WorldWideScience

Sample records for accurate automatic speech

  1. Why has (reasonably accurate) Automatic Speech Recognition been so hard to achieve?

    Wegmann, Steven

    2010-01-01

    Hidden Markov models (HMMs) have been successfully applied to automatic speech recognition for more than 35 years in spite of the fact that a key HMM assumption -- the statistical independence of frames -- is obviously violated by speech data. In fact, this data/model mismatch has inspired many attempts to modify or replace HMMs with alternative models that are better able to take into account the statistical dependence of frames. However it is fair to say that in 2010 the HMM is the consensus model of choice for speech recognition and that HMMs are at the heart of both commercially available products and contemporary research systems. In this paper we present a preliminary exploration aimed at understanding how speech data depart from HMMs and what effect this departure has on the accuracy of HMM-based speech recognition. Our analysis uses standard diagnostic tools from the field of statistics -- hypothesis testing, simulation and resampling -- which are rarely used in the field of speech recognition. Our ma...

  2. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

    Lee, Byeongwook; Cho, Kwang-Hyun

    2016-11-01

    Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test.

  3. Phoneme vs Grapheme Based Automatic Speech Recognition

    Magimai.-Doss, Mathew; Dines, John; Bourlard, Hervé; Hermansky, Hynek

    2004-01-01

    In recent literature, different approaches have been proposed to use graphemes as subword units with implicit source of phoneme information for automatic speech recognition. The major advantage of using graphemes as subword units is that the definition of lexicon is easy. In previous studies, results comparable to phoneme-based automatic speech recognition systems have been reported using context-independent graphemes or context-dependent graphemes with decision trees. In this paper, we study...

  4. Automatic Speech Segmentation Based on HMM

    M. Kroul

    2007-06-01

    Full Text Available This contribution deals with the problem of automatic phoneme segmentation using HMMs. Automatization of speech segmentation task is important for applications, where large amount of data is needed to process, so manual segmentation is out of the question. In this paper we focus on automatic segmentation of recordings, which will be used for triphone synthesis unit database creation. For speech synthesis, the speech unit quality is a crucial aspect, so the maximal accuracy in segmentation is needed here. In this work, different kinds of HMMs with various parameters have been trained and their usefulness for automatic segmentation is discussed. At the end of this work, some segmentation accuracy tests of all models are presented.

  5. A Statistical Approach to Automatic Speech Summarization

    Hori, Chiori; Furui, Sadaoki; Malkin, Rob; Yu, Hua; Waibel, Alex

    2003-12-01

    This paper proposes a statistical approach to automatic speech summarization. In our method, a set of words maximizing a summarization score indicating the appropriateness of summarization is extracted from automatically transcribed speech and then concatenated to create a summary. The extraction process is performed using a dynamic programming (DP) technique based on a target compression ratio. In this paper, we demonstrate how an English news broadcast transcribed by a speech recognizer is automatically summarized. We adapted our method, which was originally proposed for Japanese, to English by modifying the model for estimating word concatenation probabilities based on a dependency structure in the original speech given by a stochastic dependency context free grammar (SDCFG). We also propose a method of summarizing multiple utterances using a two-level DP technique. The automatically summarized sentences are evaluated by summarization accuracy based on a comparison with a manual summary of speech that has been correctly transcribed by human subjects. Our experimental results indicate that the method we propose can effectively extract relatively important information and remove redundant and irrelevant information from English news broadcasts.

  6. Personality in speech assessment and automatic classification

    Polzehl, Tim

    2015-01-01

    This work combines interdisciplinary knowledge and experience from research fields of psychology, linguistics, audio-processing, machine learning, and computer science. The work systematically explores a novel research topic devoted to automated modeling of personality expression from speech. For this aim, it introduces a novel personality assessment questionnaire and presents the results of extensive labeling sessions to annotate the speech data with personality assessments. It provides estimates of the Big 5 personality traits, i.e. openness, conscientiousness, extroversion, agreeableness, and neuroticism. Based on a database built on the questionnaire, the book presents models to tell apart different personality types or classes from speech automatically.

  7. Issues in acoustic modeling of speech for automatic speech recognition

    Gong, Yifan; Haton, Jean-Paul; Mari, Jean-François

    1994-01-01

    Projet RFIA; Stochastic modeling is a flexible method for handling the large variability in speech for recognition applications. In contrast to dynamic time warping where heuristic training methods for estimating word templates are used, stochastic modeling allows a probabilistic and automatic training for estimating models. This paper deals with the improvement of stochastic techniques, especially for a better representation of time varying phenomena.

  8. Hidden Markov models in automatic speech recognition

    Wrzoskowicz, Adam

    1993-11-01

    This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.

  9. Multilingual Vocabularies in Automatic Speech Recognition

    2000-08-01

    monolingual (a few thousands) is an obstacle to a full generalization of the inventories, then moved to the multilingual case. In the approach towards the...language. of multilingual models than the monolingual models, and it was specifically observed in the test with Spanish utterances. In fact...UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADP010389 TITLE: Multilingual Vocabularies in Automatic Speech Recognition

  10. Automatic Phonetic Transcription for Danish Speech Recognition

    Kirkedal, Andreas Søeborg

    Automatic speech recognition (ASR) uses dictionaries that map orthographic words to their phonetic representation. To minimize the occurrence of out-of-vocabulary words, ASR requires large phonetic dictionaries to model pronunciation. Hand-crafted high-quality phonetic dictionaries are difficult...... of automatic phonetic transcriptions vary greatly with respect to language and transcription strategy. For some languages where the difference between the graphemic and phonetic representations are small, graphemic transcriptions can be used to create ASR systems with acceptable performance. In other languages......, like Danish, the graphemic and phonetic representations are very dissimilar and more complex rewriting rules must be applied to create the correct phonetic representation. Automatic phonetic transcribers use different strategies, from deep analysis to shallow rewriting rules, to produce phonetic...

  11. Is markerless acquisition of speech production accurate ?

    Ouni, Slim; Dahmani, Sara

    2016-01-01

    International audience; In this study, the precision of markerless acquisition techniques have been assessed when used to acquire articulatory data for speech production studies. Two different markerless systems have been evaluated and compared to a marker-based one. The main finding is that both markerless systems provide reasonable result during normal speech and the quality is uneven during fast articulated speech. The quality of the data is dependent on the temporal resolution of the mark...

  12. Indonesian Automatic Speech Recognition For Command Speech Controller Multimedia Player

    Vivien Arief Wardhany

    2014-12-01

    Full Text Available The purpose of multimedia devices development is controlling through voice. Nowdays voice that can be recognized only in English. To overcome the issue, then recognition using Indonesian language model and accousticc model and dictionary. Automatic Speech Recognizier is build using engine CMU Sphinx with modified english language to Indonesian Language database and XBMC used as the multimedia player. The experiment is using 10 volunteers testing items based on 7 commands. The volunteers is classifiedd by the genders, 5 Male & 5 female. 10 samples is taken in each command, continue with each volunteer perform 10 testing command. Each volunteer also have to try all 7 command that already provided. Based on percentage clarification table, the word “Kanan” had the most recognize with percentage 83% while “pilih” is the lowest one. The word which had the most wrong clarification is “kembali” with percentagee 67%, while the word “kanan” is the lowest one. From the result of Recognition Rate by male there are several command such as “Kembali”, “Utama”, “Atas “ and “Bawah” has the low Recognition Rate. Especially for “kembali” cannot be recognized as the command in the female voices but in male voice that command has 4% of RR this is because the command doesn’t have similar word in english near to “kembali” so the system unrecognize the command. Also for the command “Pilih” using the female voice has 80% of RR but for the male voice has only 4% of RR. This problem is mostly because of the different voice characteristic between adult male and female which male has lower voice frequencies (from 85 to 180 Hz than woman (165 to 255 Hz.The result of the experiment showed that each man had different number of recognition rate caused by the difference tone, pronunciation, and speed of speech. For further work needs to be done in order to improving the accouracy of the Indonesian Automatic Speech Recognition system

  13. Automatic Speech Recognition from Neural Signals: A Focused Review

    Christian Herff

    2016-09-01

    Full Text Available Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e.~patients suffering from locked-in syndrome. For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people.This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography. As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the emph{Brain-to-text} system.

  14. Automatic speech recognition An evaluation of Google Speech

    Stenman, Magnus

    2015-01-01

    The use of speech recognition is increasing rapidly and is now available in smart TVs, desktop computers, every new smart phone, etc. allowing us to talk to computers naturally. With the use in home appliances, education and even in surgical procedures accuracy and speed becomes very important. This thesis aims to give an introduction to speech recognition and discuss its use in robotics. An evaluation of Google Speech, using Google’s speech API, in regards to word error rate and translation ...

  15. Confidence and rejection in automatic speech recognition

    Colton, Larry Don

    Automatic speech recognition (ASR) is performed imperfectly by computers. For some designated part (e.g., word or phrase) of the ASR output, rejection is deciding (yes or no) whether it is correct, and confidence is the probability (0.0 to 1.0) of it being correct. This thesis presents new methods of rejecting errors and estimating confidence for telephone speech. These are also called word or utterance verification and can be used in wordspotting or voice-response systems. Open-set or out-of-vocabulary situations are a primary focus. Language models are not considered. In vocabulary-dependent rejection all words in the target vocabulary are known in advance and a strategy can be developed for confirming each word. A word-specific artificial neural network (ANN) is shown to discriminate well, and scores from such ANNs are shown on a closed-set recognition task to reorder the N-best hypothesis list (N=3) for improved recognition performance. Segment-based duration and perceptual linear prediction (PLP) features are shown to perform well for such ANNs. The majority of the thesis concerns vocabulary- and task-independent confidence and rejection based on phonetic word models. These can be computed for words even when no training examples of those words have been seen. New techniques are developed using phoneme ranks instead of probabilities in each frame. These are shown to perform as well as the best other methods examined despite the data reduction involved. Certain new weighted averaging schemes are studied but found to give no performance benefit. Hierarchical averaging is shown to improve performance significantly: frame scores combine to make segment (phoneme state) scores, which combine to make phoneme scores, which combine to make word scores. Use of intermediate syllable scores is shown to not affect performance. Normalizing frame scores by an average of the top probabilities in each frame is shown to improve performance significantly. Perplexity of the wrong

  16. A Review on Speech Corpus Development for Automatic Speech Recognition in Indian Languages

    Cini kurian

    2015-05-01

    Full Text Available Corpus development gained much attention due to recent statistics based natural language processing. It has new applications in Language Technology, linguistic research, language education and information exchange. Corpus based Language research has an innovative outlook which will discard the aged linguistic theories. Speech corpus is the essential resources for building a speech recognizer. One of the main challenges faced by speech scientist is the unavailability of these resources. Very fewer efforts have been made in Indian languages to make these resources available to public compared to English. In this paper we review the efforts made in Indian languages for developing speech corpus for automatic speech recognition.

  17. Disordered Speech Assessment Using Automatic Methods Based on Quantitative Measures

    Christine Sapienza

    2005-06-01

    Full Text Available Speech quality assessment methods are necessary for evaluating and documenting treatment outcomes of patients suffering from degraded speech due to Parkinson's disease, stroke, or other disease processes. Subjective methods of speech quality assessment are more accurate and more robust than objective methods but are time-consuming and costly. We propose a novel objective measure of speech quality assessment that builds on traditional speech processing techniques such as dynamic time warping (DTW and the Itakura-Saito (IS distortion measure. Initial results show that our objective measure correlates well with the more expensive subjective methods.

  18. Modelling context in automatic speech recognition

    Wiggers, P.

    2008-01-01

    Speech is at the core of human communication. Speaking and listing comes so natural to us that we do not have to think about it at all. The underlying cognitive processes are very rapid and almost completely subconscious. It is hard, if not impossible not to understand speech. For computers on the o

  19. Automatic Blind Syllable Segmentation for Continuous Speech

    Villing, Rudi; Timoney, Joseph; Ward, Tomas

    2004-01-01

    In this paper a simple practical method for blind segmentation of continuous speech into its constituent syllables is presented. This technique which uses amplitude onset velocity and coarse spectral makeup to identify syllable boundaries is tested on a corpus of continuous speech and compared with an established segmentation algorithm. The results show substantial performance benefit using the proposed algorithm.

  20. Automatic speech signal segmentation based on the innovation adaptive filter

    Makowski Ryszard

    2014-06-01

    Full Text Available Speech segmentation is an essential stage in designing automatic speech recognition systems and one can find several algorithms proposed in the literature. It is a difficult problem, as speech is immensely variable. The aim of the authors’ studies was to design an algorithm that could be employed at the stage of automatic speech recognition. This would make it possible to avoid some problems related to speech signal parametrization. Posing the problem in such a way requires the algorithm to be capable of working in real time. The only such algorithm was proposed by Tyagi et al., (2006, and it is a modified version of Brandt’s algorithm. The article presents a new algorithm for unsupervised automatic speech signal segmentation. It performs segmentation without access to information about the phonetic content of the utterances, relying exclusively on second-order statistics of a speech signal. The starting point for the proposed method is time-varying Schur coefficients of an innovation adaptive filter. The Schur algorithm is known to be fast, precise, stable and capable of rapidly tracking changes in second order signal statistics. A transfer from one phoneme to another in the speech signal always indicates a change in signal statistics caused by vocal track changes. In order to allow for the properties of human hearing, detection of inter-phoneme boundaries is performed based on statistics defined on the mel spectrum determined from the reflection coefficients. The paper presents the structure of the algorithm, defines its properties, lists parameter values, describes detection efficiency results, and compares them with those for another algorithm. The obtained segmentation results, are satisfactory.

  1. Automatic discrimination between laughter and speech

    Truong, K.; Leeuwen, D. van

    2007-01-01

    Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the dev

  2. Automatic Speech Recognition: Reliability and Pedagogical Implications for Teaching Pronunciation

    Kim, In-Seok

    2006-01-01

    This study examines the reliability of automatic speech recognition (ASR) software used to teach English pronunciation, focusing on one particular piece of software, "FluSpeak, as a typical example." Thirty-six Korean English as a Foreign Language (EFL) college students participated in an experiment in which they listened to 15 sentences…

  3. Jackson's Parrot: Samuel Beckett, Aphasic Speech Automatisms, and Psychosomatic Language.

    Salisbury, Laura; Code, Chris

    2016-06-01

    This article explores the relationship between automatic and involuntary language in the work of Samuel Beckett and late nineteenth-century neurological conceptions of language that emerged from aphasiology. Using the work of John Hughlings Jackson alongside contemporary neuroscientific research, we explore the significance of the lexical and affective symmetries between Beckett's compulsive and profoundly embodied language and aphasic speech automatisms. The interdisciplinary work in this article explores the paradox of how and why Beckett was able to search out a longed-for language of feeling that might disarticulate the classical bond between the language, intention, rationality and the human, in forms of expression that seem automatic and "readymade".

  4. Automatic Emotion Recognition in Speech: Possibilities and Significance

    Milana Bojanić

    2009-12-01

    Full Text Available Automatic speech recognition and spoken language understanding are crucial steps towards a natural humanmachine interaction. The main task of the speech communication process is the recognition of the word sequence, but the recognition of prosody, emotion and stress tags may be of particular importance as well. This paper discusses thepossibilities of recognition emotion from speech signal in order to improve ASR, and also provides the analysis of acoustic features that can be used for the detection of speaker’s emotion and stress. The paper also provides a short overview of emotion and stress classification techniques. The importance and place of emotional speech recognition is shown in the domain of human-computer interactive systems and transaction communication model. The directions for future work are given at the end of this work.

  5. Automatic speech segmentation using throat-acoustic correlation coefficients

    Mussabayev, Rustam Rafikovich; Kalimoldayev, Maksat N.; Amirgaliyev, Yedilkhan N.; Mussabayev, Timur R.

    2016-11-01

    This work considers one of the approaches to the solution of the task of discrete speech signal automatic segmentation. The aim of this work is to construct such an algorithm which should meet the following requirements: segmentation of a signal into acoustically homogeneous segments, high accuracy and segmentation speed, unambiguity and reproducibility of segmentation results, lack of necessity of preliminary training with the use of a special set consisting of manually segmented signals. Development of the algorithm which corresponds to the given requirements was conditioned by the necessity of formation of automatically segmented speech databases that have a large volume. One of the new approaches to the solution of this task is viewed in this article. For this purpose we use the new type of informative features named TAC-coefficients (Throat-Acoustic Correlation coefficients) which provide sufficient segmentation accuracy and effi- ciency.

  6. Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders

    Baghai-Ravary, Ladan

    2013-01-01

    Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders provides a survey of methods designed to aid clinicians in the diagnosis and monitoring of speech disorders such as dysarthria and dyspraxia, with an emphasis on the signal processing techniques, statistical validity of the results presented in the literature, and the appropriateness of methods that do not require specialized equipment, rigorously controlled recording procedures or highly skilled personnel to interpret results. Such techniques offer the promise of a simple and cost-effective, yet objective, assessment of a range of medical conditions, which would be of great value to clinicians. The ideal scenario would begin with the collection of examples of the clients’ speech, either over the phone or using portable recording devices operated by non-specialist nursing staff. The recordings could then be analyzed initially to aid diagnosis of conditions, and subsequently to monitor the clients’ progress and res...

  7. The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system

    Zekveld, A.A.; Kramer, S.E.; Kessens, J.M.; Vlaming, M.S.M.G.; Houtgast, T.

    2009-01-01

    Objective: The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that spe

  8. Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

    Ye, Sherry

    2015-01-01

    NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.

  9. Robust Automatic Speech Recognition in Impulsive Noise Environment

    DINGPei; CAOZhigang

    2005-01-01

    This paper presents an efficient method to directly suppress the effect of impulsive noise for robust Automatic speech recognition (ASR). In this method, according to the noise sensitivity of each feature dimension,the observation vectors are divided into several parts, eachof which is assigned to a proper threshold. In recognition stage, the unreliable probability preponderance of incorrect competing path caused by impulsive noise is eliminated by Flooring observation probability (FOP) of eachfeature sub-vector at the Gaussian mixture level, so that the correct path will recover the priority of being chosen in decoding. Experimental results also demonstrate that the proposed method can significantly improve the recognition accuracy both in machinegun noise and simulated impulsive noise environments, while maintaining high performance for clean speech recognition.

  10. Studies on inter-speaker variability in speech and its application in automatic speech recognition

    S Umesh

    2011-10-01

    In this paper, we give an overview of the problem of inter-speaker variability and its study in many diverse areas of speech signal processing. We first give an overview of vowel-normalization studies that minimize variations in the acoustic representation of vowel realizations by different speakers. We then describe the universal-warping approach to speaker normalization which unifies many of the vowel normalization approaches and also shows the relation between speech production, perception and auditory processing. We then address the problem of inter-speaker variability in automatic speech recognition (ASR) and describe techniques that are used to reduce these effects and thereby improve the performance of speaker-independent ASR systems.

  11. Robust, accurate and fast automatic segmentation of the spinal cord.

    De Leener, Benjamin; Kadoury, Samuel; Cohen-Adad, Julien

    2014-09-01

    Spinal cord segmentation provides measures of atrophy and facilitates group analysis via inter-subject correspondence. Automatizing this procedure enables studies with large throughput and minimizes user bias. Although several automatic segmentation methods exist, they are often restricted in terms of image contrast and field-of-view. This paper presents a new automatic segmentation method (PropSeg) optimized for robustness, accuracy and speed. The algorithm is based on the propagation of a deformable model and is divided into three parts: firstly, an initialization step detects the spinal cord position and orientation using a circular Hough transform on multiple axial slices rostral and caudal to the starting plane and builds an initial elliptical tubular mesh. Secondly, a low-resolution deformable model is propagated along the spinal cord. To deal with highly variable contrast levels between the spinal cord and the cerebrospinal fluid, the deformation is coupled with a local contrast-to-noise adaptation at each iteration. Thirdly, a refinement process and a global deformation are applied on the propagated mesh to provide an accurate segmentation of the spinal cord. Validation was performed in 15 healthy subjects and two patients with spinal cord injury, using T1- and T2-weighted images of the entire spinal cord and on multiecho T2*-weighted images. Our method was compared against manual segmentation and against an active surface method. Results show high precision for all the MR sequences. Dice coefficients were 0.9 for the T1- and T2-weighted cohorts and 0.86 for the T2*-weighted images. The proposed method runs in less than 1min on a normal computer and can be used to quantify morphological features such as cross-sectional area along the whole spinal cord.

  12. Difficulties in Automatic Speech Recognition of Dysarthric Speakers and Implications for Speech-Based Applications Used by the Elderly: A Literature Review

    Young, Victoria; Mihailidis, Alex

    2010-01-01

    Despite their growing presence in home computer applications and various telephony services, commercial automatic speech recognition technologies are still not easily employed by everyone; especially individuals with speech disorders. In addition, relatively little research has been conducted on automatic speech recognition performance with older…

  13. Behavioral and Electrophysiological Evidence for Early and Automatic Detection of Phonological Equivalence in Variable Speech Inputs

    Kharlamov, Viktor; Campbell, Kenneth; Kazanina, Nina

    2011-01-01

    Speech sounds are not always perceived in accordance with their acoustic-phonetic content. For example, an early and automatic process of perceptual repair, which ensures conformity of speech inputs to the listener's native language phonology, applies to individual input segments that do not exist in the native inventory or to sound sequences that…

  14. Developing and Evaluating an Oral Skills Training Website Supported by Automatic Speech Recognition Technology

    Chen, Howard Hao-Jan

    2011-01-01

    Oral communication ability has become increasingly important to many EFL students. Several commercial software programs based on automatic speech recognition (ASR) technologies are available but their prices are not affordable for many students. This paper will demonstrate how the Microsoft Speech Application Software Development Kit (SASDK), a…

  15. Man-system interface based on automatic speech recognition: integration to a virtual control desk

    Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C., E-mail: calexandre@ien.gov.b, E-mail: mol@ien.gov.b, E-mail: cmnap@ien.gov.b, E-mail: mag@ien.gov.b [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil); Nomiya, Diogo V., E-mail: diogonomiya@gmail.co [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil)

    2009-07-01

    This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)

  16. Noise robust automatic speech recognition with adaptive quantile based noise estimation and speech band emphasizing filter bank

    Bonde, Casper Stork; Graversen, Carina; Gregersen, Andreas Gregers;

    2005-01-01

    An important topic in Automatic Speech Recognition (ASR) is to reduce the effect of noise, in particular when mismatch exists between the training and application conditions. Many noise robutness schemes within the feature processing domain use as a prerequisite a noise estimate prior...... to the appearance of the speech signal which require noise robust voice activity detection and assumptions of stationary noise. However, both of these requirements are often not met and it is therefore of particular interest to investigate methods like the Quantile Based Noise Estimation (QBNE) mehtod which...... estimates the noise during speech and non-speech sections without the use of a voice activity detector. While the standard QBNE-method uses a fixed pre-defined quantile accross all frequency bands, this paper suggests adaptive QBNE (AQBNE) which adapts the quantile individually to each frequency band...

  17. Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.

    Agarwalla, Swapna; Sarma, Kandarpa Kumar

    2016-06-01

    Automatic Speaker Recognition (ASR) and related issues are continuously evolving as inseparable elements of Human Computer Interaction (HCI). With assimilation of emerging concepts like big data and Internet of Things (IoT) as extended elements of HCI, ASR techniques are found to be passing through a paradigm shift. Oflate, learning based techniques have started to receive greater attention from research communities related to ASR owing to the fact that former possess natural ability to mimic biological behavior and that way aids ASR modeling and processing. The current learning based ASR techniques are found to be evolving further with incorporation of big data, IoT like concepts. Here, in this paper, we report certain approaches based on machine learning (ML) used for extraction of relevant samples from big data space and apply them for ASR using certain soft computing techniques for Assamese speech with dialectal variations. A class of ML techniques comprising of the basic Artificial Neural Network (ANN) in feedforward (FF) and Deep Neural Network (DNN) forms using raw speech, extracted features and frequency domain forms are considered. The Multi Layer Perceptron (MLP) is configured with inputs in several forms to learn class information obtained using clustering and manual labeling. DNNs are also used to extract specific sentence types. Initially, from a large storage, relevant samples are selected and assimilated. Next, a few conventional methods are used for feature extraction of a few selected types. The features comprise of both spectral and prosodic types. These are applied to Recurrent Neural Network (RNN) and Fully Focused Time Delay Neural Network (FFTDNN) structures to evaluate their performance in recognizing mood, dialect, speaker and gender variations in dialectal Assamese speech. The system is tested under several background noise conditions by considering the recognition rates (obtained using confusion matrices and manually) and computation time

  18. Multiclassifier fusion of an ultrasonic lip reader in automatic speech recognition

    Jennnings, David L.

    1994-12-01

    This thesis investigates the use of two active ultrasonic devices in collecting lip information for performing and enhancing automatic speech recognition. The two devices explored are called the 'Ultrasonic Mike' and the 'Lip Lock Loop.' The devices are tested in a speaker dependent isolated word recognition task with a vocabulary consisting of the spoken digits from zero to nine. Two automatic lip readers are designed and tested based on the output of the ultrasonic devices. The automatic lip readers use template matching and dynamic time warping to determine the best candidate for a given test utterance. The automatic lip readers alone achieve accuracies of 65-89%, depending on the number of reference templates used. Next the automatic lip reader is combined with a conventional automatic speech recognizer. Both classifier level fusion and feature level fusion are investigated. Feature fusion is based on combining the feature vectors prior to dynamic time warping. Classifier fusion is based on a pseudo probability mass function derived from the dynamic time warping distances. The combined systems are tested with various levels of acoustic noise added. In one typical test, at a signal to noise ratio of 0dB, the acoustic recognizer's accuracy alone was 78%, the automatic lip reader's accuracy was 69%, but the combined accuracy was 93%. This experiment demonstrates that a simple ultrasonic lip motion detector, that has an output data rate 12,500 times less than a typical video camera, can significantly improve the accuracy of automatic speech recognition in noise.

  19. Sensitivity Based Segmentation and Identification in Automatic Speech Recognition.

    1984-03-30

    by a network constructed from phonemic, phonetic , and phonological rules. Regardless of the speech processing system used, Klatt 1 2 has described...analysis, and its use in the segmentation and identification of the phonetic units of speech, that was initiated during the 1982 Summer Faculty Research...practicable framework for incorporation of acoustic- phonetic variance as well as time and talker normalization. XOI iF- ? ’:: .:- .- . . l ] 2 D

  20. Fusing Eye-gaze and Speech Recognition for Tracking in an Automatic Reading Tutor

    Rasmussen, Morten Højfeldt; Tan, Zheng-Hua

    2013-01-01

    In this paper we present a novel approach for automatically tracking the reading progress using a combination of eye-gaze tracking and speech recognition. The two are fused by first generating word probabilities based on eye-gaze information and then using these probabilities to augment the langu......In this paper we present a novel approach for automatically tracking the reading progress using a combination of eye-gaze tracking and speech recognition. The two are fused by first generating word probabilities based on eye-gaze information and then using these probabilities to augment...

  1. A Review on Speech Corpus Development for Automatic Speech Recognition in Indian Languages

    Cini kurian

    2015-01-01

    Corpus development gained much attention due to recent statistics based natural language processing. It has new applications in Language Technology, linguistic research, language education and information exchange. Corpus based Language research has an innovative outlook which will discard the aged linguistic theories. Speech corpus is the essential resources for building a speech recognizer. One of the main challenges faced by speech scientist is the unavailability of these resources. Very f...

  2. Automatic classification and accurate size measurement of blank mask defects

    Bhamidipati, Samir; Paninjath, Sankaranarayanan; Pereira, Mark; Buck, Peter

    2015-07-01

    A blank mask and its preparation stages, such as cleaning or resist coating, play an important role in the eventual yield obtained by using it. Blank mask defects' impact analysis directly depends on the amount of available information such as the number of defects observed, their accurate locations and sizes. Mask usability qualification at the start of the preparation process, is crudely based on number of defects. Similarly, defect information such as size is sought to estimate eventual defect printability on the wafer. Tracking of defect characteristics, specifically size and shape, across multiple stages, can further be indicative of process related information such as cleaning or coating process efficiencies. At the first level, inspection machines address the requirement of defect characterization by detecting and reporting relevant defect information. The analysis of this information though is still largely a manual process. With advancing technology nodes and reducing half-pitch sizes, a large number of defects are observed; and the detailed knowledge associated, make manual defect review process an arduous task, in addition to adding sensitivity to human errors. Cases where defect information reported by inspection machine is not sufficient, mask shops rely on other tools. Use of CDSEM tools is one such option. However, these additional steps translate into increased costs. Calibre NxDAT based MDPAutoClassify tool provides an automated software alternative to the manual defect review process. Working on defect images generated by inspection machines, the tool extracts and reports additional information such as defect location, useful for defect avoidance[4][5]; defect size, useful in estimating defect printability; and, defect nature e.g. particle, scratch, resist void, etc., useful for process monitoring. The tool makes use of smart and elaborate post-processing algorithms to achieve this. Their elaborateness is a consequence of the variety and

  3. Evaluating Automatic Speech Recognition-Based Language Learning Systems: A Case Study

    van Doremalen, Joost; Boves, Lou; Colpaert, Jozef; Cucchiarini, Catia; Strik, Helmer

    2016-01-01

    The purpose of this research was to evaluate a prototype of an automatic speech recognition (ASR)-based language learning system that provides feedback on different aspects of speaking performance (pronunciation, morphology and syntax) to students of Dutch as a second language. We carried out usability reviews, expert reviews and user tests to…

  4. Automatic Speech Recognition Technology as an Effective Means for Teaching Pronunciation

    Elimat, Amal Khalil; AbuSeileek, Ali Farhan

    2014-01-01

    This study aimed to explore the effect of using automatic speech recognition technology (ASR) on the third grade EFL students' performance in pronunciation, whether teaching pronunciation through ASR is better than regular instruction, and the most effective teaching technique (individual work, pair work, or group work) in teaching pronunciation…

  5. An open-set detection evaluation methodology for automatic emotion recognition in speech

    Truong, K.P.; Leeuwen, D.A. van

    2007-01-01

    In this paper, we present a detection approach and an ‘open-set’ detection evaluation methodology for automatic emotion recognition in speech. The traditional classification approach does not seem to be suitable and flexible enough for typical emotion recognition tasks. For example, classification d

  6. AUTOMATIC SPEECH RECOGNITION SYSTEM CONCERNING THE MOROCCAN DIALECTE (Darija and Tamazight)

    A. EL GHAZI; Daoui, C.; Idrissi, N

    2012-01-01

    In this work we present an automatic speech recognition system for Moroccan dialect mainly: Darija (Arab dialect) and Tamazight. Many approaches have been used to model the Arabic and Tamazightphonetic units. In this paper, we propose to use the hidden Markov model (HMM) for modeling these phoneticunits. Experimental results show that the proposed approach further improves the recognition.

  7. Errors in Automatic Speech Recognition versus Difficulties in Second Language Listening

    Mirzaei, Maryam Sadat; Meshgi, Kourosh; Akita, Yuya; Kawahara, Tatsuya

    2015-01-01

    Automatic Speech Recognition (ASR) technology has become a part of contemporary Computer-Assisted Language Learning (CALL) systems. ASR systems however are being criticized for their erroneous performance especially when utilized as a mean to develop skills in a Second Language (L2) where errors are not tolerated. Nevertheless, these errors can…

  8. Cross-modal enhancement of the MMN to speech-sounds indicates early and automatic integration of letters and speech-sounds.

    Froyen, Dries; Van Atteveldt, Nienke; Bonte, Milene; Blomert, Leo

    2008-01-01

    Recently brain imaging evidence indicated that letter/speech-sound integration, necessary for establishing fluent reading, takes place in auditory association areas and that the integration is influenced by stimulus onset asynchrony (SOA) between the letter and the speech-sound. In the present study, we used a specific ERP measure known for its automatic character, the mismatch negativity (MMN), to investigate the time course and automaticity of letter/speech-sound integration. We studied the effect of visual letters and SOA on the MMN elicited by a deviant speech-sound. We found a clear enhancement of the MMN by simultaneously presenting a letter, but without changing the auditory stimulation. This enhancement diminishes linearly with increasing SOA. These results suggest that letters and speech-sounds are processed as compound stimuli early and automatically in the auditory association cortex of fluent readers and that this processing is strongly dependent on timing.

  9. Native language shapes automatic neural processing of speech.

    Intartaglia, Bastien; White-Schwoch, Travis; Meunier, Christine; Roman, Stéphane; Kraus, Nina; Schön, Daniele

    2016-08-01

    The development of the phoneme inventory is driven by the acoustic-phonetic properties of one's native language. Neural representation of speech is known to be shaped by language experience, as indexed by cortical responses, and recent studies suggest that subcortical processing also exhibits this attunement to native language. However, most work to date has focused on the differences between tonal and non-tonal languages that use pitch variations to convey phonemic categories. The aim of this cross-language study is to determine whether subcortical encoding of speech sounds is sensitive to language experience by comparing native speakers of two non-tonal languages (French and English). We hypothesized that neural representations would be more robust and fine-grained for speech sounds that belong to the native phonemic inventory of the listener, and especially for the dimensions that are phonetically relevant to the listener such as high frequency components. We recorded neural responses of American English and French native speakers, listening to natural syllables of both languages. Results showed that, independently of the stimulus, American participants exhibited greater neural representation of the fundamental frequency compared to French participants, consistent with the importance of the fundamental frequency to convey stress patterns in English. Furthermore, participants showed more robust encoding and more precise spectral representations of the first formant when listening to the syllable of their native language as compared to non-native language. These results align with the hypothesis that language experience shapes sensory processing of speech and that this plasticity occurs as a function of what is meaningful to a listener.

  10. Preliminary Analysis of Automatic Speech Recognition and Synthesis Technology.

    1983-05-01

    but also suprasegmental (prosodic) information, such as appropriate stress levels and intonation patterns, to improve their output. This is a definite...Society of America Paper, 1975. " Suprasegmentals ." Cambridge, Mass.: MIT Press, 1970. M4akhoul, J., Viswanathan, R., and Huggins, W. F. "A Mixed...the " suprasegmentals " or influences of duration, fundamental frequency, and speech product ion power upon basic phonemes. These ...- 102- influences

  11. Arabic Language Learning Assisted by Computer, based on Automatic Speech Recognition

    Terbeh, Naim

    2012-01-01

    This work consists of creating a system of the Computer Assisted Language Learning (CALL) based on a system of Automatic Speech Recognition (ASR) for the Arabic language using the tool CMU Sphinx3 [1], based on the approach of HMM. To this work, we have constructed a corpus of six hours of speech recordings with a number of nine speakers. we find in the robustness to noise a grounds for the choice of the HMM approach [2]. the results achieved are encouraging since our corpus is made by only nine speakers, but they are always reasons that open the door for other improvement works.

  12. [Repetitive phenomenona in the spontaneous speech of aphasic patients: perseveration, stereotypy, echolalia, automatism and recurring utterance].

    Wallesch, C W; Brunner, R J; Seemüller, E

    1983-12-01

    Repetitive phenomena in spontaneous speech were investigated in 30 patients with chronic infarctions of the left hemisphere which included Broca's and/or Wernicke's area and/or the basal ganglia. Perseverations, stereotypies, and echolalias occurred with all types of brain lesions, automatisms and recurring utterances only with those patients, whose infarctions involved Wernicke's area and basal ganglia. These patients also showed more echolalic responses. The results are discussed in view of the role of the basal ganglia as motor program generators.

  13. Automatic evaluation of speech rhythm instability and acceleration in dysarthrias associated with basal ganglia dysfunction

    Jan eRusz

    2015-07-01

    Full Text Available Speech rhythm abnormalities are commonly present in patients with different neurodegenerative disorders. These alterations are hypothesized to be a consequence of disruption to the basal ganglia circuitry involving dysfunction of motor planning, programming and execution, which can be detected by a syllable repetition paradigm. Therefore, the aim of the present study was to design a robust signal processing technique that allows the automatic detection of spectrally-distinctive nuclei of syllable vocalizations and to determine speech features that represent rhythm instability and acceleration. A further aim was to elucidate specific patterns of dysrhythmia across various neurodegenerative disorders that share disruption of basal ganglia function. Speech samples based on repetition of the syllable /pa/ at a self-determined steady pace were acquired from 109 subjects, including 22 with Parkinson's disease (PD, 11 progressive supranuclear palsy (PSP, 9 multiple system atrophy (MSA, 24 ephedrone-induced parkinsonism (EP, 20 Huntington's disease (HD, and 23 healthy controls. Subsequently, an algorithm for the automatic detection of syllables as well as features representing rhythm instability and rhythm acceleration were designed. The proposed detection algorithm was able to correctly identify syllables and remove erroneous detections due to excessive inspiration and nonspeech sounds with a very high accuracy of 99.6%. Instability of vocal pace performance was observed in PSP, MSA, EP and HD groups. Significantly increased pace acceleration was observed only in the PD group. Although not significant, a tendency for pace acceleration was observed also in the PSP and MSA groups. Our findings underline the crucial role of the basal ganglia in the execution and maintenance of automatic speech motor sequences. We envisage the current approach to become the first step towards the development of acoustic technologies allowing automated assessment of rhythm

  14. Deformable meshes for medical image segmentation accurate automatic segmentation of anatomical structures

    Kainmueller, Dagmar

    2014-01-01

    ? Segmentation of anatomical structures in medical image data is an essential task in clinical practice. Dagmar Kainmueller introduces methods for accurate fully automatic segmentation of anatomical structures in 3D medical image data. The author's core methodological contribution is a novel deformation model that overcomes limitations of state-of-the-art Deformable Surface approaches, hence allowing for accurate segmentation of tip- and ridge-shaped features of anatomical structures. As for practical contributions, she proposes application-specific segmentation pipelines for a range of anatom

  15. An exploration of the potential of Automatic Speech Recognition to assist and enable receptive communication in higher education

    Mike Wald

    2006-12-01

    Full Text Available The potential use of Automatic Speech Recognition to assist receptive communication is explored. The opportunities and challenges that this technology presents students and staff to provide captioning of speech online or in classrooms for deaf or hard of hearing students and assist blind, visually impaired or dyslexic learners to read and search learning material more readily by augmenting synthetic speech with natural recorded real speech is also discussed and evaluated. The automatic provision of online lecture notes, synchronised with speech, enables staff and students to focus on learning and teaching issues, while also benefiting learners unable to attend the lecture or who find it difficult or impossible to take notes at the same time as listening, watching and thinking.

  16. Automatic transcription of continuous speech into syllable-like units for Indian languages

    G Lakshmi Sarada; A Lakshmi; Hema A Murthy; T Nagarajan

    2009-04-01

    The focus of this paper is to automatically segment and label continuous speech signal into syllable-like units for Indian languages. In this approach, the continuous speech signal is first automatically segmented into syllable-like units using group delay based algorithm. Similar syllable segments are then grouped together using an unsupervised and incremental training (UIT) technique. Isolated style HMM models are generated for each of the clusters during training. During testing, the speech signal is segmented into syllable-like units which are then tested against the HMMs obtained during training. This results in a syllable recognition performance of 42·6% and 39·94% for Tamil and Telugu. A new feature extraction technique that uses features extracted from multiple frame sizes and frame rates during both training and testing is explored for the syllable recognition task. This results in a recognition performance of 48·7% and 45·36%, for Tamil and Telugu respectively. The performance of segmentation followed by labelling is superior to that of a flat start syllable recogniser (27·8% and 28·8% for Tamil and Telugu respectively).

  17. AUTOMATIC SPEECH RECOGNITION – THE MAIN STAGES OVER LAST 50 YEARS

    I. B. Tampel

    2015-11-01

    Full Text Available The main stages of automatic speech recognition systems over last 50 years are regarded. The attempt is made to evaluate different methods in the context of approaching to functioning of biological systems. The method implementation based on dynamic programming algorithm and done in 1968 is considered as a benchmark. Shortcomings of the method, which make it possible to use it only for command recognition, are considered. The next method considered is based on a formalism of Markov chains. Based on the notion of coarticulation the necessity of applying context dependent triphones and biphones instead of context independent phonemes is shown. The problems of insufficiency of speech databases for triphone training which lead to state tying methods are explained. The importance of model adaptation and feature normalization methods providing better invariance to speakers, communication channels and additive noise are shown. Deep Neural Networks and Recurrent Networks are considered as the most up-to-date methods. The similarity of deep (multilayer neural networks and biological systems is noted. In conclusion, the problems and drawbacks of the modern systems of automatic speech recognition are described and prognosis of their development is given.

  18. A HYBRID METHOD FOR AUTOMATIC SPEECH RECOGNITION PERFORMANCE IMPROVEMENT IN REAL WORLD NOISY ENVIRONMENT

    Urmila Shrawankar

    2013-01-01

    Full Text Available It is a well known fact that, speech recognition systems perform well when the system is used in conditions similar to the one used to train the acoustic models. However, mismatches degrade the performance. In adverse environment, it is very difficult to predict the category of noise in advance in case of real world environmental noise and difficult to achieve environmental robustness. After doing rigorous experimental study it is observed that, a unique method is not available that will clean the noisy speech as well as preserve the quality which have been corrupted by real natural environmental (mixed noise. It is also observed that only back-end techniques are not sufficient to improve the performance of a speech recognition system. It is necessary to implement performance improvement techniques at every step of back-end as well as front-end of the Automatic Speech Recognition (ASR model. Current recognition systems solve this problem using a technique called adaptation. This study presents an experimental study that aims two points, first is to implement the hybrid method that will take care of clarifying the speech signal as much as possible with all combinations of filters and enhancement techniques. The second point is to develop a method for training all categories of noise that can adapt the acoustic models for a new environment that will help to improve the performance of the speech recognizer under real world environmental mismatched conditions. This experiment confirms that hybrid adaptation methods improve the ASR performance on both levels, (Signal-to-Noise Ratio SNR improvement as well as word recognition accuracy in real world noisy environment.

  19. Automatic Speech Segmentation Based On Audio and Optical Flow Visual Classification

    Behnam Torabi

    2014-10-01

    Full Text Available Automatic speech segmentation as an important part of speech recognition system (ASR is highly noise dependent. Noise is made by changes in the communication channel, background, level of speaking etc. In recent years, many researchers have proposed noise cancelation techniques and have added visual features from speaker’s face to reduce the effect of noise on ASR systems. Removing noise from audio signals depends on the type of the noise; so it cannot be used as a general solution. Adding visual features improve this lack of efficiency, but advanced methods of this type need manual extraction of visual features. In this paper we propose a completely automatic system which uses optical flow vectors from speaker’s image sequence to obtain visual features. Then, Hidden Markov Models are trained to segment audio signals from image sequences and audio features based on extracted optical flow. The developed segmentation system based on such method acts totally automatic and become more robust to noise.

  20. An FFT-Based Companding Front End for Noise-Robust Automatic Speech Recognition

    Turicchia Lorenzo

    2007-01-01

    Full Text Available We describe an FFT-based companding algorithm for preprocessing speech before recognition. The algorithm mimics tone-to-tone suppression and masking in the auditory system to improve automatic speech recognition performance in noise. Moreover, it is also very computationally efficient and suited to digital implementations due to its use of the FFT. In an automotive digits recognition task with the CU-Move database recorded in real environmental noise, the algorithm improves the relative word error by 12.5% at -5 dB signal-to-noise ratio (SNR and by 6.2% across all SNRs (-5 dB SNR to +5 dB SNR. In the Aurora-2 database recorded with artificially added noise in several environments, the algorithm improves the relative word error rate in almost all situations.

  1. An FFT-Based Companding Front End for Noise-Robust Automatic Speech Recognition

    Bhiksha Raj

    2007-06-01

    Full Text Available We describe an FFT-based companding algorithm for preprocessing speech before recognition. The algorithm mimics tone-to-tone suppression and masking in the auditory system to improve automatic speech recognition performance in noise. Moreover, it is also very computationally efficient and suited to digital implementations due to its use of the FFT. In an automotive digits recognition task with the CU-Move database recorded in real environmental noise, the algorithm improves the relative word error by 12.5% at −5 dB signal-to-noise ratio (SNR and by 6.2% across all SNRs (−5 dB SNR to +15 dB SNR. In the Aurora-2 database recorded with artificially added noise in several environments, the algorithm improves the relative word error rate in almost all situations.

  2. Language modeling for automatic speech recognition of inflective languages an applications-oriented approach using lexical data

    Donaj, Gregor

    2017-01-01

    This book covers language modeling and automatic speech recognition for inflective languages (e.g. Slavic languages), which represent roughly half of the languages spoken in Europe. These languages do not perform as well as English in speech recognition systems and it is therefore harder to develop an application with sufficient quality for the end user. The authors describe the most important language features for the development of a speech recognition system. This is then presented through the analysis of errors in the system and the development of language models and their inclusion in speech recognition systems, which specifically address the errors that are relevant for targeted applications. The error analysis is done with regard to morphological characteristics of the word in the recognized sentences. The book is oriented towards speech recognition with large vocabularies and continuous and even spontaneous speech. Today such applications work with a rather small number of languages compared to the nu...

  3. An HMM-Like Dynamic Time Warping Scheme for Automatic Speech Recognition

    Ing-Jr Ding

    2014-01-01

    Full Text Available In the past, the kernel of automatic speech recognition (ASR is dynamic time warping (DTW, which is feature-based template matching and belongs to the category technique of dynamic programming (DP. Although DTW is an early developed ASR technique, DTW has been popular in lots of applications. DTW is playing an important role for the known Kinect-based gesture recognition application now. This paper proposed an intelligent speech recognition system using an improved DTW approach for multimedia and home automation services. The improved DTW presented in this work, called HMM-like DTW, is essentially a hidden Markov model- (HMM- like method where the concept of the typical HMM statistical model is brought into the design of DTW. The developed HMM-like DTW method, transforming feature-based DTW recognition into model-based DTW recognition, will be able to behave as the HMM recognition technique and therefore proposed HMM-like DTW with the HMM-like recognition model will have the capability to further perform model adaptation (also known as speaker adaptation. A series of experimental results in home automation-based multimedia access service environments demonstrated the superiority and effectiveness of the developed smart speech recognition system by HMM-like DTW.

  4. Peripheral Nonlinear Time Spectrum Features Algorithm for Large Vocabulary Mandarin Automatic Speech Recognition

    Fadhil H. T. Al-dulaimy; WANG Zuoying

    2005-01-01

    This work describes an improved feature extractor algorithm to extract the peripheral features of point x(ti,fj) using a nonlinear algorithm to compute the nonlinear time spectrum (NL-TS) pattern. The algorithm observes n×n neighborhoods of the point in all directions, and then incorporates the peripheral features using the Mel frequency cepstrum components (MFCCs)-based feature extractor of the Tsinghua electronic engineering speech processing (THEESP) for Mandarin automatic speech recognition (MASR) system as replacements of the dynamic features with different feature combinations. In this algorithm, the orthogonal bases are extracted directly from the speech data using discrite cosime transformation (DCT) with 3×3 blocks on an NL-TS pattern as the peripheral features. The new primal bases are then selected and simplified in the form of the operator in the time direction and the operator in the frequency direction. The algorithm has 23.29% improvements of the relative error rate in comparison with the standard MFCC feature-set and the dynamic features in tests using THEESP with the duration distribution-based hidden Markov model (DDBHMM) based on MASR system.

  5. Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?

    Héctor Delgado

    2015-12-01

    Full Text Available This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.

  6. Suprasegmental speech cues are automatically processed by the human brain: a mismatch negativity study.

    Honbolygó, Ferenc; Csépe, Valéria; Ragó, Anett

    2004-06-03

    This study investigates the electrical brain activity correlates of the automatic detection of suprasegmental and local speech cues by using a passive oddball paradigm, in which the standard Hungarian word 'banán' ('banana' in English) was contrasted with two deviants: a voiceless phoneme deviant ('panán'), and a stress deviant, where the stress was on the second syllable, instead of the obligatory first one. As a result, we obtained the mismatch negativity component (MMN) of event-related brain potentials in each condition. The stress deviant elicited two MMNs: one as a response to the lack of stress as compared to the standard stimulus, and another to the additional stress. Our results support that the MMN is as valuable in investigating processing characteristics of suprasegmental features as in that of phonemic features. MMN data may provide further insight into pre-attentive processes contributing to spoken word recognition.

  7. Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

    Umit H. Yapanel

    2008-08-01

    Full Text Available A proven method for achieving effective automatic speech recognition (ASR due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN, despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN, where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER by 24%, and (ii for a diverse noisy speech task (SPINE 2, where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.

  8. Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

    Yapanel UmitH

    2008-01-01

    Full Text Available A proven method for achieving effective automatic speech recognition (ASR due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN, despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN, where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER by 24%, and (ii for a diverse noisy speech task (SPINE 2, where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.

  9. Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition

    Skowronski, Mark D.; Harris, John G.

    2004-09-01

    Mel frequency cepstral coefficients (MFCC) are the most widely used speech features in automatic speech recognition systems, primarily because the coefficients fit well with the assumptions used in hidden Markov models and because of the superior noise robustness of MFCC over alternative feature sets such as linear prediction-based coefficients. The authors have recently introduced human factor cepstral coefficients (HFCC), a modification of MFCC that uses the known relationship between center frequency and critical bandwidth from human psychoacoustics to decouple filter bandwidth from filter spacing. In this work, the authors introduce a variation of HFCC called HFCC-E in which filter bandwidth is linearly scaled in order to investigate the effects of wider filter bandwidth on noise robustness. Experimental results show an increase in signal-to-noise ratio of 7 dB over traditional MFCC algorithms when filter bandwidth increases in HFCC-E. An important attribute of both HFCC and HFCC-E is that the algorithms only differ from MFCC in the filter bank coefficients: increased noise robustness using wider filters is achieved with no additional computational cost.

  10. Long term Suboxone™ emotional reactivity as measured by automatic detection in speech.

    Hill, Edward; Han, David; Dumouchel, Pierre; Dehak, Najim; Quatieri, Thomas; Moehs, Charles; Oscar-Berman, Marlene; Giordano, John; Simpatico, Thomas; Barh, Debmalya; Blum, Kenneth

    2013-01-01

    Addictions to illicit drugs are among the nation's most critical public health and societal problems. The current opioid prescription epidemic and the need for buprenorphine/naloxone (Suboxone®; SUBX) as an opioid maintenance substance, and its growing street diversion provided impetus to determine affective states ("true ground emotionality") in long-term SUBX patients. Toward the goal of effective monitoring, we utilized emotion-detection in speech as a measure of "true" emotionality in 36 SUBX patients compared to 44 individuals from the general population (GP) and 33 members of Alcoholics Anonymous (AA). Other less objective studies have investigated emotional reactivity of heroin, methadone and opioid abstinent patients. These studies indicate that current opioid users have abnormal emotional experience, characterized by heightened response to unpleasant stimuli and blunted response to pleasant stimuli. However, this is the first study to our knowledge to evaluate "true ground" emotionality in long-term buprenorphine/naloxone combination (Suboxone™). We found in long-term SUBX patients a significantly flat affect (p<0.01), and they had less self-awareness of being happy, sad, and anxious compared to both the GP and AA groups. We caution definitive interpretation of these seemingly important results until we compare the emotional reactivity of an opioid abstinent control using automatic detection in speech. These findings encourage continued research strategies in SUBX patients to target the specific brain regions responsible for relapse prevention of opioid addiction.

  11. Long term Suboxone™ emotional reactivity as measured by automatic detection in speech.

    Edward Hill

    Full Text Available Addictions to illicit drugs are among the nation's most critical public health and societal problems. The current opioid prescription epidemic and the need for buprenorphine/naloxone (Suboxone®; SUBX as an opioid maintenance substance, and its growing street diversion provided impetus to determine affective states ("true ground emotionality" in long-term SUBX patients. Toward the goal of effective monitoring, we utilized emotion-detection in speech as a measure of "true" emotionality in 36 SUBX patients compared to 44 individuals from the general population (GP and 33 members of Alcoholics Anonymous (AA. Other less objective studies have investigated emotional reactivity of heroin, methadone and opioid abstinent patients. These studies indicate that current opioid users have abnormal emotional experience, characterized by heightened response to unpleasant stimuli and blunted response to pleasant stimuli. However, this is the first study to our knowledge to evaluate "true ground" emotionality in long-term buprenorphine/naloxone combination (Suboxone™. We found in long-term SUBX patients a significantly flat affect (p<0.01, and they had less self-awareness of being happy, sad, and anxious compared to both the GP and AA groups. We caution definitive interpretation of these seemingly important results until we compare the emotional reactivity of an opioid abstinent control using automatic detection in speech. These findings encourage continued research strategies in SUBX patients to target the specific brain regions responsible for relapse prevention of opioid addiction.

  12. Automatic lung segmentation in CT images with accurate handling of the hilar region.

    De Nunzio, Giorgio; Tommasi, Eleonora; Agrusti, Antonella; Cataldo, Rosella; De Mitri, Ivan; Favetta, Marco; Maglio, Silvio; Massafra, Andrea; Quarta, Maurizio; Torsello, Massimo; Zecca, Ilaria; Bellotti, Roberto; Tangaro, Sabina; Calvini, Piero; Camarlinghi, Niccolò; Falaschi, Fabio; Cerello, Piergiorgio; Oliva, Piernicola

    2011-02-01

    A fully automated and three-dimensional (3D) segmentation method for the identification of the pulmonary parenchyma in thorax X-ray computed tomography (CT) datasets is proposed. It is meant to be used as pre-processing step in the computer-assisted detection (CAD) system for malignant lung nodule detection that is being developed by the Medical Applications in a Grid Infrastructure Connection (MAGIC-5) Project. In this new approach the segmentation of the external airways (trachea and bronchi), is obtained by 3D region growing with wavefront simulation and suitable stop conditions, thus allowing an accurate handling of the hilar region, notoriously difficult to be segmented. Particular attention was also devoted to checking and solving the problem of the apparent 'fusion' between the lungs, caused by partial-volume effects, while 3D morphology operations ensure the accurate inclusion of all the nodules (internal, pleural, and vascular) in the segmented volume. The new algorithm was initially developed and tested on a dataset of 130 CT scans from the Italung-CT trial, and was then applied to the ANODE09-competition images (55 scans) and to the LIDC database (84 scans), giving very satisfactory results. In particular, the lung contour was adequately located in 96% of the CT scans, with incorrect segmentation of the external airways in the remaining cases. Segmentation metrics were calculated that quantitatively express the consistency between automatic and manual segmentations: the mean overlap degree of the segmentation masks is 0.96 ± 0.02, and the mean and the maximum distance between the mask borders (averaged on the whole dataset) are 0.74 ± 0.05 and 4.5 ± 1.5, respectively, which confirms that the automatic segmentations quite correctly reproduce the borders traced by the radiologist. Moreover, no tissue containing internal and pleural nodules was removed in the segmentation process, so that this method proved to be fit for the use in the

  13. The Usefulness of Automatic Speech Recognition (ASR Eyespeak Software in Improving Iraqi EFL Students’ Pronunciation

    Lina Fathi Sidig Sidgi

    2017-02-01

    Full Text Available The present study focuses on determining whether automatic speech recognition (ASR technology is reliable for improving English pronunciation to Iraqi EFL students. Non-native learners of English are generally concerned about improving their pronunciation skills, and Iraqi students face difficulties in pronouncing English sounds that are not found in their native language (Arabic. This study is concerned with ASR and its effectiveness in overcoming this difficulty. The data were obtained from twenty participants randomly selected from first-year college students at Al-Turath University College from the Department of English in Baghdad-Iraq. The students had participated in a two month pronunciation instruction course using ASR Eyespeak software. At the end of the pronunciation instruction course using ASR Eyespeak software, the students completed a questionnaire to get their opinions about the usefulness of the ASR Eyespeak in improving their pronunciation. The findings of the study revealed that the students found ASR Eyespeak software very useful in improving their pronunciation and helping them realise their pronunciation mistakes. They also reported that learning pronunciation with ASR Eyespeak enjoyable.

  14. Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients

    TjongWan Sen

    2009-11-01

    Full Text Available To improve the performance of phoneme based Automatic Speech Recognition (ASR in noisy environment; we developed a new technique that could add robustness to clean phonemes features. These robust features are obtained from Complex Wavelet Packet Transform (CWPT coefficients. Since the CWPT coefficients represent all different frequency bands of the input signal, decomposing the input signal into complete CWPT tree would also cover all frequencies involved in recognition process. For time overlapping signals with different frequency contents, e. g. phoneme signal with noises, its CWPT coefficients are the combination of CWPT coefficients of phoneme signal and CWPT coefficients of noises. The CWPT coefficients of phonemes signal would be changed according to frequency components contained in noises. Since the numbers of phonemes in every language are relatively small (limited and already well known, one could easily derive principal component vectors from clean training dataset using Principal Component Analysis (PCA. These principal component vectors could be used then to add robustness and minimize noises effects in testing phase. Simulation results, using Alpha Numeric 4 (AN4 from Carnegie Mellon University and NOISEX-92 examples from Rice University, showed that this new technique could be used as features extractor that improves the robustness of phoneme based ASR systems in various adverse noisy conditions and still preserves the performance in clean environments.

  15. Automatically high accurate and efficient photomask defects management solution for advanced lithography manufacture

    Zhu, Jun; Chen, Lijun; Ma, Lantao; Li, Dejian; Jiang, Wei; Pan, Lihong; Shen, Huiting; Jia, Hongmin; Hsiang, Chingyun; Cheng, Guojie; Ling, Li; Chen, Shijie; Wang, Jun; Liao, Wenkui; Zhang, Gary

    2014-04-01

    Defect review is a time consuming job. Human error makes result inconsistent. The defects located on don't care area would not hurt the yield and no need to review them such as defects on dark area. However, critical area defects can impact yield dramatically and need more attention to review them such as defects on clear area. With decrease in integrated circuit dimensions, mask defects are always thousands detected during inspection even more. Traditional manual or simple classification approaches are unable to meet efficient and accuracy requirement. This paper focuses on automatic defect management and classification solution using image output of Lasertec inspection equipment and Anchor pattern centric image process technology. The number of mask defect found during an inspection is always in the range of thousands or even more. This system can handle large number defects with quick and accurate defect classification result. Our experiment includes Die to Die and Single Die modes. The classification accuracy can reach 87.4% and 93.3%. No critical or printable defects are missing in our test cases. The missing classification defects are 0.25% and 0.24% in Die to Die mode and Single Die mode. This kind of missing rate is encouraging and acceptable to apply on production line. The result can be output and reloaded back to inspection machine to have further review. This step helps users to validate some unsure defects with clear and magnification images when captured images can't provide enough information to make judgment. This system effectively reduces expensive inline defect review time. As a fully inline automated defect management solution, the system could be compatible with current inspection approach and integrated with optical simulation even scoring function and guide wafer level defect inspection.

  16. Fitmunk: improving protein structures by accurate, automatic modeling of side-chain conformations.

    Porebski, Przemyslaw Jerzy; Cymborowski, Marcin; Pasenkiewicz-Gierula, Marta; Minor, Wladek

    2016-02-01

    Improvements in crystallographic hardware and software have allowed automated structure-solution pipelines to approach a near-`one-click' experience for the initial determination of macromolecular structures. However, in many cases the resulting initial model requires a laborious, iterative process of refinement and validation. A new method has been developed for the automatic modeling of side-chain conformations that takes advantage of rotamer-prediction methods in a crystallographic context. The algorithm, which is based on deterministic dead-end elimination (DEE) theory, uses new dense conformer libraries and a hybrid energy function derived from experimental data and prior information about rotamer frequencies to find the optimal conformation of each side chain. In contrast to existing methods, which incorporate the electron-density term into protein-modeling frameworks, the proposed algorithm is designed to take advantage of the highly discriminatory nature of electron-density maps. This method has been implemented in the program Fitmunk, which uses extensive conformational sampling. This improves the accuracy of the modeling and makes it a versatile tool for crystallographic model building, refinement and validation. Fitmunk was extensively tested on over 115 new structures, as well as a subset of 1100 structures from the PDB. It is demonstrated that the ability of Fitmunk to model more than 95% of side chains accurately is beneficial for improving the quality of crystallographic protein models, especially at medium and low resolutions. Fitmunk can be used for model validation of existing structures and as a tool to assess whether side chains are modeled optimally or could be better fitted into electron density. Fitmunk is available as a web service at http://kniahini.med.virginia.edu/fitmunk/server/ or at http://fitmunk.bitbucket.org/.

  17. Automatic Speech Recognition and Training for Severely Dysarthric Users of Assistive Technology: The STARDUST Project

    Parker, Mark; Cunningham, Stuart; Enderby, Pam; Hawley, Mark; Green, Phil

    2006-01-01

    The STARDUST project developed robust computer speech recognizers for use by eight people with severe dysarthria and concomitant physical disability to access assistive technologies. Independent computer speech recognizers trained with normal speech are of limited functional use by those with severe dysarthria due to limited and inconsistent…

  18. What’s Wrong With Automatic Speech Recognition (ASR) and How Can We Fix It?

    2013-03-01

    Alex Acero , Proc. ICASSP 2005). The authors attempt to use an underlying hidden generative model and demonstrate improved performance on phonetic...Gunawardana and Alex Acero , International Conference on Speech Communication and Technology, International Speech Communication Association...Geoffrey Zweig and Alex Acero , ASRU 2011 (accepted) (pdf)(+tech report) o "The Subspace Gaussian Mixture Model – a Structured Model for Speech

  19. Novel Techniques for Dialectal Arabic Speech Recognition

    Elmahdy, Mohamed; Minker, Wolfgang

    2012-01-01

    Novel Techniques for Dialectal Arabic Speech describes approaches to improve automatic speech recognition for dialectal Arabic. Since speech resources for dialectal Arabic speech recognition are very sparse, the authors describe how existing Modern Standard Arabic (MSA) speech data can be applied to dialectal Arabic speech recognition, while assuming that MSA is always a second language for all Arabic speakers. In this book, Egyptian Colloquial Arabic (ECA) has been chosen as a typical Arabic dialect. ECA is the first ranked Arabic dialect in terms of number of speakers, and a high quality ECA speech corpus with accurate phonetic transcription has been collected. MSA acoustic models were trained using news broadcast speech. In order to cross-lingually use MSA in dialectal Arabic speech recognition, the authors have normalized the phoneme sets for MSA and ECA. After this normalization, they have applied state-of-the-art acoustic model adaptation techniques like Maximum Likelihood Linear Regression (MLLR) and M...

  20. Subjective Quality Measurement of Speech Its Evaluation, Estimation and Applications

    Kondo, Kazuhiro

    2012-01-01

    It is becoming crucial to accurately estimate and monitor speech quality in various ambient environments to guarantee high quality speech communication. This practical hands-on book shows speech intelligibility measurement methods so that the readers can start measuring or estimating speech intelligibility of their own system. The book also introduces subjective and objective speech quality measures, and describes in detail speech intelligibility measurement methods. It introduces a diagnostic rhyme test which uses rhyming word-pairs, and includes: An investigation into the effect of word familiarity on speech intelligibility. Speech intelligibility measurement of localized speech in virtual 3-D acoustic space using the rhyme test. Estimation of speech intelligibility using objective measures, including the ITU standard PESQ measures, and automatic speech recognizers.

  1. Automatic recognition of spontaneous emotions in speech using acoustic and lexical features

    Raaijmakers, S.; Truong, K.P.

    2008-01-01

    We developed acoustic and lexical classifiers, based on a boosting algorithm, to assess the separability on arousal and valence dimensions in spontaneous emotional speech. The spontaneous emotional speech data was acquired by inviting subjects to play a first-person shooter video game. Our acoustic

  2. Using competing speech to estimate articulatory automatization in children: the possible effect of masking level and subject grade.

    Manning, W H; Scheer, B R

    1978-09-01

    In order to study the possible influence of masking level and subject grade on a procedure for determining a child's articulatory automatization (Manning et al., 1976) 47 first and second grade and 49 third and fourth grade children were administered the McDonald Deep Test of Articulation under one of five conditions of auditory masking (earphones only, or presentation of competing speech at 50, 60, 70, or 80 dB SPL). Results indicated no significant difference in subject performance across the factors of masking level and subject grade. The findings suggest that these factors do not appear to be critical in the clinical application of the suggested procedure for estimating children's automatization of newly acquired phonemes.

  3. Highly accurate SVM model with automatic feature selection for word sense disambiguation

    王浩; 陈贵林; 吴连献

    2004-01-01

    A novel algorithm for word sense disambiguation(WSD) that is based on SVM model improved with automatic feature selection is introduced. This learning method employs rich contextual features to predict the proper senses for specific words. Experimental results show that this algorithm can achieve an execellent performance on the set of data released during the SENSEEVAL-2 competition. We present the results obtained and discuss the transplantation of this algorithm to other languages such as Chinese. Experimental results on Chinese corpus show that our algorithm achieves an accuracy of 70.0 % even with small training data.

  4. Automatic Rotation Recovery Algorithm for Accurate Digital Image and Video Watermarks Extraction

    Nasr addin Ahmed Salem Al-maweri

    2016-11-01

    Full Text Available Research in digital watermarking has evolved rapidly in the current decade. This evolution brought various different methods and algorithms for watermarking digital images and videos. Introduced methods in the field varies from weak to robust according to how tolerant the method is implemented to keep the existence of the watermark in the presence of attacks. Rotation attacks applied to the watermarked media is one of the serious attacks which many, if not most, algorithms cannot survive. In this paper, a new automatic rotation recovery algorithm is proposed. This algorithm can be plugged to any image or video watermarking algorithm extraction component. The main job for this method is to detect the geometrical distortion happens to the watermarked image/images sequence; recover the distorted scene to its original state in a blind and automatic way and then send it to be used by the extraction procedure. The work is limited to have a recovery process to zero padded rotations for now, cropped images after rotation is left as future work. The proposed algorithm is tested on top of extraction component. Both recovery accuracy and the extracted watermarks accuracy showed high performance level.

  5. Accurate and robust fully-automatic QCA: method and numerical validation.

    Hernández-Vela, Antonio; Gatta, Carlo; Escalera, Sergio; Igual, Laura; Martin-Yuste, Victoria; Radeva, Petia

    2011-01-01

    The Quantitative Coronary Angiography (QCA) is a methodology used to evaluate the arterial diseases and, in particular, the degree of stenosis. In this paper we propose AQCA, a fully automatic method for vessel segmentation based on graph cut theory. Vesselness, geodesic paths and a new multi-scale edgeness map are used to compute a globally optimal artery segmentation. We evaluate the method performance in a rigorous numerical way on two datasets. The method can detect an artery with precision 92.9 +/- 5% and sensitivity 94.2 +/- 6%. The average absolute distance error between detected and ground truth centerline is 1.13 +/- 0.11 pixels (about 0.27 +/- 0.025 mm) and the absolute relative error in the vessel caliber estimation is 2.93% with almost no bias. Moreover, the method can discriminate between arteries and catheter with an accuracy of 96.4%.

  6. Automatic generation of reaction energy databases from highly accurate atomization energy benchmark sets.

    Margraf, Johannes T; Ranasinghe, Duminda S; Bartlett, Rodney J

    2017-03-31

    In this contribution, we discuss how reaction energy benchmark sets can automatically be created from arbitrary atomization energy databases. As an example, over 11 000 reaction energies derived from the W4-11 database, as well as some relevant subsets are reported. Importantly, there is only very modest computational overhead involved in computing >11 000 reaction energies compared to 140 atomization energies, since the rate-determining step for either benchmark is performing the same 140 quantum chemical calculations. The performance of commonly used electronic structure methods for the new database is analyzed. This allows investigating the relationship between the performances for atomization and reaction energy benchmarks based on an identical set of molecules. The atomization energy is found to be a weak predictor for the overall usefulness of a method. The performance of density functional approximations in light of the number of empirically optimized parameters used in their design is also discussed.

  7. Estimation of Phoneme-Specific HMM Topologies for the Automatic Recognition of Dysarthric Speech

    Santiago-Omar Caballero-Morales

    2013-01-01

    Full Text Available Dysarthria is a frequently occurring motor speech disorder which can be caused by neurological trauma, cerebral palsy, or degenerative neurological diseases. Because dysarthria affects phonation, articulation, and prosody, spoken communication of dysarthric speakers gets seriously restricted, affecting their quality of life and confidence. Assistive technology has led to the development of speech applications to improve the spoken communication of dysarthric speakers. In this field, this paper presents an approach to improve the accuracy of HMM-based speech recognition systems. Because phonatory dysfunction is a main characteristic of dysarthric speech, the phonemes of a dysarthric speaker are affected at different levels. Thus, the approach consists in finding the most suitable type of HMM topology (Bakis, Ergodic for each phoneme in the speaker’s phonetic repertoire. The topology is further refined with a suitable number of states and Gaussian mixture components for acoustic modelling. This represents a difference when compared with studies where a single topology is assumed for all phonemes. Finding the suitable parameters (topology and mixtures components is performed with a Genetic Algorithm (GA. Experiments with a well-known dysarthric speech database showed statistically significant improvements of the proposed approach when compared with the single topology approach, even for speakers with severe dysarthria.

  8. User Experience of a Mobile Speaking Application with Automatic Speech Recognition for EFL Learning

    Ahn, Tae youn; Lee, Sangmin-Michelle

    2016-01-01

    With the spread of mobile devices, mobile phones have enormous potential regarding their pedagogical use in language education. The goal of this study is to analyse user experience of a mobile-based learning system that is enhanced by speech recognition technology for the improvement of EFL (English as a foreign language) learners' speaking…

  9. Automatically feeded firewood machine, efficient, safe and accurate; Automaattisyoettoeinen klapikone. Tehokas, turvallinen ja varma

    Riikkilae, M. [Metsaelehti, Helsinki (Finland)

    1994-12-31

    A new firewood chopping machine, developed by engineering workshop Kalevi Peurala, has an automated feed of the stems, which speeds up the chopping. The feeding roll sucks in the stem so the operator can go to fetch a new stem simultaneously. Additionally, the automatic feed lightens the chopping work. There is no need for holding the stem as the cutting and chopping blades split the wood into firewood. The automation also minimizes the dangers of the chopping work because the machine is totally encapsulated. The operator need not to go nearer the machine than to about one meters distance from the machine. Power supply is totally mechanical, arranged using a Cardan shaft. The power demand is low. The operator push the stem into the feeding channel, there a hook rising from the bottom of the channel draws the block between the draw-in roller and the guidance channel, pressed downwards with spring. The roller feeds the stem simultaneously towards the cutting blades. The blades roll in diverse directions. The 25 cm broad blades are pressed into the wood in slightly inclined position simultaneously from the top and the bottom of the cutter. The carving cutting diminishes the power demand. The splitting occurs simultaneously with the cutting. In the middle of the both cutting blades there are splitting dog which penetrate the block both downwards and upwards so the block splits simultaneously as it is cut. The length of the chopwood is controlled by changing the chain wheels. The machine contains chain wheels for 30 cm and 45 cm chopwood as standard equipment. The changing of the chain wheels takes about five minutes. Special tools are not needed. The maximum diameter of the processible wood is 20 cm. The machine is patented in Finland. The price of the machine is 30 000 FIM

  10. Fast, automatic, and accurate catheter reconstruction in HDR brachytherapy using an electromagnetic 3D tracking system

    Poulin, Eric; Racine, Emmanuel; Beaulieu, Luc, E-mail: Luc.Beaulieu@phy.ulaval.ca [Département de physique, de génie physique et d’optique et Centre de recherche sur le cancer de l’Université Laval, Université Laval, Québec, Québec G1V 0A6, Canada and Département de radio-oncologie et Axe Oncologie du Centre de recherche du CHU de Québec, CHU de Québec, 11 Côte du Palais, Québec, Québec G1R 2J6 (Canada); Binnekamp, Dirk [Integrated Clinical Solutions and Marketing, Philips Healthcare, Veenpluis 4-6, Best 5680 DA (Netherlands)

    2015-03-15

    Purpose: In high dose rate brachytherapy (HDR-B), current catheter reconstruction protocols are relatively slow and error prone. The purpose of this technical note is to evaluate the accuracy and the robustness of an electromagnetic (EM) tracking system for automated and real-time catheter reconstruction. Methods: For this preclinical study, a total of ten catheters were inserted in gelatin phantoms with different trajectories. Catheters were reconstructed using a 18G biopsy needle, used as an EM stylet and equipped with a miniaturized sensor, and the second generation Aurora{sup ®} Planar Field Generator from Northern Digital Inc. The Aurora EM system provides position and orientation value with precisions of 0.7 mm and 0.2°, respectively. Phantoms were also scanned using a μCT (GE Healthcare) and Philips Big Bore clinical computed tomography (CT) system with a spatial resolution of 89 μm and 2 mm, respectively. Reconstructions using the EM stylet were compared to μCT and CT. To assess the robustness of the EM reconstruction, five catheters were reconstructed twice and compared. Results: Reconstruction time for one catheter was 10 s, leading to a total reconstruction time inferior to 3 min for a typical 17-catheter implant. When compared to the μCT, the mean EM tip identification error was 0.69 ± 0.29 mm while the CT error was 1.08 ± 0.67 mm. The mean 3D distance error was found to be 0.66 ± 0.33 mm and 1.08 ± 0.72 mm for the EM and CT, respectively. EM 3D catheter trajectories were found to be more accurate. A maximum difference of less than 0.6 mm was found between successive EM reconstructions. Conclusions: The EM reconstruction was found to be more accurate and precise than the conventional methods used for catheter reconstruction in HDR-B. This approach can be applied to any type of catheters and applicators.

  11. Long Term Suboxone™ Emotional Reactivity As Measured by Automatic Detection in Speech

    2013-01-01

    Addictions to illicit drugs are among the nation’s most critical public health and societal problems. The current opioid prescription epidemic and the need for buprenorphine/naloxone (Suboxone®; SUBX) as an opioid maintenance substance, and its growing street diversion provided impetus to determine affective states (“true ground emotionality”) in long-term SUBX patients. Toward the goal of effective monitoring, we utilized emotion-detection in speech as a measure of “true” emotionality in 36 ...

  12. An automatic method for fast and accurate liver segmentation in CT images using a shape detection level set method

    Lee, Jeongjin; Kim, Namkug; Lee, Ho; Seo, Joon Beom; Won, Hyung Jin; Shin, Yong Moon; Shin, Yeong Gil

    2007-03-01

    Automatic liver segmentation is still a challenging task due to the ambiguity of liver boundary and the complex context of nearby organs. In this paper, we propose a faster and more accurate way of liver segmentation in CT images with an enhanced level set method. The speed image for level-set propagation is smoothly generated by increasing number of iterations in anisotropic diffusion filtering. This prevents the level-set propagation from stopping in front of local minima, which prevails in liver CT images due to irregular intensity distributions of the interior liver region. The curvature term of shape modeling level-set method captures well the shape variations of the liver along the slice. Finally, rolling ball algorithm is applied for including enhanced vessels near the liver boundary. Our approach are tested and compared to manual segmentation results of eight CT scans with 5mm slice distance using the average distance and volume error. The average distance error between corresponding liver boundaries is 1.58 mm and the average volume error is 2.2%. The average processing time for the segmentation of each slice is 5.2 seconds, which is much faster than the conventional ones. Accurate and fast result of our method will expedite the next stage of liver volume quantification for liver transplantations.

  13. An automatic speech recognition system with speaker-independent identification support

    Caranica, Alexandru; Burileanu, Corneliu

    2015-02-01

    The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.

  14. A Robust Front-End Processor combining Mel Frequency Cepstral Coefficient and Sub-band Spectral Centroid Histogram methods for Automatic Speech Recognition

    R. Thangarajan

    2009-06-01

    Full Text Available Environmental robustness is an important area of research in speech recognition. Mismatch between trained speech models and actual speech to be recognized is due to factors like background noise. It can cause severe degradation in the accuracy of recognizers whichare based on commonly used features like mel-frequency cepstral co-efficient (MFCC and linear predictive coding (LPC. It is well understood that all previous auditory based feature extraction methods perform extremely well in terms of robustness due to the dominantfrequency information present in them. But these methods suffer from high computational cost. Another method called sub-band spectral centroid histograms (SSCH integrates dominant-frequency information with sub-band power information. This method is based onsub-band spectral centroids (SSC which are closely related to spectral peaks for both clean and noisy speech. Since SSC can be computed efficiently from short-term speech power spectrum estimate, SSCH method is quite robust to background additive noise at a lowercomputational cost. It has been noted that MFCC method outperforms SSCH method in the case of clean speech. However in the case of speech with additive noise, MFCC method degrades substantially. In this paper, both MFCC and SSCH feature extraction have beenimplemented in Carnegie Melon University (CMU Sphinx 4.0 and trained and tested on AN4 database for clean and noisy speech. Finally, a robust speech recognizer which automatically employs either MFCC or SSCH feature extraction methods based on the variance of shortterm power of the input utterance is suggested.

  15. Automatic pronunciation error detection in non-native speech: the case of vowel errors in Dutch.

    van Doremalen, Joost; Cucchiarini, Catia; Strik, Helmer

    2013-08-01

    This research is aimed at analyzing and improving automatic pronunciation error detection in a second language. Dutch vowels spoken by adult non-native learners of Dutch are used as a test case. A first study on Dutch pronunciation by L2 learners with different L1s revealed that vowel pronunciation errors are relatively frequent and often concern subtle acoustic differences between the realization and the target sound. In a second study automatic pronunciation error detection experiments were conducted to compare existing measures to a metric that takes account of the error patterns observed to capture relevant acoustic differences. The results of the two studies do indeed show that error patterns bear information that can be usefully employed in weighted automatic measures of pronunciation quality. In addition, it appears that combining such a weighted metric with existing measures improves the equal error rate by 6.1 percentage points from 0.297, for the Goodness of Pronunciation (GOP) algorithm, to 0.236.

  16. A Global Approach to Accurate and Automatic Quantitative Analysis of NMR Spectra by Complex Least-Squares Curve Fitting

    Martin, Y. L.

    The performance of quantitative analysis of 1D NMR spectra depends greatly on the choice of the NMR signal model. Complex least-squares analysis is well suited for optimizing the quantitative determination of spectra containing a limited number of signals (20). From a general point of view it is concluded, on the basis of mathematical considerations and numerical simulations, that, in the absence of truncation of the free-induction decay, complex least-squares curve fitting either in the time or in the frequency domain and linear-prediction methods are in fact nearly equivalent and give identical results. However, in the situation considered, complex least-squares analysis in the frequency domain is more flexible since it enables the quality of convergence to be appraised at every resonance position. An efficient data-processing strategy has been developed which makes use of an approximate conjugate-gradient algorithm. All spectral parameters (frequency, damping factors, amplitudes, phases, initial delay associated with intensity, and phase parameters of a baseline correction) are simultaneously managed in an integrated approach which is fully automatable. The behavior of the error as a function of the signal-to-noise ratio is theoretically estimated, and the influence of apodization is discussed. The least-squares curve fitting is theoretically proved to be the most accurate approach for quantitative analysis of 1D NMR data acquired with reasonable signal-to-noise ratio. The method enables complex spectral residuals to be sorted out. These residuals, which can be cumulated thanks to the possibility of correcting for frequency shifts and phase errors, extract systematic components, such as isotopic satellite lines, and characterize the shape and the intensity of the spectral distortion with respect to the Lorentzian model. This distortion is shown to be nearly independent of the chemical species, of the nature of the molecular site, and of the type of nucleus, but

  17. Amharic Speech Recognition for Speech Translation

    Melese, Michael; Besacier, Laurent; Meshesha, Million

    2016-01-01

    International audience; The state-of-the-art speech translation can be seen as a cascade of Automatic Speech Recognition, Statistical Machine Translation and Text-To-Speech synthesis. In this study an attempt is made to experiment on Amharic speech recognition for Amharic-English speech translation in tourism domain. Since there is no Amharic speech corpus, we developed a read-speech corpus of 7.43hr in tourism domain. The Amharic speech corpus has been recorded after translating standard Bas...

  18. Integrating Automatic Speech Recognition and Machine Translation for Better Translation Outputs

    Liyanapathirana, Jeevanthi

    than typing, making the translation process faster. The spoken translation is analyzed and combined with the machine translation output of the same sentence using different methods. We study a number of different translation models in the context of n-best list rescoring methods. As an alternative...... to the n-best list rescoring, we also use word graphs with the expectation of arriving at a tighter integration of ASR and MT models. Integration methods include constraining ASR models using language and translation models of MT, and vice versa. We currently develop and experiment different methods...... on the Danish – English language pair, with the use of a speech corpora and parallel text. The methods are investigated to check ways that the accuracy of the spoken translation of the translator can be increased with the use of machine translation outputs, which would be useful for potential computer...

  19. Speech Indexing

    Ordelman, R.J.F.; Jong, de F.M.G.; Leeuwen, van D.A.; Blanken, H.M.; de Vries, A.P.; Blok, H.E.; Feng, L.

    2007-01-01

    This chapter will focus on the automatic extraction of information from the speech in multimedia documents. This approach is often referred to as speech indexing and it can be regarded as a subfield of audio indexing that also incorporates for example the analysis of music and sounds. If the objecti

  20. Optimizing Automatic Speech Recognition for Low-Proficient Non-Native Speakers

    Catia Cucchiarini

    2010-01-01

    Full Text Available Computer-Assisted Language Learning (CALL applications for improving the oral skills of low-proficient learners have to cope with non-native speech that is particularly challenging. Since unconstrained non-native ASR is still problematic, a possible solution is to elicit constrained responses from the learners. In this paper, we describe experiments aimed at selecting utterances from lists of responses. The first experiment on utterance selection indicates that the decoding process can be improved by optimizing the language model and the acoustic models, thus reducing the utterance error rate from 29–26% to 10–8%. Since giving feedback on incorrectly recognized utterances is confusing, we verify the correctness of the utterance before providing feedback. The results of the second experiment on utterance verification indicate that combining duration-related features with a likelihood ratio (LR yield an equal error rate (EER of 10.3%, which is significantly better than the EER for the other measures in isolation.

  1. Security and Hyper-accurate Positioning Monitoring with Automatic Dependent Surveillance-Broadcast (ADS-B) Project

    National Aeronautics and Space Administration — Lightning Ridge Technologies, working in collaboration with The Innovation Laboratory, Inc., extend Automatic Dependent Surveillance Broadcast (ADS-B) into a safe,...

  2. Security and Hyper-accurate Positioning Monitoring with Automatic Dependent Surveillance-Broadcast (ADS-B) Project

    National Aeronautics and Space Administration — Lightning Ridge Technologies, LLC, working in collaboration with The Innovation Laboratory, Inc., extend Automatic Dependent Surveillance ? Broadcast (ADS-B) into a...

  3. Automatic pose initialization for accurate 2D/3D registration applied to abdominal aortic aneurysm endovascular repair

    Miao, Shun; Lucas, Joseph; Liao, Rui

    2012-02-01

    Minimally invasive abdominal aortic aneurysm (AAA) stenting can be greatly facilitated by overlaying the preoperative 3-D model of the abdominal aorta onto the intra-operative 2-D X-ray images. Accurate 2-D/3-D registration in 3-D space makes the 2-D/3-D overlay robust to the change of C-Arm angulations. By far, the 2-D/3-D registration methods based on simulated X-ray projection images using multiple image planes have been shown to be able to provide satisfactory 3-D registration accuracy. However, one drawback of the intensity-based 2-D/3-D registration methods is that the similarity measure is usually highly non-convex and hence the optimizer can easily be trapped into local minima. User interaction therefore is often needed in the initialization of the position of the 3-D model in order to get a successful 2-D/3-D registration. In this paper, a novel 3-D pose initialization technique is proposed, as an extension of our previously proposed bi-plane 2-D/3-D registration method for AAA intervention [4]. The proposed method detects vessel bifurcation points and spine centerline in both 2-D and 3-D images, and utilizes landmark information to bring the 3-D volume into a 15mm capture range. The proposed landmark detection method was validated on real dataset, and is shown to be able to provide a good initialization for 2-D/3-D registration in [4], thus making the workflow fully automatic.

  4. A fully automatic tool to perform accurate flood mapping by merging remote sensing imagery and ancillary data

    D'Addabbo, Annarita; Refice, Alberto; Lovergine, Francesco; Pasquariello, Guido

    2016-04-01

    Flooding is one of the most frequent and expansive natural hazard. High-resolution flood mapping is an essential step in the monitoring and prevention of inundation hazard, both to gain insight into the processes involved in the generation of flooding events, and from the practical point of view of the precise assessment of inundated areas. Remote sensing data are recognized to be useful in this respect, thanks to the high resolution and regular revisit schedules of state-of-the-art satellites, moreover offering a synoptic overview of the extent of flooding. In particular, Synthetic Aperture Radar (SAR) data present several favorable characteristics for flood mapping, such as their relative insensitivity to the meteorological conditions during acquisitions, as well as the possibility of acquiring independently of solar illumination, thanks to the active nature of the radar sensors [1]. However, flood scenarios are typical examples of complex situations in which different factors have to be considered to provide accurate and robust interpretation of the situation on the ground: the presence of many land cover types, each one with a particular signature in presence of flood, requires modelling the behavior of different objects in the scene in order to associate them to flood or no flood conditions [2]. Generally, the fusion of multi-temporal, multi-sensor, multi-resolution and/or multi-platform Earth observation image data, together with other ancillary information, seems to have a key role in the pursuit of a consistent interpretation of complex scenes. In the case of flooding, distance from the river, terrain elevation, hydrologic information or some combination thereof can add useful information to remote sensing data. Suitable methods, able to manage and merge different kind of data, are so particularly needed. In this work, a fully automatic tool, based on Bayesian Networks (BNs) [3] and able to perform data fusion, is presented. It supplies flood maps

  5. Assessing the Performance of Automatic Speech Recognition Systems When Used by Native and Non-Native Speakers of Three Major Languages in Dictation Workflows

    Zapata, Julián; Kirkedal, Andreas Søeborg

    2015-01-01

    In this paper, we report on a two-part experiment aiming to assess and compare the performance of two types of automatic speech recognition (ASR) systems on two different computational platforms when used to augment dictation workflows. The experiment was performed with a sample of speakers...... of three major languages and with different linguistic profiles: non-native English speakers; non-native French speakers; and native Spanish speakers. The main objective of this experiment is to examine ASR performance in translation dictation (TD) and medical dictation (MD) workflows without manual...

  6. Full automatic fiducial marker detection on coil arrays for accurate instrumentation placement during MRI guided breast interventions

    Filippatos, Konstantinos; Boehler, Tobias; Geisler, Benjamin; Zachmann, Harald; Twellmann, Thorsten

    2010-02-01

    With its high sensitivity, dynamic contrast-enhanced MR imaging (DCE-MRI) of the breast is today one of the first-line tools for early detection and diagnosis of breast cancer, particularly in the dense breast of young women. However, many relevant findings are very small or occult on targeted ultrasound images or mammography, so that MRI guided biopsy is the only option for a precise histological work-up [1]. State-of-the-art software tools for computer-aided diagnosis of breast cancer in DCE-MRI data offer also means for image-based planning of biopsy interventions. One step in the MRI guided biopsy workflow is the alignment of the patient position with the preoperative MR images. In these images, the location and orientation of the coil localization unit can be inferred from a number of fiducial markers, which for this purpose have to be manually or semi-automatically detected by the user. In this study, we propose a method for precise, full-automatic localization of fiducial markers, on which basis a virtual localization unit can be subsequently placed in the image volume for the purpose of determining the parameters for needle navigation. The method is based on adaptive thresholding for separating breast tissue from background followed by rigid registration of marker templates. In an evaluation of 25 clinical cases comprising 4 different commercial coil array models and 3 different MR imaging protocols, the method yielded a sensitivity of 0.96 at a false positive rate of 0.44 markers per case. The mean distance deviation between detected fiducial centers and ground truth information that was appointed from a radiologist was 0.94mm.

  7. The software for automatic creation of the formal grammars used by speech recognition, computer vision, editable text conversion systems, and some new functions

    Kardava, Irakli; Tadyszak, Krzysztof; Gulua, Nana; Jurga, Stefan

    2017-02-01

    For more flexibility of environmental perception by artificial intelligence it is needed to exist the supporting software modules, which will be able to automate the creation of specific language syntax and to make a further analysis for relevant decisions based on semantic functions. According of our proposed approach, of which implementation it is possible to create the couples of formal rules of given sentences (in case of natural languages) or statements (in case of special languages) by helping of computer vision, speech recognition or editable text conversion system for further automatic improvement. In other words, we have developed an approach, by which it can be achieved to significantly improve the training process automation of artificial intelligence, which as a result will give us a higher level of self-developing skills independently from us (from users). At the base of our approach we have developed a software demo version, which includes the algorithm and software code for the entire above mentioned component's implementation (computer vision, speech recognition and editable text conversion system). The program has the ability to work in a multi - stream mode and simultaneously create a syntax based on receiving information from several sources.

  8. Nonlinear spectro-temporal features based on a cochlear model for automatic speech recognition in a noisy situation.

    Choi, Yong-Sun; Lee, Soo-Young

    2013-09-01

    A nonlinear speech feature extraction algorithm was developed by modeling human cochlear functions, and demonstrated as a noise-robust front-end for speech recognition systems. The algorithm was based on a model of the Organ of Corti in the human cochlea with such features as such as basilar membrane (BM), outer hair cells (OHCs), and inner hair cells (IHCs). Frequency-dependent nonlinear compression and amplification of OHCs were modeled by lateral inhibition to enhance spectral contrasts. In particular, the compression coefficients had frequency dependency based on the psychoacoustic evidence. Spectral subtraction and temporal adaptation were applied in the time-frame domain. With long-term and short-term adaptation characteristics, these factors remove stationary or slowly varying components and amplify the temporal changes such as onset or offset. The proposed features were evaluated with a noisy speech database and showed better performance than the baseline methods such as mel-frequency cepstral coefficients (MFCCs) and RASTA-PLP in unknown noisy conditions.

  9. Cross-modal retrieval of scripted speech audio

    Owen, Charles B.; Makedon, Fillia

    1997-12-01

    This paper describes an approach to the problem of searching speech-based digital audio using cross-modal information retrieval. Audio containing speech (speech-based audio) is difficult to search. Open vocabulary speech recognition is advancing rapidly, but cannot yield high accuracy in either search or transcription modalities. However, text can be searched quickly and efficiently with high accuracy. Script- light digital audio is audio that has an available transcription. This is a surprisingly large class of content including legal testimony, broadcasting, dramatic productions and political meetings and speeches. An automatic mechanism for deriving the synchronization between the transcription and the audio allows for very accurate retrieval of segments of that audio. The mechanism described in this paper is based on building a transcription graph from the text and computing biphone probabilities for the audio. A modified beam search algorithm is presented to compute the alignment.

  10. 基于DIVA模型的语音-映射单元自动获取%Automatic acquisition of speech sound-target cells based on DIVA model

    张少白; 刘欣

    2013-01-01

    针对DIVA模型中存在的“感知能力与语音生成技巧发育不平衡”问题,提出了一种自动获取语音-映射单元的方法。该方法将人耳模拟为一个具有不同带宽的并联带通滤波器组,分别与模型中21维度的听觉存储空间相关联,对不同听觉的不同反应,分别考虑其频带的屏蔽效应、听觉响度与频率的关系。在读取语音输入信号的过程中,模型能较好地获得初始听觉表示,其方式与婴儿咿呀学语的过程基本一致。仿真实验表明,通过边界定义、相似性比较以及搜索更新等步骤,此方法能很好地进行初始输入模式的自组织匹配,并最终使DIVA模型更具语音获取的自然特性。%Contraposing the shortage of Directions Into Velocities of Articulators ( DIVA) model about“infants per-ceptual abilities do develop faster at first than their speech production skills”, the paper presents an automatic ac-quisition method of speech sound-target cells. The method simulates the human ear as a parallel band-pass filter group with different bandwidth and associates respectively;the filter with the 21-dimensional storage space of audi-tory sense in DIVA model. This method was done in order for different auditory reactions, the shielding effect of fre-quency band, sound loudness, and frequency relation could be considered respectively for this study. In the process of reading the input signal of speech, the model can acquire good initial hearing and the process is consistent with baby's babble. The simulation results show that through boundary definition, similarity comparison, searching and updates and so on, the method has nicer self-organized pattern matching effect for initial input, which makes the DIVA model a more natural characteristic regarding speech acquisition.

  11. Study on automatic prediction of sentential stress for Chinese Putonghua Text-to-Speech system with natural style

    SHAO Yanqiu; HAN Jiqing; ZHAO Yongzhen; LIU Ting

    2007-01-01

    Stress is an important parameter for prosody processing in speech synthesis. In this paper, we compare the acoustic features of neutral tone syllables and strong stress syllables with moderate stress syllables, including pitch, syllable duration, intensity and pause length after syllable. The relation between duration and pitch, as well as the Third Tone (T3) and pitch are also studied. Three stress prediction models based on ANN, i.e. the acoustic model,the linguistic model and the mixed model, are presented for predicting Chinese sentential stress.The results show that the mixed model performs better than the other two models. In order to solve the problem of the diversity of manual labeling, an evaluation index of support ratio is proposed.

  12. Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD

    Kraljevski, Ivan; Tan, Zheng-Hua; Paola Bissiri, Maria

    2015-01-01

    This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the ......This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions...... and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels....

  13. Speech processing in mobile environments

    Rao, K Sreenivasa

    2014-01-01

    This book focuses on speech processing in the presence of low-bit rate coding and varying background environments. The methods presented in the book exploit the speech events which are robust in noisy environments. Accurate estimation of these crucial events will be useful for carrying out various speech tasks such as speech recognition, speaker recognition and speech rate modification in mobile environments. The authors provide insights into designing and developing robust methods to process the speech in mobile environments. Covering temporal and spectral enhancement methods to minimize the effect of noise and examining methods and models on speech and speaker recognition applications in mobile environments.

  14. Recognizing GSM Digital Speech

    2005-01-01

    The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech recognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source c...

  15. A Blueprint for a Comprehensive Australian English Auditory-Visual Speech Corpus

    Burnham, D.; Ambikairajah, E.; Arciuli, J.; Bennamoun, M.; Best, C.T.; Bird, S.; Butcher, A.R.; Cassidy, S.; Chetty, G.; Cox, F.M.; Cutler, A.; Dale, R.; Epps, J.R.; Fletcher, J.M.; Goecke, R.; Grayden, D.B.; Hajek, J.T.; Ingram, J.C.; Ishihara, S.; Kemp, N.; Kinoshita, Y.; Kuratate, T.; Lewis, T.W.; Loakes, D.E.; Onslow, M.; Powers, D.M.; Rose, P.; Togneri, R.; Tran, D.; Wagner, M.

    2009-01-01

    Large auditory-visual (AV) speech corpora are the grist of modern research in speech science, but no such corpus exists for Australian English. This is unfortunate, for speech science is the brains behind speech technology and applications such as text-to-speech (TTS) synthesis, automatic speech rec

  16. Pattern recognition in speech and language processing

    Chou, Wu

    2003-01-01

    Minimum Classification Error (MSE) Approach in Pattern Recognition, Wu ChouMinimum Bayes-Risk Methods in Automatic Speech Recognition, Vaibhava Goel and William ByrneA Decision Theoretic Formulation for Adaptive and Robust Automatic Speech Recognition, Qiang HuoSpeech Pattern Recognition Using Neural Networks, Shigeru KatagiriLarge Vocabulary Speech Recognition Based on Statistical Methods, Jean-Luc GauvainToward Spontaneous Speech Recognition and Understanding, Sadaoki FuruiSpeaker Authentication, Qi Li and Biing-Hwang JuangHMMs for Language Processing Problems, Ri

  17. Speech transmission index from running speech: A neural network approach

    Li, F. F.; Cox, T. J.

    2003-04-01

    Speech transmission index (STI) is an important objective parameter concerning speech intelligibility for sound transmission channels. It is normally measured with specific test signals to ensure high accuracy and good repeatability. Measurement with running speech was previously proposed, but accuracy is compromised and hence applications limited. A new approach that uses artificial neural networks to accurately extract the STI from received running speech is developed in this paper. Neural networks are trained on a large set of transmitted speech examples with prior knowledge of the transmission channels' STIs. The networks perform complicated nonlinear function mappings and spectral feature memorization to enable accurate objective parameter extraction from transmitted speech. Validations via simulations demonstrate the feasibility of this new method on a one-net-one-speech extract basis. In this case, accuracy is comparable with normal measurement methods. This provides an alternative to standard measurement techniques, and it is intended that the neural network method can facilitate occupied room acoustic measurements.

  18. Speech Recognition on Mobile Devices

    Tan, Zheng-Hua; Lindberg, Børge

    2010-01-01

    The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR...... in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within...... command and control, text entry and search are presented with an emphasis on mobile text entry....

  19. Speech Problems

    ... of your treatment plan may include seeing a speech therapist , a person who is trained to treat speech disorders. How often you have to see the speech therapist will vary — you'll probably start out seeing ...

  20. INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM

    J. SANGEETHA

    2015-02-01

    Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.

  1. Automatic Reading

    胡迪

    2007-01-01

    <正>Reading is the key to school success and,like any skill,it takes practice.A child learns to walk by practising until he no longer has to think about how to put one foot in front of the other.The great athlete practises until he can play quickly,accurately and without thinking.Ed- ucators call it automaticity.

  2. Automatic discovery on auxiliary word usage-based part-of-speech and segmentation errors for Chinese language%基于助词用法的汉语词性、分词错误自动发现

    韩英杰; 张坤丽; 昝红英; 柴玉梅

    2011-01-01

    During the construction of auxiliary words knowledge base, used rule-based automatic annotation on auxiliary word's usage.After automatic annotation, found words part-of-speech and segmentation errors in annotated corpus.The discovery is benefit for the high quality chinese corpus and the development of the processing depth.%在构建助词知识库、标注大规模语料过程中使用了基于规则的助词用法自动标注的方法;对标注后的语料,发现基于规则的助词用法自动标注方法能够自动发现语料的部分词性、分词错误.这些错误的发现对研制高质量的语料库起到了积极的促进作用,并将语料加工深度向前推进.

  3. A real-time phoneme counting algorithm and application for speech rate monitoring.

    Aharonson, Vered; Aharonson, Eran; Raichlin-Levi, Katia; Sotzianu, Aviv; Amir, Ofer; Ovadia-Blechman, Zehava

    2017-03-01

    Adults who stutter can learn to control and improve their speech fluency by modifying their speaking rate. Existing speech therapy technologies can assist this practice by monitoring speaking rate and providing feedback to the patient, but cannot provide an accurate, quantitative measurement of speaking rate. Moreover, most technologies are too complex and costly to be used for home practice. We developed an algorithm and a smartphone application that monitor a patient's speaking rate in real time and provide user-friendly feedback to both patient and therapist. Our speaking rate computation is performed by a phoneme counting algorithm which implements spectral transition measure extraction to estimate phoneme boundaries. The algorithm is implemented in real time in a mobile application that presents its results in a user-friendly interface. The application incorporates two modes: one provides the patient with visual feedback of his/her speech rate for self-practice and another provides the speech therapist with recordings, speech rate analysis and tools to manage the patient's practice. The algorithm's phoneme counting accuracy was validated on ten healthy subjects who read a paragraph at slow, normal and fast paces, and was compared to manual counting of speech experts. Test-retest and intra-counter reliability were assessed. Preliminary results indicate differences of -4% to 11% between automatic and human phoneme counting. Differences were largest for slow speech. The application can thus provide reliable, user-friendly, real-time feedback for speaking rate control practice.

  4. Activation and application of an obligatory phonotactic constraint in German during automatic speech processing is revealed by human event-related potentials.

    Steinberg, Johanna; Truckenbrodt, Hubert; Jacobsen, Thomas

    2010-07-01

    In auditory speech processing, implicit linguistic knowledge is activated and applied on phonetic and segment-related phonological processing level even if the perceived sound sequence is outside the focus of attention. In this study, the effects of language-specific phonotactic restrictions on pre-attentive auditory speech processing were investigated, using the Mismatch Negativity component of the human event-related brain potential. In German grammar, the distribution of the velar and the palatal dorsal fricative is limited by an obligatory phonotactic constraint, Dorsal Fricative Assimilation, which demands that a vowel and a following dorsal fricative must have the same specifications for articulatory backness. For passive oddball stimulation, we used three phonotactically correct VC syllables and one incorrect VC syllable, composed of the vowels [epsilon] and [open o] and the fricatives [ç] and []. Stimuli were contrasted pairwise in experimental oddball blocks in a way that they differed in regard to their respective vowel but shared the fricative. Additionally to the usual Mismatch Negativity which is attributable to the change of the initial vowel and which was elicited by all deviants, we observed a second negative deflection in the deviant ERP elicited by the phonotactically ill-formed syllable only. This negativity cannot be attributed to any acoustical or phonemic difference between standard and deviant, it rather reflects the effect of a phonotactic evaluation process after both sounds of the syllable were identified. Our finding suggests that implicit phonotactic knowledge is activated and applied even outside the focus of the participants' attention.

  5. 维吾尔语语音识别中发音变异现象%Uyghur pronunciation variations in automatic speech recognition systems

    杨雅婷; 马博; 王磊; 吐尔洪·吾司曼; 李晓

    2011-01-01

    The recognition rate in Uyghur standard speech recognition systems is relatively low due to pronunciation variations in the corpus.The assimilation,weakening,loss,and vowel harmony in Uyghur speech were analyzed to provide a better understanding of the voice and rhythm characteristics.The Uyghur accent pronunciation variation rules are summarized by combining knowledge-based and data-driven methods with mapping of vowel and consonant pairs and developing of a phoneme confusion matrix.%维语口语发音中很多音素相对标准语产生了发音变异,基于标准语音的识别系统在识别带有发音变异的口语语料时识别率较低。该文针对维吾尔语同化、弱化、脱落、元音和谐等语流音变难点进行分析,对语音、韵律特性进行知识融合与技术创新,运用基于数据驱动和基于专家经验相结合的方法对维吾尔语方言口语中存在的发音变异现象进行研究,统计元音、辅音多发音变化映射对,建立音素混淆矩阵,为维吾尔语方言口语语音识别研究奠定基础。

  6. How should a speech recognizer work?

    Scharenborg, O.E.; Norris, D.G.; Bosch, L.F.M. ten; McQueen, J.M.

    2005-01-01

    Although researchers studying human speech recognition (HSR) and automatic speech recognition (ASR) share a common interest in how information processing systems (human or machine) recognize spoken language, there is little communication between the two disciplines. We suggest that this lack of comm

  7. Second Language Learners and Speech Act Comprehension

    Holtgraves, Thomas

    2007-01-01

    Recognizing the specific speech act ( Searle, 1969) that a speaker performs with an utterance is a fundamental feature of pragmatic competence. Past research has demonstrated that native speakers of English automatically recognize speech acts when they comprehend utterances (Holtgraves & Ashley, 2001). The present research examined whether this…

  8. Annotating Speech Corpus for Prosody Modeling in Indian Language Text to Speech Systems

    Kiruthiga S

    2012-01-01

    Full Text Available A spoken language system, it may either be a speech synthesis or a speech recognition system, starts with building a speech corpora. We give a detailed survey of issues and a methodology that selects the appropriate speech unit in building a speech corpus for Indian language Text to Speech systems. The paper ultimately aims to improve the intelligibility of the synthesized speech in Text to Speech synthesis systems. To begin with, an appropriate text file should be selected for building the speech corpus. Then a corresponding speech file is generated and stored. This speech file is the phonetic representation of the selected text file. The speech file is processed in different levels viz., paragraphs, sentences, phrases, words, syllables and phones. These are called the speech units of the file. Researches have been done taking these units as the basic unit for processing. This paper analyses the researches done using phones, diphones, triphones, syllables and polysyllables as their basic unit for speech synthesis. The paper also provides a recommended set of combinations for polysyllables. Concatenative speech synthesis involves the concatenation of these basic units to synthesize an intelligent, natural sounding speech. The speech units are annotated with relevant prosodic information about each unit, manually or automatically, based on an algorithm. The database consisting of the units along with their annotated information is called as the annotated speech corpus. A Clustering technique is used in the annotated speech corpus that provides way to select the appropriate unit for concatenation, based on the lowest total join cost of the speech unit.

  9. Transcribing Disordered Speech: By Target or by Production?

    Ball, Martin J.

    2008-01-01

    The ability to transcribe disordered speech is a vital tool for speech-language pathologists, as accurate description of a client's speech output is needed for both diagnosis and effective intervention. Clients in the speech clinic often use sounds that are not part of the target sound system and which may, in some cases, be sounds not found in…

  10. Automatic translation among spoken languages

    Walter, Sharon M.; Costigan, Kelly

    1994-01-01

    The Machine Aided Voice Translation (MAVT) system was developed in response to the shortage of experienced military field interrogators with both foreign language proficiency and interrogation skills. Combining speech recognition, machine translation, and speech generation technologies, the MAVT accepts an interrogator's spoken English question and translates it into spoken Spanish. The spoken Spanish response of the potential informant can then be translated into spoken English. Potential military and civilian applications for automatic spoken language translation technology are discussed in this paper.

  11. Key Arithmetic of Image Accurate Registration for Automatic PCB Drill Photoelectric Inspection%PCB钻孔自动光电检测精确配准关键算法

    黄永林; 叶玉堂; 陈镇龙; 刘霖

    2011-01-01

    In order to realize the fast automatic PCB drill inspection, a fast arithmetic of image registration between standard PCB and testing PCB is proposed. The basic process of the image algorithm and the system structure are introduced. The process of registration is mainly consisted of approximate registration and accurate registration. Edge extraction and Hough transformation are used to obtain 4 comers coordinate of PCB firstly; the coordinates of location hole of the testing PCB can be obtained after approximate registration; the coordinates of circle center of location hole between the testing PCB and standard PCB are used to process accurate registration. Experience shows the method greatly improved the accuracy of image registration. The speed of the arithmetic also meets the needs of the real time process.%为了实现快速印制电路板(PCB)钻孔自动检测,提出了一种PCB待测板和标准板快速自动配置算法.介绍了PCB钻孔自动光学检测系统的基本结构.采用粗匹配和精匹配两步进行配准的方法,通过边缘提取和Hough变换得到待测板4个角点坐标,利用角点坐标和标准板角点的关系进行粗配准,并找到待测板的定位孔;然后利用待测板定位孔圆心和标准板定位孔圆心的关系进行精确配准.实验证明,该算法计算速度和精度完全能够满足实际要求.

  12. Current trends in multilingual speech processing

    Hervé Bourlard; John Dines; Mathew Magimai-Doss; Philip N Garner; David Imseng; Petr Motlicek; Hui Liang; Lakshmi Saheer; Fabio Valente

    2011-10-01

    In this paper, we describe recent work at Idiap Research Institute in the domain of multilingual speech processing and provide some insights into emerging challenges for the research community. Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers. For example, discriminative features are seeing wide application by the speech recognition community, but additional issues arise when using such features in a multilingual setting. Another example is the apparent convergence of speech recognition and speech synthesis technologies in the form of statistical parametric methodologies. This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS. The second driving force is the impetus being provided by both government and industry for technologies to help break down domestic and international language barriers, these also being barriers to the expansion of policy and commerce. Speech-to-speech and speech-to-text translation are thus emerging as key technologies at the heart of which lies multilingual speech processing.

  13. Automatic differentiation bibliography

    Corliss, G.F. (comp.)

    1992-07-01

    This is a bibliography of work related to automatic differentiation. Automatic differentiation is a technique for the fast, accurate propagation of derivative values using the chain rule. It is neither symbolic nor numeric. Automatic differentiation is a fundamental tool for scientific computation, with applications in optimization, nonlinear equations, nonlinear least squares approximation, stiff ordinary differential equation, partial differential equations, continuation methods, and sensitivity analysis. This report is an updated version of the bibliography which originally appeared in Automatic Differentiation of Algorithms: Theory, Implementation, and Application.

  14. Epoch-based analysis of speech signals

    B Yegnanarayana; Suryakanth V Gangashetty

    2011-10-01

    Speech analysis is traditionally performed using short-time analysis to extract features in time and frequency domains. The window size for the analysis is fixed somewhat arbitrarily, mainly to account for the time varying vocal tract system during production. However, speech in its primary mode of excitation is produced due to impulse-like excitation in each glottal cycle. Anchoring the speech analysis around the glottal closure instants (epochs) yields significant benefits for speech analysis. Epoch-based analysis of speech helps not only to segment the speech signals based on speech production characteristics, but also helps in accurate analysis of speech. It enables extraction of important acoustic-phonetic features such as glottal vibrations, formants, instantaneous fundamental frequency, etc. Epoch sequence is useful to manipulate prosody in speech synthesis applications. Accurate estimation of epochs helps in characterizing voice quality features. Epoch extraction also helps in speech enhancement and multispeaker separation. In this tutorial article, the importance of epochs for speech analysis is discussed, and methods to extract the epoch information are reviewed. Applications of epoch extraction for some speech applications are demonstrated.

  15. The Phase Spectra Based Feature for Robust Speech Recognition

    Abbasian ALI

    2009-07-01

    Full Text Available Speech recognition in adverse environment is one of the major issue in automatic speech recognition nowadays. While most current speech recognition system show to be highly efficient for ideal environment but their performance go down extremely when they are applied in real environment because of noise effected speech. In this paper a new feature representation based on phase spectra and Perceptual Linear Prediction (PLP has been suggested which can be used for robust speech recognition. It is shown that this new features can improve the performance of speech recognition not only in clean condition but also in various levels of noise condition when it is compared to PLP features.

  16. Automatic Sarcasm Detection: A Survey

    Joshi, Aditya; Bhattacharyya, Pushpak; Carman, Mark James

    2016-01-01

    Automatic sarcasm detection is the task of predicting sarcasm in text. This is a crucial step to sentiment analysis, considering prevalence and challenges of sarcasm in sentiment-bearing text. Beginning with an approach that used speech-based features, sarcasm detection has witnessed great interest from the sentiment analysis community. This paper is the first known compilation of past work in automatic sarcasm detection. We observe three milestones in the research so far: semi-supervised pat...

  17. CONCURRENT SPEECHES SEPARATION USING WRAPPED DISCRETE FOURIER TRANSFORM

    Zhang Xichun; Li Yunjie; Zhang Jun; Wei Gang

    2005-01-01

    This letter proposes a new method for concurrent voiced speech separation. Firstly the Wrapped Discrete Fourier Transform (WDFT) is used to decompose the harmonic spectra of the mixed speeches. Then the individual speech is reconstructed by using the sinusoidal speech model. By taking advantage of the non-uniform frequency resolution of WDFT, harmonic spectra parameters can be estimated and separated accurately. Experimental results on mixed vowels separation show that the proposed method can recover the original speeches effectively.

  18. Speech recognition from spectral dynamics

    Hynek Hermansky

    2011-10-01

    Information is carried in changes of a signal. The paper starts with revisiting Dudley’s concept of the carrier nature of speech. It points to its close connection to modulation spectra of speech and argues against short-term spectral envelopes as dominant carriers of the linguistic information in speech. The history of spectral representations of speech is briefly discussed. Some of the history of gradual infusion of the modulation spectrum concept into Automatic recognition of speech (ASR) comes next, pointing to the relationship of modulation spectrum processing to wellaccepted ASR techniques such as dynamic speech features or RelAtive SpecTrAl (RASTA) filtering. Next, the frequency domain perceptual linear prediction technique for deriving autoregressive models of temporal trajectories of spectral power in individual frequency bands is reviewed. Finally, posterior-based features, which allow for straightforward application of modulation frequency domain information, are described. The paper is tutorial in nature, aims at a historical global overview of attempts for using spectral dynamics in machine recognition of speech, and does not always provide enough detail of the described techniques. However, extensive references to earlier work are provided to compensate for the lack of detail in the paper.

  19. Plowing Speech

    Zla ba sgrol ma

    2009-01-01

    This file contains a plowing speech and a discussion about the speech This collection presents forty-nine audio files including: several folk song genres; folktales and; local history from the Sman shad Valley of Sde dge county World Oral Literature Project

  20. Speech is Golden

    Juel Henrichsen, Peter

    2014-01-01

    Most of the Danish municipalities are ready to begin to adopt automatic speech recognition, but at the same time remain nervous following a long series of bad business cases in the recent past. Complaints are voiced over costly licences and low service levels, typical effects of a de facto monopoly...... on the supply side. The present article reports on a new public action strategy which has taken shape in the course of 2013-14. While Denmark is a small language area, our public sector is well organised and has considerable purchasing power. Across this past year, Danish local authorities have organised around...... of the present article, in the role of economically neutral advisers. The aim of the initiative is to pave the way for the first profitable contract in the field - which we hope to see in 2014 - an event which would precisely break the present deadlock and open up a billion EUR market for speech technology...

  1. Speech coding

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  2. A Survey on Statistical Based Single Channel Speech Enhancement Techniques

    Sunnydayal. V

    2014-11-01

    Full Text Available Speech enhancement is a long standing problem with various applications like hearing aids, automatic recognition and coding of speech signals. Single channel speech enhancement technique is used for enhancement of the speech degraded by additive background noises. The background noise can have an adverse impact on our ability to converse without hindrance or smoothly in very noisy environments, such as busy streets, in a car or cockpit of an airplane. Such type of noises can affect quality and intelligibility of speech. This is a survey paper and its object is to provide an overview of speech enhancement algorithms so that enhance the noisy speech signal which is corrupted by additive noise. The algorithms are mainly based on statistical based approaches. Different estimators are compared. Challenges and Opportunities of speech enhancement are also discussed. This paper helps in choosing the best statistical based technique for speech enhancement

  3. ARMA Modelling for Whispered Speech

    Xue-li LI; Wei-dong ZHOU

    2010-01-01

    The Autoregressive Moving Average (ARMA) model for whispered speech is proposed. Compared with normal speech, whispered speech has no fundamental frequency because of the glottis being semi-opened and turbulent flow being created, and formant shifting exists in the lower frequency region due to the narrowing of the tract in the false vocal fold regions and weak acoustic coupling with the subglottal system. Analysis shows that the effect of the subglottal system is to introduce additional pole-zero pairs into the vocal tract transfer function. Theoretically, the method based on an ARMA process is superior to that based on an AR process in the spectral analysis of the whispered speech. Two methods, the least squared modified Yule-Walker likelihood estimate (LSMY) algorithm and the Frequency-Domain Steiglitz-Mcbride (FDSM) algorithm, are applied to the ARMA model for the whispered speech. The performance evaluation shows that the ARMA model is much more appropriate for representing the whispered speech than the AR model, and the FDSM algorithm provides a more accurate estimation of the whispered speech spectral envelope than the LSMY algorithm with higher computational complexity.

  4. An articulatorily constrained, maximum entropy approach to speech recognition and speech coding

    Hogden, J.

    1996-12-31

    Hidden Markov models (HMM`s) are among the most popular tools for performing computer speech recognition. One of the primary reasons that HMM`s typically outperform other speech recognition techniques is that the parameters used for recognition are determined by the data, not by preconceived notions of what the parameters should be. This makes HMM`s better able to deal with intra- and inter-speaker variability despite the limited knowledge of how speech signals vary and despite the often limited ability to correctly formulate rules describing variability and invariance in speech. In fact, it is often the case that when HMM parameter values are constrained using the limited knowledge of speech, recognition performance decreases. However, the structure of an HMM has little in common with the mechanisms underlying speech production. Here, the author argues that by using probabilistic models that more accurately embody the process of speech production, he can create models that have all the advantages of HMM`s, but that should more accurately capture the statistical properties of real speech samples--presumably leading to more accurate speech recognition. The model he will discuss uses the fact that speech articulators move smoothly and continuously. Before discussing how to use articulatory constraints, he will give a brief description of HMM`s. This will allow him to highlight the similarities and differences between HMM`s and the proposed technique.

  5. Objective voice and speech analysis of persons with chronic hoarseness by prosodic analysis of speech samples.

    Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar

    2016-10-01

    Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.

  6. Hate speech

    Anne Birgitta Nilsen

    2014-12-01

    Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the

  7. Speech enhancement

    Benesty, Jacob; Chen, Jingdong

    2006-01-01

    We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red

  8. Sequential Organization and Room Reverberation for Speech Segregation

    2012-02-28

    voiced portions account for about 75-80% of spoken English . Voiced speech is characterized by periodicity (or harmonicity), which has been used as a...onset and offset cues to extract unvoiced speech segments. Acoustic- phonetic features are then used to separate unvoiced speech from nonspeech...estimate is relatively accurate due to weak voiced speech at these frequencies. Based on this analysis and acoustic- phonetic characteristics of

  9. Segmentation, diarization and speech transcription : surprise data unraveled

    Huijbregts, Marijn Anthonius Henricus

    2008-01-01

    In this thesis, research on large vocabulary continuous speech recognition for unknown audio conditions is presented. For automatic speech recognition systems based on statistical methods, it is important that the conditions of the audio used for training the statistical models match the conditions

  10. In Vitro Evaluation of the iValve : A Novel Hands-Free Speech Valve

    van der Houwen, E.B.; van Kalkeren, T.A.; Burgerhof, J.G.; van der Laan, B.F.; Verkerke, G.J.

    2011-01-01

    Objectives: We performed in vitro evaluation of a novel, disposable, automatic hands-free tracheostoma speech valve for laryngectomy patients based upon the principle of inhalation. The commercially available automatic speech valves close upon strong exhalation and open again when the pressure drops

  11. "Rate My Therapist": Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing.

    Xiao, Bo; Imel, Zac E; Georgiou, Panayiotis G; Atkins, David C; Narayanan, Shrikanth S

    2015-01-01

    The technology for evaluating patient-provider interactions in psychotherapy-observational coding-has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies.

  12. Speech dynamics

    Pols, L.C.W.

    2011-01-01

    In order for speech to be informative and communicative, segmental and suprasegmental variation is mandatory. Only this leads to meaningful words and sentences. The building blocks are no stable entities put next to each other (like beads on a string or like printed text), but there are gradual tran

  13. Speech Intelligibility

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  14. Automatic sequences

    Haeseler, Friedrich

    2003-01-01

    Automatic sequences are sequences which are produced by a finite automaton. Although they are not random they may look as being random. They are complicated, in the sense of not being not ultimately periodic, they may look rather complicated, in the sense that it may not be easy to name the rule by which the sequence is generated, however there exists a rule which generates the sequence. The concept automatic sequences has special applications in algebra, number theory, finite automata and formal languages, combinatorics on words. The text deals with different aspects of automatic sequences, in particular:· a general introduction to automatic sequences· the basic (combinatorial) properties of automatic sequences· the algebraic approach to automatic sequences· geometric objects related to automatic sequences.

  15. Unusual ictal foreign language automatisms in temporal lobe epilepsy.

    Soe, Naing Ko; Lee, Sang Kun

    2014-12-01

    The distinct brain regions could be specifically involved in different languages and the differences in brain activation depending on the language proficiency and on the age of language acquisition. Speech disturbances are observed in the majority of temporal lobe complex motor seizures. Ictal verbalization had significant lateralization value: 90% of patients with this manifestation had seizure focus in the non-dominant temporal lobe. Although, ictal speech automatisms are usually uttered in the patient's native language, ictal speech foreign language automatisms are unusual presentations of non-dominent temporal lobe epilepsy. The release of isolated foreign language area could be possible depending on the pattern of ictal spreading of non-dominant hemisphere. Most of the case reports in ictal speech foreign language automatisms were men. In this case report, we observed ictal foreign language automatisms in middle age Korean woman.

  16. ASLP-MULAN: Audio speech and language processing for multimedia analytics

    2016-01-01

    Our intention is generating the right mixture of audio, speech and language technologies with big data ones. Some audio, speech and language automatic technologies are available or gaining enough degree of maturity as to be able to help to this objective: automatic speech transcription, query by spoken example, spoken information retrieval, natural language processing, unstructured multimedia contents transcription and description, multimedia files summarization, spoken emotion detection and ...

  17. Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

    Kayasith, Prakasith; Theeramunkong, Thanaruk

    It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.

  18. Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

    Larm, Petra; Hongisto, Valtteri

    2006-02-01

    During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.

  19. Speech communications in noise

    1984-07-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  20. Speech recognition employing biologically plausible receptive fields

    Fereczkowski, Michal; Bothe, Hans-Heinrich

    2011-01-01

    The main idea of the project is to build a widely speaker-independent, biologically motivated automatic speech recognition (ASR) system. The two main differences between our approach and current state-of-the-art ASRs are that i) the features used here are based on the responses of neuronlike spec...

  1. OLIVE: Speech-Based Video Retrieval

    Jong, de Franciska; Gauvain, Jean-Luc; Hartog, den Jurgen; Netter, Klaus

    1999-01-01

    This paper describes the Olive project which aims to support automated indexing of video material by use of human language technologies. Olive is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which serve as the

  2. Presentation video retrieval using automatically recovered slide and spoken text

    Cooper, Matthew

    2013-03-01

    Video is becoming a prevalent medium for e-learning. Lecture videos contain text information in both the presentation slides and lecturer's speech. This paper examines the relative utility of automatically recovered text from these sources for lecture video retrieval. To extract the visual information, we automatically detect slides within the videos and apply optical character recognition to obtain their text. Automatic speech recognition is used similarly to extract spoken text from the recorded audio. We perform controlled experiments with manually created ground truth for both the slide and spoken text from more than 60 hours of lecture video. We compare the automatically extracted slide and spoken text in terms of accuracy relative to ground truth, overlap with one another, and utility for video retrieval. Results reveal that automatically recovered slide text and spoken text contain different content with varying error profiles. Experiments demonstrate that automatically extracted slide text enables higher precision video retrieval than automatically recovered spoken text.

  3. Speech Enhancement based on Compressive Sensing Algorithm

    Sulong, Amart; Gunawan, Teddy S.; Khalifa, Othman O.; Chebil, Jalel

    2013-12-01

    There are various methods, in performance of speech enhancement, have been proposed over the years. The accurate method for the speech enhancement design mainly focuses on quality and intelligibility. The method proposed with high performance level. A novel speech enhancement by using compressive sensing (CS) is a new paradigm of acquiring signals, fundamentally different from uniform rate digitization followed by compression, often used for transmission or storage. Using CS can reduce the number of degrees of freedom of a sparse/compressible signal by permitting only certain configurations of the large and zero/small coefficients, and structured sparsity models. Therefore, CS is significantly provides a way of reconstructing a compressed version of the speech in the original signal by taking only a small amount of linear and non-adaptive measurement. The performance of overall algorithms will be evaluated based on the speech quality by optimise using informal listening test and Perceptual Evaluation of Speech Quality (PESQ). Experimental results show that the CS algorithm perform very well in a wide range of speech test and being significantly given good performance for speech enhancement method with better noise suppression ability over conventional approaches without obvious degradation of speech quality.

  4. Language-Independent Automatic Evaluation of Intelligibility of Chronically Hoarse Persons

    Haderlein, Tino; Middag, Catherine; Martens, Jean-Pierre; Döllinger, Michael; Nöth, Elmar

    2014-01-01

    Objective: Automatic intelligibility assessment using automatic speech recognition is usually language specific. In this study, a language-independent approach is proposed. It uses models that are trained with Flemish speech, and it is applied to assess chronically hoarse German speakers. The research questions are here: is it possible to construct suitable acoustic features that generalize to other languages and a speech disorder, and is the generated model for intelligibility also suitable ...

  5. A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM

    Chenchen Huang

    2014-01-01

    Full Text Available Feature extraction is a very important part in speech emotion recognition, and in allusion to feature extraction in speech emotion recognition problems, this paper proposed a new method of feature extraction, using DBNs in DNN to extract emotional features in speech signal automatically. By training a 5 layers depth DBNs, to extract speech emotion feature and incorporate multiple consecutive frames to form a high dimensional feature. The features after training in DBNs were the input of nonlinear SVM classifier, and finally speech emotion recognition multiple classifier system was achieved. The speech emotion recognition rate of the system reached 86.5%, which was 7% higher than the original method.

  6. Ensemble Feature Extraction Modules for Improved Hindi Speech Recognition System

    Malay Kumar

    2012-05-01

    Full Text Available Speech is the most natural way of communication between human beings. The field of speech recognition generates intrigues of man - machine conversation and due to its versatile applications; automatic speech recognition systems have been designed. In this paper we are presenting a novel approach for Hindi speech recognition by ensemble feature extraction modules of ASR systems and their outputs have been combined using voting technique ROVER. Experimental results have been shown that proposed system will produce better result than traditional ASR systems.

  7. 78 FR 49717 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    2013-08-15

    ... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With... Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications...

  8. Predicting speech intelligibility in adverse conditions: evaluation of the speech-based envelope power spectrum model

    2011-01-01

    are based on the long-term SNRenv. As an attempt to extent the model to deal with fluctuating interferers, a short-time version of the sEPSM is presented. The SNRenv of a speech sample is estimated from a combination of SNRenv-values calculated in short time frames. The model is evaluated in adverse......The speech-based envelope power spectrum model (sEPSM) [Jørgensen and Dau (2011). J. Acoust. Soc. Am., 130 (3), 1475–1487] estimates the envelope signal-to-noise ratio (SNRenv) of distorted speech and accurately describes the speech recognition thresholds (SRT) for normal-hearing listeners...... conditions by comparing predictions to measured data from [Kjems et al. (2009). J. Acoust. Soc. Am. 126 (3), 1415-1426] where speech is mixed with four different interferers, including speech-shaped noise, bottle noise, car noise, and cafe noise. The model accounts well for the differences in intelligibility...

  9. Going to a Speech Therapist

    ... Video: Getting an X-ray Going to a Speech Therapist KidsHealth > For Kids > Going to a Speech Therapist ... therapists (also called speech-language pathologists ). What Do Speech Therapists Help With? Speech therapists help people of all ...

  10. Speech research

    1992-06-01

    Phonology is traditionally seen as the discipline that concerns itself with the building blocks of linguistic messages. It is the study of the structure of sound inventories of languages and of the participation of sounds in rules or processes. Phonetics, in contrast, concerns speech sounds as produced and perceived. Two extreme positions on the relationship between phonological messages and phonetic realizations are represented in the literature. One holds that the primary home for linguistic symbols, including phonological ones, is the human mind, itself housed in the human brain. The second holds that their primary home is the human vocal tract.

  11. Effective Prediction of Errors by Non-native Speakers Using Decision Tree for Speech Recognition-Based CALL System

    Wang, Hongcui; Kawahara, Tatsuya

    CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.

  12. Multi-thread Parallel Speech Recognition for Mobile Applications

    LOJKA Martin

    2014-05-01

    Full Text Available In this paper, the server based solution of the multi-thread large vocabulary automatic speech recognition engine is described along with the Android OS and HTML5 practical application examples. The basic idea was to bring speech recognition available for full variety of applications for computers and especially for mobile devices. The speech recognition engine should be independent of commercial products and services (where the dictionary could not be modified. Using of third-party services could be also a security and privacy problem in specific applications, when the unsecured audio data could not be sent to uncontrolled environments (voice data transferred to servers around the globe. Using our experience with speech recognition applications, we have been able to construct a multi-thread speech recognition serverbased solution designed for simple applications interface (API to speech recognition engine modified to specific needs of particular application.

  13. Speech production, Psychology of

    Schriefers, H.J.; Vigliocco, G.

    2015-01-01

    Research on speech production investigates the cognitive processes involved in transforming thoughts into speech. This article starts with a discussion of the methodological issues inherent to research in speech production that illustrates how empirical approaches to speech production must differ fr

  14. Speech Enhancement

    Benesty, Jacob; Jensen, Jesper Rindom; Christensen, Mads Græsbøll;

    of methods and have been introduced in somewhat different contexts. Linear filtering methods originate in stochastic processes, while subspace methods have largely been based on developments in numerical linear algebra and matrix approximation theory. This book bridges the gap between these two classes......Speech enhancement is a classical problem in signal processing, yet still largely unsolved. Two of the conventional approaches for solving this problem are linear filtering, like the classical Wiener filter, and subspace methods. These approaches have traditionally been treated as different classes...... of methods by showing how the ideas behind subspace methods can be incorporated into traditional linear filtering. In the context of subspace methods, the enhancement problem can then be seen as a classical linear filter design problem. This means that various solutions can more easily be compared...

  15. Children's perception of their synthetically corrected speech production.

    Strömbergsson, Sofia; Wengelin, Asa; House, David

    2014-06-01

    We explore children's perception of their own speech - in its online form, in its recorded form, and in synthetically modified forms. Children with phonological disorder (PD) and children with typical speech and language development (TD) performed tasks of evaluating accuracy of the different types of speech stimuli, either immediately after having produced the utterance or after a delay. In addition, they performed a task designed to assess their ability to detect synthetic modification. Both groups showed high performance in tasks involving evaluation of other children's speech, whereas in tasks of evaluating one's own speech, the children with PD were less accurate than their TD peers. The children with PD were less sensitive to misproductions in immediate conjunction with their production of an utterance, and more accurate after a delay. Within-category modification often passed undetected, indicating a satisfactory quality of the generated speech. Potential clinical benefits of using corrective re-synthesis are discussed.

  16. Phonetic Alphabet for Speech Recognition of Czech

    J. Uhlir

    1997-12-01

    Full Text Available In the paper we introduce and discuss an alphabet that has been proposed for phonemicly oriented automatic speech recognition. The alphabet, denoted as a PAC (Phonetic Alphabet for Czech consists of 48 basic symbols that allow for distinguishing all major events occurring in spoken Czech language. The symbols can be used both for phonetic transcription of Czech texts as well as for labeling recorded speech signals. From practical reasons, the alphabet occurs in two versions; one utilizes Czech native characters and the other employs symbols similar to those used for English in the DARPA and NIST alphabets.

  17. Sentence Clustering Using Parts-of-Speech

    Richard Khoury

    2012-02-01

    Full Text Available Clustering algorithms are used in many Natural Language Processing (NLP tasks. They have proven to be popular and effective tools to use to discover groups of similar linguistic items. In this exploratory paper, we propose a new clustering algorithm to automatically cluster together similar sentences based on the sentences’ part-of-speech syntax. The algorithm generates and merges together the clusters using a syntactic similarity metric based on a hierarchical organization of the parts-of-speech. We demonstrate the features of this algorithm by implementing it in a question type classification system, in order to determine the positive or negative impact of different changes to the algorithm.

  18. Speech therapy with obturator.

    Shyammohan, A; Sreenivasulu, D

    2010-12-01

    Rehabilitation of speech is tantamount to closure of defect in cases with velopharyngeal insufficiency. Often the importance of speech therapy is sidelined during the fabrication of obturators. Usually the speech part is taken up only at a later stage and is relegated entirely to a speech therapist without the active involvement of the prosthodontist. The article suggests a protocol for speech therapy in such cases to be done in unison with a prosthodontist.

  19. A Statistical Quality Model for Data-Driven Speech Animation.

    Ma, Xiaohan; Deng, Zhigang

    2012-11-01

    In recent years, data-driven speech animation approaches have achieved significant successes in terms of animation quality. However, how to automatically evaluate the realism of novel synthesized speech animations has been an important yet unsolved research problem. In this paper, we propose a novel statistical model (called SAQP) to automatically predict the quality of on-the-fly synthesized speech animations by various data-driven techniques. Its essential idea is to construct a phoneme-based, Speech Animation Trajectory Fitting (SATF) metric to describe speech animation synthesis errors and then build a statistical regression model to learn the association between the obtained SATF metric and the objective speech animation synthesis quality. Through delicately designed user studies, we evaluate the effectiveness and robustness of the proposed SAQP model. To the best of our knowledge, this work is the first-of-its-kind, quantitative quality model for data-driven speech animation. We believe it is the important first step to remove a critical technical barrier for applying data-driven speech animation techniques to numerous online or interactive talking avatar applications.

  20. Filled pause refinement based on the pronunciation probability for lecture speech.

    Long, Yan-Hua; Ye, Hong

    2015-01-01

    Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement.

  1. Detection and Separation of Speech Events in Meeting Recordings Using a Microphone Array

    Yamada Miichi

    2007-01-01

    Full Text Available When applying automatic speech recognition (ASR to meeting recordings including spontaneous speech, the performance of ASR is greatly reduced by the overlap of speech events. In this paper, a method of separating the overlapping speech events by using an adaptive beamforming (ABF framework is proposed. The main feature of this method is that all the information necessary for the adaptation of ABF, including microphone calibration, is obtained from meeting recordings based on the results of speech-event detection. The performance of the separation is evaluated via ASR using real meeting recordings.

  2. Comparing Speech Recognition Systems (Microsoft API, Google API And CMU Sphinx

    Veton Këpuska

    2017-03-01

    Full Text Available The idea of this paper is to design a tool that will be used to test and compare commercial speech recognition systems, such as Microsoft Speech API and Google Speech API, with open-source speech recognition systems such as Sphinx-4. The best way to compare automatic speech recognition systems in different environments is by using some audio recordings that were selected from different sources and calculating the word error rate (WER. Although the WER of the three aforementioned systems were acceptable, it was observed that the Google API is superior.

  3. Delayed Speech or Language Development

    ... to 2-Year-Old Delayed Speech or Language Development KidsHealth > For Parents > Delayed Speech or Language Development ... child is right on schedule. Normal Speech & Language Development It's important to discuss early speech and language ...

  4. Robust Speech Recognition Using a Harmonic Model

    许超; 曹志刚

    2004-01-01

    Automatic speech recognition under conditions of a noisy environment remains a challenging problem. Traditionally, methods focused on noise structure, such as spectral subtraction, have been employed to address this problem, and thus the performance of such methods depends on the accuracy in noise estimation. In this paper, an alternative method, using a harmonic-based spectral reconstruction algorithm, is proposed for the enhancement of robust automatic speech recognition. Neither noise estimation nor noise-model training are required in the proposed approach. A spectral subtraction integrated autocorrelation function is proposed to determine the pitch for the harmonic model. Recognition results show that the harmonic-based spectral reconstruction approach outperforms spectral subtraction in the middle- and low-signal noise ratio (SNR) ranges. The advantage of the proposed method is more manifest for non-stationary noise, as the algorithm does not require an assumption of stationary noise.

  5. Speech production in amplitude-modulated noise

    Macdonald, Ewen N; Raufer, Stefan

    2013-01-01

    The Lombard effect refers to the phenomenon where talkers automatically increase their level of speech in a noisy environment. While many studies have characterized how the Lombard effect influences different measures of speech production (e.g., F0, spectral tilt, etc.), few have investigated...... the consequences of temporally fluctuating noise. In the present study, 20 talkers produced speech in a variety of noise conditions, including both steady-state and amplitude-modulated white noise. While listening to noise over headphones, talkers produced randomly generated five word sentences. Similar...... to previous studies, talkers raised the level of their voice in steady-state noise. While talkers also increased the level of their voice in amplitude-modulated noise, the increase was not as large as that observed in steady-state noise. Importantly, for the 2 and 4 Hz amplitude-modulated noise conditions...

  6. Post-editing through Speech Recognition

    Mesa-Lao, Bartolomé

    In the past couple of years automatic speech recognition (ASR) software has quietly created a niche for itself in many situations of our lives. Nowadays it can be found at the other end of customer-support hotlines, it is built into operating systems and it is offered as an alternative text......-input method for smartphones. On another front, given the significant improvements in Machine Translation (MT) quality and the increasing demand for translations, post-editing of MT is becoming a popular practice in the translation industry, since it has been shown to allow for larger volumes of translations...... to be produced saving time and costs. The translation industry is at a deeply transformative point in its evolution and the coming years herald an era of converge where speech technology could make a difference. As post-editing services are becoming a common practice among language service providers and speech...

  7. Modeling Speech Intelligibility in Hearing Impaired Listeners

    Scheidiger, Christoph; Jørgensen, Søren; Dau, Torsten

    2014-01-01

    Models of speech intelligibility (SI) have a long history, starting with the articulation index (AI, [17]), followed by the SI index (SI I, [18]) and the speech transmission index (STI, [7]), to only name a few. However, these models fail to accurately predict SI with nonlinearly processed noisy...... speech, e.g. phase jitter or spectral subtraction. Recent studies predict SI for normal-hearing (NH) listeners based on a signal-to-noise ratio measure in the envelope domain (SNRenv), in the framework of the speech-based envelope power spectrum model (sEPSM, [20, 21]). These models have shown good...... agreement with measured data under a broad range of conditions, including stationary and modulated interferers, reverberation, and spectral subtraction. Despite the advances in modeling intelligibility in NH listeners, a broadly applicable model that can predict SI in hearing-impaired (HI) listeners...

  8. Heart Rate Extraction from Vowel Speech Signals

    Abdelwadood Mesleh; Dmitriy Skopin; Sergey Baglikov; Anas Quteishat

    2012-01-01

    This paper presents a novel non-contact heart rate extraction method from vowel speech signals.The proposed method is based on modeling the relationship between speech production of vowel speech signals and heart activities for humans where it is observed that the moment of heart beat causes a short increment (evolution) of vowel speech formants.The short-time Fourier transform (STFT) is used to detect the formant maximum peaks so as to accurately estimate the heart rate.Compared with traditional contact pulse oximeter,the average accuracy of the proposed non-contact heart rate extraction method exceeds 95%.The proposed non-contact heart rate extraction method is expected to play an important role in modern medical applications.

  9. Automatic analysis of multiparty meetings

    Steve Renals

    2011-10-01

    This paper is about the recognition and interpretation of multiparty meetings captured as audio, video and other signals. This is a challenging task since the meetings consist of spontaneous and conversational interactions between a number of participants: it is a multimodal, multiparty, multistream problem. We discuss the capture and annotation of the Augmented Multiparty Interaction (AMI) meeting corpus, the development of a meeting speech recognition system, and systems for the automatic segmentation, summarization and social processing of meetings, together with some example applications based on these systems.

  10. Accurate guitar tuning by cochlear implant musicians.

    Lu, Thomas; Huang, Juan; Zeng, Fan-Gang

    2014-01-01

    Modern cochlear implant (CI) users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task.

  11. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    2013-08-15

    ... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With... this document, the Commission amends telecommunications relay services (TRS) mandatory...

  12. Emotion Recognition from Persian Speech with Neural Network

    Mina Hamidi

    2012-10-01

    Full Text Available In this paper, we report an effort towards automatic recognition of emotional states from continuousPersian speech. Due to the unavailability of appropriate database in the Persian language for emotionrecognition, at first, we built a database of emotional speech in Persian. This database consists of 2400wave clips modulated with anger, disgust, fear, sadness, happiness and normal emotions. Then we extractprosodic features, including features related to the pitch, intensity and global characteristics of the speechsignal. Finally, we applied neural networks for automatic recognition of emotion. The resulting averageaccuracy was about 78%.

  13. Fifty years of progress in speech understanding systems

    Zue, Victor

    2004-10-01

    Researchers working on human-machine interfaces realized nearly 50 years ago that automatic speech recognition (ASR) alone is not sufficient; one needs to impart linguistic knowledge to the system such that the signal could ultimately be understood. A speech understanding system combines speech recognition (i.e., the speech to symbols conversion) with natural language processing (i.e., the symbol to meaning transformation) to achieve understanding. Speech understanding research dates back to the DARPA Speech Understanding Project in the early 1970s. However, large-scale efforts only began in earnest in the late 1980s, with government research programs in the U.S. and Europe providing the impetus. This has resulted in many innovations including novel approaches to natural language understanding (NLU) for speech input, and integration techniques for ASR and NLU. In the past decade, speech understanding systems have become major building blocks of conversational interfaces that enable users to access and manage information using spoken dialogue, incorporating language generation, discourse modeling, dialogue management, and speech synthesis. Today, we are at the threshold of developing multimodal interfaces, augmenting sound with sight and touch. This talk will highlight past work and speculate on the future. [Work supported by an industrial consortium of the MIT Oxygen Project.

  14. An effective cluster-based model for robust speech detection and speech recognition in noisy environments.

    Górriz, J M; Ramírez, J; Segura, J C; Puntonet, C G

    2006-07-01

    This paper shows an accurate speech detection algorithm for improving the performance of speech recognition systems working in noisy environments. The proposed method is based on a hard decision clustering approach where a set of prototypes is used to characterize the noisy channel. Detecting the presence of speech is enabled by a decision rule formulated in terms of an averaged distance between the observation vector and a cluster-based noise model. The algorithm benefits from using contextual information, a strategy that considers not only a single speech frame but also a neighborhood of data in order to smooth the decision function and improve speech detection robustness. The proposed scheme exhibits reduced computational cost making it adequate for real time applications, i.e., automated speech recognition systems. An exhaustive analysis is conducted on the AURORA 2 and AURORA 3 databases in order to assess the performance of the algorithm and to compare it to existing standard voice activity detection (VAD) methods. The results show significant improvements in detection accuracy and speech recognition rate over standard VADs such as ITU-T G.729, ETSI GSM AMR, and ETSI AFE for distributed speech recognition and a representative set of recently reported VAD algorithms.

  15. Automated Gesturing for Virtual Characters: Speech-driven and Text-driven Approaches

    Goranka Zoric

    2006-04-01

    Full Text Available We present two methods for automatic facial gesturing of graphically embodied animated agents. In one case, conversational agent is driven by speech in automatic Lip Sync process. By analyzing speech input, lip movements are determined from the speech signal. Another method provides virtual speaker capable of reading plain English text and rendering it in a form of speech accompanied by the appropriate facial gestures. Proposed statistical model for generating virtual speaker’s facial gestures can be also applied as addition to lip synchronization process in order to obtain speech driven facial gesturing. In this case statistical model will be triggered with the input speech prosody instead of lexical analysis of the input text.

  16. The role of visual spatial attention in audiovisual speech perception

    Andersen, Tobias; Tiippana, K.; Laarni, J.

    2009-01-01

    integration did not change. Visual spatial attention was also able to select between the faces when lip reading. This suggests that visual spatial attention acts at the level of visual speech perception prior to audiovisual integration and that the effect propagates through audiovisual integration......Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre-attentive...... but recent reports have challenged this view. Here we study the effect of visual spatial attention on the McGurk effect. By presenting a movie of two faces symmetrically displaced to each side of a central fixation point and dubbed with a single auditory speech track, we were able to discern the influences...

  17. Auto Spell Suggestion for High Quality Speech Synthesis in Hindi

    Kabra, Shikha; Agarwal, Ritika

    2014-02-01

    The goal of Text-to-Speech (TTS) synthesis in a particular language is to convert arbitrary input text to intelligible and natural sounding speech. However, for a particular language like Hindi, which is a highly confusing language (due to very close spellings), it is not an easy task to identify errors/mistakes in input text and an incorrect text degrade the quality of output speech hence this paper is a contribution to the development of high quality speech synthesis with the involvement of Spellchecker which generates spell suggestions for misspelled words automatically. Involvement of spellchecker would increase the efficiency of speech synthesis by providing spell suggestions for incorrect input text. Furthermore, we have provided the comparative study for evaluating the resultant effect on to phonetic text by adding spellchecker on to input text.

  18. Adaptive Recognition of Phonemes from Speaker - Connected-Speech Using Alisa.

    Osella, Stephen Albert

    The purpose of this dissertation research is to investigate a novel approach to automatic speech recognition (ASR). The successes that have been achieved in ASR have relied heavily on the use of a language grammar, which significantly constrains the ASR process. By using grammar to provide most of the recognition ability, the ASR system does not have to be as accurate at the low-level recognition stage. The ALISA Phonetic Transcriber (APT) algorithm is proposed as a way to improve ASR by enhancing the lowest -level recognition stage. The objective of the APT algorithm is to classify speech frames (a short sequence of speech signal samples) into a small set of phoneme classes. The APT algorithm constructs the mapping from speech frames to phoneme labels through a multi-layer feedforward process. A design principle of APT is that final decisions are delayed as long as possible. Instead of attempting to optimize the decision making at each processing level individually, each level generates a list of candidate solutions that are passed on to the next level of processing. The later processing levels use these candidate solutions to resolve ambiguities. The scope of this dissertation is the design of the APT algorithm up to the speech-frame classification stage. In future research, the APT algorithm will be extended to the word recognition stage. In particular, the APT algorithm could serve as the front-end stage to a Hidden Markov Model (HMM) based word recognition system. In such a configuration, the APT algorithm would provide the HMM with the requisite phoneme state-probability estimates. To date, the APT algorithm has been tested with the TIMIT and NTIMIT speech databases. The APT algorithm has been trained and tested on the SX and SI sentence texts using both male and female speakers. Results indicate better performance than those results obtained using a neural network based speech-frame classifier. The performance of the APT algorithm has been evaluated for

  19. Semi-automatic segmentation for 3D motion analysis of the tongue with dynamic MRI.

    Lee, Junghoon; Woo, Jonghye; Xing, Fangxu; Murano, Emi Z; Stone, Maureen; Prince, Jerry L

    2014-12-01

    Dynamic MRI has been widely used to track the motion of the tongue and measure its internal deformation during speech and swallowing. Accurate segmentation of the tongue is a prerequisite step to define the target boundary and constrain the tracking to tissue points within the tongue. Segmentation of 2D slices or 3D volumes is challenging because of the large number of slices and time frames involved in the segmentation, as well as the incorporation of numerous local deformations that occur throughout the tongue during motion. In this paper, we propose a semi-automatic approach to segment 3D dynamic MRI of the tongue. The algorithm steps include seeding a few slices at one time frame, propagating seeds to the same slices at different time frames using deformable registration, and random walker segmentation based on these seed positions. This method was validated on the tongue of five normal subjects carrying out the same speech task with multi-slice 2D dynamic cine-MR images obtained at three orthogonal orientations and 26 time frames. The resulting semi-automatic segmentations of a total of 130 volumes showed an average dice similarity coefficient (DSC) score of 0.92 with less segmented volume variability between time frames than in manual segmentations.

  20. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

    Heracleous Panikos

    2007-01-01

    Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.

  1. THE BASIS FOR SPEECH PREVENTION

    Jordan JORDANOVSKI

    1997-06-01

    Full Text Available The speech is a tool for accurate communication of ideas. When we talk about speech prevention as a practical realization of the language, we are referring to the fact that it should be comprised of the elements of the criteria as viewed from the perspective of the standards. This criteria, in the broad sense of the word, presupposes an exact realization of the thought expressed between the speaker and the recipient.The absence of this criterion catches the eye through the practical realization of the language and brings forth consequences, often hidden very deeply in the human psyche. Their outer manifestation already represents a delayed reaction of the social environment. The foundation for overcoming and standardization of this phenomenon must be the anatomy-physiological patterns of the body, accomplished through methods in concordance with the nature of the body.

  2. Speech 7 through 12.

    Nederland Independent School District, TX.

    GRADES OR AGES: Grades 7 through 12. SUBJECT MATTER: Speech. ORGANIZATION AND PHYSICAL APPEARANCE: Following the foreward, philosophy and objectives, this guide presents a speech curriculum. The curriculum covers junior high and Speech I, II, III (senior high). Thirteen units of study are presented for junior high, each unit is divided into…

  3. A Spoken Access Approach for Chinese Text and Speech Information Retrieval.

    Chien, Lee-Feng; Wang, Hsin-Min; Bai, Bo-Ren; Lin, Sun-Chein

    2000-01-01

    Presents an efficient spoken-access approach for both Chinese text and Mandarin speech information retrieval. Highlights include human-computer interaction via voice input, speech query recognition at the syllable level, automatic term suggestion, relevance feedback techniques, and experiments that show an improvement in the effectiveness of…

  4. Unobtrusive multimodal emotion detection in adaptive interfaces: speech and facial expressions

    Truong, K.P.; Leeuwen, D.A. van; Neerincx, M.A.

    2007-01-01

    Two unobtrusive modalities for automatic emotion recognition are discussed: speech and facial expressions. First, an overview is given of emotion recognition studies based on a combination of speech and facial expressions. We will identify difficulties concerning data collection, data fusion, system

  5. Automatic Recognition of Element Classes and Boundaries in the Birdsong with Variable Sequences.

    Takuya Koumura

    Full Text Available Researches on sequential vocalization often require analysis of vocalizations in long continuous sounds. In such studies as developmental ones or studies across generations in which days or months of vocalizations must be analyzed, methods for automatic recognition would be strongly desired. Although methods for automatic speech recognition for application purposes have been intensively studied, blindly applying them for biological purposes may not be an optimal solution. This is because, unlike human speech recognition, analysis of sequential vocalizations often requires accurate extraction of timing information. In the present study we propose automated systems suitable for recognizing birdsong, one of the most intensively investigated sequential vocalizations, focusing on the three properties of the birdsong. First, a song is a sequence of vocal elements, called notes, which can be grouped into categories. Second, temporal structure of birdsong is precisely controlled, meaning that temporal information is important in song analysis. Finally, notes are produced according to certain probabilistic rules, which may facilitate the accurate song recognition. We divided the procedure of song recognition into three sub-steps: local classification, boundary detection, and global sequencing, each of which corresponds to each of the three properties of birdsong. We compared the performances of several different ways to arrange these three steps. As results, we demonstrated a hybrid model of a deep convolutional neural network and a hidden Markov model was effective. We propose suitable arrangements of methods according to whether accurate boundary detection is needed. Also we designed the new measure to jointly evaluate the accuracy of note classification and boundary detection. Our methods should be applicable, with small modification and tuning, to the songs in other species that hold the three properties of the sequential vocalization.

  6. Automatic Recognition of Element Classes and Boundaries in the Birdsong with Variable Sequences

    Okanoya, Kazuo

    2016-01-01

    Researches on sequential vocalization often require analysis of vocalizations in long continuous sounds. In such studies as developmental ones or studies across generations in which days or months of vocalizations must be analyzed, methods for automatic recognition would be strongly desired. Although methods for automatic speech recognition for application purposes have been intensively studied, blindly applying them for biological purposes may not be an optimal solution. This is because, unlike human speech recognition, analysis of sequential vocalizations often requires accurate extraction of timing information. In the present study we propose automated systems suitable for recognizing birdsong, one of the most intensively investigated sequential vocalizations, focusing on the three properties of the birdsong. First, a song is a sequence of vocal elements, called notes, which can be grouped into categories. Second, temporal structure of birdsong is precisely controlled, meaning that temporal information is important in song analysis. Finally, notes are produced according to certain probabilistic rules, which may facilitate the accurate song recognition. We divided the procedure of song recognition into three sub-steps: local classification, boundary detection, and global sequencing, each of which corresponds to each of the three properties of birdsong. We compared the performances of several different ways to arrange these three steps. As results, we demonstrated a hybrid model of a deep convolutional neural network and a hidden Markov model was effective. We propose suitable arrangements of methods according to whether accurate boundary detection is needed. Also we designed the new measure to jointly evaluate the accuracy of note classification and boundary detection. Our methods should be applicable, with small modification and tuning, to the songs in other species that hold the three properties of the sequential vocalization. PMID:27442240

  7. Speech in spinocerebellar ataxia.

    Schalling, Ellika; Hartelius, Lena

    2013-12-01

    Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia.

  8. Digital speech processing using Matlab

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  9. Commercial applications of speech interface technology: an industry at the threshold.

    Oberteuffer, J A

    1995-10-24

    Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines.

  10. Automatic Construction of Accurate Models of Physical Systems

    1998-07-10

    order to develop an AI-adapted global optimization tool that combines qualitative reasoning and local numerical methods; see the Annals of Mathematics and...34Global Solutions for Nonlinear Systems using Qualitative Reasoning," Annals of Mathematics and Artificial Intelligence (1998) • • E. Bradley and

  11. Fast and accurate automatic structure prediction with HHpred.

    Hildebrand, Andrea; Remmert, Michael; Biegert, Andreas; Söding, Johannes

    2009-01-01

    Automated protein structure prediction is becoming a mainstream tool for biological research. This has been fueled by steady improvements of publicly available automated servers over the last decade, in particular their ability to build good homology models for an increasing number of targets by reliably detecting and aligning more and more remotely homologous templates. Here, we describe the three fully automated versions of the HHpred server that participated in the community-wide blind protein structure prediction competition CASP8. What makes HHpred unique is the combination of usability, short response times (typically under 15 min) and a model accuracy that is competitive with those of the best servers in CASP8.

  12. Automatic detection of articulation disorders in children with cleft lip and palate.

    Maier, Andreas; Hönig, Florian; Bocklet, Tobias; Nöth, Elmar; Stelzle, Florian; Nkenke, Emeka; Schuster, Maria

    2009-11-01

    Speech of children with cleft lip and palate (CLP) is sometimes still disordered even after adequate surgical and nonsurgical therapies. Such speech shows complex articulation disorders, which are usually assessed perceptually, consuming time and manpower. Hence, there is a need for an easy to apply and reliable automatic method. To create a reference for an automatic system, speech data of 58 children with CLP were assessed perceptually by experienced speech therapists for characteristic phonetic disorders at the phoneme level. The first part of the article aims to detect such characteristics by a semiautomatic procedure and the second to evaluate a fully automatic, thus simple, procedure. The methods are based on a combination of speech processing algorithms. The semiautomatic method achieves moderate to good agreement (kappa approximately 0.6) for the detection of all phonetic disorders. On a speaker level, significant correlations between the perceptual evaluation and the automatic system of 0.89 are obtained. The fully automatic system yields a correlation on the speaker level of 0.81 to the perceptual evaluation. This correlation is in the range of the inter-rater correlation of the listeners. The automatic speech evaluation is able to detect phonetic disorders at an experts'level without any additional human postprocessing.

  13. Frequent word section extraction in a presentation speech by an effective dynamic programming algorithm.

    Itoh, Yoshiaki; Tanaka, Kazuyo

    2004-08-01

    Word frequency in a document has often been utilized in text searching and summarization. Similarly, identifying frequent words or phrases in a speech data set for searching and summarization would also be meaningful. However, obtaining word frequency in a speech data set is difficult, because frequent words are often special terms in the speech and cannot be recognized by a general speech recognizer. This paper proposes another approach that is effective for automatic extraction of such frequent word sections in a speech data set. The proposed method is applicable to any domain of monologue speech, because no language models or specific terms are required in advance. The extracted sections can be regarded as speech labels of some kind or a digest of the speech presentation. The frequent word sections are determined by detecting similar sections, which are sections of audio data that represent the same word or phrase. The similar sections are detected by an efficient algorithm, called Shift Continuous Dynamic Programming (Shift CDP), which realizes fast matching between arbitrary sections in the reference speech pattern and those in the input speech, and enables frame-synchronous extraction of similar sections. In experiments, the algorithm is applied to extract the repeated sections in oral presentation speeches recorded in academic conferences in Japan. The results show that Shift CDP successfully detects similar sections and identifies the frequent word sections in individual presentation speeches, without prior domain knowledge, such as language models and terms.

  14. Term clouds as surrogates for user generated speech

    M. Tsagkias; M. Larson; M. de Rijke

    2008-01-01

    User generated spoken audio remains a challenge for Automatic Speech Recognition (ASR) technology and content-based audio surrogates derived from ASR-transcripts must be error robust. An investigation of the use of term clouds as surrogates for podcasts demonstrates that ASR term clouds closely appr

  15. Intonation and Dialog Context as Constraints for Speech Recognition.

    Taylor, Paul; King, Simon; Isard, Stephen; Wright, Helen

    1998-01-01

    Describes how to use intonation and dialog context to improve the performance of an automatic speech-recognition system. Experiments utilized the DCIEM Maptask corpus, using a separate bigram language model for each type of move and showing that, with the correct move-specific language model for each utterance in the test set, the recognizer's…

  16. Story Segmentation for Speech Transcripts in Sparse Data Conditions

    Werff, van der Laurens

    2010-01-01

    Information Retrieval systems determine relevance by comparing information needs with the content of potential retrieval units. Unlike most textual data, automatically generated speech transcripts cannot by default be easily divided into obvious retrieval units due to a lack of explicit structural m

  17. Speech intelligibility metrics in small unoccupied classrooms

    Cruikshank, Matthew E.; Carney, Melinda J.; Cheenne, Dominique J.

    2005-04-01

    Nine small volume classrooms in schools located in the Chicago suburbs were tested to quantify speech intelligibility at various seat locations. Several popular intelligibility metrics were investigated, including Speech Transmission Index (STI), %Alcons, Signal to Noise Ratios (SNR), and 80 ms Useful/Detrimental Ratios (U80). Incorrect STI values were experienced in high noise environments, while the U80s and the SNRs were found to be the most accurate methodologies. Test results are evaluated against the guidelines of ANSI S12.60-2002, and match the data from previous research.

  18. Exploration of Speech Planning and Producing by Speech Error Analysis

    冷卉

    2012-01-01

    Speech error analysis is an indirect way to discover speech planning and producing processes. From some speech errors made by people in their daily life, linguists and learners can reveal the planning and producing processes more easily and clearly.

  19. Indirect Speech Acts

    李威

    2001-01-01

    Indirect speech acts are frequently used in verbal communication, the interpretation of them is of great importance in order to meet the demands of the development of students' communicative competence. This paper, therefore, intends to present Searle' s indirect speech acts and explore the way how indirect speech acts are interpreted in accordance with two influential theories. It consists of four parts. Part one gives a general introduction to the notion of speech acts theory. Part two makes an elaboration upon the conception of indirect speech act theory proposed by Searle and his supplement and development of illocutionary acts. Part three deals with the interpretation of indirect speech acts. Part four draws implication from the previous study and also serves as the conclusion of the dissertation.

  20. Charisma in business speeches

    Niebuhr, Oliver; Brem, Alexander; Novák-Tót, Eszter

    2016-01-01

    of the acoustic-prosodic signal, secondly, focuses on business speeches like product presentations, and, thirdly, in doing so, advances the still fairly fragmentary evidence on the prosodic correlates of charismatic speech. We show that the prosodic features of charisma in political speeches also apply...... to business speeches. Consistent with the public opinion, our findings are indicative of Steve Jobs being a more charismatic speaker than Mark Zuckerberg. Beyond previous studies, our data suggest that rhythm and emphatic accentuation are also involved in conveying charisma. Furthermore, the differences...

  1. Principles of speech coding

    Ogunfunmi, Tokunbo

    2010-01-01

    It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the

  2. Advances in Speech Recognition

    Neustein, Amy

    2010-01-01

    This volume is comprised of contributions from eminent leaders in the speech industry, and presents a comprehensive and in depth analysis of the progress of speech technology in the topical areas of mobile settings, healthcare and call centers. The material addresses the technical aspects of voice technology within the framework of societal needs, such as the use of speech recognition software to produce up-to-date electronic health records, not withstanding patients making changes to health plans and physicians. Included will be discussion of speech engineering, linguistics, human factors ana

  3. Emotional State Categorization from Speech: Machine vs. Human

    Shaukat, Arslan

    2010-01-01

    This paper presents our investigations on emotional state categorization from speech signals with a psychologically inspired computational model against human performance under the same experimental setup. Based on psychological studies, we propose a multistage categorization strategy which allows establishing an automatic categorization model flexibly for a given emotional speech categorization task. We apply the strategy to the Serbian Emotional Speech Corpus (GEES) and the Danish Emotional Speech Corpus (DES), where human performance was reported in previous psychological studies. Our work is the first attempt to apply machine learning to the GEES corpus where the human recognition rates were only available prior to our study. Unlike the previous work on the DES corpus, our work focuses on a comparison to human performance under the same experimental settings. Our studies suggest that psychology-inspired systems yield behaviours that, to a great extent, resemble what humans perceived and their performance ...

  4. Speech-Language Therapy (For Parents)

    ... Feeding Your 1- to 2-Year-Old Speech-Language Therapy KidsHealth > For Parents > Speech-Language Therapy A ... with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech disorder refers ...

  5. Speech Compression for Noise-Corrupted Thai Expressive Speech

    Suphattharachai Chomphan

    2011-01-01

    Full Text Available Problem statement: In speech communication, speech coding aims at preserving the speech quality with lower coding bitrate. When considering the communication environment, various types of noises deteriorates the speech quality. The expressive speech with different speaking styles may cause different speech quality with the same coding method. Approach: This research proposed a study of speech compression for noise-corrupted Thai expressive speech by using two coding methods of CS-ACELP and MP-CELP. The speech material included a hundredmale speech utterances and a hundred female speech utterances. Four speaking styles included enjoyable, sad, angry and reading styles. Five sentences of Thai speech were chosen. Three types of noises were included (train, car and air conditioner. Five levels of each type of noise were varied from 0-20 dB. The subjective test of mean opinion score was exploited in the evaluation process. Results: The experimental results showed that CS-ACELP gave the better speech quality than that of MP-CELP at all three bitrates of 6000, 8600-12600 bps. When considering the levels of noise, the 20-dB noise gave the best speech quality, while 0-dB noise gave the worst speech quality. When considering the speech gender, female speech gave the better results than that of male speech. When considering the types of noise, the air-conditioner noise gave the best speech quality, while the train noise gave the worst speech quality. Conclusion: From the study, it can be seen that coding methods, types of noise, levels of noise, speech gender influence on the coding speech quality.

  6. Dual Key Speech Encryption Algorithm Based Underdetermined BSS

    Huan Zhao

    2014-01-01

    Full Text Available When the number of the mixed signals is less than that of the source signals, the underdetermined blind source separation (BSS is a significant difficult problem. Due to the fact that the great amount data of speech communications and real-time communication has been required, we utilize the intractability of the underdetermined BSS problem to present a dual key speech encryption method. The original speech is mixed with dual key signals which consist of random key signals (one-time pad generated by secret seed and chaotic signals generated from chaotic system. In the decryption process, approximate calculation is used to recover the original speech signals. The proposed algorithm for speech signals encryption can resist traditional attacks against the encryption system, and owing to approximate calculation, decryption becomes faster and more accurate. It is demonstrated that the proposed method has high level of security and can recover the original signals quickly and efficiently yet maintaining excellent audio quality.

  7. Dual key speech encryption algorithm based underdetermined BSS.

    Zhao, Huan; He, Shaofang; Chen, Zuo; Zhang, Xixiang

    2014-01-01

    When the number of the mixed signals is less than that of the source signals, the underdetermined blind source separation (BSS) is a significant difficult problem. Due to the fact that the great amount data of speech communications and real-time communication has been required, we utilize the intractability of the underdetermined BSS problem to present a dual key speech encryption method. The original speech is mixed with dual key signals which consist of random key signals (one-time pad) generated by secret seed and chaotic signals generated from chaotic system. In the decryption process, approximate calculation is used to recover the original speech signals. The proposed algorithm for speech signals encryption can resist traditional attacks against the encryption system, and owing to approximate calculation, decryption becomes faster and more accurate. It is demonstrated that the proposed method has high level of security and can recover the original signals quickly and efficiently yet maintaining excellent audio quality.

  8. A Self-Instructional Device for Conditioning Accurate Prosody.

    Buiten, Roger; Lane, Harlan

    1965-01-01

    A self-instructional device for conditioning accurate prosody in second-language learning is described in this article. The Speech Auto-Instructional Device (SAID) is electro-mechanical and performs three functions: SAID (1) presents to the student tape-recorded pattern sentences that are considered standards in prosodic performance; (2) processes…

  9. Speech recognition systems on the Cell Broadband Engine

    Liu, Y; Jones, H; Vaidya, S; Perrone, M; Tydlitat, B; Nanda, A

    2007-04-20

    In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousands of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.

  10. MMSE based Noise Tracking and Echo Cancellation of Speech Signals

    Praveen. N

    2014-03-01

    Full Text Available During the recent years, there have been many studies on automatic audio classification and segmentation to use several features and techniques. The signal recorded at a microphone in a room incorporates the direct arrival of the sound from the source as well as the multiple weaker copies of the same signal that are created by sound reflections off the room walls. We call it as acoustic. Due to complications associated with exact tracking of the target in three dimensional environments, in many applications such as source localization and speech recognition the reverberate patterns imposed by the environment are seen as undesirable. In an auditorium, the common disturbances in public speech are Noise, Acoustic Echo etc. The MMSE estimator is one of the algorithms proposed for removal of additive background noise. It is a single channel speech enhancement technique for the enhancement of speech degraded by additive background noise. Background noise can effect our conversation in a noisy environment like in streets or in a car, when sending speech from the cockpit of an airplane to the ground or to the cabin and can effect both quality and intelligibility of speech. With the passage of time Spectral subtraction has undergone many modifications. This is a review paper and its objective is to provide an overview of MMSE estimator that has been proposed for enhancement of speech degraded by additive background noise during past decades

  11. Free Speech Yearbook 1976.

    Phifer, Gregg, Ed.

    The articles collected in this annual address several aspects of First Amendment Law. The following titles are included: "Freedom of Speech As an Academic Discipline" (Franklyn S. Haiman), "Free Speech and Foreign-Policy Decision Making" (Douglas N. Freeman), "The Supreme Court and the First Amendment: 1975-1976"…

  12. Private Speech in Ballet

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  13. Tracking Speech Sound Acquisition

    Powell, Thomas W.

    2011-01-01

    This article describes a procedure to aid in the clinical appraisal of child speech. The approach, based on the work by Dinnsen, Chin, Elbert, and Powell (1990; Some constraints on functionally disordered phonologies: Phonetic inventories and phonotactics. "Journal of Speech and Hearing Research", 33, 28-37), uses a railway idiom to track gains in…

  14. Preschool Connected Speech Inventory.

    DiJohnson, Albert; And Others

    This speech inventory developed for a study of aurally handicapped preschool children (see TM 001 129) provides information on intonation patterns in connected speech. The inventory consists of a list of phrases and simple sentences accompanied by pictorial clues. The test is individually administered by a teacher-examiner who presents the spoken…

  15. Free Speech. No. 38.

    Kane, Peter E., Ed.

    This issue of "Free Speech" contains the following articles: "Daniel Schoor Relieved of Reporting Duties" by Laurence Stern, "The Sellout at CBS" by Michael Harrington, "Defending Dan Schorr" by Tome Wicker, "Speech to the Washington Press Club, February 25, 1976" by Daniel Schorr, "Funds Voted For Schorr Inquiry" by Richard Lyons, "Erosion of the…

  16. Advertising and Free Speech.

    Hyman, Allen, Ed.; Johnson, M. Bruce, Ed.

    The articles collected in this book originated at a conference at which legal and economic scholars discussed the issue of First Amendment protection for commercial speech. The first article, in arguing for freedom for commercial speech, finds inconsistent and untenable the arguments of those who advocate freedom from regulation for political…

  17. Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...... from computational auditory scene analysis and further support the hypothesis that the SNRenv is a powerful metric for speech intelligibility prediction....

  18. Automatic lexical classification: bridging research and practice.

    Korhonen, Anna

    2010-08-13

    Natural language processing (NLP)--the automatic analysis, understanding and generation of human language by computers--is vitally dependent on accurate knowledge about words. Because words change their behaviour between text types, domains and sub-languages, a fully accurate static lexical resource (e.g. a dictionary, word classification) is unattainable. Researchers are now developing techniques that could be used to automatically acquire or update lexical resources from textual data. If successful, the automatic approach could considerably enhance the accuracy and portability of language technologies, such as machine translation, text mining and summarization. This paper reviews the recent and on-going research in automatic lexical acquisition. Focusing on lexical classification, it discusses the many challenges that still need to be met before the approach can benefit NLP on a large scale.

  19. Evaluating airborne sound insulation in terms of speech intelligibility.

    Park, H K; Bradley, J S; Gover, B N

    2008-03-01

    This paper reports on an evaluation of ratings of the sound insulation of simulated walls in terms of the intelligibility of speech transmitted through the walls. Subjects listened to speech modified to simulate transmission through 20 different walls with a wide range of sound insulation ratings, with constant ambient noise. The subjects' mean speech intelligibility scores were compared with various physical measures to test the success of the measures as sound insulation ratings. The standard Sound Transmission Class (STC) and Weighted Sound Reduction Index ratings were only moderately successful predictors of intelligibility scores, and eliminating the 8 dB rule from STC led to very modest improvements. Various previously established speech intelligibility measures (e.g., Articulation Index or Speech Intelligibility Index) and measures derived from them, such as the Articulation Class, were all relatively strongly related to speech intelligibility scores. In general, measures that involved arithmetic averages or summations of decibel values over frequency bands important for speech were most strongly related to intelligibility scores. The two most accurate predictors of the intelligibility of transmitted speech were an arithmetic average transmission loss over the frequencies from 200 to 2.5 kHz and the addition of a new spectrum weighting term to R(w) that included frequencies from 400 to 2.5 kHz.

  20. Environmental Contamination of Normal Speech.

    Harley, Trevor A.

    1990-01-01

    Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…

  1. UMLS-based automatic image indexing.

    Sneiderman, C; Sneiderman, Charles Alan; Demner-Fushman, D; Demner-Fushman, Dina; Fung, K W; Fung, Kin Wah; Bray, B; Bray, Bruce

    2008-01-01

    To date, most accurate image retrieval techniques rely on textual descriptions of images. Our goal is to automatically generate indexing terms for an image extracted from a biomedical article by identifying Unified Medical Language System (UMLS) concepts in image caption and its discussion in the text. In a pilot evaluation of the suggested image indexing method by five physicians, a third of the automatically identified index terms were found suitable for indexing.

  2. Speech endpoint detection in real noise environments

    GUO Yanmeng; FU Qiang; YAN Yonghong

    2007-01-01

    A method of speech endpoint detection in environments of complicated additive noise is presented. Based on the analysis of noise, an adaptive model of stationary noise is proposed to detect the section where the signal is nonstationary. Then the voice is detected in this section by its harmonic structure, and the accurate endpoint is searched using energy.Compared with the typical algorithms, this algorithm operates reliably in most real noise environments.

  3. Accurate guitar tuning by cochlear implant musicians.

    Thomas Lu

    Full Text Available Modern cochlear implant (CI users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to <0.5 Hz. One subject had normal contralateral hearing and produced more accurate tuning with CI than his normal ear. To understand these counterintuitive findings, we presented tones sequentially and found that tuning error was larger at ∼ 30 Hz for both subjects. A third subject, a non-musician CI user with normal contralateral hearing, showed similar trends in performance between CI and normal hearing ears but with less precision. This difference, along with electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task.

  4. Global Freedom of Speech

    Binderup, Lars Grassme

    2007-01-01

    , as opposed to a legal norm, that curbs exercises of the right to free speech that offend the feelings or beliefs of members from other cultural groups. The paper rejects the suggestion that acceptance of such a norm is in line with liberal egalitarian thinking. Following a review of the classical liberal...... egalitarian reasons for free speech - reasons from overall welfare, from autonomy and from respect for the equality of citizens - it is argued that these reasons outweigh the proposed reasons for curbing culturally offensive speech. Currently controversial cases such as that of the Danish Cartoon Controversy...

  5. Effect of talker and speaking style on the Speech Transmission Index (L)

    van Wijngaarden, Sander J.; Houtgast, Tammo

    2004-01-01

    The Speech Transmission Index (STI) is routinely applied for predicting the intelligibility of messages (sentences) in noise and reverberation. Despite clear evidence that the STI is capable of doing so accurately, recent results indicate that the STI sometimes underestimates the effect of reverberation on sentence intelligibility. To investigate the influence of talker and speaking style, the Speech Reception Threshold in noise and reverberation was measured for three talkers, differing in clarity of articulation and speaking style. For very clear speech, the standard STI yields accurate results. For more conversational speech by an untrained talker, the effect of reverberation is underestimated. Measurements of the envelope spectrum reveal that conversational speech has relatively stronger contributions by higher (> 12.5 Hz) modulation frequencies. By modifying the STI calculation procedure to include modulations in the range 12.5-31.5 Hz, better results are obtained for conversational speech. A speaking-style-dependent choice for the STI modulation frequency range is proposed.

  6. One-against-all weighted dynamic time warping for language-independent and speaker-dependent speech recognition in adverse conditions.

    Xianglilan Zhang

    Full Text Available Considering personal privacy and difficulty of obtaining training material for many seldom used English words and (often non-English names, language-independent (LI with lightweight speaker-dependent (SD automatic speech recognition (ASR is a promising option to solve the problem. The dynamic time warping (DTW algorithm is the state-of-the-art algorithm for small foot-print SD ASR applications with limited storage space and small vocabulary, such as voice dialing on mobile devices, menu-driven recognition, and voice control on vehicles and robotics. Even though we have successfully developed two fast and accurate DTW variations for clean speech data, speech recognition for adverse conditions is still a big challenge. In order to improve recognition accuracy in noisy environment and bad recording conditions such as too high or low volume, we introduce a novel one-against-all weighted DTW (OAWDTW. This method defines a one-against-all index (OAI for each time frame of training data and applies the OAIs to the core DTW process. Given two speech signals, OAWDTW tunes their final alignment score by using OAI in the DTW process. Our method achieves better accuracies than DTW and merge-weighted DTW (MWDTW, as 6.97% relative reduction of error rate (RRER compared with DTW and 15.91% RRER compared with MWDTW are observed in our extensive experiments on one representative SD dataset of four speakers' recordings. To the best of our knowledge, OAWDTW approach is the first weighted DTW specially designed for speech data in adverse conditions.

  7. Design of an expert system for phonetic speech recognition

    Carbonell, N.; Haton, J.P.; Pierrel, J.M.; Lonchamp, F.

    1983-07-01

    Expert systems have been extensively used as a means for integrating the expertise of a human being into an artificial intelligence system. The authors are presently designing an expert system which will integrate the strategy and the knowledge of a phonetician reading a speech spectrogram. Their goal is twofold, firstly to obtain a better insight into the acoustic-decoding of speech, and, secondly, to improve the efficiency of present automatic phonetic recognition systems. This paper presents a preliminary description of the project, especially the overall strategy of the expert and the role of duration parameters in the segmentation and identification processes.

  8. The Rhetoric in English Speech

    马鑫

    2014-01-01

    English speech has a very long history and always attached importance of people highly. People usually give a speech in economic activities, political forums and academic reports to express their opinions to investigate or persuade others. English speech plays a rather important role in English literature. The distinct theme of speech should attribute to the rhetoric. It discusses parallelism, repetition and rhetorical question in English speech, aiming to help people appreciate better the charm of them.

  9. Noise estimation Algorithms for Speech Enhancement in highly non-stationary Environments

    Anuradha R Fukane

    2011-03-01

    Full Text Available A noise estimation algorithm plays an important role in speech enhancement. Speech enhancement for automatic speaker recognition system, Man-Machine communication, Voice recognition systems, speech coders, Hearing aids, Video conferencing and many applications are related to speech processing. All these systems are real world systems and input available for these systems is only the noisy speech signal, before applying to these systems we have to remove the noise component from noisy speech signal means enhanced speech signal can be applied to these systems. In most speech enhancement algorithms, it is assumed that an estimate of noise spectrum is available. Noise estimate is critical part and it is important for speech enhancement algorithms. If the noise estimate is too low then annoying residual noise will be available and if the noise estimate is too high then speech will get distorted and loss intelligibility. This paper focus on the different approaches of noise estimation. Section I introduction, Section II explains simple approach of Voice activity detector (VAD for noise estimation, Section III explains different classes of noise estimation algorithms, Section IV explains performance evaluation of noise estimation algorithms, Section V conclusion.

  10. Speech intelligibility in hospitals.

    Ryherd, Erica E; Moeller, Michael; Hsu, Timothy

    2013-07-01

    Effective communication between staff members is key to patient safety in hospitals. A variety of patient care activities including admittance, evaluation, and treatment rely on oral communication. Surprisingly, published information on speech intelligibility in hospitals is extremely limited. In this study, speech intelligibility measurements and occupant evaluations were conducted in 20 units of five different U.S. hospitals. A variety of unit types and locations were studied. Results show that overall, no unit had "good" intelligibility based on the speech intelligibility index (SII > 0.75) and several locations found to have "poor" intelligibility (SII speech intelligibility across a variety of hospitals and unit types, offers some evidence of the positive impact of absorption on intelligibility, and identifies areas for future research.

  11. Anxiety and ritualized speech

    Lalljee, Mansur; Cook, Mark

    1975-01-01

    The experiment examines the effects on a number of words that seem irrelevant to semantic communication. The Units of Ritualized Speech (URSs) considered are: 'I mean', 'in fact', 'really', 'sort of', 'well' and 'you know'. (Editor)

  12. Speech disorders - children

    ... this page: //medlineplus.gov/ency/article/001430.htm Speech disorders - children To use the sharing features on ... 2017, A.D.A.M., Inc. Duplication for commercial use must be authorized in writing by ADAM ...

  13. Speech impairment (adult)

    ... this page: //medlineplus.gov/ency/article/003204.htm Speech impairment (adult) To use the sharing features on ... 2017, A.D.A.M., Inc. Duplication for commercial use must be authorized in writing by ADAM ...

  14. Speech Compression and Synthesis

    1980-10-01

    phonological rules combined with diphone improved the algorithms used by the phonetic synthesis prog?Im for gain normalization and time... phonetic vocoder, spectral template. i0^Th^TreprtTörc"u’d1sTuV^ork for the past two years on speech compression’and synthesis. Since there was an...from Block 19: speech recognition, pnoneme recogmtion. initial design for a phonetic recognition program. We also recorded ana partially labeled a

  15. On pre-image iterations for speech enhancement.

    Leitner, Christina; Pernkopf, Franz

    2015-01-01

    In this paper, we apply kernel PCA for speech enhancement and derive pre-image iterations for speech enhancement. Both methods make use of a Gaussian kernel. The kernel variance serves as tuning parameter that has to be adapted according to the SNR and the desired degree of de-noising. We develop a method to derive a suitable value for the kernel variance from a noise estimate to adapt pre-image iterations to arbitrary SNRs. In experiments, we compare the performance of kernel PCA and pre-image iterations in terms of objective speech quality measures and automatic speech recognition. The speech data is corrupted by white and colored noise at 0, 5, 10, and 15 dB SNR. As a benchmark, we provide results of the generalized subspace method, of spectral subtraction, and of the minimum mean-square error log-spectral amplitude estimator. In terms of the scores of the PEASS (Perceptual Evaluation Methods for Audio Source Separation) toolbox, the proposed methods achieve a similar performance as the reference methods. The speech recognition experiments show that the utterances processed by pre-image iterations achieve a consistently better word recognition accuracy than the unprocessed noisy utterances and than the utterances processed by the generalized subspace method.

  16. Study of acoustic correlates associate with emotional speech

    Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth

    2004-10-01

    This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.

  17. Speech and language therapy for aphasia following subacute stroke

    Engin Koyuncu

    2016-01-01

    Full Text Available The aim of this study was to investigate the time window, duration and intensity of optimal speech and language therapy applied to aphasic patients with subacute stroke in our hospital. The study consisted of 33 patients being hospitalized for stroke rehabilitation in our hospital with first stroke but without previous history of speech and language therapy. Sixteen sessions of impairment-based speech and language therapy were applied to the patients, 30-60 minutes per day, 2 days a week, for 8 successive weeks. Aphasia assessment in stroke patients was performed with Gülhane Aphasia Test-2 before and after treatment. Compared with before treatment, fluency of speech, listening comprehension, reading comprehension, oral motor evaluation, automatic speech, repetition and naming were improved after treatment. This suggests that 16 seesions of speech and language therapy, 30-60 minutes per day, 2 days a week, for 8 successive weeks, are effective in the treatment of aphasic patients with subacute stroke.

  18. Speech and language therapy for aphasia following subacute stroke

    Engin Koyuncu; Pnar am; Nermin Altnok; Duygu Ekinci all; Tuba Yarbay Duman; Nee zgirgin

    2016-01-01

    hTe aim of this study was to investigate the time window, duration and intensity of optimal speech and language therapy applied to aphasic patients with subacute stroke in our hospital. hTe study consisted of 33 patients being hospitalized for stroke rehabilitation in our hospital with ifrst stroke but without previous history of speech and language therapy. Sixteen sessions of impairment-based speech and language therapy were applied to the patients, 30–60 minutes per day, 2 days a week, for 8 successive weeks. Aphasia assess-ment in stroke patients was performed with Gülhane Aphasia Test-2 before and atfer treatment. Compared with before treatment, fluency of speech, listening comprehension, reading comprehension, oral motor evaluation, automatic speech, repetition and naming were improved atfer treatment. hTis suggests that 16 seesions of speech and language therapy, 30–60 minutes per day, 2 days a week, for 8 successive weeks, are effective in the treatment of aphasic patients with subacute stroke.

  19. Analysis of Phonetic Transcriptions for Danish Automatic Speech Recognition

    Kirkedal, Andreas Søeborg

    2013-01-01

    recognition system depends heavily on the dictionary and the transcriptions therein. This paper presents an analysis of phonetic/phonemic features that are salient for current Danish ASR systems. This preliminary study consists of a series of experiments using an ASR system trained on the DK-PAROLE corpus....... The analysis indicates that transcribing e.g. stress or vowel duration has a negative impact on performance. The best performance is obtained with coarse phonetic annotation and improves performance 1% word error rate and 3.8% sentence error rate....

  20. Stochastic Modeling as a Means of Automatic Speech Recognition

    1975-04-01

    posienon probability Pr( X| I :T|=x| l:T| | Y| I :T|«.y! I :T|. A, I, P, S ). where A, L, P, S represent the acoustic- phonetic , lexical, phonological , and... phonetic sequence by using multiple dictionary entries, phonological rules embedded in the dictionary, and a "degarbling" procedure. The search is...statistical model i f the hnguage. 2) a phonemic dictionary and statistical phonological rules. 3) a phonetic matching algorithm. 4) word level search

  1. An algorithm for automatic speech understanding over word graphs

    Sanchís Arnal, Emilio; Calvo Lance, Marcos; Gómez Adrian, Jon Ander; Hurtado Oliver, Lluis Felip

    2012-01-01

    [ES] En este trabajo se propone un algoritmo para la comprensión automática del habla que toma como entrada un grafo de palabras. Este grafo es procesado en primer lugar mediante un algoritmo de programación dinámica, obteniendo como resultado un segundo grafo enriquecido con información semántica. El cálculo del mejor camino sobre este segundo grafo permite obtener la secuencia de conceptos más verosímil de acuerdo con la evidencia acústica re¿ejada en el grafo de palabras. Ta...

  2. Cued Speech Gesture Recognition: A First Prototype Based on Early Reduction

    Caplier Alice

    2007-01-01

    Full Text Available Cued Speech is a specific linguistic code for hearing-impaired people. It is based on both lip reading and manual gestures. In the context of THIMP (Telephony for the Hearing-IMpaired Project, we work on automatic cued speech translation. In this paper, we only address the problem of automatic cued speech manual gesture recognition. Such a gesture recognition issue is really common from a theoretical point of view, but we approach it with respect to its particularities in order to derive an original method. This method is essentially built around a bioinspired method called early reduction. Prior to a complete analysis of each image of a sequence, the early reduction process automatically extracts a restricted number of key images which summarize the whole sequence. Only the key images are studied from a temporal point of view with lighter computation than the complete sequence.

  3. Cued Speech Gesture Recognition: A First Prototype Based on Early Reduction

    Pascal Perret

    2008-01-01

    Full Text Available Cued Speech is a specific linguistic code for hearing-impaired people. It is based on both lip reading and manual gestures. In the context of THIMP (Telephony for the Hearing-IMpaired Project, we work on automatic cued speech translation. In this paper, we only address the problem of automatic cued speech manual gesture recognition. Such a gesture recognition issue is really common from a theoretical point of view, but we approach it with respect to its particularities in order to derive an original method. This method is essentially built around a bioinspired method called early reduction. Prior to a complete analysis of each image of a sequence, the early reduction process automatically extracts a restricted number of key images which summarize the whole sequence. Only the key images are studied from a temporal point of view with lighter computation than the complete sequence.

  4. SPEECH DISORDERS ENCOUNTERED DURING SPEECH THERAPY AND THERAPY TECHNIQUES

    İlhan ERDEM

    2013-06-01

    Full Text Available Speech which is a physical and mental process, agreed signs and sounds to create a sense of mind to the message that change . Process to identify the sounds of speech it is essential to know the structure and function of various organs which allows to happen the conversation. Speech is a physical and mental process so many factors can lead to speech disorders. Speech disorder can be about language acquisitions as well as it can be caused medical and psychological many factors. Disordered speech, language, medical and psychological conditions as well as acquisitions also be caused by many factors. Speaking, is the collective work of many organs, such as an orchestra. Mental dimension of the speech disorder which is a very complex skill so it must be found which of these obstacles inhibit conversation. Speech disorder is a defect in speech flow, rhythm, tizliğinde, beats, the composition and vocalization. In this study, speech disorders such as articulation disorders, stuttering, aphasia, dysarthria, a local dialect speech, , language and lip-laziness, rapid speech peech defects in a term of language skills. This causes of speech disorders were investigated and presented suggestions for remedy was discussed.

  5. Comparative wavelet, PLP, and LPC speech recognition techniques on the Hindi speech digits database

    Mishra, A. N.; Shrotriya, M. C.; Sharan, S. N.

    2010-02-01

    In view of the growing use of automatic speech recognition in the modern society, we study various alternative representations of the speech signal that have the potential to contribute to the improvement of the recognition performance. In this paper wavelet based features using different wavelets are used for Hindi digits recognition. The recognition performance of these features has been compared with Linear Prediction Coefficients (LPC) and Perceptual Linear Prediction (PLP) features. All features have been tested using Hidden Markov Model (HMM) based classifier for speaker independent Hindi digits recognition. The recognition performance of PLP features is11.3% better than LPC features. The recognition performance with db10 features has shown a further improvement of 12.55% over PLP features. The recognition performance with db10 is best among all wavelet based features.

  6. Improving the speech intelligibility in classrooms

    Lam, Choi Ling Coriolanus

    One of the major acoustical concerns in classrooms is the establishment of effective verbal communication between teachers and students. Non-optimal acoustical conditions, resulting in reduced verbal communication, can cause two main problems. First, they can lead to reduce learning efficiency. Second, they can also cause fatigue, stress, vocal strain and health problems, such as headaches and sore throats, among teachers who are forced to compensate for poor acoustical conditions by raising their voices. Besides, inadequate acoustical conditions can induce the usage of public address system. Improper usage of such amplifiers or loudspeakers can lead to impairment of students' hearing systems. The social costs of poor classroom acoustics will be large to impair the learning of children. This invisible problem has far reaching implications for learning, but is easily solved. Many researches have been carried out that they have accurately and concisely summarized the research findings on classrooms acoustics. Though, there is still a number of challenging questions remaining unanswered. Most objective indices for speech intelligibility are essentially based on studies of western languages. Even several studies of tonal languages as Mandarin have been conducted, there is much less on Cantonese. In this research, measurements have been done in unoccupied rooms to investigate the acoustical parameters and characteristics of the classrooms. The speech intelligibility tests, which based on English, Mandarin and Cantonese, and the survey were carried out on students aged from 5 years old to 22 years old. It aims to investigate the differences in intelligibility between English, Mandarin and Cantonese of the classrooms in Hong Kong. The significance on speech transmission index (STI) related to Phonetically Balanced (PB) word scores will further be developed. Together with developed empirical relationship between the speech intelligibility in classrooms with the variations

  7. Practical speech user interface design

    Lewis, James R

    2010-01-01

    Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application

  8. 'What is it?' A functional MRI and SPECT study of ictal speech in a second language

    Navarro, V.; Chauvire, V.; Baulac, M.; Cohen, L. [Department of Neurology, AP-HP, Hopital de la Pitie-Salpetriere, IFR 70, Paris (France); Delmaire, Ch.; Lehericy, St. [Department of Neuroradiology, AP-HP, Hopital de la Pitie-Salpetriere, IFR 70, Paris (France); Habert, M.O. [Department of Nuclear Medicine, AP-HP, Hopital de la Pitie-Salpetriere, IFR 70, Paris (France); Footnick, R.; Pallier, Ch. [INSERM, U562, CEA/DSV, IFR 49, Orsay (France); Baulac, M.; Cohen, L. [Universite Paris VI, Faculte Pitie-Salpetriere, Paris (France)

    2009-07-01

    Neuronal networks involved in second language (L2) processing vary between normal subjects. Patients with epilepsy may have ictal speech automatisms in their second language. To delineate the brain systems involved in L2 ictal speech, we combined functional MRI during bilingual tasks and ictal - inter-ictal single-photon emission computed tomography in a patient who presented L2 ictal speech productions. These analyses showed that the networks activated by the seizure and those activated by L2 processing intersected in the right hippocampus. These results may provide some insights both into the pathophysiology of ictal speech and into the brain organization for L2. (authors)

  9. Speech-Language Therapy (For Parents)

    ... Speech-language pathologists (SLPs), often informally known as speech therapists, are professionals educated in the study of human ... Palate Hearing Evaluation in Children Going to a Speech Therapist Stuttering Hearing Impairment Speech Problems Cleft Lip and ...

  10. Degraded neural and behavioral processing of speech sounds in a rat model of Rett syndrome.

    Engineer, Crystal T; Rahebi, Kimiya C; Borland, Michael S; Buell, Elizabeth P; Centanni, Tracy M; Fink, Melyssa K; Im, Kwok W; Wilson, Linda G; Kilgard, Michael P

    2015-11-01

    Individuals with Rett syndrome have greatly impaired speech and language abilities. Auditory brainstem responses to sounds are normal, but cortical responses are highly abnormal. In this study, we used the novel rat Mecp2 knockout model of Rett syndrome to document the neural and behavioral processing of speech sounds. We hypothesized that both speech discrimination ability and the neural response to speech sounds would be impaired in Mecp2 rats. We expected that extensive speech training would improve speech discrimination ability and the cortical response to speech sounds. Our results reveal that speech responses across all four auditory cortex fields of Mecp2 rats were hyperexcitable, responded slower, and were less able to follow rapidly presented sounds. While Mecp2 rats could accurately perform consonant and vowel discrimination tasks in quiet, they were significantly impaired at speech sound discrimination in background noise. Extensive speech training improved discrimination ability. Training shifted cortical responses in both Mecp2 and control rats to favor the onset of speech sounds. While training increased the response to low frequency sounds in control rats, the opposite occurred in Mecp2 rats. Although neural coding and plasticity are abnormal in the rat model of Rett syndrome, extensive therapy appears to be effective. These findings may help to explain some aspects of communication deficits in Rett syndrome and suggest that extensive rehabilitation therapy might prove beneficial.

  11. Phone Duration Modeling of Affective Speech Using Support Vector Regression

    Alexandros Lazaridis

    2012-07-01

    Full Text Available In speech synthesis accurate modeling of prosody is important for producing high quality synthetic speech. One of the main aspects of prosody is phone duration. Robust phone duration modeling is a prerequisite for synthesizing emotional speech with natural sounding. In this work ten phone duration models are evaluated. These models belong to well known and widely used categories of algorithms, such as the decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms. Furthermore, we investigate the effectiveness of Support Vector Regression (SVR in phone duration modeling in the context of emotional speech. The evaluation of the eleven models is performed on a Modern Greek emotional speech database which consists of four categories of emotional speech (anger, fear, joy, sadness plus neutral speech. The experimental results demonstrated that the SVR-based modeling outperforms the other ten models across all the four emotion categories. Specifically, the SVR model achieved an average relative reduction of 8% in terms of root mean square error (RMSE throughout all emotional categories.

  12. Exploring expressivity and emotion with artificial voice and speech technologies.

    Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James

    2013-10-01

    Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration.

  13. Mandarin Digits Speech Recognition Using Support Vector Machines

    XIE Xiang; KUANG Jing-ming

    2005-01-01

    A method of applying support vector machine (SVM) in speech recognition was proposed, and a speech recognition system for mandarin digits was built up by SVMs. In the system, vectors were linearly extracted from speech feature sequence to make up time-aligned input patterns for SVM, and the decisions of several 2-class SVM classifiers were employed for constructing an N-class classifier. Four kinds of SVM kernel functions were compared in the experiments of speaker-independent speech recognition of mandarin digits. And the kernel of radial basis function has the highest accurate rate of 99.33%, which is better than that of the baseline system based on hidden Markov models (HMM) (97.08%). And the experiments also show that SVM can outperform HMM especially when the samples for learning were very limited.

  14. SEMI-AUTOMATIC SPEAKER VERIFICATION SYSTEM

    E. V. Bulgakova

    2016-03-01

    Full Text Available Subject of Research. The paper presents a semi-automatic speaker verification system based on comparing of formant values, statistics of phone lengths and melodic characteristics as well. Due to the development of speech technology, there is an increased interest now in searching for expert speaker verification systems, which have high reliability and low labour intensiveness because of the automation of data processing for the expert analysis. System Description. We present a description of a novel system analyzing similarity or distinction of speaker voices based on comparing statistics of phone lengths, formant features and melodic characteristics. The characteristic feature of the proposed system based on fusion of methods is a weak correlation between the analyzed features that leads to a decrease in the error rate of speaker recognition. The system advantage is the possibility to carry out rapid analysis of recordings since the processes of data preprocessing and making decision are automated. We describe the functioning methods as well as fusion of methods to combine their decisions. Main Results. We have tested the system on the speech database of 1190 target trials and 10450 non-target trials, including the Russian speech of the male and female speakers. The recognition accuracy of the system is 98.59% on the database containing records of the male speech, and 96.17% on the database containing records of the female speech. It was also experimentally established that the formant method is the most reliable of all used methods. Practical Significance. Experimental results have shown that proposed system is applicable for the speaker recognition task in the course of phonoscopic examination.

  15. Tackling the complexity in speech

    section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...

  16. Denial Denied: Freedom of Speech

    Glen Newey

    2009-12-01

    Full Text Available Free speech is a widely held principle. This is in some ways surprising, since formal and informal censorship of speech is widespread, and rather different issues seem to arise depending on whether the censorship concerns who speaks, what content is spoken or how it is spoken. I argue that despite these facts, free speech can indeed be seen as a unitary principle. On my analysis, the core of the free speech principle is the denial of the denial of speech, whether to a speaker, to a proposition, or to a mode of expression. Underlying free speech is the principle of freedom of association, according to which speech is both a precondition of future association (e.g. as a medium for negotiation and a mode of association in its own right. I conclude by applying this account briefly to two contentious issues: hate speech and pornography.

  17. Speaking Fluently And Accurately

    JosephDeVeto

    2004-01-01

    Even after many years of study,students make frequent mistakes in English. In addition, many students still need a long time to think of what they want to say. For some reason, in spite of all the studying, students are still not quite fluent.When I teach, I use one technique that helps students not only speak more accurately, but also more fluently. That technique is dictations.

  18. On Optimal Linear Filtering of Speech for Near-End Listening Enhancement

    Taal, Cees H.; Jensen, Jesper; Leijon, Arne

    2013-01-01

    In this letter the focus is on linear filtering of speech before degradation due to additive background noise. The goal is to design the filter such that the speech intelligibility index (SII) is maximized when the speech is played back in a known noisy environment. Moreover, a power constraint...... suboptimal. In this work we propose a nonlinear approximation of the SII which is accurate for all SNRs. Experiments show large intelligibility improvements with the proposed method over the unprocessed noisy speech and better performance than one state-of-the art method....

  19. Exploring the role of low level visual processing in letter-speech sound integration: a visual MMN study

    Dries Froyen

    2010-04-01

    Full Text Available In contrast with for example audiovisual speech, the relation between visual and auditory properties of letters and speech sounds is artificial and learned only by explicit instruction. The arbitrariness of the audiovisual link together with the widespread usage of letter – speech sound pairs in alphabetic languages makes those audiovisual objects a unique subject for cross-modal research. Brain imaging evidence has indicated that heteromodal areas in superior temporal, as well as modality-specific auditory cortex are involved in letter – speech sound processing. The role of low level visual areas, however, remains unclear. In this study the visual counterpart of the auditory mismatch negativity (MMN is used to investigate the influences of speech sounds on letter processing. Letter and non-letter deviants were infrequently presented in a train of standard letters, either in isolation or simultaneously with speech sounds. Although previous findings showed that letters systematically modulate speech sound processing (reflected by auditory MMN amplitude modulation, the reverse does not seem to hold: our results did not show evidence for an automatic influence of speech sounds on letter processing (no visual MMN amplitude modulation. This apparent asymmetric recruitment of low level sensory cortices during letter - speech sound processing, contrasts with the symmetric involvement of these cortices in audiovisual speech processing, and is possibly due to the arbitrary nature of the link between letters and speech sounds.

  20. Exploring the Role of Low Level Visual Processing in Letter–Speech Sound Integration: A Visual MMN Study

    Froyen, Dries; van Atteveldt, Nienke; Blomert, Leo

    2009-01-01

    In contrast with for example audiovisual speech, the relation between visual and auditory properties of letters and speech sounds is artificial and learned only by explicit instruction. The arbitrariness of the audiovisual link together with the widespread usage of letter–speech sound pairs in alphabetic languages makes those audiovisual objects a unique subject for crossmodal research. Brain imaging evidence has indicated that heteromodal areas in superior temporal, as well as modality-specific auditory cortex are involved in letter–speech sound processing. The role of low level visual areas, however, remains unclear. In this study the visual counterpart of the auditory mismatch negativity (MMN) is used to investigate the influences of speech sounds on letter processing. Letter and non-letter deviants were infrequently presented in a train of standard letters, either in isolation or simultaneously with speech sounds. Although previous findings showed that letters systematically modulate speech sound processing (reflected by auditory MMN amplitude modulation), the reverse does not seem to hold: our results did not show evidence for an automatic influence of speech sounds on letter processing (no visual MMN amplitude modulation). This apparent asymmetric recruitment of low level sensory cortices during letter–speech sound processing, contrasts with the symmetric involvement of these cortices in audiovisual speech processing, and is possibly due to the arbitrary nature of the link between letters and speech sounds. PMID:20428501

  1. Exploring the Role of Low Level Visual Processing in Letter-Speech Sound Integration: A Visual MMN Study.

    Froyen, Dries; van Atteveldt, Nienke; Blomert, Leo

    2010-01-01

    In contrast with for example audiovisual speech, the relation between visual and auditory properties of letters and speech sounds is artificial and learned only by explicit instruction. The arbitrariness of the audiovisual link together with the widespread usage of letter-speech sound pairs in alphabetic languages makes those audiovisual objects a unique subject for crossmodal research. Brain imaging evidence has indicated that heteromodal areas in superior temporal, as well as modality-specific auditory cortex are involved in letter-speech sound processing. The role of low level visual areas, however, remains unclear. In this study the visual counterpart of the auditory mismatch negativity (MMN) is used to investigate the influences of speech sounds on letter processing. Letter and non-letter deviants were infrequently presented in a train of standard letters, either in isolation or simultaneously with speech sounds. Although previous findings showed that letters systematically modulate speech sound processing (reflected by auditory MMN amplitude modulation), the reverse does not seem to hold: our results did not show evidence for an automatic influence of speech sounds on letter processing (no visual MMN amplitude modulation). This apparent asymmetric recruitment of low level sensory cortices during letter-speech sound processing, contrasts with the symmetric involvement of these cortices in audiovisual speech processing, and is possibly due to the arbitrary nature of the link between letters and speech sounds.

  2. RECOGNISING SPEECH ACTS

    Phyllis Kaburise

    2012-09-01

    Full Text Available Speech Act Theory (SAT, a theory in pragmatics, is an attempt to describe what happens during linguistic interactions. Inherent within SAT is the idea that language forms and intentions are relatively formulaic and that there is a direct correspondence between sentence forms (for example, in terms of structure and lexicon and the function or meaning of an utterance. The contention offered in this paper is that when such a correspondence does not exist, as in indirect speech utterances, this creates challenges for English second language speakers and may result in miscommunication. This arises because indirect speech acts allow speakers to employ various pragmatic devices such as inference, implicature, presuppositions and context clues to transmit their messages. Such devices, operating within the non-literal level of language competence, may pose challenges for ESL learners.

  3. Speech spectrogram expert

    Johannsen, J.; Macallister, J.; Michalek, T.; Ross, S.

    1983-01-01

    Various authors have pointed out that humans can become quite adept at deriving phonetic transcriptions from speech spectrograms (as good as 90percent accuracy at the phoneme level). The authors describe an expert system which attempts to simulate this performance. The speech spectrogram expert (spex) is actually a society made up of three experts: a 2-dimensional vision expert, an acoustic-phonetic expert, and a phonetics expert. The visual reasoning expert finds important visual features of the spectrogram. The acoustic-phonetic expert reasons about how visual features relates to phonemes, and about how phonemes change visually in different contexts. The phonetics expert reasons about allowable phoneme sequences and transformations, and deduces an english spelling for phoneme strings. The speech spectrogram expert is highly interactive, allowing users to investigate hypotheses and edit rules. 10 references.

  4. Protection limits on free speech

    李敏

    2014-01-01

    Freedom of speech is one of the basic rights of citizens should receive broad protection, but in the real context of China under what kind of speech can be protected and be restricted, how to grasp between state power and free speech limit is a question worth considering. People tend to ignore the freedom of speech and its function, so that some of the rhetoric cannot be demonstrated in the open debates.

  5. Designing speech for a recipient

    Fischer, Kerstin

    is investigated on three candidates for so-called ‘simplified registers’: speech to children (also called motherese or baby talk), speech to foreigners (also called foreigner talk) and speech to robots. The volume integrates research from various disciplines, such as psychology, sociolinguistics...

  6. Abortion and compelled physician speech.

    Orentlicher, David

    2015-01-01

    Informed consent mandates for abortion providers may infringe the First Amendment's freedom of speech. On the other hand, they may reinforce the physician's duty to obtain informed consent. Courts can promote both doctrines by ensuring that compelled physician speech pertains to medical facts about abortion rather than abortion ideology and that compelled speech is truthful and not misleading.

  7. SPEECH DISORDERS ENCOUNTERED DURING SPEECH THERAPY AND THERAPY TECHNIQUES

    2013-01-01

    Speech which is a physical and mental process, agreed signs and sounds to create a sense of mind to the message that change . Process to identify the sounds of speech it is essential to know the structure and function of various organs which allows to happen the conversation. Speech is a physical and mental process so many factors can lead to speech disorders. Speech disorder can be about language acquisitions as well as it can be caused medical and psychological many factors. Disordered sp...

  8. A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners

    Rhebergen, Koenraad S.; Versfeld, Niek J.

    2005-04-01

    The SII model in its present form (ANSI S3.5-1997, American National Standards Institute, New York) can accurately describe intelligibility for speech in stationary noise but fails to do so for nonstationary noise maskers. Here, an extension to the SII model is proposed with the aim to predict the speech intelligibility in both stationary and fluctuating noise. The basic principle of the present approach is that both speech and noise signal are partitioned into small time frames. Within each time frame the conventional SII is determined, yielding the speech information available to the listener at that time frame. Next, the SII values of these time frames are averaged, resulting in the SII for that particular condition. Using speech reception threshold (SRT) data from the literature, the extension to the present SII model can give a good account for SRTs in stationary noise, fluctuating speech noise, interrupted noise, and multiple-talker noise. The predictions for sinusoidally intensity modulated (SIM) noise and real speech or speech-like maskers are better than with the original SII model, but are still not accurate. For the latter type of maskers, informational masking may play a role. .

  9. Speech intelligibility measure for vocal control of an automaton

    Naranjo, Michel; Tsirigotis, Georgios

    1998-07-01

    The acceleration of investigations in Speech Recognition allows to augur, in the next future, a wide establishment of Vocal Control Systems in the production units. The communication between a human and a machine necessitates technical devices that emit, or are submitted to important noise perturbations. The vocal interface introduces a new control problem of a deterministic automaton using uncertain information. The purpose is to place exactly the automaton in a final state, ordered by voice, from an unknown initial state. The whole Speech Processing procedure, presented in this paper, has for input the temporal speech signal of a word and for output a recognised word labelled with an intelligibility index given by the recognition quality. In the first part, we present the essential psychoacoustic concepts for the automatic calculation of the loudness of a speech signal. The architecture of a Time Delay Neural Network is presented in second part where we also give the results of the recognition. The theory of the fuzzy subset, in third part, allows to extract at the same time a recognised word and its intelligibility index. In the fourth part, an Anticipatory System models the control of a Sequential Machine. A prediction phase and an updating one appear which involve data coming from the information system. A Bayesian decision strategy is used and the criterion is a weighted sum of criteria defined from information, minimum path functions and speech intelligibility measure.

  10. Mandarin Visual Speech Information

    Chen, Trevor H.

    2010-01-01

    While the auditory-only aspects of Mandarin speech are heavily-researched and well-known in the field, this dissertation addresses its lesser-known aspects: The visual and audio-visual perception of Mandarin segmental information and lexical-tone information. Chapter II of this dissertation focuses on the audiovisual perception of Mandarin…

  11. Speech After Banquet

    Yang, Chen Ning

    2013-05-01

    I am usually not so short of words, but the previous speeches have rendered me really speechless. I have known and admired the eloquence of Freeman Dyson, but I did not know that there is a hidden eloquence in my colleague George Sterman...

  12. Speech and Hearing Therapy.

    Sakata, Reiko; Sakata, Robert

    1978-01-01

    In the public school, the speech and hearing therapist attempts to foster child growth and development through the provision of services basic to awareness of self and others, management of personal and social interactions, and development of strategies for coping with the handicap. (MM)

  13. The Commercial Speech Doctrine.

    Luebke, Barbara F.

    In its 1942 ruling in the "Valentine vs. Christensen" case, the Supreme Court established the doctrine that commercial speech is not protected by the First Amendment. In 1975, in the "Bigelow vs. Virginia" case, the Supreme Court took a decisive step toward abrogating that doctrine, by ruling that advertising is not stripped of…

  14. Superior Speech Acquisition and Robust Automatic Speech Recognition for Integrated Spacesuit Audio Systems Project

    National Aeronautics and Space Administration — Astronauts suffer from poor dexterity of their hands due to the clumsy spacesuit gloves during Extravehicular Activity (EVA) operations and NASA has had a widely...

  15. Metaheuristic applications to speech enhancement

    Kunche, Prajna

    2016-01-01

    This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.

  16. A Mobile Phone based Speech Therapist

    Pandey, Vinod K.; Pande, Arun; Kopparapu, Sunil Kumar

    2016-01-01

    Patients with articulatory disorders often have difficulty in speaking. These patients need several speech therapy sessions to enable them speak normally. These therapy sessions are conducted by a specialized speech therapist. The goal of speech therapy is to develop good speech habits as well as to teach how to articulate sounds the right way. Speech therapy is critical for continuous improvement to regain normal speech. Speech therapy sessions require a patient to travel to a hospital or a ...

  17. Parameter estimation of labial movements in speech production: implications for speech motor control.

    Hinton, V A; Robey, R R

    1995-08-01

    Central to theories of speech motor control are estimates on magnitudes of lip activity expressed in terms of central tendency, variability, and interrelatedness. In fact, the tenability of each of two competing theories of motor control for speech production rests solely on the observation of the predicted direction of the correlation coefficient (one positive and one negative) that indexes the relationship of concurrent lip activity. Each theory, however, predicts a relationship that is the complete opposite of the relationship predicted by the other. That is, one theory proposes that the labial system functions on the basis of complementary variation, whereas the other assumes positive covariation, or complementary modulation. In apparent contradiction, each prediction has been observed under laboratory conditions. The explanation for this apparent contradiction resides in the small sample sizes upon which each estimate was based. The minimum number of observations that are necessary to achieve accurate estimates of lip displacement parameters has remained unclear. This paper addresses three fundamental questions: (a) how many observations of on-task behavior are necessary to accurately estimate mean and variance values for the magnitude of upper lip displacement in a speech production experiment?, (b) what is the analogous number of observations for estimating the same values of lower lip displacement (together with the mandible) in the same context?, and (c) how many observations are necessary to accurately estimate the correlation coefficient indexing the relationship of lip displacements during the production of speech? Answers to these questions are accomplished through a review of estimator properties, a Monte Carlo computer simulation, and through laboratory observations.(ABSTRACT TRUNCATED AT 250 WORDS)

  18. Automatic Fiscal Stabilizers

    Narcis Eduard Mitu

    2013-11-01

    Full Text Available Policies or institutions (built into an economic system that automatically tend to dampen economic cycle fluctuations in income, employment, etc., without direct government intervention. For example, in boom times, progressive income tax automatically reduces money supply as incomes and spendings rise. Similarly, in recessionary times, payment of unemployment benefits injects more money in the system and stimulates demand. Also called automatic stabilizers or built-in stabilizers.

  19. Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

    van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

    2007-01-01

    Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…

  20. Direct and indirect measures of speech articulator motions using low power EM sensors

    Barnes, T; Burnett, G; Gable, T; Holzrichter, J F; Ng, L

    1999-05-12

    Low power Electromagnetic (EM) Wave sensors can measure general properties of human speech articulator motions, as speech is produced. See Holzrichter, Burnett, Ng, and Lea, J.Acoust.Soc.Am. 103 (1) 622 (1998). Experiments have demonstrated extremely accurate pitch measurements (< 1 Hz per pitch cycle) and accurate onset of voiced speech. Recent measurements of pressure-induced tracheal motions enable very good spectra and amplitude estimates of a voiced excitation function. The use of the measured excitation functions and pitch synchronous processing enable the determination of each pitch cycle of an accurate transfer function and, indirectly, of the corresponding articulator motions. In addition, direct measurements have been made of EM wave reflections from articulator interfaces, including jaw, tongue, and palate, simultaneously with acoustic and glottal open/close signals. While several types of EM sensors are suitable for speech articulator measurements, the homodyne sensor has been found to provide good spatial and temporal resolution for several applications.

  1. Inner Speech in People with Aphasia

    William Hayward

    2014-05-01

    Here, we have demonstrated that subjective reports of IS are meaningful in at least some individuals with aphasia. Additional research is needed to confirm the degree to which self-reported IS accurately reflects phonological access, as well as to determine which processes of word retrieval and self-monitoring are needed for reliable insights into inner speech and how specific language deficits interact with these insights. Examining insight into IS could inform our understanding of the psychological and neural bases of word retrieval as well as provide a novel tool for early prognosis and individualized aphasia treatment.

  2. Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar.

    Shin, Young Hoon; Seo, Jiwon

    2016-10-29

    People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.

  3. Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar

    Young Hoon Shin

    2016-10-01

    Full Text Available People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker’s vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.

  4. Analytical Study on Fundamental Frequency Contours of Thai Expressive Speech Using Fujisaki's Model

    Suphattharachai Chomphan

    2010-01-01

    Full Text Available Problem statement: In spontaneous speech communication, prosody is an important factor that must be taken into account, since the prosody effects on not only the naturalness but also the intelligibility of speech. Focusing on synthesis of Thai expressive speech, a number of systems has been developed for years. However, the expressive speech with various speaking styles has not been accomplished. To achieve the generation of expressive speech, we need to model the fundamental frequency (F0 contours accurately to preserve the speech prosody. Approach: Therefore this study proposes an analysis of model parameters for Thai speech prosody with three speaking styles and two genders which is a preliminary work for speech synthesis. Fujisaki's modeling; a powerful tool to model the F0 contour has been adopted, while the speaking styles of happiness, sadness and reading have been considered. Seven derived parameters from the Fujisaki's model are as follows. The first parameter is baseline frequency which is the lowest level of F0 contour. The second and third parameters are the numbers of phrase commands and tone commands which reflect the frequencies of surges of the utterance in global and local levels, respectively. The fourth and fifth parameters are phrase command and tone command durations which reflect the speed of speaking and the length of a syllable, respectively. The sixth and seventh parameters are amplitudes of phrase command and tone command which reflect the energy of the global speech and the energy of local syllable. Results: In the experiments, each speaking style includes 200 samples of one sentence with male and female speech. Therefore our speech database contains 1200 utterances in total. The results show that most of the proposed parameters can distinguish three kinds of speaking styles explicitly. Conclusion: From the finding, it is a strong evidence to further apply the successful parameters in the speech synthesis systems or

  5. Preferred real-ear insertion gain on a commercial hearing aid at different speech and noise levels.

    Kuk, F K; Harper, T; Doubek, K

    1994-03-01

    In the present study, we measured preferred real-ear insertion gain (REIG) under different levels of speech and noise to assess whether current automatic gain control (AGC) and automatic signal processing (ASP) hearing aids are operating optimally. Preferred REIG for optimal speech clarity was determined under seven speech and noise conditions. In four conditions, speech (discourse passages) was varied from 55 dB SPL to 85 dB SPL in 10-dB steps at a fixed signal-to-noise ratio (S/N) of +5. In the remaining conditions, speech was fixed at 65 dB SPL while the noise level was varied in 5-dB steps to yield S/Ns from +10 to -5. The results showed that subjects selected less gain as speech or noise levels were increased. In general, less overall gain was selected as speech level was increased, and less overall gain, especially in the low-frequency region, was selected as the S/N ratio became progressively poorer. These results are discussed in relation to how hearing aids with adaptive frequency/gain responses should respond to varying input levels to achieve optimal clarity of speech.

  6. Sensorimotor Interactions in Speech Learning

    Douglas M Shiller

    2011-10-01

    Full Text Available Auditory input is essential for normal speech development and plays a key role in speech production throughout the life span. In traditional models, auditory input plays two critical roles: 1 establishing the acoustic correlates of speech sounds that serve, in part, as the targets of speech production, and 2 as a source of feedback about a talker's own speech outcomes. This talk will focus on both of these roles, describing a series of studies that examine the capacity of children and adults to adapt to real-time manipulations of auditory feedback during speech production. In one study, we examined sensory and motor adaptation to a manipulation of auditory feedback during production of the fricative “s”. In contrast to prior accounts, adaptive changes were observed not only in speech motor output but also in subjects' perception of the sound. In a second study, speech adaptation was examined following a period of auditory–perceptual training targeting the perception of vowels. The perceptual training was found to systematically improve subjects' motor adaptation response to altered auditory feedback during speech production. The results of both studies support the idea that perceptual and motor processes are tightly coupled in speech production learning, and that the degree and nature of this coupling may change with development.

  7. Aero-tactile integration in speech perception

    Gick, Bryan; Derrick, Donald

    2013-01-01

    Visual information from a speaker’s face can enhance1 or interfere with2 accurate auditory perception. This integration of information across auditory and visual streams has been observed in functional imaging studies3,4, and has typically been attributed to the frequency and robustness with which perceivers jointly encounter event-specific information from these two modalities5. Adding the tactile modality has long been considered a crucial next step in understanding multisensory integration. However, previous studies have found an influence of tactile input on speech perception only under limited circumstances, either where perceivers were aware of the task6,7 or where they had received training to establish a cross-modal mapping8–10. Here we show that perceivers integrate naturalistic tactile information during auditory speech perception without previous training. Drawing on the observation that some speech sounds produce tiny bursts of aspiration (such as English ‘p’)11, we applied slight, inaudible air puffs on participants’ skin at one of two locations: the right hand or the neck. Syllables heard simultaneously with cutaneous air puffs were more likely to be heard as aspirated (for example, causing participants to mishear ‘b’ as ‘p’). These results demonstrate that perceivers integrate event-relevant tactile information in auditory perception in much the same way as they do visual information. PMID:19940925

  8. Objective speech quality measures for Internet telephony

    Hall, Timothy A.

    2001-07-01

    Measuring voice quality for telephony is not a new problem. However, packet-switched, best-effort networks such as the Internet present significant new challenges for the delivery of real-time voice traffic. Unlike the circuit-switched PSTN, Internet protocol (IP) networks guarantee neither sufficient bandwidth for the voice traffic nor a constant, minimal delay. Dropped packets and varying delays introduce distortions not found in traditional telephony. In addition, if a low bitrate codec is used in voice over IP (VoIP) to achieve a high compression ratio, the original waveform can be significantly distorted. These new potential sources of signal distortion present significant challenges for objectively measuring speech quality. Measurement techniques designed for the PSTN may not perform well in VoIP environments. Our objective is to find a speech quality metric that accurately predicts subjective human perception under the conditions present in VoIP systems. To do this, we compared three types of measures: perceptually weighted distortion measures such as enhanced modified Bark spectral distance (EMBSD) and measuring normalizing blocks (MNB), word-error rates of continuous speech recognizers, and the ITU E-model. We tested the performance of these measures under conditions typical of a VoIP system. We found that the E-model had the highest correlation with mean opinion scores (MOS). The E-model is well-suited for online monitoring because it does not use the original (undistorted) signal to compute its quality metric and because it is computationally simple.

  9. An acoustic feature-based similarity scoring system for speech rehabilitation assistance.

    Syauqy, Dahnial; Wu, Chao-Min; Setyawati, Onny

    2016-08-01

    The purpose of this study is to develop a tool to assist speech therapy and rehabilitation, which focused on automatic scoring based on the comparison of the patient's speech with another normal speech on several aspects including pitch, vowel, voiced-unvoiced segments, strident fricative and sound intensity. The pitch estimation employed the use of cepstrum-based algorithm for its robustness; the vowel classification used multilayer perceptron (MLP) to classify vowel from pitch and formants; and the strident fricative detection was based on the major peak spectral intensity, location and the pitch existence in the segment. In order to evaluate the performance of the system, this study analyzed eight patient's speech recordings (four males, four females; 4-58-years-old), which had been recorded in previous study in cooperation with Taipei Veterans General Hospital and Taoyuan General Hospital. The experiment result on pitch algorithm showed that the cepstrum method had 5.3% of gross pitch error from a total of 2086 frames. On the vowel classification algorithm, MLP method provided 93% accuracy (men), 87% (women) and 84% (children). In total, the overall results showed that 156 tool's grading results (81%) were consistent compared to 192 audio and visual observations done by four experienced respondents. Implication for Rehabilitation Difficulties in communication may limit the ability of a person to transfer and exchange information. The fact that speech is one of the primary means of communication has encouraged the needs of speech diagnosis and rehabilitation. The advances of technology in computer-assisted speech therapy (CAST) improve the quality, time efficiency of the diagnosis and treatment of the disorders. The present study attempted to develop tool to assist speech therapy and rehabilitation, which provided simple interface to let the assessment be done even by the patient himself without the need of particular knowledge of speech processing while at the

  10. Combining Knowledge Sources to Reorder N-Best Speech Hypothesis Lists

    Rayner, M; Digalakis, V; Price, P; Rayner, Manny; Carter, David; Digalakis, Vassilios; Price, Patti

    1994-01-01

    A simple and general method is described that can combine different knowledge sources to reorder N-best lists of hypotheses produced by a speech recognizer. The method is automatically trainable, acquiring information from both positive and negative examples. Experiments are described in which it was tested on a 1000-utterance sample of unseen ATIS data.

  11. Measuring Prevalence of Other-Oriented Transactive Contributions Using an Automated Measure of Speech Style Accommodation

    Gweon, Gahgene; Jain, Mahaveer; McDonough, John; Raj, Bhiksha; Rose, Carolyn P.

    2013-01-01

    This paper contributes to a theory-grounded methodological foundation for automatic collaborative learning process analysis. It does this by illustrating how insights from the social psychology and sociolinguistics of speech style provide a theoretical framework to inform the design of a computational model. The purpose of that model is to detect…

  12. Compliance, quality of life and quantitative voice quality aspects of hands-free speech.

    Coul, B.M.R. op de; Ackerstaff, A.H.; As-Brooks, C.J. van; Hoogen, F.J.A. van den; Meeuwis, C.A.; Manni, J.J.; Hilgers, F.J.M.

    2005-01-01

    CONCLUSIONS: With the use of a new automatic stoma valve (ASV) it appears possible to rehabilitate patients who have previously been unsuccessful in acquiring hands-free speech. As well as making daily ASV use possible for an additional group of patients, this new device was also appreciated by many

  13. The Promise of NLP and Speech Processing Technologies in Language Assessment

    Chapelle, Carol A.; Chung, Yoo-Ree

    2010-01-01

    Advances in natural language processing (NLP) and automatic speech recognition and processing technologies offer new opportunities for language testing. Despite their potential uses on a range of language test item types, relatively little work has been done in this area, and it is therefore not well understood by test developers, researchers or…

  14. Speech-based recognition of self-reported and observed emotion in a dimensional space

    Truong, Khiet P.; Leeuwen, van David A.; Jong, de Franciska M.G.

    2012-01-01

    The differences between self-reported and observed emotion have only marginally been investigated in the context of speech-based automatic emotion recognition. We address this issue by comparing self-reported emotion ratings to observed emotion ratings and look at how differences between these two t

  15. Variation and Synthetic Speech

    Miller, C; Massey, N; Miller, Corey; Karaali, Orhan; Massey, Noel

    1997-01-01

    We describe the approach to linguistic variation taken by the Motorola speech synthesizer. A pan-dialectal pronunciation dictionary is described, which serves as the training data for a neural network based letter-to-sound converter. Subsequent to dictionary retrieval or letter-to-sound generation, pronunciations are submitted a neural network based postlexical module. The postlexical module has been trained on aligned dictionary pronunciations and hand-labeled narrow phonetic transcriptions. This architecture permits the learning of individual postlexical variation, and can be retrained for each speaker whose voice is being modeled for synthesis. Learning variation in this way can result in greater naturalness for the synthetic speech that is produced by the system.

  16. [Improving speech comprehension using a new cochlear implant speech processor].

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  17. Neurophysiology of speech differences in childhood apraxia of speech.

    Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole

    2014-01-01

    Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.

  18. Hiding Information under Speech

    2005-12-12

    as it arrives in real time, and it disappears as fast as it arrives. Furthermore, our cognitive process for translating audio sounds to the meaning... steganography , whose goal is to make the embedded data completely undetectable. In addi- tion, we must dismiss the idea of hiding data by using any...therefore, an image has more room to hide data; and (2) speech steganography has not led to many money-making commercial businesses. For these two

  19. Speech Quality Measurement

    1977-06-10

    noise test , t=2 for t1-v low p’ass f lit er te st ,and t 3 * or theit ADP(NI cod ing tevst ’*s is the sub lec nube 0l e tet Bostz- Av L b U0...a 1ý...it aepa rate, speech clu.1 t laboratory and controlled by the NOVA 830 computoer . Bach of tho stations has a CRT, .15 response buttons, a "rad button

  20. Two distinct auditory-motor circuits for monitoring speech production as revealed by content-specific suppression of auditory cortex.

    Ylinen, Sari; Nora, Anni; Leminen, Alina; Hakala, Tero; Huotilainen, Minna; Shtyrov, Yury; Mäkelä, Jyrki P; Service, Elisabet

    2015-06-01

    Speech production, both overt and covert, down-regulates the activation of auditory cortex. This is thought to be due to forward prediction of the sensory consequences of speech, contributing to a feedback control mechanism for speech production. Critically, however, these regulatory effects should be specific to speech content to enable accurate speech monitoring. To determine the extent to which such forward prediction is content-specific, we recorded the brain's neuromagnetic responses to heard multisyllabic pseudowords during covert rehearsal in working memory, contrasted with a control task. The cortical auditory processing of target syllables was significantly suppressed during rehearsal compared with control, but only when they matched the rehearsed items. This critical specificity to speech content enables accurate speech monitoring by forward prediction, as proposed by current models of speech production. The one-to-one phonological motor-to-auditory mappings also appear to serve the maintenance of information in phonological working memory. Further findings of right-hemispheric suppression in the case of whole-item matches and left-hemispheric enhancement for last-syllable mismatches suggest that speech production is monitored by 2 auditory-motor circuits operating on different timescales: Finer grain in the left versus coarser grain in the right hemisphere. Taken together, our findings provide hemisphere-specific evidence of the interface between inner and heard speech.

  1. Mediation and Automatization.

    Hutchins, Edwin

    This paper discusses the relationship between the mediation of task performance by some structure that is not inherent in the task domain itself and the phenomenon of automatization, in which skilled performance becomes effortless or phenomenologically "automatic" after extensive practice. The use of a common simple explicit mediating…

  2. Digital automatic gain control

    Uzdy, Z.

    1980-01-01

    Performance analysis, used to evaluated fitness of several circuits to digital automatic gain control (AGC), indicates that digital integrator employing coherent amplitude detector (CAD) is best device suited for application. Circuit reduces gain error to half that of conventional analog AGC while making it possible to automatically modify response of receiver to match incoming signal conditions.

  3. Automatic Differentiation Package

    2007-03-01

    Sacado is an automatic differentiation package for C++ codes using operator overloading and C++ templating. Sacado provide forward, reverse, and Taylor polynomial automatic differentiation classes and utilities for incorporating these classes into C++ codes. Users can compute derivatives of computations arising in engineering and scientific applications, including nonlinear equation solving, time integration, sensitivity analysis, stability analysis, optimization and uncertainity quantification.

  4. Musical expertise and foreign speech perception

    Eduardo eMartínez-Montes

    2013-11-01

    Full Text Available The aim of this experiment was to investigate the influence of musical expertise on the automatic perception of foreign syllables and harmonic sounds. Participants were Cuban students with high level of expertise in music or in visual arts and with the same level of general education and socio-economic background. We used a multi-feature Mismatch Negativity (MMN design with sequences of either syllables in Mandarin Chinese or harmonic sounds, both comprising deviants in pitch contour, duration and Voice Onset Time (VOT or equivalent that were either far from (Large deviants or close to (Small deviants the standard. For both Mandarin syllables and harmonic sounds, results were clear-cut in showing larger MMNs to pitch contour deviants in musicians than in visual artists. Results were less clear for duration and VOT deviants, possibly because of the specific characteristics of the stimuli. Results are interpreted as reflecting similar processing of pitch contour in speech and non-speech sounds. The implications of these results for understanding the influence of intense musical training from childhood to adulthood and of genetic predispositions for music on foreign language perception is discussed.

  5. Speech recognition in university classrooms

    Wald, Mike; Bain, Keith; Basson, Sara H

    2002-01-01

    The LIBERATED LEARNING PROJECT (LLP) is an applied research project studying two core questions: 1) Can speech recognition (SR) technology successfully digitize lectures to display spoken words as text in university classrooms? 2) Can speech recognition technology be used successfully as an alternative to traditional classroom notetaking for persons with disabilities? This paper addresses these intriguing questions and explores the underlying complex relationship between speech recognition te...

  6. Huntington's Disease: Speech, Language and Swallowing

    ... Disease Society of America Huntington's Disease Youth Organization Movement Disorder Society National Institute of Neurological Disorders and Stroke Typical Speech and Language Development Learning More Than One Language Adult Speech and Language Child Speech and Language Swallowing ...

  7. Child vocalization composition as discriminant information for automatic autism detection.

    Xu, Dongxin; Gilkerson, Jill; Richards, Jeffrey; Yapanel, Umit; Gray, Sharmi

    2009-01-01

    Early identification is crucial for young children with autism to access early intervention. The existing screens require either a parent-report questionnaire and/or direct observation by a trained practitioner. Although an automatic tool would benefit parents, clinicians and children, there is no automatic screening tool in clinical use. This study reports a fully automatic mechanism for autism detection/screening for young children. This is a direct extension of the LENA (Language ENvironment Analysis) system, which utilizes speech signal processing technology to analyze and monitor a child's natural language environment and the vocalizations/speech of the child. It is discovered that child vocalization composition contains rich discriminant information for autism detection. By applying pattern recognition and machine learning approaches to child vocalization composition data, accuracy rates of 85% to 90% in cross-validation tests for autism detection have been achieved at the equal-error-rate (EER) point on a data set with 34 children with autism, 30 language delayed children and 76 typically developing children. Due to its easy and automatic procedure, it is believed that this new tool can serve a significant role in childhood autism screening, especially in regards to population-based or universal screening.

  8. Variable-mass Thermodynamics Calculation Model for Gas-operated Automatic Weapon%Variable-mass Thermodynamics Calculation Model for Gas-operated Automatic Weapon

    陈建彬; 吕小强

    2011-01-01

    Aiming at the fact that the energy and mass exchange phenomena exist between barrel and gas-operated device of the automatic weapon, for describing its interior ballistics and dynamic characteristics of the gas-operated device accurately, a new variable-mass thermodynamics model is built. It is used to calculate the automatic mechanism velocity of a certain automatic weapon, the calculation results coincide with the experimental results better, and thus the model is validated. The influences of structure parameters on gas-operated device' s dynamic characteristics are discussed. It shows that the model is valuable for design and accurate performance prediction of gas-operated automatic weapon.

  9. A Joint Approach for Single-Channel Speaker Identification and Speech Separation

    Mowlaee, Pejman; Saeidi, Rahim; Christensen, Mads Græsbøll;

    2012-01-01

    In this paper, we present a novel system for joint speaker identification and speech separation. For speaker identification a single-channel speaker identification algorithm is proposed which provides an estimate of signal-to-signal ratio (SSR) as a by-product. For speech separation, we propose...... a sinusoidal model-based algorithm. The speech separation algorithm consists of a double-talk/single-talk detector followed by a minimum mean square error estimator of sinusoidal parameters for finding optimal codevectors from pre-trained speaker codebooks. In evaluating the proposed system, we start from...... a situation where we have prior information of codebook indices, speaker identities and SSR-level, and then, by relaxing these assumptions one by one, we demonstrate the efficiency of the proposed fully blind system. In contrast to previous studies that mostly focus on automatic speech recognition (ASR...

  10. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System

    Partila, Pavol; Voznak, Miroslav; Tovarek, Jaromir

    2015-01-01

    The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency. PMID:26346654

  11. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System

    Pavol Partila

    2015-01-01

    Full Text Available The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.

  12. Modeling Co-evolution of Speech and Biology.

    de Boer, Bart

    2016-04-01

    Two computer simulations are investigated that model interaction of cultural evolution of language and biological evolution of adaptations to language. Both are agent-based models in which a population of agents imitates each other using realistic vowels. The agents evolve under selective pressure for good imitation. In one model, the evolution of the vocal tract is modeled; in the other, a cognitive mechanism for perceiving speech accurately is modeled. In both cases, biological adaptations to using and learning speech evolve, even though the system of speech sounds itself changes at a more rapid time scale than biological evolution. However, the fact that the available acoustic space is used maximally (a self-organized result of cultural evolution) is constant, and therefore biological evolution does have a stable target. This work shows that when cultural and biological traits are continuous, their co-evolution may lead to cognitive adaptations that are strong enough to detect empirically.

  13. An overview of the SPHINX speech recognition system

    Lee, Kai-Fu; Hon, Hsiao-Wuen; Reddy, Raj

    1990-01-01

    A description is given of SPHINX, a system that demonstrates the feasibility of accurate, large-vocabulary, speaker-independent, continuous speech recognition. SPHINX is based on discrete hidden Markov models (HMMs) with linear-predictive-coding derived parameters. To provide speaker independence, knowledge was added to these HMMs in several ways: multiple codebooks of fixed-width parameters, and an enhanced recognizer with carefully designed models and word-duration modeling. To deal with coarticulation in continuous speech, yet still adequately represent a large vocabulary, two new subword speech units are introduced: function-word-dependent phone models and generalized triphone models. With grammars of perplexity 997, 60, and 20, SPHINX attained word accuracies of 71, 94, and 96 percent, respectively, on a 997-word task.

  14. Comparison of Classification Methods for Detecting Emotion from Mandarin Speech

    Pao, Tsang-Long; Chen, Yu-Te; Yeh, Jun-Heng

    It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers being able to perceive and respond to human emotion, the human-computer interaction will be more natural. Several classifiers are adopted for automatically assigning an emotion category, such as anger, happiness or sadness, to a speech utterance. These classifiers were designed independently and tested on various emotional speech corpora, making it difficult to compare and evaluate their performance. In this paper, we first compared several popular classification methods and evaluated their performance by applying them to a Mandarin speech corpus consisting of five basic emotions, including anger, happiness, boredom, sadness and neutral. The extracted feature streams contain MFCC, LPCC, and LPC. The experimental results show that the proposed WD-MKNN classifier achieves an accuracy of 81.4% for the 5-class emotion recognition and outperforms other classification techniques, including KNN, MKNN, DW-KNN, LDA, QDA, GMM, HMM, SVM, and BPNN. Then, to verify the advantage of the proposed method, we compared these classifiers by applying them to another Mandarin expressive speech corpus consisting of two emotions. The experimental results still show that the proposed WD-MKNN outperforms others.

  15. Teaching Speech Acts

    Teaching Speech Acts

    2007-01-01

    Full Text Available In this paper I argue that pragmatic ability must become part of what we teach in the classroom if we are to realize the goals of communicative competence for our students. I review the research on pragmatics, especially those articles that point to the effectiveness of teaching pragmatics in an explicit manner, and those that posit methods for teaching. I also note two areas of scholarship that address classroom needs—the use of authentic data and appropriate assessment tools. The essay concludes with a summary of my own experience teaching speech acts in an advanced-level Portuguese class.

  16. Online Speech/Music Segmentation Based on the Variance Mean of Filter Bank Energy

    Kos, Marko; Grašič, Matej; Kačič, Zdravko

    2009-12-01

    This paper presents a novel feature for online speech/music segmentation based on the variance mean of filter bank energy (VMFBE). The idea that encouraged the feature's construction is energy variation in a narrow frequency sub-band. The energy varies more rapidly, and to a greater extent for speech than for music. Therefore, an energy variance in such a sub-band is greater for speech than for music. The radio broadcast database and the BNSI broadcast news database were used for feature discrimination and segmentation ability evaluation. The calculation procedure of the VMFBE feature has 4 out of 6 steps in common with the MFCC feature calculation procedure. Therefore, it is a very convenient speech/music discriminator for use in real-time automatic speech recognition systems based on MFCC features, because valuable processing time can be saved, and computation load is only slightly increased. Analysis of the feature's speech/music discriminative ability shows an average error rate below 10% for radio broadcast material and it outperforms other features used for comparison, by more than 8%. The proposed feature as a stand-alone speech/music discriminator in a segmentation system achieves an overall accuracy of over 94% on radio broadcast material.

  17. Magneto encephalography (MEG: perspectives of speech areas functional mapping in human subjects

    Butorina A. V.

    2012-06-01

    Full Text Available One of the main problems in clinical practice and academic research is how to localize speech zones in the human brain. Two speech areas (Broca and Wernicke areas that are responsible for language production and for understanding of written and spoken language have been known since the past century. Their location and even hemispheric lateralization have a substantial inter-individual variability, especially in neurosurgery patients. Wada test is one of the most frequently used invasive methodology for speech hemispheric lateralization in neurosurgery patients. However, besides relatively high-risk of Wada test for patient's health, it has its own limitation, e. g. low reliability of Wada-based evidence of verbal memory brain lateralization. Therefore, there is an urgent need for non-invasive, reliable methods of speech zones mapping.The current review summarizes the recent experimental evidence from magnitoencephalographic (MEG research suggesting that speech areas are included in the speech processing within the first 200 ms after the word onset. The electro-magnetic response to deviant word, mismatch negativity wave with latency of 100—200 ms, can be recorded from auditory cortex within the oddball-paradigm. We provide the arguments that basic features of this brain response, such as its automatic, pre-attentive nature, high signal to noise ratio, source localization at superior temporal sulcus, make it a promising vehicle for non-invasive MEG-based speech areas mapping in neurosurgery.

  18. Online Speech/Music Segmentation Based on the Variance Mean of Filter Bank Energy

    Zdravko Kačič

    2009-01-01

    Full Text Available This paper presents a novel feature for online speech/music segmentation based on the variance mean of filter bank energy (VMFBE. The idea that encouraged the feature's construction is energy variation in a narrow frequency sub-band. The energy varies more rapidly, and to a greater extent for speech than for music. Therefore, an energy variance in such a sub-band is greater for speech than for music. The radio broadcast database and the BNSI broadcast news database were used for feature discrimination and segmentation ability evaluation. The calculation procedure of the VMFBE feature has 4 out of 6 steps in common with the MFCC feature calculation procedure. Therefore, it is a very convenient speech/music discriminator for use in real-time automatic speech recognition systems based on MFCC features, because valuable processing time can be saved, and computation load is only slightly increased. Analysis of the feature's speech/music discriminative ability shows an average error rate below 10% for radio broadcast material and it outperforms other features used for comparison, by more than 8%. The proposed feature as a stand-alone speech/music discriminator in a segmentation system achieves an overall accuracy of over 94% on radio broadcast material.

  19. Technological evaluation of gesture and speech interfaces for enabling dismounted soldier-robot dialogue

    Kattoju, Ravi Kiran; Barber, Daniel J.; Abich, Julian; Harris, Jonathan

    2016-05-01

    With increasing necessity for intuitive Soldier-robot communication in military operations and advancements in interactive technologies, autonomous robots have transitioned from assistance tools to functional and operational teammates able to service an array of military operations. Despite improvements in gesture and speech recognition technologies, their effectiveness in supporting Soldier-robot communication is still uncertain. The purpose of the present study was to evaluate the performance of gesture and speech interface technologies to facilitate Soldier-robot communication during a spatial-navigation task with an autonomous robot. Gesture and speech semantically based spatial-navigation commands leveraged existing lexicons for visual and verbal communication from the U.S Army field manual for visual signaling and a previously established Squad Level Vocabulary (SLV). Speech commands were recorded by a Lapel microphone and Microsoft Kinect, and classified by commercial off-the-shelf automatic speech recognition (ASR) software. Visual signals were captured and classified using a custom wireless gesture glove and software. Participants in the experiment commanded a robot to complete a simulated ISR mission in a scaled down urban scenario by delivering a sequence of gesture and speech commands, both individually and simultaneously, to the robot. Performance and reliability of gesture and speech hardware interfaces and recognition tools were analyzed and reported. Analysis of experimental results demonstrated the employed gesture technology has significant potential for enabling bidirectional Soldier-robot team dialogue based on the high classification accuracy and minimal training required to perform gesture commands.

  20. PESQ Based Speech Intelligibility Measurement

    Beerends, J.G.; Buuren, R.A. van; Vugt, J.M. van; Verhave, J.A.

    2009-01-01

    Several measurement techniques exist to quantify the intelligibility of a speech transmission chain. In the objective domain, the Articulation Index [1] and the Speech Transmission Index STI [2], [3], [4], [5] have been standardized for predicting intelligibility. The STI uses a signal that contains

  1. Separating Underdetermined Convolutive Speech Mixtures

    Pedersen, Michael Syskind; Wang, DeLiang; Larsen, Jan

    2006-01-01

    a method for underdetermined blind source separation of convolutive mixtures. The proposed framework is applicable for separation of instantaneous as well as convolutive speech mixtures. It is possible to iteratively extract each speech signal from the mixture by combining blind source separation...

  2. Perceptual Learning of Interrupted Speech

    Benard, Michel Ruben; Başkent, Deniz

    2013-01-01

    The intelligibility of periodically interrupted speech improves once the silent gaps are filled with noise bursts. This improvement has been attributed to phonemic restoration, a top-down repair mechanism that helps intelligibility of degraded speech in daily life. Two hypotheses were investigated u

  3. Connected digit speech recognition system for Malayalam language

    Cini Kurian; Kannan Balakrishnan

    2013-12-01

    A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer for Malayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.

  4. Approximated mutual information training for speech recognition using myoelectric signals.

    Guo, Hua J; Chan, A D C

    2006-01-01

    A new training algorithm called the approximated maximum mutual information (AMMI) is proposed to improve the accuracy of myoelectric speech recognition using hidden Markov models (HMMs). Previous studies have demonstrated that automatic speech recognition can be performed using myoelectric signals from articulatory muscles of the face. Classification of facial myoelectric signals can be performed using HMMs that are trained using the maximum likelihood (ML) algorithm; however, this algorithm maximizes the likelihood of the observations in the training sequence, which is not directly associated with optimal classification accuracy. The AMMI training algorithm attempts to maximize the mutual information, thereby training the HMMs to optimize their parameters for discrimination. Our results show that AMMI training consistently reduces the error rates compared to these by the ML training, increasing the accuracy by approximately 3% on average.

  5. The Speech-Language Interface in the Spoken Language Translator

    Carter, D; Carter, David; Rayner, Manny

    1994-01-01

    Abstract: The Spoken Language Translator is a prototype for practically useful systems capable of translating continuous spoken language within restricted domains. The prototype system translates air travel (ATIS) queries from spoken English to spoken Swedish and to French. It is constructed, with as few modifications as possible, from existing pieces of speech and language processing software. The speech recognizer and language understander are connected by a fairly conventional pipelined N-best interface. This paper focuses on the ways in which the language processor makes intelligent use of the sentence hypotheses delivered by the recognizer. These ways include (1) producing modified hypotheses to reflect the possible presence of repairs in the uttered word sequence; (2) fast parsing with a version of the grammar automatically specialized to the more frequent constructions in the training corpus; and (3) allowing syntactic and semantic factors to interact with acoustic ones in the choice of a meaning struc...

  6. Speech Compression Using Multecirculerletet Transform

    Sulaiman Murtadha

    2012-01-01

    Full Text Available Compressing the speech reduces the data storage requirements, leading to reducing the time of transmitting the digitized speech over long-haul links like internet. To obtain best performance in speech compression, wavelet transforms require filters that combine a number of desirable properties, such as orthogonality and symmetry.The MCT bases functions are derived from GHM bases function using 2D linear convolution .The fast computation algorithm methods introduced here added desirable features to the current transform. We further assess the performance of the MCT in speech compression application. This paper discusses the effect of using DWT and MCT (one and two dimension on speech compression. DWT and MCT performances in terms of compression ratio (CR, mean square error (MSE and peak signal to noise ratio (PSNR are assessed. Computer simulation results indicate that the two dimensions MCT offer a better compression ratio, MSE and PSNR than DWT.

  7. Word Automaticity of Tree Automatic Scattered Linear Orderings Is Decidable

    Huschenbett, Martin

    2012-01-01

    A tree automatic structure is a structure whose domain can be encoded by a regular tree language such that each relation is recognisable by a finite automaton processing tuples of trees synchronously. Words can be regarded as specific simple trees and a structure is word automatic if it is encodable using only these trees. The question naturally arises whether a given tree automatic structure is already word automatic. We prove that this problem is decidable for tree automatic scattered linear orderings. Moreover, we show that in case of a positive answer a word automatic presentation is computable from the tree automatic presentation.

  8. Functional analysis screening for problem behavior maintained by automatic reinforcement.

    Querim, Angie C; Iwata, Brian A; Roscoe, Eileen M; Schlichenmeyer, Kevin J; Ortega, Javier Virués; Hurl, Kylee E

    2013-01-01

    A common finding in previous research is that problem behavior maintained by automatic reinforcement continues to occur in the alone condition of a functional analysis (FA), whereas behavior maintained by social reinforcement typically is extinguished. Thus, the alone condition may represent an efficient screening procedure when maintenance by automatic reinforcement is suspected. We conducted a series of 5-min alone (or no-interaction) probes for 30 cases of problem behavior and compared initial predictions of maintenance or extinction to outcomes obtained in subsequent FAs. Results indicated that data from the screening procedure accurately predicted that problem behavior was maintained by automatic reinforcement in 21 of 22 cases and by social reinforcement in 7 of 8 cases. Thus, results of the screening accurately predicted the function of problem behavior (social vs. automatic reinforcement) in 28 of 30 cases.

  9. The timing and effort of lexical access in natural and degraded speech

    Anita Eva Wagner

    2016-03-01

    Full Text Available Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech.This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners’ ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners' quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners’ ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why

  10. PCA-Based Speech Enhancement for Distorted Speech Recognition

    Tetsuya Takiguchi

    2007-09-01

    Full Text Available We investigated a robust speech feature extraction method using kernel PCA (Principal Component Analysis for distorted speech recognition. Kernel PCA has been suggested for various image processing tasks requiring an image model, such as denoising, where a noise-free image is constructed from a noisy input image. Much research for robust speech feature extraction has been done, but it remains difficult to completely remove additive or convolution noise (distortion. The most commonly used noise-removal techniques are based on the spectraldomain operation, and then for speech recognition, the MFCC (Mel Frequency Cepstral Coefficient is computed, where DCT (Discrete Cosine Transform is applied to the mel-scale filter bank output. This paper describes a new PCA-based speech enhancement algorithm using kernel PCA instead of DCT, where the main speech element is projected onto low-order features, while the noise or distortion element is projected onto high-order features. Its effectiveness is confirmed by word recognition experiments on distorted speech.

  11. Interactions between distal speech rate, linguistic knowledge, and speech environment.

    Morrill, Tuuli; Baese-Berk, Melissa; Heffner, Christopher; Dilley, Laura

    2015-10-01

    During lexical access, listeners use both signal-based and knowledge-based cues, and information from the linguistic context can affect the perception of acoustic speech information. Recent findings suggest that the various cues used in lexical access are implemented with flexibility and may be affected by information from the larger speech context. We conducted 2 experiments to examine effects of a signal-based cue (distal speech rate) and a knowledge-based cue (linguistic structure) on lexical perception. In Experiment 1, we manipulated distal speech rate in utterances where an acoustically ambiguous critical word was either obligatory for the utterance to be syntactically well formed (e.g., Conner knew that bread and butter (are) both in the pantry) or optional (e.g., Don must see the harbor (or) boats). In Experiment 2, we examined identical target utterances as in Experiment 1 but changed the distribution of linguistic structures in the fillers. The results of the 2 experiments demonstrate that speech rate and linguistic knowledge about critical word obligatoriness can both influence speech perception. In addition, it is possible to alter the strength of a signal-based cue by changing information in the speech environment. These results provide support for models of word segmentation that include flexible weighting of signal-based and knowledge-based cues.

  12. Acoustic and perceptual parameters relating to connected speech are more reliable measures of hoarseness than parameters relating to sustained vowels.

    Halberstam, Benjamin

    2004-01-01

    This report investigates the correlations between acoustic parameters and the perception of hoarseness by trained listeners. Both sustained vowels and connected speech were examined. Fourteen acoustic parameters from samples of sustained vowels and 2 from connected speech were measured. The results show that jitter, shimmer and cepstral peak prominence (CPP) are correlated with the perception of hoarseness in sustained vowels. CPP is strongly correlated with the perception of hoarseness in connected speech. Some evidence is seen that perception of hoarseness in connected speech is more valid than the perception of hoarseness in sustained vowels. It is concluded that CPP for connected speech is a more valid objective measure of hoarseness than jitter, shimmer or CPP for sustained vowels and that perception of hoarseness may be most accurate in connected speech, rather than isolated vowels.

  13. Semi-automatic object tracking in video sequences

    Lecumberry, Federico; Pardo, Álvaro

    2005-01-01

    A method is presented for semi-automatic object tracking in video sequences using multiple features and a method for probabilistic relaxation to improve the tracking results producing smooth and accurate tracked borders. Starting from a given initial position of the object in the first frame the proposed method automatically tracks the object in the sequence modelling the a posteriori probabilities of a set of features such as color, position and motion, depth, etc. Facultad de Informática

  14. Comparison of voice acquisition methodologies in speech research.

    Vogel, Adam P; Maruff, Paul

    2008-11-01

    The use of voice acoustic techniques has the potential to extend beyond work devoted purely to speech or vocal pathology. For this to occur, however, researchers and clinicians will require acquisition technologies that provide fast, accurate, and cost-effective methods for recording data. Therefore, the present study aimed to compare industry-standard techniques for acquiring high-quality acoustic signals (e.g., hard drive and solid-state recorder) with widely available and easy-to-use, computer-based (standard laptop) data-acquisition methods. Speech samples were simultaneously acquired from 15 healthy controls using all three methods and were analyzed using identical analysis techniques. Data from all three acquisition methods were directly compared using a variety of acoustic correlates. The results suggested that selected acoustic measures (e.g., f 0, noise-to-harmonic ratio, number of pauses) were accurately obtained using all three methods; however, minimum recording standards were required for widely used measures of perturbation.

  15. Neural correlates of speech anticipatory anxiety in generalized social phobia.

    Lorberbaum, Jeffrey P; Kose, Samet; Johnson, Michael R; Arana, George W; Sullivan, Lindsay K; Hamner, Mark B; Ballenger, James C; Lydiard, R Bruce; Brodrick, Peter S; Bohning, Daryl E; George, Mark S

    2004-12-22

    Patients with generalized social phobia fear embarrassment in most social situations. Little is known about its functional neuroanatomy. We studied BOLD-fMRI brain activity while generalized social phobics and healthy controls anticipated making public speeches. With anticipation minus rest, 8 phobics compared to 6 controls showed greater subcortical, limbic, and lateral paralimbic activity (pons, striatum, amygdala/uncus/anterior parahippocampus, insula, temporal pole)--regions important in automatic emotional processing--and less cortical activity (dorsal anterior cingulate/prefrontal cortex)--regions important in cognitive processing. Phobics may become so anxious, they cannot think clearly or vice versa.

  16. Hate Speech/Free Speech: Using Feminist Perspectives To Foster On-Campus Dialogue.

    Cornwell, Nancy; Orbe, Mark P.; Warren, Kiesha

    1999-01-01

    Explores the complex issues inherent in the tension between hate speech and free speech, focusing on the phenomenon of hate speech on college campuses. Describes the challenges to hate speech made by critical race theorists and explains how a feminist critique can reorient the parameters of hate speech. (SLD)

  17. Beta rhythm modulation by speech sounds: somatotopic mapping in somatosensory cortex.

    Bartoli, Eleonora; Maffongelli, Laura; Campus, Claudio; D'Ausilio, Alessandro

    2016-08-08

    During speech listening motor regions are somatotopically activated, resembling the activity that subtends actual speech production, suggesting that motor commands can be retrieved from sensory inputs. Crucially, the efficient motor control of the articulators relies on the accurate anticipation of the somatosensory reafference. Nevertheless, evidence about somatosensory activities elicited by auditory speech processing is sparse. The present work looked for specific interactions between auditory speech presentation and somatosensory cortical information processing. We used an auditory speech identification task with sounds having different place of articulation (bilabials and dentals). We tested whether coupling the auditory task with a peripheral electrical stimulation of the lips would affect the pattern of sensorimotor electroencephalographic rhythms. Peripheral electrical stimulation elicits a series of spectral perturbations of which the beta rebound reflects the return-to-baseline stage of somatosensory processing. We show a left-lateralized and selective reduction in the beta rebound following lip somatosensory stimulation when listening to speech sounds produced with the lips (i.e. bilabials). Thus, the somatosensory processing could not return to baseline due to the recruitment of the same neural resources by speech stimuli. Our results are a clear demonstration that heard speech sounds are somatotopically mapped onto somatosensory cortices, according to place of articulation.

  18. Beta rhythm modulation by speech sounds: somatotopic mapping in somatosensory cortex

    Bartoli, Eleonora; Maffongelli, Laura; Campus, Claudio; D’Ausilio, Alessandro

    2016-01-01

    During speech listening motor regions are somatotopically activated, resembling the activity that subtends actual speech production, suggesting that motor commands can be retrieved from sensory inputs. Crucially, the efficient motor control of the articulators relies on the accurate anticipation of the somatosensory reafference. Nevertheless, evidence about somatosensory activities elicited by auditory speech processing is sparse. The present work looked for specific interactions between auditory speech presentation and somatosensory cortical information processing. We used an auditory speech identification task with sounds having different place of articulation (bilabials and dentals). We tested whether coupling the auditory task with a peripheral electrical stimulation of the lips would affect the pattern of sensorimotor electroencephalographic rhythms. Peripheral electrical stimulation elicits a series of spectral perturbations of which the beta rebound reflects the return-to-baseline stage of somatosensory processing. We show a left-lateralized and selective reduction in the beta rebound following lip somatosensory stimulation when listening to speech sounds produced with the lips (i.e. bilabials). Thus, the somatosensory processing could not return to baseline due to the recruitment of the same neural resources by speech stimuli. Our results are a clear demonstration that heard speech sounds are somatotopically mapped onto somatosensory cortices, according to place of articulation. PMID:27499204

  19. Robust F0 Estimation Using ELS-Based Robust Complex Speech Analysis

    Funaki, Keiichi; Kinjo, Tatsuhiko

    Complex speech analysis for an analytic speech signal can accurately estimate the spectrum in low frequencies since the analytic signal provides spectrum only over positive frequencies. The remarkable feature makes it possible to realize more accurate F0 estimation using complex residual signal extracted by complex-valued speech analysis. We have already proposed F0 estimation using complex LPC residual, in which the autocorrelation function weighted by AMDF was adopted as the criterion. The method adopted MMSE-based complex LPC analysis and it has been reported that it can estimate more accurate F0 for IRS filtered speech corrupted by white Gauss noise although it can not work better for the IRS filtered speech corrupted by pink noise. In this paper, robust complex speech analysis based on ELS (Extended Least Square) method is introduced in order to overcome the drawback. The experimental results for additive white Gauss or pink noise demonstrate that the proposed algorithm based on robust ELS-based complex AR analysis can perform better than other methods.

  20. The Stylistic Analysis of Public Speech

    李龙

    2011-01-01

    Public speech is a very important part in our daily life.The ability to deliver a good public speech is something we need to learn and to have,especially,in the service sector.This paper attempts to analyze the style of public speech,in the hope of providing inspiration to us whenever delivering such a speech.

  1. Linguistic Units and Speech Production Theory.

    MacNeilage, Peter F.

    This paper examines the validity of the concept of linguistic units in a theory of speech production. Substantiating data are drawn from the study of the speech production process itself. Secondarily, an attempt is made to reconcile the postulation of linguistic units in speech production theory with their apparent absence in the speech signal.…

  2. Automated Speech Rate Measurement in Dysarthria

    Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

    2015-01-01

    Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…

  3. Coevolution of Human Speech and Trade

    Horan, R.D.; Bulte, E.H.; Shogren, J.F.

    2008-01-01

    We propose a paleoeconomic coevolutionary explanation for the origin of speech in modern humans. The coevolutionary process, in which trade facilitates speech and speech facilitates trade, gives rise to multiple stable trajectories. While a `trade-speech¿ equilibrium is not an inevitable outcome for

  4. Connected Speech Processes in Australian English.

    Ingram, J. C. L.

    1989-01-01

    Explores the role of Connected Speech Processes (CSP) in accounting for sociolinguistically significant dimensions of speech variation, and presents initial findings on the distribution of CSPs in the speech of Australian adolescents. The data were gathered as part of a wider survey of speech of Brisbane school children. (Contains 26 references.)…

  5. Automatic River Network Extraction from LIDAR Data

    Maderal, E. N.; Valcarcel, N.; Delgado, J.; Sevilla, C.; Ojeda, J. C.

    2016-06-01

    National Geographic Institute of Spain (IGN-ES) has launched a new production system for automatic river network extraction for the Geospatial Reference Information (GRI) within hydrography theme. The goal is to get an accurate and updated river network, automatically extracted as possible. For this, IGN-ES has full LiDAR coverage for the whole Spanish territory with a density of 0.5 points per square meter. To implement this work, it has been validated the technical feasibility, developed a methodology to automate each production phase: hydrological terrain models generation with 2 meter grid size and river network extraction combining hydrographic criteria (topographic network) and hydrological criteria (flow accumulation river network), and finally the production was launched. The key points of this work has been managing a big data environment, more than 160,000 Lidar data files, the infrastructure to store (up to 40 Tb between results and intermediate files), and process; using local virtualization and the Amazon Web Service (AWS), which allowed to obtain this automatic production within 6 months, it also has been important the software stability (TerraScan-TerraSolid, GlobalMapper-Blue Marble , FME-Safe, ArcGIS-Esri) and finally, the human resources managing. The results of this production has been an accurate automatic river network extraction for the whole country with a significant improvement for the altimetric component of the 3D linear vector. This article presents the technical feasibility, the production methodology, the automatic river network extraction production and its advantages over traditional vector extraction systems.

  6. Application of Multi- Tier Applications Technology Datasnap in Designing a System of Automatic Segmentation and Recognition of Sppech Signal

    Yedilkhan N. Amirgaliyev

    2016-03-01

    Full Text Available In this paper we will address current issues in the field of development and application of automatic identification systems and segmentation of speech signals. The basic criteria for the shortcomings of such systems were formulated. The review of the types of speech recognition systems was conducted, and the optimum architecture for them, including information used in leading IT companies was described. The possibility of using multi-tier architectures for solving problems of speech recognition and their advantages were considered. Also practical implementation of multi-tier architecture based on DataSnap technology in voice recognition system for geo search in Kazakh language was described.

  7. Study on Unequal Error Protection for Distributed Speech Recognition System

    XIE Xiang; WANG Si-yao; LIU Jia-kang

    2006-01-01

    The unequal error protection (UEP) is applied in distributed speech recognition (DSR) system and three schemes are proposed. All of these three schemes are evaluated on the GSM simulating platform for recognizing mandarin digit strings and compared with the equal error protection (EEP) scheme. Experiments show that UEP can protect the data transmitted in DSR system more effectively, which results in a higher word accurate rate of DSR system.

  8. Emotional Speech Processing at the Intersection of Prosody and Semantics

    Rachel Schwartz; Pell, Marc D.

    2012-01-01

    The ability to accurately perceive emotions is crucial for effective social interaction. Many questions remain regarding how different sources of emotional cues in speech (e.g., prosody, semantic information) are processed during emotional communication. Using a cross-modal emotional priming paradigm (Facial affect decision task), we compared the relative contributions of processing utterances with single-channel (prosody-only) versus multi-channel (prosody and semantic) cues on the perceptio...

  9. A Survey on Speech Enhancement Methodologies

    Ravi Kumar. K

    2016-12-01

    Full Text Available Speech enhancement is a technique which processes the noisy speech signal. The aim of speech enhancement is to improve the perceived quality of speech and/or to improve its intelligibility. Due to its vast applications in mobile telephony, VOIP, hearing aids, Skype and speaker recognition, the challenges in speech enhancement have grown over the years. It is more challenging to suppress back ground noise that effects human communication in noisy environments like airports, road works, traffic, and cars. The objective of this survey paper is to outline the single channel speech enhancement methodologies used for enhancing the speech signal which is corrupted with additive background noise and also discuss the challenges and opportunities of single channel speech enhancement. This paper mainly focuses on transform domain techniques and supervised (NMF, HMM speech enhancement techniques. This paper gives frame work for developments in speech enhancement methodologies

  10. Speech enhancement theory and practice

    Loizou, Philipos C

    2013-01-01

    With the proliferation of mobile devices and hearing devices, including hearing aids and cochlear implants, there is a growing and pressing need to design algorithms that can improve speech intelligibility without sacrificing quality. Responding to this need, Speech Enhancement: Theory and Practice, Second Edition introduces readers to the basic problems of speech enhancement and the various algorithms proposed to solve these problems. Updated and expanded, this second edition of the bestselling textbook broadens its scope to include evaluation measures and enhancement algorithms aimed at impr

  11. Computational neuroanatomy of speech production.

    Hickok, Gregory

    2012-01-05

    Speech production has been studied predominantly from within two traditions, psycholinguistics and motor control. These traditions have rarely interacted, and the resulting chasm between these approaches seems to reflect a level of analysis difference: whereas motor control is concerned with lower-level articulatory control, psycholinguistics focuses on higher-level linguistic processing. However, closer examination of both approaches reveals a substantial convergence of ideas. The goal of this article is to integrate psycholinguistic and motor control approaches to speech production. The result of this synthesis is a neuroanatomically grounded, hierarchical state feedback control model of speech production.

  12. Speech motor control and acute mountain sickness

    Cymerman, Allen; Lieberman, Philip; Hochstadt, Jesse; Rock, Paul B.; Butterfield, Gail E.; Moore, Lorna G.

    2002-01-01

    BACKGROUND: An objective method that accurately quantifies the severity of Acute Mountain Sickness (AMS) symptoms is needed to enable more reliable evaluation of altitude acclimatization and testing of potentially beneficial interventions. HYPOTHESIS: Changes in human articulation, as quantified by timed variations in acoustic waveforms of specific spoken words (voice onset time; VOT), are correlated with the severity of AMS. METHODS: Fifteen volunteers were exposed to a simulated altitude of 4300 m (446 mm Hg) in a hypobaric chamber for 48 h. Speech motor control was determined from digitally recorded and analyzed timing patterns of 30 different monosyllabic words characterized as voiced and unvoiced, and as labial, alveolar, or velar. The Environmental Symptoms Questionnaire (ESQ) was used to assess AMS. RESULTS: Significant AMS symptoms occurred after 4 h, peaked at 16 h, and returned toward baseline after 48 h. Labial VOTs were shorter after 4 and 39 h of exposure; velar VOTs were altered only after 4 h; and there were no changes in alveolar VOTs. The duration of vowel sounds was increased after 4 h of exposure and returned to normal thereafter. Only 1 of 15 subjects did not increase vowel time after 4 h of exposure. The 39-h labial (p = 0.009) and velar (p = 0.037) voiced-unvoiced timed separations consonants and the symptoms of AMS were significantly correlated. CONCLUSIONS: Two objective measures of speech production were affected by exposure to 4300 m altitude and correlated with AMS severity. Alterations in speech production may represent an objective measure of AMS and central vulnerability to hypoxia.

  13. Steganalysis of recorded speech

    Johnson, Micah K.; Lyu, Siwei; Farid, Hany

    2005-03-01

    Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.

  14. Speech recovery device

    Frankle, Christen M.

    2004-04-20

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  15. Speech recovery device

    Frankle, Christen M.

    2000-10-19

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  16. HarkMan-A Vocabulary-Independent Keyword Spotter for Spontaneous Chinese Speech

    ZHENG Fang; XU Mingxing; MOU Xiaolong; WU Jian; WU Wenhu; FANG Ditang

    1999-01-01

    In this paper, a novel technique adopted in HarkMan is introduced. HarkMan is a keyword-spotter designed to automatically spot the given words of avocabulary-independent task in unconstrained Chinese telephone speech. The speaking manner and the number of keywords are not limited. This paper focuses on the novel technique which addresses acoustic modeling, keyword spotting network, search strategies, robustness, and rejection.The underlying technologies used in HarkMan given in this paper are useful not only for keyword spotting but also for continuous speech recognition. The system has achieved a figure-of-merit value over 90%.

  17. Join Cost for Unit Selection Speech Synthesis

    Vepa, Jithendra

    2004-01-01

    Undoubtedly, state-of-the-art unit selection-based concatenative speech systems produce very high quality synthetic speech. this is due to a large speech database containing many instances of each speech unit, with a varied and natural distribution of prosodic and spectral characteristics. the join cost, which measures how well two units can be joined together is one of the main criteria for selecting appropriate units from this large speech database. The ideal join cost is one that measur...

  18. Preventive measures in speech and language therapy

    Slokar, Polona

    2014-01-01

    Preventive care plays an important role in speech and language therapy. Through training, a speech and language therapist informs the expert and the general public about his efforts in the field of feeding, speech and language development, as well as about the missing elements that may appear in relation to communication and feeding. A speech and language therapist is also responsible for early detection of irregularities and of those factors which affect speech and language development. To a...

  19. Speech distortion measure based on auditory properties

    CHEN Guo; HU Xiulin; ZHANG Yunyu; ZHU Yaoting

    2000-01-01

    The Perceptual Spectrum Distortion (PSD), based on auditory properties of human being, is presented to measure speech distortion. The PSD measure calculates the speech distortion distance by simulating the auditory properties of human being and converting short-time speech power spectrum to auditory perceptual spectrum. Preliminary simulative experiments in comparison with the Itakura measure have been done. The results show that the PSD measure is a perferable speech distortion measure and more consistent with subjective assessment of speech quality.

  20. Speech of people with autism: Echolalia and echolalic speech

    Błeszyński, Jacek Jarosław

    2013-01-01

    Speech of people with autism is recognised as one of the basic diagnostic, therapeutic and theoretical problems. One of the most common symptoms of autism in children is echolalia, described here as being of different types and severity. This paper presents the results of studies into different levels of echolalia, both in normally developing children and in children diagnosed with autism, discusses the differences between simple echolalia and echolalic speech - which can be considered to b...

  1. Semi-automatic knee cartilage segmentation

    Dam, Erik B.; Folkesson, Jenny; Pettersen, Paola C.; Christiansen, Claus

    2006-03-01

    Osteo-Arthritis (OA) is a very common age-related cause of pain and reduced range of motion. A central effect of OA is wear-down of the articular cartilage that otherwise ensures smooth joint motion. Quantification of the cartilage breakdown is central in monitoring disease progression and therefore cartilage segmentation is required. Recent advances allow automatic cartilage segmentation with high accuracy in most cases. However, the automatic methods still fail in some problematic cases. For clinical studies, even if a few failing cases will be averaged out in the overall results, this reduces the mean accuracy and precision and thereby necessitates larger/longer studies. Since the severe OA cases are often most problematic for the automatic methods, there is even a risk that the quantification will introduce a bias in the results. Therefore, interactive inspection and correction of these problematic cases is desirable. For diagnosis on individuals, this is even more crucial since the diagnosis will otherwise simply fail. We introduce and evaluate a semi-automatic cartilage segmentation method combining an automatic pre-segmentation with an interactive step that allows inspection and correction. The automatic step consists of voxel classification based on supervised learning. The interactive step combines a watershed transformation of the original scan with the posterior probability map from the classification step at sub-voxel precision. We evaluate the method for the task of segmenting the tibial cartilage sheet from low-field magnetic resonance imaging (MRI) of knees. The evaluation shows that the combined method allows accurate and highly reproducible correction of the segmentation of even the worst cases in approximately ten minutes of interaction.

  2. Automatic Program Development

    by members of the IFIP Working Group 2.1 of which Bob was an active member. All papers are related to some of the research interests of Bob and, in particular, to the transformational development of programs and their algorithmic derivation from formal specifications. Automatic Program Development offers......Automatic Program Development is a tribute to Robert Paige (1947-1999), our accomplished and respected colleague, and moreover our good friend, whose untimely passing was a loss to our academic and research community. We have collected the revised, updated versions of the papers published in his...... honor in the Higher-Order and Symbolic Computation Journal in the years 2003 and 2005. Among them there are two papers by Bob: (i) a retrospective view of his research lines, and (ii) a proposal for future studies in the area of the automatic program derivation. The book also includes some papers...

  3. Automatic text summarization

    Torres Moreno, Juan Manuel

    2014-01-01

    This new textbook examines the motivations and the different algorithms for automatic document summarization (ADS). We performed a recent state of the art. The book shows the main problems of ADS, difficulties and the solutions provided by the community. It presents recent advances in ADS, as well as current applications and trends. The approaches are statistical, linguistic and symbolic. Several exemples are included in order to clarify the theoretical concepts.  The books currently available in the area of Automatic Document Summarization are not recent. Powerful algorithms have been develop

  4. Automatic Camera Control

    Burelli, Paolo; Preuss, Mike

    2014-01-01

    Automatically generating computer animations is a challenging and complex problem with applications in games and film production. In this paper, we investigate howto translate a shot list for a virtual scene into a series of virtual camera configurations — i.e automatically controlling the virtual...... camera. We approach this problem by modelling it as a dynamic multi-objective optimisation problem and show how this metaphor allows a much richer expressiveness than a classical single objective approach. Finally, we showcase the application of a multi-objective evolutionary algorithm to generate a shot...

  5. Implementation of Phonetic Context Variable Length Unit Selection Module for Malay Text to Speech

    Tian-Swee Tan

    2008-01-01

    Full Text Available Problem statement: The main problem with current Malay Text-To-Speech (MTTS synthesis system is the poor quality of the generated speech sound due to the inability of traditional TTS system to provide multiple choices of unit for generating more accurate synthesized speech. Approach: This study proposes a phonetic context variable length unit selection MTTS system that is capable of providing more natural and accurate unit selection for synthesized speech. It implemented a phonetic context algorithm for unit selection for MTTS. The unit selection method (without phonetic context may encounter the problem of selecting the speech unit from different sources and affect the quality of concatenation. This study proposes the design of speech corpus and unit selection method according to phonetic context so that it can select a string of continuous phoneme from same source instead of individual phoneme from different sources. This can further reduce the concatenation point and increase the quality of concatenation. The speech corpus was transcribed according to phonetic context to preserve the phonetic information. This method utilizes word base concatenation method. Firstly it will search through the speech corpus for the target word, if the target is found; it will be used for concatenation. If the word does not exist, then it will construct the words from phoneme sequence. Results: This system had been tested with 40 participants in Mean Opinion Score (MOS listening test with the average rates for naturalness, pronunciation and intelligibility are 3.9, 4.1 and 3.9. Conclusion/Recommendation: Through this study, a very first version of Corpus-based MTTS has been designed; it has improved the naturalness, pronunciation and intelligibility of synthetic speech. But it still has some lacking that need to be perfected such as the prosody module to support the phrasing analysis and intonation of input text to match with the waveform modifier.

  6. Why Go to Speech Therapy?

    ... a Difference (PDF) Brief History About The Founder Corporate Directors Audit The Facts FAQ Basic Research Resources ... teens who stutter make positive changes in their communication skills. As you work with your speech pathologist ...

  7. Emotion Recognition using Speech Features

    Rao, K Sreenivasa

    2013-01-01

    “Emotion Recognition Using Speech Features” covers emotion-specific features present in speech and discussion of suitable models for capturing emotion-specific information for distinguishing different emotions.  The content of this book is important for designing and developing  natural and sophisticated speech systems. Drs. Rao and Koolagudi lead a discussion of how emotion-specific information is embedded in speech and how to acquire emotion-specific knowledge using appropriate statistical models. Additionally, the authors provide information about using evidence derived from various features and models. The acquired emotion-specific knowledge is useful for synthesizing emotions. Discussion includes global and local prosodic features at syllable, word and phrase levels, helpful for capturing emotion-discriminative information; use of complementary evidences obtained from excitation sources, vocal tract systems and prosodic features in order to enhance the emotion recognition performance;  and pro...

  8. English Speeches Of Three Minutes

    凌和军; 丁小琴

    2002-01-01

    English speeches, which were made at the beginning of this term, are popular among us, English learners, as it is very useful for us to improve our spoken English. So each of us feels very interested te join the activity.

  9. Delayed Speech or Language Development

    ... What Parents Can Do en español Retraso en el desarrollo del habla o del lenguaje Your son ... for communication exchange and participation? What kind of feedback does the child get? When speech, language, hearing, ...

  10. Writing, Inner Speech, and Meditation.

    Moffett, James

    1982-01-01

    Examines the interrelationships among meditation, inner speech (stream of consciousness), and writing. Considers the possibilities and implications of using the techniques of meditation in educational settings, especially in the writing classroom. (RL)

  11. Real-time automatic registration in optical surgical navigation

    Lin, Qinyong; Yang, Rongqian; Cai, Ken; Si, Xuan; Chen, Xiuwen; Wu, Xiaoming

    2016-05-01

    An image-guided surgical navigation system requires the improvement of the patient-to-image registration time to enhance the convenience of the registration procedure. A critical step in achieving this aim is performing a fully automatic patient-to-image registration. This study reports on a design of custom fiducial markers and the performance of a real-time automatic patient-to-image registration method using these markers on the basis of an optical tracking system for rigid anatomy. The custom fiducial markers are designed to be automatically localized in both patient and image spaces. An automatic localization method is performed by registering a point cloud sampled from the three dimensional (3D) pedestal model surface of a fiducial marker to each pedestal of fiducial markers searched in image space. A head phantom is constructed to estimate the performance of the real-time automatic registration method under four fiducial configurations. The head phantom experimental results demonstrate that the real-time automatic registration method is more convenient, rapid, and accurate than the manual method. The time required for each registration is approximately 0.1 s. The automatic localization method precisely localizes the fiducial markers in image space. The averaged target registration error for the four configurations is approximately 0.7 mm. The automatic registration performance is independent of the positions relative to the tracking system and the movement of the patient during the operation.

  12. The Effectiveness of Computer-Based Speech Corrective Feedback for Improving Segmental Quality in L2 Dutch

    Neri, Ambra; Cucchiarini, Catia; Strik, Helmer

    2008-01-01

    Although the success of automatic speech recognition (ASR)-based Computer Assisted Pronunciation Training (CAPT) systems is increasing, little is known about the pedagogical effectiveness of these systems. This is particularly regrettable because ASR technology still suffers from limitations that may result in the provision of erroneous feedback,…

  13. Auditory Sensitivity, Speech Perception, L1 Chinese, and L2 English Reading Abilities in Hong Kong Chinese Children

    Zhang, Juan; McBride-Chang, Catherine

    2014-01-01

    A 4-stage developmental model, in which auditory sensitivity is fully mediated by speech perception at both the segmental and suprasegmental levels, which are further related to word reading through their associations with phonological awareness, rapid automatized naming, verbal short-term memory and morphological awareness, was tested with…

  14. Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity

    Moses, David A.; Mesgarani, Nima; Leonard, Matthew K.; Chang, Edward F.

    2016-10-01

    Objective. The superior temporal gyrus (STG) and neighboring brain regions play a key role in human language processing. Previous studies have attempted to reconstruct speech information from brain activity in the STG, but few of them incorporate the probabilistic framework and engineering methodology used in modern speech recognition systems. In this work, we describe the initial efforts toward the design of a neural speech recognition (NSR) system that performs continuous phoneme recognition on English stimuli with arbitrary vocabulary sizes using the high gamma band power of local field potentials in the STG and neighboring cortical areas obtained via electrocorticography. Approach. The system implements a Viterbi decoder that incorporates phoneme likelihood estimates from a linear discriminant analysis model and transition probabilities from an n-gram phonemic language model. Grid searches were used in an attempt to determine optimal parameterizations of the feature vectors and Viterbi decoder. Main results. The performance of the system was significantly improved by using spatiotemporal representations of the neural activity (as opposed to purely spatial representations) and by including language modeling and Viterbi decoding in the NSR system. Significance. These results emphasize the importance of modeling the temporal dynamics of neural responses when analyzing their variations with respect to varying stimuli and demonstrate that speech recognition techniques can be successfully leveraged when decoding speech from neural signals. Guided by the results detailed in this work, further development of the NSR system could have applications in the fields of automatic speech recognition and neural prosthetics.

  15. Enhancement of speech signals - with a focus on voiced speech models

    Nørholm, Sidsel Marie

    This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based on this mo......This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based...

  16. The use of phonetic motor invariants can improve automatic phoneme discrimination.

    Claudio Castellini

    Full Text Available We investigate the use of phonetic motor invariants (MIs, that is, recurring kinematic patterns of the human phonetic articulators, to improve automatic phoneme discrimination. Using a multi-subject database of synchronized speech and lips/tongue trajectories, we first identify MIs commonly associated with bilabial and dental consonants, and use them to simultaneously segment speech and motor signals. We then build a simple neural network-based regression schema (called Audio-Motor Map, AMM mapping audio features of these segments to the corresponding MIs. Extensive experimental results show that (a a small set of features extracted from the MIs, as originally gathered from articulatory sensors, are dramatically more effective than a large, state-of-the-art set of audio features, in automatically discriminating bilabials from dentals; (b the same features, extracted from AMM-reconstructed MIs, are as effective as or better than the audio features, when testing across speakers and coarticulating phonemes; and dramatically better as noise is added to the speech signal. These results seem to support some of the claims of the motor theory of speech perception and add experimental evidence of the actual usefulness of MIs in the more general framework of automated speech recognition.

  17. Automatic Complexity Analysis

    Rosendahl, Mads

    1989-01-01

    One way to analyse programs is to to derive expressions for their computational behaviour. A time bound function (or worst-case complexity) gives an upper bound for the computation time as a function of the size of input. We describe a system to derive such time bounds automatically using abstract...

  18. Exploring Automatization Processes.

    DeKeyser, Robert M.

    1996-01-01

    Presents the rationale for and the results of a pilot study attempting to document in detail how automatization takes place as the result of different kinds of intensive practice. Results show that reaction times and error rates gradually decline with practice, and the practice effect is skill-specific. (36 references) (CK)

  19. Investigation of objective measures for intelligibility prediction of noise-reduced speech for Chinese, Japanese, and English.

    Li, Junfeng; Xia, Risheng; Ying, Dongwen; Yan, Yonghong; Akagi, Masato

    2014-12-01

    Many objective measures have been reported to predict speech intelligibility in noise, most of which were designed and evaluated with English speech corpora. Given the different perceptual cues used by native listeners of different languages, examining whether there is any language effect when the same objective measure is used to predict speech intelligibility in different languages is of great interest, particularly when non-linear noise-reduction processing is involved. In the present study, an extensive evaluation is taken of objective measures for speech intelligibility prediction of noisy speech processed by noise-reduction algorithms in Chinese, Japanese, and English. Of all the objective measures tested, the short-time objective intelligibility (STOI) measure produced the most accurate results in speech intelligibility prediction for Chinese, while the normalized covariance metric (NCM) and middle-level coherence speech intelligibility index ( CSIIm) incorporating the signal-dependent band-importance functions (BIFs) produced the most accurate results for Japanese and English, respectively. The objective measures that performed best in predicting the effect of non-linear noise-reduction processing in speech intelligibility were found to be the BIF-modified NCM measure for Chinese, the STOI measure for Japanese, and the BIF-modified CSIIm measure for English. Most of the objective measures examined performed differently even under the same conditions for different languages.

  20. Effect of speaking styles on the relevance of the Speech Transmission Index in the presence of reverberation

    van Wijngaarden, Sander J.; Houtgast, Tammo

    2003-04-01

    The Speech Transmission Index (STI) predicts speech intelligibility under various speech degrading conditions, including noise and reverberation. Measurements of sentence intelligibility in noise and reverberation, for three different talkers adopting different speaking styles, indicate that the STI is sometimes inaccurate in the presence of reverberation. For a trained talker speaking clearly, the STI predictions are accurate, but for conversational speech by untrained talkers, the effect of reverberation is underestimated. By using a wider range of modulation frequencies than prescribed for the standardized STI calculation (up to 31.5 Hz instead of 12.5 Hz), more accurate predictions are obtained for conversational speech. This can be understood by comparing the envelope spectrum between talkers and speaking styles: conversational speech tends to spread more of the energy in its envelope spectrum to higher modulation frequencies. Since higher modulation frequencies are more susceptible to reverberation, conversational speech tends to be affected more by reverberation than clear speech. To investigate the range of between-talker variations, envelope spectra were calculated for a population of 134 talkers. Upper boundaries of the STI modulation frequency range of 12.5 and 31.5 Hz seem approximately appropriate for the 5th and 95th percentile of this population.

  1. Tactile Modulation of Emotional Speech Samples

    Katri Salminen

    2012-01-01

    Full Text Available Traditionally only speech communicates emotions via mobile phone. However, in daily communication the sense of touch mediates emotional information during conversation. The present aim was to study if tactile stimulation affects emotional ratings of speech when measured with scales of pleasantness, arousal, approachability, and dominance. In the Experiment 1 participants rated speech-only and speech-tactile stimuli. The tactile signal mimicked the amplitude changes of the speech. In the Experiment 2 the aim was to study whether the way the tactile signal was produced affected the ratings. The tactile signal either mimicked the amplitude changes of the speech sample in question, or the amplitude changes of another speech sample. Also, concurrent static vibration was included. The results showed that the speech-tactile stimuli were rated as more arousing and dominant than the speech-only stimuli. The speech-only stimuli were rated as more approachable than the speech-tactile stimuli, but only in the Experiment 1. Variations in tactile stimulation also affected the ratings. When the tactile stimulation was static vibration the speech-tactile stimuli were rated as more arousing than when the concurrent tactile stimulation was mimicking speech samples. The results suggest that tactile stimulation offers new ways of modulating and enriching the interpretation of speech.

  2. An Approach to Hide Secret Speech Information

    WU Zhi-jun; DUAN Hai-xin; LI Xing

    2006-01-01

    This paper presented an approach to hide secret speech information in code excited linear prediction(CELP)-based speech coding scheme by adopting the analysis-by-synthesis (ABS)-based algorithm of speech information hiding and extracting for the purpose of secure speech communication. The secret speech is coded in 2.4Kb/s mixed excitation linear prediction (MELP), which is embedded in CELP type public speech. The ABS algorithm adopts speech synthesizer in speech coder. Speech embedding and coding are synchronous, i.e. a fusion of speech information data of public and secret. The experiment of embedding 2.4 Kb/s MELP secret speech in G.728 scheme coded public speech transmitted via public switched telephone network (PSTN) shows that the proposed approach satisfies the requirements of information hiding, meets the secure communication speech quality constraints, and achieves high hiding capacity of average 3.2 Kb/s with an excellent speech quality and complicating speakers' recognition.

  3. Impaired motor speech performance in Huntington's disease.

    Skodda, Sabine; Schlegel, Uwe; Hoffmann, Rainer; Saft, Carsten

    2014-04-01

    Dysarthria is a common symptom of Huntington's disease and has been reported, besides other features, to be characterized by alterations of speech rate and regularity. However, data on the specific pattern of motor speech impairment and their relationship to other motor and neuropsychological symptoms are sparse. Therefore, the aim of the present study was to describe and objectively analyse different speech parameters with special emphasis on the aspect of speech timing of connected speech and non-speech verbal utterances. 21 patients with manifest Huntington's disease and 21 age- and gender-matched healthy controls had to perform a reading task and several syllable repetition tasks. Computerized acoustic analysis of different variables for the measurement of speech rate and regularity generated a typical pattern of impaired motor speech performance with a reduction of speech rate, an increase of pauses and a marked disability to steadily repeat single syllables. Abnormalities of speech parameters were more pronounced in the subgroup of patients with Huntington's disease receiving antidopaminergic medication, but were also present in the drug-naïve patients. Speech rate related to connected speech and parameters of syllable repetition showed correlations to overall motor impairment, capacity of tapping in a quantitative motor assessment and some score of cognitive function. After these preliminary data, further investigations on patients in different stages of disease are warranted to survey if the analysis of speech and non-speech verbal utterances might be a helpful additional tool for the monitoring of functional disability in Huntington's disease.

  4. Automaticity and Reading: Perspectives from the Instance Theory of Automatization.

    Logan, Gordon D.

    1997-01-01

    Reviews recent literature on automaticity, defining the criteria that distinguish automatic processing from non-automatic processing, and describing modern theories of the underlying mechanisms. Focuses on evidence from studies of reading and draws implications from theory and data for practical issues in teaching reading. Suggests that…

  5. Neural pathways for visual speech perception

    Lynne E Bernstein

    2014-12-01

    Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.

  6. Human and automatic speaker recognition over telecommunication channels

    Fernández Gallardo, Laura

    2016-01-01

    This work addresses the evaluation of the human and the automatic speaker recognition performances under different channel distortions caused by bandwidth limitation, codecs, and electro-acoustic user interfaces, among other impairments. Its main contribution is the demonstration of the benefits of communication channels of extended bandwidth, together with an insight into how speaker-specific characteristics of speech are preserved through different transmissions. It provides sufficient motivation for considering speaker recognition as a criterion for the migration from narrowband to enhanced bandwidths, such as wideband and super-wideband.

  7. Investigation of in-vehicle speech intelligibility metrics for normal hearing and hearing impaired listeners

    Samardzic, Nikolina

    The effectiveness of in-vehicle speech communication can be a good indicator of the perception of the overall vehicle quality and customer satisfaction. Currently available speech intelligibility metrics do not account in their procedures for essential parameters needed for a complete and accurate evaluation of in-vehicle speech intelligibility. These include the directivity and the distance of the talker with respect to the listener, binaural listening, hearing profile of the listener, vocal effort, and multisensory hearing. In the first part of this research the effectiveness of in-vehicle application of these metrics is investigated in a series of studies to reveal their shortcomings, including a wide range of scores resulting from each of the metrics for a given measurement configuration and vehicle operating condition. In addition, the nature of a possible correlation between the scores obtained from each metric is unknown. The metrics and the subjective perception of speech intelligibility using, for example, the same speech material have not been compared in literature. As a result, in the second part of this research, an alternative method for speech intelligibility evaluation is proposed for use in the automotive industry by utilizing a virtual reality driving environment for ultimately setting targets, including the associated statistical variability, for future in-vehicle speech intelligibility evaluation. The Speech Intelligibility Index (SII) was evaluated at the sentence Speech Receptions Threshold (sSRT) for various listening situations and hearing profiles using acoustic perception jury testing and a variety of talker and listener configurations and background noise. In addition, the effect of individual sources and transfer paths of sound in an operating vehicle to the vehicle interior sound, specifically their effect on speech intelligibility was quantified, in the framework of the newly developed speech intelligibility evaluation method. Lastly

  8. Automatic modulation recognition of communication signals

    Azzouz, Elsayed Elsayed

    1996-01-01

    Automatic modulation recognition is a rapidly evolving area of signal analysis. In recent years, interest from the academic and military research institutes has focused around the research and development of modulation recognition algorithms. Any communication intelligence (COMINT) system comprises three main blocks: receiver front-end, modulation recogniser and output stage. Considerable work has been done in the area of receiver front-ends. The work at the output stage is concerned with information extraction, recording and exploitation and begins with signal demodulation, that requires accurate knowledge about the signal modulation type. There are, however, two main reasons for knowing the current modulation type of a signal; to preserve the signal information content and to decide upon the suitable counter action, such as jamming. Automatic Modulation Recognition of Communications Signals describes in depth this modulation recognition process. Drawing on several years of research, the authors provide a cr...

  9. Development of an automatic pipeline scanning system

    Kim, Jae H.; Lee, Jae C.; Moon, Soon S.; Eom, Heung S.; Choi, Yu R

    1999-11-01

    Pressure pipe inspection in nuclear power plants is one of the mandatory regulation items. Comparing to manual ultrasonic inspection, automatic inspection has the benefits of more accurate and reliable inspection results and reduction of radiation disposal. final object of this project is to develop an automatic pipeline inspection system of pressure pipe welds in nuclear power plants. We developed a pipeline scanning robot with four magnetic wheels and 2-axis manipulator for controlling ultrasonic transducers, and developed the robot control computer which controls the robot to navigate along inspection path exactly. We expect our system can contribute to reduction of inspection time, performance enhancement, and effective management of inspection results. The system developed by this project can be practically used for inspection works after field tests. (author)

  10. An Approximate Approach to Automatic Kernel Selection.

    Ding, Lizhong; Liao, Shizhong

    2016-02-02

    Kernel selection is a fundamental problem of kernel-based learning algorithms. In this paper, we propose an approximate approach to automatic kernel selection for regression from the perspective of kernel matrix approximation. We first introduce multilevel circulant matrices into automatic kernel selection, and develop two approximate kernel selection algorithms by exploiting the computational virtues of multilevel circulant matrices. The complexity of the proposed algorithms is quasi-linear in the number of data points. Then, we prove an approximation error bound to measure the effect of the approximation in kernel matrices by multilevel circulant matrices on the hypothesis and further show that the approximate hypothesis produced with multilevel circulant matrices converges to the accurate hypothesis produced with kernel matrices. Experimental evaluations on benchmark datasets demonstrate the effectiveness of approximate kernel selection.

  11. Child directed speech, speech in noise and hyperarticulated speech in the Pacific Northwest

    Wright, Richard; Carmichael, Lesley; Beckford Wassink, Alicia; Galvin, Lisa

    2001-05-01

    Three types of exaggerated speech are thought to be systematic responses to accommodate the needs of the listener: child-directed speech (CDS), hyperspeech, and the Lombard response. CDS (e.g., Kuhl et al., 1997) occurs in interactions with young children and infants. Hyperspeech (Johnson et al., 1993) is a modification in response to listeners difficulties in recovering the intended message. The Lombard response (e.g., Lane et al., 1970) is a compensation for increased noise in the signal. While all three result from adaptations to accommodate the needs of the listener, and therefore should share some features, the triggering conditions are quite different, and therefore should exhibit differences in their phonetic outcomes. While CDS has been the subject of a variety of acoustic studies, it has never been studied in the broader context of the other ``exaggerated'' speech styles. A large crosslinguistic study was undertaken that compares speech produced under four conditions: spontaneous conversations, CDS aimed at 6-9-month-old infants, hyperarticulated speech, and speech in noise. This talk will present some findings for North American English as spoken in the Pacific Northwest. The measures include f0, vowel duration, F1 and F2 at vowel midpoint, and intensity.

  12. Merge-Weighted Dynamic Time Warping for Speech Recognition

    张湘莉兰; 骆志刚; 李明

    2014-01-01

    Obtaining training material for rarely used English words and common given names from countries where English is not spoken is difficult due to excessive time, storage and cost factors. By considering personal privacy, language-independent (LI) with lightweight speaker-dependent (SD) automatic speech recognition (ASR) is a convenient option to solve the problem. The dynamic time warping (DTW) algorithm is the state-of-the-art algorithm for small-footprint SD ASR for real-time applications with limited storage and small vocabularies. These applications include voice dialing on mobile devices, menu-driven recognition, and voice control on vehicles and robotics. However, traditional DTW has several limitations, such as high computational complexity, constraint induced coarse approximation, and inaccuracy problems. In this paper, we introduce the merge-weighted dynamic time warping (MWDTW) algorithm. This method defines a template confidence index for measuring the similarity between merged training data and testing data, while following the core DTW process. MWDTW is simple, efficient, and easy to implement. With extensive experiments on three representative SD speech recognition datasets, we demonstrate that our method outperforms DTW, DTW on merged speech data, the hidden Markov model (HMM) significantly, and is also six times faster than DTW overall.

  13. Objective Gender and Age Recognition from Speech Sentences

    Fatima K. Faek

    2015-10-01

    Full Text Available In this work, an automatic gender and age recognizer from speech is investigated. The relevant features to gender recognition are selected from the first four formant frequencies and twelve MFCCs and feed the SVM classifier. While the relevant features to age has been used with k-NN classifier for the age recognizer model, using MATLAB as a simulation tool. A special selection of robust features is used in this work to improve the results of the gender and age classifiers based on the frequency range that the feature represents. The gender and age classification algorithms are evaluated using 114 (clean and noisy speech samples uttered in Kurdish language. The model of two classes (adult males and adult females gender recognition, reached 96% recognition accuracy. While for three categories classification (adult males, adult females, and children, the model achieved 94% recognition accuracy. For the age recognition model, seven groups according to their ages are categorized. The model performance after selecting the relevant features to age achieved 75.3%. For further improvement a de-noising technique is used with the noisy speech signals, followed by selecting the proper features that are affected by the de-noising process and result in 81.44% recognition accuracy.

  14. Musical melody and speech intonation: singing a different tune.

    Robert J Zatorre

    Full Text Available Music and speech are often cited as characteristically human forms of communication. Both share the features of hierarchical structure, complex sound systems, and sensorimotor sequencing demands, and both are used to convey and influence emotions, among other functions [1]. Both music and speech also prominently use acoustical frequency modulations, perceived as variations in pitch, as part of their communicative repertoire. Given these similarities, and the fact that pitch perception and production involve the same peripheral transduction system (cochlea and the same production mechanism (vocal tract, it might be natural to assume that pitch processing in speech and music would also depend on the same underlying cognitive and neural mechanisms. In this essay we argue that the processing of pitch information differs significantly for speech and music; specifically, we suggest that there are two pitch-related processing systems, one for more coarse-grained, approximate analysis and one for more fine-grained accurate representation, and that the latter is unique to music. More broadly, this dissociation offers clues about the interface between sensory and motor systems, and highlights the idea that multiple processing streams are a ubiquitous feature of neuro-cognitive architectures.

  15. Emotional speech processing at the intersection of prosody and semantics.

    Rachel Schwartz

    Full Text Available The ability to accurately perceive emotions is crucial for effective social interaction. Many questions remain regarding how different sources of emotional cues in speech (e.g., prosody, semantic information are processed during emotional communication. Using a cross-modal emotional priming paradigm (Facial affect decision task, we compared the relative contributions of processing utterances with single-channel (prosody-only versus multi-channel (prosody and semantic cues on the perception of happy, sad, and angry emotional expressions. Our data show that emotional speech cues produce robust congruency effects on decisions about an emotionally related face target, although no processing advantage occurred when prime stimuli contained multi-channel as opposed to single-channel speech cues. Our data suggest that utterances with prosodic cues alone and utterances with combined prosody and semantic cues both activate knowledge that leads to emotional congruency (priming effects, but that the convergence of these two information sources does not always heighten access to this knowledge during emotional speech processing.

  16. Emotional speech processing at the intersection of prosody and semantics.

    Schwartz, Rachel; Pell, Marc D

    2012-01-01

    The ability to accurately perceive emotions is crucial for effective social interaction. Many questions remain regarding how different sources of emotional cues in speech (e.g., prosody, semantic information) are processed during emotional communication. Using a cross-modal emotional priming paradigm (Facial affect decision task), we compared the relative contributions of processing utterances with single-channel (prosody-only) versus multi-channel (prosody and semantic) cues on the perception of happy, sad, and angry emotional expressions. Our data show that emotional speech cues produce robust congruency effects on decisions about an emotionally related face target, although no processing advantage occurred when prime stimuli contained multi-channel as opposed to single-channel speech cues. Our data suggest that utterances with prosodic cues alone and utterances with combined prosody and semantic cues both activate knowledge that leads to emotional congruency (priming) effects, but that the convergence of these two information sources does not always heighten access to this knowledge during emotional speech processing.

  17. Objective Speech Quality Measurement Using Statistical Data Mining

    Wai-Yip Chan

    2005-06-01

    Full Text Available Measuring speech quality by machines overcomes two major drawbacks of subjective listening tests, their low speed and high cost. Real-time, accurate, and economical objective measurement of speech quality opens up a wide range of applications that cannot be supported with subjective listening tests. In this paper, we propose a statistical data mining approach to design objective speech quality measurement algorithms. A large pool of perceptual distortion features is extracted from the speech signal. We examine using classification and regression trees (CART and multivariate adaptive regression splines (MARS, separately and jointly, to select the most salient features from the pool, and to construct good estimators of subjective listening quality based on the selected features. We show designs that use perceptually significant features and outperform the state-of-the-art objective measurement algorithm. The designed algorithms are computationally simple, making them suitable for real-time implementation. The proposed design method is scalable with the amount of learning data; thus, performance can be improved with more offline or online training.

  18. NICT/ATR Chinese-Japanese-English Speech-to-Speech Translation System

    Tohru Shimizu; Yutaka Ashikari; Eiichiro Sumita; ZHANG Jinsong; Satoshi Nakamura

    2008-01-01

    This paper describes the latest version of the Chinese-Japanese-English handheld speech-to-speech translation system developed by NICT/ATR,which is now ready to be deployed for travelers.With the entire speech-to-speech translation function being implemented into one terminal,it realizes real-time,location-free speech-to-speech translation.A new noise-suppression technique notably improves the speech recognition performance.Corpus-based approaches of speech recognition,machine translation,and speech synthesis enable coverage of a wide variety of topics and portability to other languages.Test results show that the character accuracy of speech recognition is 82%-94% for Chinese speech,with a bilingual evaluation understudy score of machine translation is 0.55-0.74 for Chinese-Japanese and Chinese-English.

  19. Binary Masking & Speech Intelligibility

    Boldt, Jesper

    The purpose of this thesis is to examine how binary masking can be used to increase intelligibility in situations where hearing impaired listeners have difficulties understanding what is being said. The major part of the experiments carried out in this thesis can be categorized as either experime...... mask using a directional system and a method for correcting errors in the target binary mask. The last part of the thesis, proposes a new method for objective evaluation of speech intelligibility.......The purpose of this thesis is to examine how binary masking can be used to increase intelligibility in situations where hearing impaired listeners have difficulties understanding what is being said. The major part of the experiments carried out in this thesis can be categorized as either...... experiments under ideal conditions or as experiments under more realistic conditions useful for real-life applications such as hearing aids. In the experiments under ideal conditions, the previously defined ideal binary mask is evaluated using hearing impaired listeners, and a novel binary mask -- the target...

  20. Critical Thinking Process in English Speech

    WANG Jia-li

    2016-01-01

    With the development of mass media, English speech has become an important way for international cultural exchange in the context of globalization. Whether it is a political speech, a motivational speech, or an ordinary public speech, the wisdom and charm of critical thinking are always given into full play. This study analyzes the cultivation of critical thinking in English speech with the aid of representative examples, which is significant for cultivating college students’critical thinking as well as developing their critical thinking skills in English speech.

  1. Intra- and Inter-database Study for Arabic, English, and German Databases: Do Conventional Speech Features Detect Voice Pathology?

    Ali, Zulfiqar; Alsulaiman, Mansour; Muhammad, Ghulam; Elamvazuthi, Irraivan; Al-Nasheri, Ahmed; Mesallam, Tamer A; Farahat, Mohamed; Malki, Khalid H

    2016-10-10

    A large population around the world has voice complications. Various approaches for subjective and objective evaluations have been suggested in the literature. The subjective approach strongly depends on the experience and area of expertise of a clinician, and human error cannot be neglected. On the other hand, the objective or automatic approach is noninvasive. Automatic developed systems can provide complementary information that may be helpful for a clinician in the early screening of a voice disorder. At the same time, automatic systems can be deployed in remote areas where a general practitioner can use them and may refer the patient to a specialist to avoid complications that may be life threatening. Many automatic systems for disorder detection have been developed by applying different types of conventional speech features such as the linear prediction coefficients, linear prediction cepstral coefficients, and Mel-frequency cepstral coefficients (MFCCs). This study aims to ascertain whether conventional speech features detect voice pathology reliably, and whether they can be correlated with voice quality. To investigate this, an automatic detection system based on MFCC was developed, and three different voice disorder databases were used in this study. The experimental results suggest that the accuracy of the MFCC-based system varies from database to database. The detection rate for the intra-database ranges from 72% to 95%, and that for the inter-database is from 47% to 82%. The results conclude that conventional speech features are not correlated with voice, and hence are not reliable in pathology detection.

  2. Memory conserving speech synthesis DSP chip; Sho memory onsei gosei DSP chip

    NONE

    1999-03-01

    which can be incorporated into a system having large restriction in terms of hardware. The system was incorporated successfully into a digital signal processor (DSP) chip for an answering phone. Reduction in speech elements, compression of an element dictionary, and simplification of synthesis processing have made the speech synthesizing part including the element dictionary as very compact as about 40 K bytes. The speech call-out function automatically gives accents on names registered in an address book with Katakana characters, converts them into synthesized sound and reads them out. The DSP chip is mounted on answering phone sets with number display, and has been put on sale since August 1998. (translated by NEDO)

  3. Subjective and Objective Quality Assessment of Single-Channel Speech Separation Algorithms

    Mowlaee, Pejman; Saeidi, Rahim; Christensen, Mads Græsbøll;

    2012-01-01

    Previous studies on performance evaluation of single-channel speech separation (SCSS) algorithms mostly focused on automatic speech recognition (ASR) accuracy as their performance measure. Assessing the separated signals by different metrics other than this has the benefit that the results...... methods for audio source separation (PEASS) measures. In our experiments, we apply these measures on the separated signals obtained by two well-known systems in the SCSS challenge to assess the objective and subjective quality of their output signals. Comparing subjective and objective measurements shows...... that PESQ and PEASS quality metrics predict well the subjective quality of separated signals obtained by the separation systems. From the results it is observed that the short-time objective intelligibility (STOI) measure predict the speech intelligibility results....

  4. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    Santiago-Omar Caballero-Morales

    2013-01-01

    Full Text Available An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR system was built with Hidden Markov Models (HMMs, where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness. Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR’s output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech.

  5. Prosody's Contribution to Fluency: An Examination of the Theory of Automatic Information Processing

    Schrauben, Julie E.

    2010-01-01

    LaBerge and Samuels' (1974) theory of automatic information processing in reading offers a model that explains how and where the processing of information occurs and the degree to which processing of information occurs. These processes are dependent upon two criteria: accurate word decoding and automatic word recognition. However, LaBerge and…

  6. Automaticity or active control

    Tudoran, Ana Alina; Olsen, Svein Ottar

    aspects of the construct, such as routine, inertia, automaticity, or very little conscious deliberation. The data consist of 2962 consumers participating in a large European survey. The results show that habit strength significantly moderates the association between satisfaction and action loyalty, and......This study addresses the quasi-moderating role of habit strength in explaining action loyalty. A model of loyalty behaviour is proposed that extends the traditional satisfaction–intention–action loyalty network. Habit strength is conceptualised as a cognitive construct to refer to the psychological......, respectively, between intended loyalty and action loyalty. At high levels of habit strength, consumers are more likely to free up cognitive resources and incline the balance from controlled to routine and automatic-like responses....

  7. Automatic Ultrasound Scanning

    Moshavegh, Ramin

    Medical ultrasound has been a widely used imaging modality in healthcare platforms for examination, diagnostic purposes, and for real-time guidance during surgery. However, despite the recent advances, medical ultrasound remains the most operator-dependent imaging modality, as it heavily relies...... on the user adjustments on the scanner interface to optimize the scan settings. This explains the huge interest in the subject of this PhD project entitled “AUTOMATIC ULTRASOUND SCANNING”. The key goals of the project have been to develop automated techniques to minimize the unnecessary settings...... on the scanners, and to improve the computer-aided diagnosis (CAD) in ultrasound by introducing new quantitative measures. Thus, four major issues concerning automation of the medical ultrasound are addressed in this PhD project. They touch upon gain adjustments in ultrasound, automatic synthetic aperture image...

  8. Automatic trend estimation

    Vamos¸, C˘alin

    2013-01-01

    Our book introduces a method to evaluate the accuracy of trend estimation algorithms under conditions similar to those encountered in real time series processing. This method is based on Monte Carlo experiments with artificial time series numerically generated by an original algorithm. The second part of the book contains several automatic algorithms for trend estimation and time series partitioning. The source codes of the computer programs implementing these original automatic algorithms are given in the appendix and will be freely available on the web. The book contains clear statement of the conditions and the approximations under which the algorithms work, as well as the proper interpretation of their results. We illustrate the functioning of the analyzed algorithms by processing time series from astrophysics, finance, biophysics, and paleoclimatology. The numerical experiment method extensively used in our book is already in common use in computational and statistical physics.

  9. Automatic design of decision-tree algorithms with evolutionary algorithms.

    Barros, Rodrigo C; Basgalupp, Márcio P; de Carvalho, André C P L F; Freitas, Alex A

    2013-01-01

    This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ability to provide an intuitive and accurate knowledge representation for classification problems. The automatic design of these algorithms seems timely, given the large literature accumulated over more than 40 years of research in the manual design of decision-tree induction algorithms. The proposed hyper-heuristic evolutionary algorithm, HEAD-DT, is extensively tested using 20 public UCI datasets and 10 microarray gene expression datasets. The algorithms automatically designed by HEAD-DT are compared with traditional decision-tree induction algorithms, such as C4.5 and CART. Experimental results show that HEAD-DT is capable of generating algorithms which are significantly more accurate than C4.5 and CART.

  10. Acoustic analysis assessment in speech pathology detection

    Panek Daria

    2015-09-01

    Full Text Available Automatic detection of voice pathologies enables non-invasive, low cost and objective assessments of the presence of disorders, as well as accelerating and improving the process of diagnosis and clinical treatment given to patients. In this work, a vector made up of 28 acoustic parameters is evaluated using principal component analysis (PCA, kernel principal component analysis (kPCA and an auto-associative neural network (NLPCA in four kinds of pathology detection (hyperfunctional dysphonia, functional dysphonia, laryngitis, vocal cord paralysis using the a, i and u vowels, spoken at a high, low and normal pitch. The results indicate that the kPCA and NLPCA methods can be considered a step towards pathology detection of the vocal folds. The results show that such an approach provides acceptable results for this purpose, with the best efficiency levels of around 100%. The study brings the most commonly used approaches to speech signal processing together and leads to a comparison of the machine learning methods determining the health status of the patient

  11. Automatic food decisions

    Mueller Loose, Simone

    Consumers' food decisions are to a large extent shaped by automatic processes, which are either internally directed through learned habits and routines or externally influenced by context factors and visual information triggers. Innovative research methods such as eye tracking, choice experiments...... and food diaries allow us to better understand the impact of unconscious processes on consumers' food choices. Simone Mueller Loose will provide an overview of recent research insights into the effects of habit and context on consumers' food choices....

  12. Comparing the influence of spectro-temporal integration in computational speech segregation

    Bentsen, Thomas; May, Tobias; Kressner, Abigail Anne

    2016-01-01

    The goal of computational speech segregation systems is to automatically segregate a target speaker from interfering maskers. Typically, these systems include a feature extraction stage in the front-end and a classification stage in the back-end. A spectrotemporal integration strategy can...... metric that comprehensively predicts computational segregation performance and correlates well with intelligibility. The outcome of this study could help to identify the most effective spectro-temporal integration strategy for computational segregation systems....

  13. A continuous speech recognition approach for the design of a dictation machine

    Smaïli, Kamel; Charpillet, François; Pierrel, Jean-Mari; Haton, Jean-Paul

    1991-01-01

    International audience; The oral entry of texts (dictation machine) remains an important potential field of application for automatic speech recognition. The RFIA group of CRIN/INRIA has been investigating this research area for the french language during the past ten years. We propose in this paper a general presentation of the present state of our MAUD system which is based upon four major interacting components: an acoustic phonetic decoder, a lexical component, a linguistic model and a us...

  14. Automatic 3D modeling of the urban landscape

    I. Esteban; J. Dijk; F. Groen

    2010-01-01

    In this paper we present a fully automatic system for building 3D models of urban areas at the street level. We propose a novel approach for the accurate estimation of the scale consistent camera pose given two previous images. We employ a new method for global optimization and use a novel sampling

  15. Automatic 3D Modeling of the Urban Landscape

    Esteban, I.; Dijk, J.; Groen, F.A.

    2010-01-01

    In this paper we present a fully automatic system for building 3D models of urban areas at the street level. We propose a novel approach for the accurate estimation of the scale consistent camera pose given two previous images. We employ a new method for global optimization and use a novel sampling

  16. Automatic classification of trees from laser scanning point clouds

    Sirmacek, B.; Lindenbergh, R.C.

    2015-01-01

    Development of laser scanning technologies has promoted tree monitoring studies to a new level, as the laser scanning point clouds enable accurate 3D measurements in a fast and environmental friendly manner. In this paper, we introduce a probability matrix computation based algorithm for automatical

  17. Automatization of lexicographic work

    Iztok Kosem

    2013-12-01

    Full Text Available A new approach to lexicographic work, in which the lexicographer is seen more as a validator of the choices made by computer, was recently envisaged by Rundell and Kilgarriff (2011. In this paper, we describe an experiment using such an approach during the creation of Slovene Lexical Database (Gantar, Krek, 2011. The corpus data, i.e. grammatical relations, collocations, examples, and grammatical labels, were automatically extracted from 1,18-billion-word Gigafida corpus of Slovene. The evaluation of the extracted data consisted of making a comparison between the time spent writing a manual entry and a (semi-automatic entry, and identifying potential improvements in the extraction algorithm and in the presentation of data. An important finding was that the automatic approach was far more effective than the manual approach, without any significant loss of information. Based on our experience, we would propose a slightly revised version of the approach envisaged by Rundell and Kilgarriff in which the validation of data is left to lower-level linguists or crowd-sourcing, whereas high-level tasks such as meaning description remain the domain of lexicographers. Such an approach indeed reduces the scope of lexicographer’s work, however it also results in the ability of bringing the content to the users more quickly.

  18. Automatic differentiation for reduced sequential quadratic programming

    Liao Liangcai; Li Jin; Tan Yuejin

    2007-01-01

    In order to slove the large-scale nonlinear programming (NLP) problems efficiently, an efficient optimization algorithm based on reduced sequential quadratic programming (rSQP) and automatic differentiation (AD) is presented in this paper. With the characteristics of sparseness, relatively low degrees of freedom and equality constraints utilized, the nonlinear programming problem is solved by improved rSQP solver. In the solving process, AD technology is used to obtain accurate gradient information. The numerical results show that the combined algorithm, which is suitable for large-scale process optimization problems, can calculate more efficiently than rSQP itself.

  19. The Role of Visual Speech Information in Supporting Perceptual Learning of Degraded Speech

    Wayne, Rachel V.; Johnsrude, Ingrid S.

    2012-01-01

    Following cochlear implantation, hearing-impaired listeners must adapt to speech as heard through their prosthesis. Visual speech information (VSI; the lip and facial movements of speech) is typically available in everyday conversation. Here, we investigate whether learning to understand a popular auditory simulation of speech as transduced by a…

  20. Visual Context Enhanced: The Joint Contribution of Iconic Gestures and Visible Speech to Degraded Speech Comprehension

    Drijvers, Linda; Ozyurek, Asli

    2017-01-01

    Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. Method:…

  1. Connected Speech Processes in Developmental Speech Impairment: Observations from an Electropalatographic Perspective

    Howard, Sara

    2004-01-01

    This paper uses a combination of perceptual and electropalatographic (EPG) analysis to explore the presence and characteristics of connected speech processes in the speech output of five older children with developmental speech impairments. Each of the children is shown to use some processes typical of normal speech production but also to use a…

  2. A Danish open-set speech corpus for competing-speech studies

    Nielsen, Jens Bo; Dau, Torsten; Neher, Tobias

    2014-01-01

    Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SR...

  3. The treatment of apraxia of speech : Speech and music therapy, an innovative joint effort

    Hurkmans, Josephus Johannes Stephanus

    2016-01-01

    Apraxia of Speech (AoS) is a neurogenic speech disorder. A wide variety of behavioural methods have been developed to treat AoS. Various therapy programmes use musical elements to improve speech production. A unique therapy programme combining elements of speech therapy and music therapy is called S

  4. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  5. Experimental study on phase perception in speech

    BU Fanliang; CHEN Yanpu

    2003-01-01

    As the human ear is dull to the phase in speech, little attention has been paid tophase information in speech coding. In fact, the speech perceptual quality may be degeneratedif the phase distortion is very large. The perceptual effect of the STFT (Short time Fouriertransform) phase spectrum is studied by auditory subjective hearing tests. Three main con-clusions are (1) If the phase information is neglected completely, the subjective quality of thereconstructed speech may be very poor; (2) Whether the neglected phase is in low frequencyband or high frequency band, the difference from the original speech can be perceived by ear;(3) It is very difficult for the human ear to perceive the difference of speech quality betweenoriginal speech and reconstructed speech while the phase quantization step size is shorter thanπ/7.

  6. Quick Statistics about Voice, Speech, and Language

    ... here Home » Health Info » Statistics and Epidemiology Quick Statistics About Voice, Speech, Language Voice, Speech, Language, and ... no 205. Hyattsville, MD: National Center for Health Statistics. 2015. Hoffman HJ, Li C-M, Losonczy K, ...

  7. STUDY ON PHASE PERCEPTION IN SPEECH

    Tong Ming; Bian Zhengzhong; Li Xiaohui; Dai Qijun; Chen Yanpu

    2003-01-01

    The perceptual effect of the phase information in speech has been studied by auditorysubjective tests. On the condition that the phase spectrum in speech is changed while amplitudespectrum is unchanged, the tests show that: (1) If the envelop of the reconstructed speech signalis unchanged, there is indistinctive auditory perception between the original speech and thereconstructed speech; (2) The auditory perception effect of the reconstructed speech mainly lieson the amplitude of the derivative of the additive phase; (3) td is the maximum relative time shiftbetween different frequency components of the reconstructed speech signal. The speech qualityis excellent while td <10ms; good while 10ms< td <20ms; common while 20ms< td <35ms, andpoor while td >35ms.

  8. What Is Language? What Is Speech?

    ... request did not produce results) Speech is the verbal means of communicating. Speech consists of the following: ... questions and requests for information from members and non-members. Available 8:30 a.m.–5:00 ...

  9. Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

    Yue Zhao

    2012-12-01

    Full Text Available Audio‐visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi‐stream Dynamic Bayesian Network and coupled HMM are widely used for audio‐visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial‐temporal multimodal features from Tibetan audio‐visual speech data and build an accurate audio‐visual speech recognition model under a no frame‐independency assumption. The experiment results on Tibetan speech data from some real‐world environments showed the proposed DDBN outperforms the state‐of‐art methods in word recognition accuracy.

  10. Automatic segmentation of psoriasis lesions

    Ning, Yang; Shi, Chenbo; Wang, Li; Shu, Chang

    2014-10-01

    The automatic segmentation of psoriatic lesions is widely researched these years. It is an important step in Computer-aid methods of calculating PASI for estimation of lesions. Currently those algorithms can only handle single erythema or only deal with scaling segmentation. In practice, scaling and erythema are often mixed together. In order to get the segmentation of lesions area - this paper proposes an algorithm based on Random forests with color and texture features. The algorithm has three steps. The first step, the polarized light is applied based on the skin's Tyndall-effect in the imaging to eliminate the reflection and Lab color space are used for fitting the human perception. The second step, sliding window and its sub windows are used to get textural feature and color feature. In this step, a feature of image roughness has been defined, so that scaling can be easily separated from normal skin. In the end, Random forests will be used to ensure the generalization ability of the algorithm. This algorithm can give reliable segmentation results even the image has different lighting conditions, skin types. In the data set offered by Union Hospital, more than 90% images can be segmented accurately.

  11. High Performance Speech Compression System

    2001-01-01

    Since Pulse Code Modulation emerged in 1937, digitized speec h has experienced rapid development due to its outstanding voice quali ty, reliability, robustness and security in communication. But how to reduce channel width without loss of speech quality remains a crucial problem in speech coding theory. A new full-duplex digital speech comm unication system based on the Vocoder of AMBE-1000 and microcontroller ATMEL 89C51 is introduced. It shows higher voice quality than current mobile phone system with only a quarter of channel width needed for t he latter. The prospective areas in which the system can be applied in clude satellite communication, IP Phone, virtual meeting and the most important, defence industry.

  12. Spatial localization of speech segments

    Karlsen, Brian Lykkegaard

    1999-01-01

    angle the target is likely to have originated from. The model is trained on the experimental data. On the basis of the experimental results, it is concluded that the human ability to localize speech segments in adverse noise depends on the speech segment as well as its point of origin in space...... the task of the experiment. The psychoacoustical experiment used naturally-spoken Danish consonant-vowel combinations as targets presented in diffuse speech-shaped noise at a peak SNR of -10 dB. The subjects were normal hearing persons. The experiment took place in an anechoic chamber where eight...... loudspeakers were suspended so that they surrounded the subjects in the horizontal plane. The subjects were required to push a button on a pad indicating where they had localized the target to in the horizontal plane. The response pad had twelve buttons arranged uniformly in a circle and two further buttons so...

  13. MUSAN: A Music, Speech, and Noise Corpus

    Snyder, David; Chen, Guoguo; Povey, Daniel

    2015-01-01

    This report introduces a new corpus of music, speech, and noise. This dataset is suitable for training models for voice activity detection (VAD) and music/speech discrimination. Our corpus is released under a flexible Creative Commons license. The dataset consists of music from several genres, speech from twelve languages, and a wide assortment of technical and non-technical noises. We demonstrate use of this corpus for music/speech discrimination on Broadcast news and VAD for speaker identif...

  14. Freedom of Speech as an Academic Discipline.

    Haiman, Franklyn S.

    Since its formation, the Speech Communication Association's Committee on Freedom of Speech has played a critical leadership role in course offerings, research efforts, and regional activities in freedom of speech. Areas in which research has been done and in which further research should be carried out include: historical-critical research, in…

  15. Development of binaural speech transmission index

    Wijngaarden, S.J. van; Drullman, R.

    2006-01-01

    Although the speech transmission index (STI) is a well-accepted and standardized method for objective prediction of speech intelligibility in a wide range of-environments and applications, it is essentially a monaural model. Advantages of binaural hearing to the intelligibility of speech are disrega

  16. Recovering Asynchronous Watermark Tones from Speech

    2009-04-01

    Audio steganography for covert data transmission by impercep- tible tone insertion,” Proceedings Communications Sys- tems and Applications, IEEE, vol. 4, pp. 1647–1653, 2004. 1408 ...by a comfortable margin. Index Terms— Speech Watermarking, Hidden Tones, Speech Steganography , Speech Data Hiding 1. BACKGROUND Imperceptibly

  17. Audiovisual Asynchrony Detection in Human Speech

    Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

    2011-01-01

    Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…

  18. Application of wavelets in speech processing

    Farouk, Mohamed Hesham

    2014-01-01

    This book provides a survey on wide-spread of employing wavelets analysis  in different applications of speech processing. The author examines development and research in different application of speech processing. The book also summarizes the state of the art research on wavelet in speech processing.

  19. Cognitive Functions in Childhood Apraxia of Speech

    Nijland, Lian; Terband, Hayo; Maassen, Ben

    2015-01-01

    Purpose: Childhood apraxia of speech (CAS) is diagnosed on the basis of specific speech characteristics, in the absence of problems in hearing, intelligence, and language comprehension. This does not preclude the possibility that children with this speech disorder might demonstrate additional problems. Method: Cognitive functions were investigated…

  20. Cognitive functions in Childhood Apraxia of Speech

    Nijland, L.; Terband, H.; Maassen, B.

    2015-01-01

    Purpose: Childhood Apraxia of Speech (CAS) is diagnosed on the basis of specific speech characteristics, in the absence of problems in hearing, intelligence, and language comprehension. This does not preclude the possibility that children with this speech disorder might demonstrate additional proble

  1. Speech-Song Interface of Chinese Speakers

    Mang, Esther

    2007-01-01

    Pitch is a psychoacoustic construct crucial in the production and perception of speech and songs. This article is an exploration of the interface of speech and song performance of Chinese speakers. Although parallels might be drawn from the prosodic and sound structures of the linguistic and musical systems, perceiving and producing speech and…

  2. Factors of Politeness and Indirect Speech Acts

    杨雪梅

    2016-01-01

    Polite principle is influenced deeply by a nation's history,culture,custom and so on,therefor different countries have different understandings and expressions of politeness and indirect speech acts.This paper shows some main factors influencing a polite speech.Through this article,readers can comprehensively know about politeness and indirect speech acts.

  3. Speech and Debate as Civic Education

    Hogan, J. Michael; Kurr, Jeffrey A.; Johnson, Jeremy D.; Bergmaier, Michael J.

    2016-01-01

    In light of the U.S. Senate's designation of March 15, 2016 as "National Speech and Debate Education Day" (S. Res. 398, 2016), it only seems fitting that "Communication Education" devote a special section to the role of speech and debate in civic education. Speech and debate have been at the heart of the communication…

  4. Audiovisual Speech Integration and Lipreading in Autism

    Smith, Elizabeth G.; Bennetto, Loisa

    2007-01-01

    Background: During speech perception, the ability to integrate auditory and visual information causes speech to sound louder and be more intelligible, and leads to quicker processing. This integration is important in early language development, and also continues to affect speech comprehension throughout the lifespan. Previous research shows that…

  5. Liberalism, Speech Codes, and Related Problems.

    Sunstein, Cass R.

    1993-01-01

    It is argued that universities are pervasively and necessarily engaged in regulation of speech, which complicates many existing claims about hate speech codes on campus. The ultimate test is whether the restriction on speech is a legitimate part of the institution's mission, commitment to liberal education. (MSE)

  6. Gesture & Speech Based Appliance Control

    Dr. Sayleegharge,

    2014-01-01

    Full Text Available This document explores the use of speech & gestures to control home appliances. Aiming at the aging population of the world and relieving them from their dependencies. The two approaches used to sail through the target are the MFCC approach for speech processing and the Identification of Characteristic Point Algorithm for gesture recognition. A barrier preventing wide adoption is that this audience can find controlling assistive technology difficult, as they are less dexterous and computer literate. Our results hope to provide a more natural and intuitive interface to help bridge the gap between technology and elderly users.

  7. Discriminative learning for speech recognition

    He, Xiadong

    2008-01-01

    In this book, we introduce the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition. The specific models treated in depth include the widely used exponential-family distributions and the hidden Markov model. A detailed study is presented on unifying the common objective functions for discriminative learning in speech recognition, namely maximum mutual information (MMI), minimum classification error, and minimum phone/word error. The unification is presented, with rigorous mathematical analysis, in a common rational-functio

  8. Reproducible Research in Speech Sciences

    Kandaacute;lmandaacute;n Abari

    2012-11-01

    Full Text Available Reproducible research is the minimum standard of scientific claims in cases when independent replication proves to be difficult. With the special combination of available software tools, we provide a reproducibility recipe for the experimental research conducted in some fields of speech sciences. We have based our model on the triad of the R environment, the EMU-format speech database, and the executable publication. We present the use of three typesetting systems (LaTeX, Markdown, Org, with the help of a mini research.

  9. Speech Communication and Telephone Networks

    Gierlich, H. W.

    Speech communication over telephone networks has one major constraint: The communication has to be “real time”. The basic principle since the beginning of all telephone networks has been to provide a communication system capable of substituting the air path between two persons having a conversation at 1-m distance. This is the so-called orthotelephonic reference position [7]. Although many technical compromises must be made to enable worldwide communication over telephone networks, it is still the goal to achieve speech quality performance which is close to this reference.

  10. Relationship between Speech Intelligibility and Speech Comprehension in Babble Noise

    Fontan, Lionel; Tardieu, Julien; Gaillard, Pascal; Woisard, Virginie; Ruiz, Robert

    2015-01-01

    Purpose: The authors investigated the relationship between the intelligibility and comprehension of speech presented in babble noise. Method: Forty participants listened to French imperative sentences (commands for moving objects) in a multitalker babble background for which intensity was experimentally controlled. Participants were instructed to…

  11. Speech intelligibility of native and non-native speech

    Wijngaarden, S.J. van

    1999-01-01

    The intelligibility of speech is known to be lower if the talker is non-native instead of native for the given language. This study is aimed at quantifying the overall degradation due to acoustic-phonetic limitations of non-native talkers of Dutch, specifically of Dutch-speaking Americans who have l

  12. Speech perception in children with speech output disorders.

    Nijland, L.

    2009-01-01

    Research in the field of speech production pathology is dominated by describing deficits in output. However, perceptual problems might underlie, precede, or interact with production disorders. The present study hypothesizes that the level of the production disorders is linked to level of perception

  13. Speech entrainment enables patients with Broca's aphasia to produce fluent speech.

    Fridriksson, Julius; Hubbard, H Isabel; Hudspeth, Sarah Grace; Holland, Audrey L; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

    2012-12-01

    A distinguishing feature of Broca's aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect 'speech entrainment' and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca's aphasia. In Experiment 1, 13 patients with Broca's aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca's area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and

  14. Effect of talker and speaking style on the Speech Transmission Index (L)

    Wijngaarden, S.J. van; Houtgast, T.

    2004-01-01

    The Speech Transmission Index (STI) is routinely applied for predicting the intelligibility of messages (sentences) in noise and reverberation. Despite clear evidence that the STI is capable of doing so accurately, recent results indicate that the STI sometimes underestimates the effect of reverbera

  15. Assessing Disfluencies in School-Age Children Who Stutter: How Much Speech Is Enough?

    Gregg, Brent A.; Sawyer, Jean

    2015-01-01

    The question of what size speech sample is sufficient to accurately identify stuttering and its myriad characteristics is a valid one. Short samples have a risk of over- or underrepresenting disfluency types or characteristics. In recent years, there has been a trend toward using shorter samples because they are less time-consuming for…

  16. Automatic Configuration in NTP

    Jiang Zongli(蒋宗礼); Xu Binbin

    2003-01-01

    NTP is nowadays the most widely used distributed network time protocol, which aims at synchronizing the clocks of computers in a network and keeping the accuracy and validation of the time information which is transmitted in the network. Without automatic configuration mechanism, the stability and flexibility of the synchronization network built upon NTP protocol are not satisfying. P2P's resource discovery mechanism is used to look for time sources in a synchronization network, and according to the network environment and node's quality, the synchronization network is constructed dynamically.

  17. A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

    Lavner Yizhar

    2009-01-01

    Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.

  18. Intelligibility Evaluation of Pathological Speech through Multigranularity Feature Extraction and Optimization

    Ma, Lin; Zhang, Mancai

    2017-01-01

    Pathological speech usually refers to speech distortion resulting from illness or other biological insults. The assessment of pathological speech plays an important role in assisting the experts, while automatic evaluation of speech intelligibility is difficult because it is usually nonstationary and mutational. In this paper, we carry out an independent innovation of feature extraction and reduction, and we describe a multigranularity combined feature scheme which is optimized by the hierarchical visual method. A novel method of generating feature set based on S-transform and chaotic analysis is proposed. There are BAFS (430, basic acoustics feature), local spectral characteristics MSCC (84, Mel S-transform cepstrum coefficients), and chaotic features (12). Finally, radar chart and F-score are proposed to optimize the features by the hierarchical visual fusion. The feature set could be optimized from 526 to 96 dimensions based on NKI-CCRT corpus and 104 dimensions based on SVD corpus. The experimental results denote that new features by support vector machine (SVM) have the best performance, with a recognition rate of 84.4% on NKI-CCRT corpus and 78.7% on SVD corpus. The proposed method is thus approved to be effective and reliable for pathological speech intelligibility evaluation. PMID:28194222

  19. Joint spatial-spectral feature space clustering for speech activity detection from ECoG signals.

    Kanas, Vasileios G; Mporas, Iosif; Benz, Heather L; Sgarbas, Kyriakos N; Bezerianos, Anastasios; Crone, Nathan E

    2014-04-01

    Brain-machine interfaces for speech restoration have been extensively studied for more than two decades. The success of such a system will depend in part on selecting the best brain recording sites and signal features corresponding to speech production. The purpose of this study was to detect speech activity automatically from electrocorticographic signals based on joint spatial-frequency clustering of the ECoG feature space. For this study, the ECoG signals were recorded while a subject performed two different syllable repetition tasks. We found that the optimal frequency resolution to detect speech activity from ECoG signals was 8 Hz, achieving 98.8% accuracy by employing support vector machines as a classifier. We also defined the cortical areas that held the most information about the discrimination of speech and nonspeech time intervals. Additionally, the results shed light on the distinct cortical areas associated with the two syllables repetition tasks and may contribute to the development of portable ECoG-based communication.

  20. Speech perception of noise with binary gains

    Wang, DeLiang; Kjems, Ulrik; Pedersen, Michael Syskind;

    2008-01-01

    For a given mixture of speech and noise, an ideal binary time-frequency mask is constructed by comparing speech energy and noise energy within local time-frequency units. It is observed that listeners achieve nearly perfect speech recognition from gated noise with binary gains prescribed...... by the ideal binary mask. Only 16 filter channels and a frame rate of 100 Hz are sufficient for high intelligibility. The results show that, despite a dramatic reduction of speech information, a pattern of binary gains provides an adequate basis for speech perception....

  1. Speech in Mobile and Pervasive Environments

    Rajput, Nitendra

    2012-01-01

    This book brings together the latest research in one comprehensive volume that deals with issues related to speech processing on resource-constrained, wireless, and mobile devices, such as speech recognition in noisy environments, specialized hardware for speech recognition and synthesis, the use of context to enhance recognition, the emerging and new standards required for interoperability, speech applications on mobile devices, distributed processing between the client and the server, and the relevance of Speech in Mobile and Pervasive Environments for developing regions--an area of explosiv

  2. Perceived Speech Quality Estimation Using DTW Algorithm

    S. Arsenovski

    2009-06-01

    Full Text Available In this paper a method for speech quality estimation is evaluated by simulating the transfer of speech over packet switched and mobile networks. The proposed system uses Dynamic Time Warping algorithm for test and received speech comparison. Several tests have been made on a test speech sample of a single speaker with simulated packet (frame loss effects on the perceived speech. The achieved results have been compared with measured PESQ values on the used transmission channel and their correlation has been observed.

  3. Text To Speech System for Telugu Language

    Siva kumar, M; E. Prakash Babu

    2014-01-01

    Telugu is one of the oldest languages in India. This paper describes the development of Telugu Text-to-Speech System (TTS).In Telugu TTS the input is Telugu text in Unicode. The voices are sampled from real recorded speech. The objective of a text to speech system is to convert an arbitrary text into its corresponding spoken waveform. Speech synthesis is a process of building machinery that can generate human-like speech from any text input to imitate human speakers. Text proc...

  4. Recent Advances in Robust Speech Recognition Technology

    Ramírez, Javier

    2011-01-01

    This E-book is a collection of articles that describe advances in speech recognition technology. Robustness in speech recognition refers to the need to maintain high speech recognition accuracy even when the quality of the input speech is degraded, or when the acoustical, articulate, or phonetic characteristics of speech in the training and testing environments differ. Obstacles to robust recognition include acoustical degradations produced by additive noise, the effects of linear filtering, nonlinearities in transduction or transmission, as well as impulsive interfering sources, and diminishe

  5. [Nature of speech disorders in Parkinson disease].

    Pawlukowska, W; Honczarenko, K; Gołąb-Janowska, M

    2013-01-01

    The aim of the study was to discuss physiology and pathology of speech and review of the literature on speech disorders in Parkinson disease. Additionally, the most effective methods to diagnose the speech disorders in Parkinson disease were also stressed. Afterward, articulatory, respiratory, acoustic and pragmatic factors contributing to the exacerbation of the speech disorders were discussed. Furthermore, the study dealt with the most important types of speech treatment techniques available (pharmacological and behavioral) and a significance of Lee Silverman Voice Treatment was highlighted.

  6. Fast Monaural Separation of Speech

    Pontoppidan, Niels Henrik; Dyrholm, Mads

    2003-01-01

    a Factorial Hidden Markov Model, with non-stationary assumptions on the source autocorrelations modelled through the Factorial Hidden Markov Model, leads to separation in the monaural case. By extending Hansens work we find that Roweis' assumptions are necessary for monaural speech separation. Furthermore we...

  7. Aerosol Emission during Human Speech

    Asadi, Sima; Ristenpart, William

    2016-11-01

    The traditional emphasis for airborne disease transmission has been on coughing and sneezing, which are dramatic expiratory events that yield easily visible droplets. Recent research suggests that normal speech can release even larger quantities of aerosols that are too small to see with the naked eye, but are nonetheless large enough to carry a variety of pathogens (e.g., influenza A). This observation raises an important question: what types of speech emit the most aerosols? Here we show that the concentration of aerosols emitted during healthy human speech is positively correlated with both the amplitude (loudness) and fundamental frequency (pitch) of the vocalization. Experimental measurements with an aerodynamic particle sizer (APS) indicate that speaking in a loud voice (95 decibels) yields up to fifty times more aerosols than in a quiet voice (75 decibels), and that sounds associated with certain phonemes (e.g., [a] or [o]) release more aerosols than others. We interpret these results in terms of the egressive airflow rate associated with each phoneme and the corresponding fundamental frequency, which is known to vary significantly with gender and age. The results suggest that individual speech patterns could affect the probability of airborne disease transmission.

  8. Affecting Critical Thinking through Speech.

    O'Keefe, Virginia P.

    Intended for teachers, this booklet shows how spoken language can affect student thinking and presents strategies for teaching critical thinking skills. The first section discusses the theoretical and research bases for promoting critical thinking through speech, defines critical thinking, explores critical thinking as abstract thinking, and tells…

  9. Acoustic Analysis of PD Speech

    Karen Chenausky

    2011-01-01

    Full Text Available According to the U.S. National Institutes of Health, approximately 500,000 Americans have Parkinson's disease (PD, with roughly another 50,000 receiving new diagnoses each year. 70%–90% of these people also have the hypokinetic dysarthria associated with PD. Deep brain stimulation (DBS substantially relieves motor symptoms in advanced-stage patients for whom medication produces disabling dyskinesias. This study investigated speech changes as a result of DBS settings chosen to maximize motor performance. The speech of 10 PD patients and 12 normal controls was analyzed for syllable rate and variability, syllable length patterning, vowel fraction, voice-onset time variability, and spirantization. These were normalized by the controls' standard deviation to represent distance from normal and combined into a composite measure. Results show that DBS settings relieving motor symptoms can improve speech, making it up to three standard deviations closer to normal. However, the clinically motivated settings evaluated here show greater capacity to impair, rather than improve, speech. A feedback device developed from these findings could be useful to clinicians adjusting DBS parameters, as a means for ensuring they do not unwittingly choose DBS settings which impair patients' communication.

  10. Gaucho Gazette: Speech and Sensationalism

    Roberto José Ramos

    2013-07-01

    Full Text Available The Gaucho Gazette presents itself as a “popular newspaper”. Attempts to produce a denial about his aesthetic tabloid. Search only say that discloses what happens, as if the media were merely a reflection of society. This paper will seek to understand and explain your Sensationalism, through their speeches. Use for both, semiology, Roland Barthes, in their possibilities transdisciplinary.

  11. Automatic detecting method of LED signal lamps on fascia based on color image

    Peng, Xiaoling; Hou, Wenguang; Ding, Mingyue

    2009-10-01

    Instrument display panel is one of the most important parts of automobiles. Automatic detection of LED signal lamps is critical to ensure the reliability of automobile systems. In this paper, an automatic detection method was developed which is composed of three parts in the automatic detection: the shape of LED lamps, the color of LED lamps, and defect spots inside the lamps. More than hundreds of fascias were detected with the automatic detection algorithm. The speed of the algorithm is quite fast and satisfied with the real-time request of the system. Further, the detection result was demonstrated to be stable and accurate.

  12. Suppression of the µ rhythm during speech and non-speech discrimination revealed by independent component analysis: implications for sensorimotor integration in speech processing.

    Andrew Bowers

    Full Text Available BACKGROUND: Constructivist theories propose that articulatory hypotheses about incoming phonetic targets may function to enhance perception by limiting the possibilities for sensory analysis. To provide evidence for this proposal, it is necessary to map ongoing, high-temporal resolution changes in sensorimotor activity (i.e., the sensorimotor μ rhythm to accurate speech and non-speech discrimination performance (i.e., correct trials.. METHODS: Sixteen participants (15 female and 1 male were asked to passively listen to or actively identify speech and tone-sweeps in a two-force choice discrimination task while the electroencephalograph (EEG was recorded from 32 channels. The stimuli were presented at signal-to-noise ratios (SNRs in which discrimination accuracy was high (i.e., 80-100% and low SNRs producing discrimination performance at chance. EEG data were decomposed using independent component analysis and clustered across participants using principle component methods in EEGLAB. RESULTS: ICA revealed left and right sensorimotor µ components for 14/16 and 13/16 participants respectively that were identified on the basis of scalp topography, spectral peaks, and localization to the precentral and postcentral gyri. Time-frequency analysis of left and right lateralized µ component clusters revealed significant (pFDR<.05 suppression in the traditional beta frequency range (13-30 Hz prior to, during, and following syllable discrimination trials. No significant differences from baseline were found for passive tasks. Tone conditions produced right µ beta suppression following stimulus onset only. For the left µ, significant differences in the magnitude of beta suppression were found for correct speech discrimination trials relative to chance trials following stimulus offset. CONCLUSIONS: Findings are consistent with constructivist, internal model theories proposing that early forward motor models generate predictions about likely phonemic units

  13. Comparison of HMM and DTW methods in automatic recognition of pathological phoneme pronunciation

    Wielgat, Robert; Zielinski, Tomasz P.; Swietojanski, Pawel; Zoladz, Piotr; Król, Daniel; Wozniak, Tomasz; Grabias, Stanislaw

    2007-01-01

    In the paper recently proposed Human Factor Cepstral Coefficients (HFCC) are used to automatic recognition of pathological phoneme pronunciation in speech of impaired children and efficiency of this approach is compared to application of the standard Mel-Frequency Cepstral Coefficients (MFCC) as a feature vector. Both dynamic time warping (DTW), working on whole words or embedded phoneme patterns, and hidden Markov models (HMM) are used as classifiers in the presented research. Obtained resul...

  14. Modeling speech imitation and ecological learning of auditory-motor maps

    Claudia eCanevari

    2013-06-01

    Full Text Available Classical models of speech consider an antero-posterior distinction between perceptive and productive functions. However, the selective alteration of neural activity in speech motor centers, via transcranial magnetic stimulation, was shown to affect speech discrimination. On the automatic speech recognition (ASR side, the recognition systems have classically relied solely on acoustic data, achieving rather good performance in optimal listening conditions. The main limitations of current ASR are mainly evident in the realistic use of such systems. These limitations can be partly reduced by using normalization strategies that minimize inter-speaker variability by either explicitly removing speakers’ peculiarities or adapting different speakers to a reference model. In this paper we aim at modeling a motor-based imitation learning mechanism in ASR. We tested the utility of a speaker normalization strategy that uses motor representations of speech and compare it with strategies that ignore the motor domain. Specifically, we first trained a regressor through state-of-the-art machine learning techniques to build an auditory-motor mapping, in a sense mimicking a human learner that tries to reproduce utterances produced by other speakers. This auditory-motor mapping maps the speech acoustics of a speaker into the motor plans of a reference speaker. Since, during recognition, only speech acoustics are available, the mapping is necessary to recover motor information. Subsequently, in a phone classification task, we tested the system on either one of the speakers that was used during training or a new one. Results show that in both cases the motor-based speaker normalization strategy almost always outperforms all other strategies where only acoustics is taken into account.

  15. Speech recognition with amplitude and frequency modulations

    Zeng, Fan-Gang; Nie, Kaibao; Stickney, Ginger S.; Kong, Ying-Yee; Vongphoe, Michael; Bhargave, Ashish; Wei, Chaogang; Cao, Keli

    2005-02-01

    Amplitude modulation (AM) and frequency modulation (FM) are commonly used in communication, but their relative contributions to speech recognition have not been fully explored. To bridge this gap, we derived slowly varying AM and FM from speech sounds and conducted listening tests using stimuli with different modulations in normal-hearing and cochlear-implant subjects. We found that although AM from a limited number of spectral bands may be sufficient for speech recognition in quiet, FM significantly enhances speech recognition in noise, as well as speaker and tone recognition. Additional speech reception threshold measures revealed that FM is particularly critical for speech recognition with a competing voice and is independent of spectral resolution and similarity. These results suggest that AM and FM provide independent yet complementary contributions to support robust speech recognition under realistic listening situations. Encoding FM may improve auditory scene analysis, cochlear-implant, and audiocoding performance. auditory analysis | cochlear implant | neural code | phase | scene analysis

  16. Speech motor learning in profoundly deaf adults.

    Nasir, Sazzad M; Ostry, David J

    2008-10-01

    Speech production, like other sensorimotor behaviors, relies on multiple sensory inputs--audition, proprioceptive inputs from muscle spindles and cutaneous inputs from mechanoreceptors in the skin and soft tissues of the vocal tract. However, the capacity for intelligible speech by deaf speakers suggests that somatosensory input alone may contribute to speech motor control and perhaps even to speech learning. We assessed speech motor learning in cochlear implant recipients who were tested with their implants turned off. A robotic device was used to alter somatosensory feedback by displacing the jaw during speech. We found that implant subjects progressively adapted to the mechanical perturbation with training. Moreover, the corrections that we observed were for movement deviations that were exceedingly small, on the order of millimeters, indicating that speakers have precise somatosensory expectations. Speech motor learning is substantially dependent on somatosensory input.

  17. Speech Enhancement with Natural Sounding Residual Noise Based on Connected Time-Frequency Speech Presence Regions

    Sørensen Karsten Vandborg

    2005-01-01

    Full Text Available We propose time-frequency domain methods for noise estimation and speech enhancement. A speech presence detection method is used to find connected time-frequency regions of speech presence. These regions are used by a noise estimation method and both the speech presence decisions and the noise estimate are used in the speech enhancement method. Different attenuation rules are applied to regions with and without speech presence to achieve enhanced speech with natural sounding attenuated background noise. The proposed speech enhancement method has a computational complexity, which makes it feasible for application in hearing aids. An informal listening test shows that the proposed speech enhancement method has significantly higher mean opinion scores than minimum mean-square error log-spectral amplitude (MMSE-LSA and decision-directed MMSE-LSA.

  18. Automatic basal slice detection for cardiac analysis

    Paknezhad, Mahsa; Marchesseau, Stephanie; Brown, Michael S.

    2016-03-01

    Identification of the basal slice in cardiac imaging is a key step to measuring the ejection fraction (EF) of the left ventricle (LV). Despite research on cardiac segmentation, basal slice identification is routinely performed manually. Manual identification, however, has been shown to have high inter-observer variability, with a variation of the EF by up to 8%. Therefore, an automatic way of identifying the basal slice is still required. Prior published methods operate by automatically tracking the mitral valve points from the long-axis view of the LV. These approaches assumed that the basal slice is the first short-axis slice below the mitral valve. However, guidelines published in 2013 by the society for cardiovascular magnetic resonance indicate that the basal slice is the uppermost short-axis slice with more than 50% myocardium surrounding the blood cavity. Consequently, these existing methods are at times identifying the incorrect short-axis slice. Correct identification of the basal slice under these guidelines is challenging due to the poor image quality and blood movement during image acquisition. This paper proposes an automatic tool that focuses on the two-chamber slice to find the basal slice. To this end, an active shape model is trained to automatically segment the two-chamber view for 51 samples using the leave-one-out strategy. The basal slice was detected using temporal binary profiles created for each short-axis slice from the segmented two-chamber slice. From the 51 successfully tested samples, 92% and 84% of detection results were accurate at the end-systolic and the end-diastolic phases of the cardiac cycle, respectively.

  19. Speech Evaluation with Special Focus on Children Suffering from Apraxia of Speech

    Manasi Dixit

    2013-07-01

    Full Text Available Speech disorders are very complicated in individuals suffering from Apraxia of Speech-AOS. In this paper ,the pathological cases of speech disabled children affected with AOS are analyzed. The speech signalsamples of childrenSpeech disorders are very complicated in individuals suffering from Apraxia of Speech-AOS. In this paper ,the pathological cases of speech disabled children affected with AOS are analyzed. The speech signalsamples of children of age between three to eight years are considered for the present study. These speechsignals are digitized and enhanced using the using the Speech Pause Index, Jitter,Skew ,Kurtosis analysisThis analysis is conducted on speech data samples which are concerned with both place of articulation andmanner of articulation. The speech disability of pathological subjects was estimated using results of aboveanalysis. of age between three to eight years are considered for the present study. These speechsignals are digitized and enhanced using the using the Speech Pause Index, Jitter,Skew ,Kurtosis analysisThis analysis is conducted on speech data samples which are concerned with both place of articulation andmanner of articulation. The speech disability of pathological subjects was estimated using results of aboveanalysis.

  20. Comparison of automatic control systems

    Oppelt, W

    1941-01-01

    This report deals with a reciprocal comparison of an automatic pressure control, an automatic rpm control, an automatic temperature control, and an automatic directional control. It shows the difference between the "faultproof" regulator and the actual regulator which is subject to faults, and develops this difference as far as possible in a parallel manner with regard to the control systems under consideration. Such as analysis affords, particularly in its extension to the faults of the actual regulator, a deep insight into the mechanism of the regulator process.