WorldWideScience

Sample records for accurate automatic speech

  1. Thai Automatic Speech Recognition

    National Research Council Canada - National Science Library

    Suebvisai, Sinaporn; Charoenpornsawat, Paisarn; Black, Alan; Woszczyna, Monika; Schultz, Tanja

    2005-01-01

    .... We focus on the discussion of the rapid deployment of ASR for Thai under limited time and data resources, including rapid data collection issues, acoustic model bootstrap, and automatic generation of pronunciations...

  2. Automatic speech recognition (ASR) based approach for speech therapy of aphasic patients: A review

    Science.gov (United States)

    Jamal, Norezmi; Shanta, Shahnoor; Mahmud, Farhanahani; Sha'abani, MNAH

    2017-09-01

    This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system's library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.

  3. Predicting automatic speech recognition performance over communication channels from instrumental speech quality and intelligibility scores

    NARCIS (Netherlands)

    Gallardo, L.F.; Möller, S.; Beerends, J.

    2017-01-01

    The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility

  4. Automatic Smoker Detection from Telephone Speech Signals

    DEFF Research Database (Denmark)

    Poorjam, Amir Hossein; Hesaraki, Soheila; Safavi, Saeid

    2017-01-01

    This paper proposes an automatic smoking habit detection from spontaneous telephone speech signals. In this method, each utterance is modeled using i-vector and non-negative factor analysis (NFA) frameworks, which yield low-dimensional representation of utterances by applying factor analysis...... method is evaluated on telephone speech signals of speakers whose smoking habits are known drawn from the National Institute of Standards and Technology (NIST) 2008 and 2010 Speaker Recognition Evaluation databases. Experimental results over 1194 utterances show the effectiveness of the proposed approach...... for the automatic smoking habit detection task....

  5. Personality in speech assessment and automatic classification

    CERN Document Server

    Polzehl, Tim

    2015-01-01

    This work combines interdisciplinary knowledge and experience from research fields of psychology, linguistics, audio-processing, machine learning, and computer science. The work systematically explores a novel research topic devoted to automated modeling of personality expression from speech. For this aim, it introduces a novel personality assessment questionnaire and presents the results of extensive labeling sessions to annotate the speech data with personality assessments. It provides estimates of the Big 5 personality traits, i.e. openness, conscientiousness, extroversion, agreeableness, and neuroticism. Based on a database built on the questionnaire, the book presents models to tell apart different personality types or classes from speech automatically.

  6. Hidden Markov models in automatic speech recognition

    Science.gov (United States)

    Wrzoskowicz, Adam

    1993-11-01

    This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.

  7. Development of a System for Automatic Recognition of Speech

    Directory of Open Access Journals (Sweden)

    Roman Jarina

    2003-01-01

    Full Text Available The article gives a review of a research on processing and automatic recognition of speech signals (ARR at the Department of Telecommunications of the Faculty of Electrical Engineering, University of iilina. On-going research is oriented to speech parametrization using 2-dimensional cepstral analysis, and to an application of HMMs and neural networks for speech recognition in Slovak language. The article summarizes achieved results and outlines future orientation of our research in automatic speech recognition.

  8. Indonesian Automatic Speech Recognition For Command Speech Controller Multimedia Player

    Directory of Open Access Journals (Sweden)

    Vivien Arief Wardhany

    2014-12-01

    Full Text Available The purpose of multimedia devices development is controlling through voice. Nowdays voice that can be recognized only in English. To overcome the issue, then recognition using Indonesian language model and accousticc model and dictionary. Automatic Speech Recognizier is build using engine CMU Sphinx with modified english language to Indonesian Language database and XBMC used as the multimedia player. The experiment is using 10 volunteers testing items based on 7 commands. The volunteers is classifiedd by the genders, 5 Male & 5 female. 10 samples is taken in each command, continue with each volunteer perform 10 testing command. Each volunteer also have to try all 7 command that already provided. Based on percentage clarification table, the word “Kanan” had the most recognize with percentage 83% while “pilih” is the lowest one. The word which had the most wrong clarification is “kembali” with percentagee 67%, while the word “kanan” is the lowest one. From the result of Recognition Rate by male there are several command such as “Kembali”, “Utama”, “Atas “ and “Bawah” has the low Recognition Rate. Especially for “kembali” cannot be recognized as the command in the female voices but in male voice that command has 4% of RR this is because the command doesn’t have similar word in english near to “kembali” so the system unrecognize the command. Also for the command “Pilih” using the female voice has 80% of RR but for the male voice has only 4% of RR. This problem is mostly because of the different voice characteristic between adult male and female which male has lower voice frequencies (from 85 to 180 Hz than woman (165 to 255 Hz.The result of the experiment showed that each man had different number of recognition rate caused by the difference tone, pronunciation, and speed of speech. For further work needs to be done in order to improving the accouracy of the Indonesian Automatic Speech Recognition system

  9. Automatic Speech Recognition from Neural Signals: A Focused Review

    Directory of Open Access Journals (Sweden)

    Christian Herff

    2016-09-01

    Full Text Available Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e.~patients suffering from locked-in syndrome. For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people.This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography. As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the emph{Brain-to-text} system.

  10. Automatic Phonetic Transcription for Danish Speech Recognition

    DEFF Research Database (Denmark)

    Kirkedal, Andreas Søeborg

    , like Danish, the graphemic and phonetic representations are very dissimilar and more complex rewriting rules must be applied to create the correct phonetic representation. Automatic phonetic transcribers use different strategies, from deep analysis to shallow rewriting rules, to produce phonetic......, syllabication, stød and several other suprasegmental features (Kirkedal, 2013). Simplifying the transcriptions by filtering out the symbols for suprasegmental features in a post-processing step produces a format that is suitable for ASR purposes. eSpeak is an open source speech synthesizer originally created...... for particular words and word classes in addition. In comparison, English has 5,852 spelling-tophoneme rules and 4,133 additional rules and 8,278 rules and 3,829 additional rules. Phonix applies deep morphological analysis as a preprocessing step. Should the analysis fail, several fallback strategies...

  11. Automatic speech recognition for radiological reporting

    International Nuclear Information System (INIS)

    Vidal, B.

    1991-01-01

    Large vocabulary speech recognition, its techniques and its software and hardware technology, are being developed, aimed at providing the office user with a tool that could significantly improve both quantity and quality of his work: the dictation machine, which allows memos and documents to be input using voice and a microphone instead of fingers and a keyboard. The IBM Rome Science Center, together with the IBM Research Division, has built a prototype recognizer that accepts sentences in natural language from 20.000-word Italian vocabulary. The unit runs on a personal computer equipped with a special hardware capable of giving all the necessary computing power. The first laboratory experiments yielded very interesting results and pointed out such system characteristics to make its use possible in operational environments. To this purpose, the dictation of medical reports was considered as a suitable application. In cooperation with the 2nd Radiology Department of S. Maria della Misericordia Hospital (Udine, Italy), a system was experimented by radiology department doctors during their everyday work. The doctors were able to directly dictate their reports to the unit. The text appeared immediately on the screen, and eventual errors could be corrected either by voice or by using the keyboard. At the end of report dictation, the doctors could both print and archive the text. The report could also be forwarded to hospital information system, when the latter was available. Our results have been very encouraging: the system proved to be robust, simple to use, and accurate (over 95% average recognition rate). The experiment was precious for suggestion and comments, and its results are useful for system evolution towards improved system management and efficency

  12. Analysis of Phonetic Transcriptions for Danish Automatic Speech Recognition

    DEFF Research Database (Denmark)

    Kirkedal, Andreas Søeborg

    2013-01-01

    Automatic speech recognition (ASR) relies on three resources: audio, orthographic transcriptions and a pronunciation dictionary. The dictionary or lexicon maps orthographic words to sequences of phones or phonemes that represent the pronunciation of the corresponding word. The quality of a speech....... The analysis indicates that transcribing e.g. stress or vowel duration has a negative impact on performance. The best performance is obtained with coarse phonetic annotation and improves performance 1% word error rate and 3.8% sentence error rate....

  13. Recent advances in Automatic Speech Recognition for Vietnamese

    OpenAIRE

    Le , Viet-Bac; Besacier , Laurent; Seng , Sopheap; Bigi , Brigitte; Do , Thi-Ngoc-Diep

    2008-01-01

    International audience; This paper presents our recent activities for automatic speech recognition for Vietnamese. First, our text data collection and processing methods and tools are described. For language modeling, we investigate word, sub-word and also hybrid word/sub-word models. For acoustic modeling, when only limited speech data are available for Vietnamese, we propose some crosslingual acoustic modeling techniques. Furthermore, since the use of sub-word units can reduce the high out-...

  14. Automatic discrimination between laughter and speech

    NARCIS (Netherlands)

    Truong, K.; Leeuwen, D. van

    2007-01-01

    Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the

  15. Automatic speech signal segmentation based on the innovation adaptive filter

    Directory of Open Access Journals (Sweden)

    Makowski Ryszard

    2014-06-01

    Full Text Available Speech segmentation is an essential stage in designing automatic speech recognition systems and one can find several algorithms proposed in the literature. It is a difficult problem, as speech is immensely variable. The aim of the authors’ studies was to design an algorithm that could be employed at the stage of automatic speech recognition. This would make it possible to avoid some problems related to speech signal parametrization. Posing the problem in such a way requires the algorithm to be capable of working in real time. The only such algorithm was proposed by Tyagi et al., (2006, and it is a modified version of Brandt’s algorithm. The article presents a new algorithm for unsupervised automatic speech signal segmentation. It performs segmentation without access to information about the phonetic content of the utterances, relying exclusively on second-order statistics of a speech signal. The starting point for the proposed method is time-varying Schur coefficients of an innovation adaptive filter. The Schur algorithm is known to be fast, precise, stable and capable of rapidly tracking changes in second order signal statistics. A transfer from one phoneme to another in the speech signal always indicates a change in signal statistics caused by vocal track changes. In order to allow for the properties of human hearing, detection of inter-phoneme boundaries is performed based on statistics defined on the mel spectrum determined from the reflection coefficients. The paper presents the structure of the algorithm, defines its properties, lists parameter values, describes detection efficiency results, and compares them with those for another algorithm. The obtained segmentation results, are satisfactory.

  16. Automatic Emotion Recognition in Speech: Possibilities and Significance

    Directory of Open Access Journals (Sweden)

    Milana Bojanić

    2009-12-01

    Full Text Available Automatic speech recognition and spoken language understanding are crucial steps towards a natural humanmachine interaction. The main task of the speech communication process is the recognition of the word sequence, but the recognition of prosody, emotion and stress tags may be of particular importance as well. This paper discusses thepossibilities of recognition emotion from speech signal in order to improve ASR, and also provides the analysis of acoustic features that can be used for the detection of speaker’s emotion and stress. The paper also provides a short overview of emotion and stress classification techniques. The importance and place of emotional speech recognition is shown in the domain of human-computer interactive systems and transaction communication model. The directions for future work are given at the end of this work.

  17. Accurate automatic tuning circuit for bipolar integrated filters

    NARCIS (Netherlands)

    de Heij, Wim J.A.; de Heij, W.J.A.; Hoen, Klaas; Hoen, Klaas; Seevinck, Evert; Seevinck, E.

    1990-01-01

    An accurate automatic tuning circuit for tuning the cutoff frequency and Q-factor of high-frequency bipolar filters is presented. The circuit is based on a voltage controlled quadrature oscillator (VCO). The frequency and the RMS (root mean square) amplitude of the oscillator output signal are

  18. Leveraging Automatic Speech Recognition Errors to Detect Challenging Speech Segments in TED Talks

    Science.gov (United States)

    Mirzaei, Maryam Sadat; Meshgi, Kourosh; Kawahara, Tatsuya

    2016-01-01

    This study investigates the use of Automatic Speech Recognition (ASR) systems to epitomize second language (L2) listeners' problems in perception of TED talks. ASR-generated transcripts of videos often involve recognition errors, which may indicate difficult segments for L2 listeners. This paper aims to discover the root-causes of the ASR errors…

  19. Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders

    CERN Document Server

    Baghai-Ravary, Ladan

    2013-01-01

    Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders provides a survey of methods designed to aid clinicians in the diagnosis and monitoring of speech disorders such as dysarthria and dyspraxia, with an emphasis on the signal processing techniques, statistical validity of the results presented in the literature, and the appropriateness of methods that do not require specialized equipment, rigorously controlled recording procedures or highly skilled personnel to interpret results. Such techniques offer the promise of a simple and cost-effective, yet objective, assessment of a range of medical conditions, which would be of great value to clinicians. The ideal scenario would begin with the collection of examples of the clients’ speech, either over the phone or using portable recording devices operated by non-specialist nursing staff. The recordings could then be analyzed initially to aid diagnosis of conditions, and subsequently to monitor the clients’ progress and res...

  20. Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

    Science.gov (United States)

    Ye, Sherry

    2015-01-01

    NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.

  1. Experiments on Automatic Recognition of Nonnative Arabic Speech

    Directory of Open Access Journals (Sweden)

    Douglas O'Shaughnessy

    2008-05-01

    Full Text Available The automatic recognition of foreign-accented Arabic speech is a challenging task since it involves a large number of nonnative accents. As well, the nonnative speech data available for training are generally insufficient. Moreover, as compared to other languages, the Arabic language has sparked a relatively small number of research efforts. In this paper, we are concerned with the problem of nonnative speech in a speaker independent, large-vocabulary speech recognition system for modern standard Arabic (MSA. We analyze some major differences at the phonetic level in order to determine which phonemes have a significant part in the recognition performance for both native and nonnative speakers. Special attention is given to specific Arabic phonemes. The performance of an HMM-based Arabic speech recognition system is analyzed with respect to speaker gender and its native origin. The WestPoint modern standard Arabic database from the language data consortium (LDC and the hidden Markov Model Toolkit (HTK are used throughout all experiments. Our study shows that the best performance in the overall phoneme recognition is obtained when nonnative speakers are involved in both training and testing phases. This is not the case when a language model and phonetic lattice networks are incorporated in the system. At the phonetic level, the results show that female nonnative speakers perform better than nonnative male speakers, and that emphatic phonemes yield a significant decrease in performance when they are uttered by both male and female nonnative speakers.

  2. Experiments on Automatic Recognition of Nonnative Arabic Speech

    Directory of Open Access Journals (Sweden)

    Selouani Sid-Ahmed

    2008-01-01

    Full Text Available The automatic recognition of foreign-accented Arabic speech is a challenging task since it involves a large number of nonnative accents. As well, the nonnative speech data available for training are generally insufficient. Moreover, as compared to other languages, the Arabic language has sparked a relatively small number of research efforts. In this paper, we are concerned with the problem of nonnative speech in a speaker independent, large-vocabulary speech recognition system for modern standard Arabic (MSA. We analyze some major differences at the phonetic level in order to determine which phonemes have a significant part in the recognition performance for both native and nonnative speakers. Special attention is given to specific Arabic phonemes. The performance of an HMM-based Arabic speech recognition system is analyzed with respect to speaker gender and its native origin. The WestPoint modern standard Arabic database from the language data consortium (LDC and the hidden Markov Model Toolkit (HTK are used throughout all experiments. Our study shows that the best performance in the overall phoneme recognition is obtained when nonnative speakers are involved in both training and testing phases. This is not the case when a language model and phonetic lattice networks are incorporated in the system. At the phonetic level, the results show that female nonnative speakers perform better than nonnative male speakers, and that emphatic phonemes yield a significant decrease in performance when they are uttered by both male and female nonnative speakers.

  3. The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system

    NARCIS (Netherlands)

    Zekveld, A.A.; Kramer, S.E.; Kessens, J.M.; Vlaming, M.S.M.G.; Houtgast, T.

    2009-01-01

    Objective: The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that

  4. The benefit obtained from visually displayed text from an automatic speech recognizer during listening to speech presented in noise

    NARCIS (Netherlands)

    Zekveld, A.A.; Kramer, S.E.; Kessens, J.M.; Vlaming, M.S.M.G.; Houtgast, T.

    2008-01-01

    OBJECTIVES: The aim of this study was to evaluate the benefit that listeners obtain from visually presented output from an automatic speech recognition (ASR) system during listening to speech in noise. DESIGN: Auditory-alone and audiovisual speech reception thresholds (SRTs) were measured. The SRT

  5. Automatic Speech Recognition Systems for the Evaluation of Voice and Speech Disorders in Head and Neck Cancer

    OpenAIRE

    Andreas Maier; Tino Haderlein; Florian Stelzle; Elmar Nöth; Emeka Nkenke; Frank Rosanowski; Anne Schützenberger; Maria Schuster

    2010-01-01

    In patients suffering from head and neck cancer, speech intelligibility is often restricted. For assessment and outcome measurements, automatic speech recognition systems have previously been shown to be appropriate for objective and quick evaluation of intelligibility. In this study we investigate the applicability of the method to speech disorders caused by head and neck cancer. Intelligibility was quantified by speech recognition on recordings of a standard text read by 41 German laryngect...

  6. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    Science.gov (United States)

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  7. The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system.

    Science.gov (United States)

    Zekveld, Adriana A; Kramer, Sophia E; Kessens, Judith M; Vlaming, Marcel S M G; Houtgast, Tammo

    2009-04-01

    The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that speech comprehension in noise by young listeners with normal hearing improves when presenting partly incorrect, automatically generated subtitles. The current study focused on the effects of age, hearing loss, visual working memory capacity, and linguistic skills on the benefit obtained from automatically generated subtitles during listening to speech in noise. In order to investigate the effects of age and hearing loss, three groups of participants were included: 22 young persons with normal hearing (YNH, mean age = 21 years), 22 middle-aged adults with normal hearing (MA-NH, mean age = 55 years) and 30 middle-aged adults with hearing impairment (MA-HI, mean age = 57 years). The benefit from automatic subtitling was measured by Speech Reception Threshold (SRT) tests (Plomp & Mimpen, 1979). Both unimodal auditory and bimodal audiovisual SRT tests were performed. In the audiovisual tests, the subtitles were presented simultaneously with the speech, whereas in the auditory test, only speech was presented. The difference between the auditory and audiovisual SRT was defined as the audiovisual benefit. Participants additionally rated the listening effort. We examined the influences of ASR accuracy level and text delay on the audiovisual benefit and the listening effort using a repeated measures General Linear Model analysis. In a correlation analysis, we evaluated the relationships between age, auditory SRT, visual working memory capacity and the audiovisual benefit and listening effort. The automatically generated subtitles improved speech comprehension in noise for all ASR accuracies and delays covered by the current study. Higher ASR accuracy levels resulted in more benefit obtained

  8. Automatic Speech Recognition Systems for the Evaluation of Voice and Speech Disorders in Head and Neck Cancer

    Directory of Open Access Journals (Sweden)

    Andreas Maier

    2010-01-01

    Full Text Available In patients suffering from head and neck cancer, speech intelligibility is often restricted. For assessment and outcome measurements, automatic speech recognition systems have previously been shown to be appropriate for objective and quick evaluation of intelligibility. In this study we investigate the applicability of the method to speech disorders caused by head and neck cancer. Intelligibility was quantified by speech recognition on recordings of a standard text read by 41 German laryngectomized patients with cancer of the larynx or hypopharynx and 49 German patients who had suffered from oral cancer. The speech recognition provides the percentage of correctly recognized words of a sequence, that is, the word recognition rate. Automatic evaluation was compared to perceptual ratings by a panel of experts and to an age-matched control group. Both patient groups showed significantly lower word recognition rates than the control group. Automatic speech recognition yielded word recognition rates which complied with experts' evaluation of intelligibility on a significant level. Automatic speech recognition serves as a good means with low effort to objectify and quantify the most important aspect of pathologic speech—the intelligibility. The system was successfully applied to voice and speech disorders.

  9. Automatic speech recognition for report generation in computed tomography

    International Nuclear Information System (INIS)

    Teichgraeber, U.K.M.; Ehrenstein, T.; Lemke, M.; Liebig, T.; Stobbe, H.; Hosten, N.; Keske, U.; Felix, R.

    1999-01-01

    Purpose: A study was performed to compare the performance of automatic speech recognition (ASR) with conventional transcription. Materials and Methods: 100 CT reports were generated by using ASR and 100 CT reports were dictated and written by medical transcriptionists. The time for dictation and correction of errors by the radiologist was assessed and the type of mistakes was analysed. The text recognition rate was calculated in both groups and the average time between completion of the imaging study by the technologist and generation of the written report was assessed. A commercially available speech recognition technology (ASKA Software, IBM Via Voice) running of a personal computer was used. Results: The time for the dictation using digital voice recognition was 9.4±2.3 min compared to 4.5±3.6 min with an ordinary Dictaphone. The text recognition rate was 97% with digital voice recognition and 99% with medical transcriptionists. The average time from imaging completion to written report finalisation was reduced from 47.3 hours with medical transcriptionists to 12.7 hours with ASR. The analysis of misspellings demonstrated (ASR vs. medical transcriptionists): 3 vs. 4 for syntax errors, 0 vs. 37 orthographic mistakes, 16 vs. 22 mistakes in substance and 47 vs. erroneously applied terms. Conclusions: The use of digital voice recognition as a replacement for medical transcription is recommendable when an immediate availability of written reports is necessary. (orig.) [de

  10. Man-system interface based on automatic speech recognition: integration to a virtual control desk

    Energy Technology Data Exchange (ETDEWEB)

    Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C., E-mail: calexandre@ien.gov.b, E-mail: mol@ien.gov.b, E-mail: cmnap@ien.gov.b, E-mail: mag@ien.gov.b [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil); Nomiya, Diogo V., E-mail: diogonomiya@gmail.co [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil)

    2009-07-01

    This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)

  11. Man-system interface based on automatic speech recognition: integration to a virtual control desk

    International Nuclear Information System (INIS)

    Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C.; Nomiya, Diogo V.

    2009-01-01

    This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)

  12. Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems.

    Science.gov (United States)

    Greene, Beth G; Logan, John S; Pisoni, David B

    1986-03-01

    We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.

  13. Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems

    Science.gov (United States)

    GREENE, BETH G.; LOGAN, JOHN S.; PISONI, DAVID B.

    2012-01-01

    We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered. PMID:23225916

  14. Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition

    NARCIS (Netherlands)

    Huijbregts, M.A.H.; Ordelman, Roeland J.F.; de Jong, Franciska M.G.

    2007-01-01

    This paper reports on the setup and evaluation of robust speech recognition system parts, geared towards transcript generation for heterogeneous, real-life media collections. The system is deployed for generating speech transcripts for the NIST/TRECVID-2007 test collection, part of a Dutch real-life

  15. The Effects of Background Noise on the Performance of an Automatic Speech Recogniser

    Science.gov (United States)

    Littlefield, Jason; HashemiSakhtsari, Ahmad

    2002-11-01

    Ambient or environmental noise is a major factor that affects the performance of an automatic speech recognizer. Large vocabulary, speaker-dependent, continuous speech recognizers are commercially available. Speech recognizers, perform well in a quiet environment, but poorly in a noisy environment. Speaker-dependent speech recognizers require training prior to them being tested, where the level of background noise in both phases affects the performance of the recognizer. This study aims to determine whether the best performance of a speech recognizer occurs when the levels of background noise during the training and test phases are the same, and how the performance is affected when the levels of background noise during the training and test phases are different. The relationship between the performance of the speech recognizer and upgrading the computer speed and amount of memory as well as software version was also investigated.

  16. Automatic initial and final segmentation in cleft palate speech of Mandarin speakers.

    Directory of Open Access Journals (Sweden)

    Ling He

    Full Text Available The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with "quasi-unvoiced" or with "quasi-voiced" initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the

  17. Automatic transcription of continuous speech into syllable-like units ...

    Indian Academy of Sciences (India)

    style HMM models are generated for each of the clusters during training. During testing .... manual segmentation at syllable-like units followed by isolated style recognition of continu- ous speech ..... obtaining demisyllabic reference patterns.

  18. Practising verbal maritime communication with computer dialogue systems using automatic speech recognition (My Practice session)

    OpenAIRE

    John, Peter; Wellmann, J.; Appell, J.E.

    2016-01-01

    This My Practice session presents a novel online tool for practising verbal communication in a maritime setting. It is based on low-fi ChatBot simulation exercises which employ computer-based dialogue systems. The ChatBot exercises are equipped with an automatic speech recognition engine specifically designed for maritime communication. The speech input and output functionality enables learners to communicate with the computer freely and spontaneously. The exercises replicate real communicati...

  19. Fusing Eye-gaze and Speech Recognition for Tracking in an Automatic Reading Tutor

    DEFF Research Database (Denmark)

    Rasmussen, Morten Højfeldt; Tan, Zheng-Hua

    2013-01-01

    In this paper we present a novel approach for automatically tracking the reading progress using a combination of eye-gaze tracking and speech recognition. The two are fused by first generating word probabilities based on eye-gaze information and then using these probabilities to augment the langu......In this paper we present a novel approach for automatically tracking the reading progress using a combination of eye-gaze tracking and speech recognition. The two are fused by first generating word probabilities based on eye-gaze information and then using these probabilities to augment...

  20. Preliminary Analysis of Automatic Speech Recognition and Synthesis Technology.

    Science.gov (United States)

    1983-05-01

    ANDELES CA 0 SHDAP ET AL MAY 93 UNCISSIFED UCG -020-8 MDA04-8’-C-415F/ 17/2 N mE = h IEEE 11111 10’ ~ 2.0 11-41 & 11111I25IID MICROCOPY RESOLUTION TEST...speech. Private industry, which sees a major market for improved speech recognition systems, is attempting to solve the problems involved in...manufacturer is able to market such a recognition system. A second requirement for the spotting of keywords in distress signals concerns the need for a

  1. Automatic speech recognition used for evaluation of text-to-speech systems

    Czech Academy of Sciences Publication Activity Database

    Vích, Robert; Nouza, J.; Vondra, Martin

    -, č. 5042 (2008), s. 136-148 ISSN 0302-9743 R&D Projects: GA AV ČR 1ET301710509; GA AV ČR 1QS108040569 Institutional research plan: CEZ:AV0Z20670512 Keywords : speech recognition * speech processing Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering

  2. Multilingual Techniques for Low Resource Automatic Speech Recognition

    Science.gov (United States)

    2016-05-20

    linguistic and ASR expertise, and Regina, bringing in another point of view from the NLP side, really help shape the direction of some of the work in...improved keyword spotting. In Proc. ASRU, 2013. 41 [43] K. Kirchhoff and D. Vergyri. Cross-dialectal data sharing for acoustic modeling in Arabic speech

  3. Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

    NARCIS (Netherlands)

    Ahmadi, S.; Ahadi, S.M.; Cranen, B.; Boves, L.W.J.

    2014-01-01

    The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equivalent of a sequence of short-time power spectra, mainly to simplify the computation of the

  4. AUTOMATIC SPEECH RECOGNITION SYSTEM CONCERNING THE MOROCCAN DIALECTE (Darija and Tamazight)

    OpenAIRE

    A. EL GHAZI; C. DAOUI; N. IDRISSI

    2012-01-01

    In this work we present an automatic speech recognition system for Moroccan dialect mainly: Darija (Arab dialect) and Tamazight. Many approaches have been used to model the Arabic and Tamazightphonetic units. In this paper, we propose to use the hidden Markov model (HMM) for modeling these phoneticunits. Experimental results show that the proposed approach further improves the recognition.

  5. The Automatic Annotation of the Semiotic Type of Hand Gestures in Obama’s Humorous Speeches

    DEFF Research Database (Denmark)

    Navarretta, Costanza

    2018-01-01

    is expressed by speech or by adding new information to what is uttered. The automatic classification of the semiotic type of gestures from their shape description can contribute to their interpretation in human-human communication and in advanced multimodal interactive systems. We annotated and analysed hand...

  6. Evaluation of missing data techniques for in-car automatic speech recognition

    OpenAIRE

    Wang, Y.; Vuerinckx, R.; Gemmeke, J.F.; Cranen, B.; Hamme, H. Van

    2009-01-01

    Wang Y., Vuerinckx R., Gemmeke J., Cranen B., Van hamme H., ''Evaluation of missing data techniques for in-car automatic speech recognition'', Proceedings NAG/DAGA 2009 - international conference on acoustics, 4 pp., March 23-26, 2009, Rotterdam, The Netherlands.

  7. Accelerometer-based automatic voice onset detection in speech mapping with navigated repetitive transcranial magnetic stimulation.

    Science.gov (United States)

    Vitikainen, Anne-Mari; Mäkelä, Elina; Lioumis, Pantelis; Jousmäki, Veikko; Mäkelä, Jyrki P

    2015-09-30

    The use of navigated repetitive transcranial magnetic stimulation (rTMS) in mapping of speech-related brain areas has recently shown to be useful in preoperative workflow of epilepsy and tumor patients. However, substantial inter- and intraobserver variability and non-optimal replicability of the rTMS results have been reported, and a need for additional development of the methodology is recognized. In TMS motor cortex mappings the evoked responses can be quantitatively monitored by electromyographic recordings; however, no such easily available setup exists for speech mappings. We present an accelerometer-based setup for detection of vocalization-related larynx vibrations combined with an automatic routine for voice onset detection for rTMS speech mapping applying naming. The results produced by the automatic routine were compared with the manually reviewed video-recordings. The new method was applied in the routine navigated rTMS speech mapping for 12 consecutive patients during preoperative workup for epilepsy or tumor surgery. The automatic routine correctly detected 96% of the voice onsets, resulting in 96% sensitivity and 71% specificity. Majority (63%) of the misdetections were related to visible throat movements, extra voices before the response, or delayed naming of the previous stimuli. The no-response errors were correctly detected in 88% of events. The proposed setup for automatic detection of voice onsets provides quantitative additional data for analysis of the rTMS-induced speech response modifications. The objectively defined speech response latencies increase the repeatability, reliability and stratification of the rTMS results. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. How Accurately Can the Google Web Speech API Recognize and Transcribe Japanese L2 English Learners' Oral Production?

    Science.gov (United States)

    Ashwell, Tim; Elam, Jesse R.

    2017-01-01

    The ultimate aim of our research project was to use the Google Web Speech API to automate scoring of elicited imitation (EI) tests. However, in order to achieve this goal, we had to take a number of preparatory steps. We needed to assess how accurate this speech recognition tool is in recognizing native speakers' production of the test items; we…

  9. Developing a broadband automatic speech recognition system for Afrikaans

    CSIR Research Space (South Africa)

    De Wet, Febe

    2011-08-01

    Full Text Available baseline transcription for the news data. The match between a baseline transcription and its corre- sponding audio can be evaluated automatically using an ASR system in forced alignment mode. Only those bulletins for which a bad match is indicated... Component Index for data [3]. occurrence of Afrikaans words3. Other text corpora that are currently under construction in- clude daily downloads of the scripts of news bulletins that are read on an Afrikaans radio station as well as transcripts of par...

  10. Development an Automatic Speech to Facial Animation Conversion for Improve Deaf Lives

    Directory of Open Access Journals (Sweden)

    S. Hamidreza Kasaei

    2011-05-01

    Full Text Available In this paper, we propose design and initial implementation of a robust system which can automatically translates voice into text and text to sign language animations. Sign Language
    Translation Systems could significantly improve deaf lives especially in communications, exchange of information and employment of machine for translation conversations from one language to another has. Therefore, considering these points, it seems necessary to study the speech recognition. Usually, the voice recognition algorithms address three major challenges. The first is extracting feature form speech and the second is when limited sound gallery are available for recognition, and the final challenge is to improve speaker dependent to speaker independent voice recognition. Extracting feature form speech is an important stage in our method. Different procedures are available for extracting feature form speech. One of the commonest of which used in speech
    recognition systems is Mel-Frequency Cepstral Coefficients (MFCCs. The algorithm starts with preprocessing and signal conditioning. Next extracting feature form speech using Cepstral coefficients will be done. Then the result of this process sends to segmentation part. Finally recognition part recognizes the words and then converting word recognized to facial animation. The project is still in progress and some new interesting methods are described in the current report.

  11. Accurate and precise determination of small quantity uranium by means of automatic potentiometric titration

    International Nuclear Information System (INIS)

    Liu Quanwei; Luo Zhongyan; Zhu Haiqiao; Wu Jizong

    2007-01-01

    For high radioactivity level of dissolved solution of spent fuel and the solution of uranium product, radioactive hazard must be considered and reduced as low as possible during accurate determination of uranium. In this work automatic potentiometric titration was applied and the sample only 10 mg of uranium contained was taken in order to reduce the harm of analyzer suffered from the radioactivity. RSD<0.06%, at the same time the result can be corrected for more reliable and accurate measurement. The determination method can effectively reduce the harm of analyzer suffered from the radioactivity, and meets the requirement of reliable accurate measurement of uranium. (authors)

  12. Automatic analysis of slips of the tongue: Insights into the cognitive architecture of speech production.

    Science.gov (United States)

    Goldrick, Matthew; Keshet, Joseph; Gustafson, Erin; Heller, Jordana; Needle, Jeremy

    2016-04-01

    Traces of the cognitive mechanisms underlying speaking can be found within subtle variations in how we pronounce sounds. While speech errors have traditionally been seen as categorical substitutions of one sound for another, acoustic/articulatory analyses show they partially reflect the intended sound. When "pig" is mispronounced as "big," the resulting /b/ sound differs from correct productions of "big," moving towards intended "pig"-revealing the role of graded sound representations in speech production. Investigating the origins of such phenomena requires detailed estimation of speech sound distributions; this has been hampered by reliance on subjective, labor-intensive manual annotation. Computational methods can address these issues by providing for objective, automatic measurements. We develop a novel high-precision computational approach, based on a set of machine learning algorithms, for measurement of elicited speech. The algorithms are trained on existing manually labeled data to detect and locate linguistically relevant acoustic properties with high accuracy. Our approach is robust, is designed to handle mis-productions, and overall matches the performance of expert coders. It allows us to analyze a very large dataset of speech errors (containing far more errors than the total in the existing literature), illuminating properties of speech sound distributions previously impossible to reliably observe. We argue that this provides novel evidence that two sources both contribute to deviations in speech errors: planning processes specifying the targets of articulation and articulatory processes specifying the motor movements that execute this plan. These findings illustrate how a much richer picture of speech provides an opportunity to gain novel insights into language processing. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Integrating Automatic Speech Recognition and Machine Translation for Better Translation Outputs

    DEFF Research Database (Denmark)

    Liyanapathirana, Jeevanthi

    translations, combining machine translation with computer assisted translation has drawn attention in current research. This combines two prospects: the opportunity of ensuring high quality translation along with a significant performance gain. Automatic Speech Recognition (ASR) is another important area......, which caters important functionalities in language processing and natural language understanding tasks. In this work we integrate automatic speech recognition and machine translation in parallel. We aim to avoid manual typing of possible translations as dictating the translation would take less time...... to the n-best list rescoring, we also use word graphs with the expectation of arriving at a tighter integration of ASR and MT models. Integration methods include constraining ASR models using language and translation models of MT, and vice versa. We currently develop and experiment different methods...

  14. Automatic feedback to promote safe walking and speech loudness control in persons with multiple disabilities: two single-case studies.

    Science.gov (United States)

    Lancioni, Giulio E; Singh, Nirbhay N; O'Reilly, Mark F; Green, Vanessa A; Alberti, Gloria; Boccasini, Adele; Smaldone, Angela; Oliva, Doretta; Bosco, Andrea

    2014-08-01

    Assessing automatic feedback technologies to promote safe travel and speech loudness control in two men with multiple disabilities, respectively. The men were involved in two single-case studies. In Study I, the technology involved a microprocessor, two photocells, and a verbal feedback device. The man received verbal alerting/feedback when the photocells spotted an obstacle in front of him. In Study II, the technology involved a sound-detecting unit connected to a throat and an airborne microphone, and to a vibration device. Vibration occurred when the man's speech loudness exceeded a preset level. The man included in Study I succeeded in using the automatic feedback in substitution of caregivers' alerting/feedback for safe travel. The man of Study II used the automatic feedback to successfully reduce his speech loudness. Automatic feedback can be highly effective in helping persons with multiple disabilities improve their travel and speech performance.

  15. [Repetitive phenomenona in the spontaneous speech of aphasic patients: perseveration, stereotypy, echolalia, automatism and recurring utterance].

    Science.gov (United States)

    Wallesch, C W; Brunner, R J; Seemüller, E

    1983-12-01

    Repetitive phenomena in spontaneous speech were investigated in 30 patients with chronic infarctions of the left hemisphere which included Broca's and/or Wernicke's area and/or the basal ganglia. Perseverations, stereotypies, and echolalias occurred with all types of brain lesions, automatisms and recurring utterances only with those patients, whose infarctions involved Wernicke's area and basal ganglia. These patients also showed more echolalic responses. The results are discussed in view of the role of the basal ganglia as motor program generators.

  16. Automatic emissive probe apparatus for accurate plasma and vacuum space potential measurements

    Science.gov (United States)

    Jianquan, LI; Wenqi, LU; Jun, XU; Fei, GAO; Younian, WANG

    2018-02-01

    We have developed an automatic emissive probe apparatus based on the improved inflection point method of the emissive probe for accurate measurements of both plasma potential and vacuum space potential. The apparatus consists of a computer controlled data acquisition card, a working circuit composed by a biasing unit and a heating unit, as well as an emissive probe. With the set parameters of the probe scanning bias, the probe heating current and the fitting range, the apparatus can automatically execute the improved inflection point method and give the measured result. The validity of the automatic emissive probe apparatus is demonstrated in a test measurement of vacuum potential distribution between two parallel plates, showing an excellent accuracy of 0.1 V. Plasma potential was also measured, exhibiting high efficiency and convenient use of the apparatus for space potential measurements.

  17. Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes.

    Science.gov (United States)

    Meyer, Bernd T; Brand, Thomas; Kollmeier, Birger

    2011-01-01

    The aim of this study is to quantify the gap between the recognition performance of human listeners and an automatic speech recognition (ASR) system with special focus on intrinsic variations of speech, such as speaking rate and effort, altered pitch, and the presence of dialect and accent. Second, it is investigated if the most common ASR features contain all information required to recognize speech in noisy environments by using resynthesized ASR features in listening experiments. For the phoneme recognition task, the ASR system achieved the human performance level only when the signal-to-noise ratio (SNR) was increased by 15 dB, which is an estimate for the human-machine gap in terms of the SNR. The major part of this gap is attributed to the feature extraction stage, since human listeners achieve comparable recognition scores when the SNR difference between unaltered and resynthesized utterances is 10 dB. Intrinsic variabilities result in strong increases of error rates, both in human speech recognition (HSR) and ASR (with a relative increase of up to 120%). An analysis of phoneme duration and recognition rates indicates that human listeners are better able to identify temporal cues than the machine at low SNRs, which suggests incorporating information about the temporal dynamics of speech into ASR systems.

  18. Deformable meshes for medical image segmentation accurate automatic segmentation of anatomical structures

    CERN Document Server

    Kainmueller, Dagmar

    2014-01-01

    ? Segmentation of anatomical structures in medical image data is an essential task in clinical practice. Dagmar Kainmueller introduces methods for accurate fully automatic segmentation of anatomical structures in 3D medical image data. The author's core methodological contribution is a novel deformation model that overcomes limitations of state-of-the-art Deformable Surface approaches, hence allowing for accurate segmentation of tip- and ridge-shaped features of anatomical structures. As for practical contributions, she proposes application-specific segmentation pipelines for a range of anatom

  19. Automatic evaluation of speech rhythm instability and acceleration in dysarthrias associated with basal ganglia dysfunction

    Directory of Open Access Journals (Sweden)

    Jan eRusz

    2015-07-01

    Full Text Available Speech rhythm abnormalities are commonly present in patients with different neurodegenerative disorders. These alterations are hypothesized to be a consequence of disruption to the basal ganglia circuitry involving dysfunction of motor planning, programming and execution, which can be detected by a syllable repetition paradigm. Therefore, the aim of the present study was to design a robust signal processing technique that allows the automatic detection of spectrally-distinctive nuclei of syllable vocalizations and to determine speech features that represent rhythm instability and acceleration. A further aim was to elucidate specific patterns of dysrhythmia across various neurodegenerative disorders that share disruption of basal ganglia function. Speech samples based on repetition of the syllable /pa/ at a self-determined steady pace were acquired from 109 subjects, including 22 with Parkinson's disease (PD, 11 progressive supranuclear palsy (PSP, 9 multiple system atrophy (MSA, 24 ephedrone-induced parkinsonism (EP, 20 Huntington's disease (HD, and 23 healthy controls. Subsequently, an algorithm for the automatic detection of syllables as well as features representing rhythm instability and rhythm acceleration were designed. The proposed detection algorithm was able to correctly identify syllables and remove erroneous detections due to excessive inspiration and nonspeech sounds with a very high accuracy of 99.6%. Instability of vocal pace performance was observed in PSP, MSA, EP and HD groups. Significantly increased pace acceleration was observed only in the PD group. Although not significant, a tendency for pace acceleration was observed also in the PSP and MSA groups. Our findings underline the crucial role of the basal ganglia in the execution and maintenance of automatic speech motor sequences. We envisage the current approach to become the first step towards the development of acoustic technologies allowing automated assessment of rhythm

  20. An exploration of the potential of Automatic Speech Recognition to assist and enable receptive communication in higher education

    Directory of Open Access Journals (Sweden)

    Mike Wald

    2006-12-01

    Full Text Available The potential use of Automatic Speech Recognition to assist receptive communication is explored. The opportunities and challenges that this technology presents students and staff to provide captioning of speech online or in classrooms for deaf or hard of hearing students and assist blind, visually impaired or dyslexic learners to read and search learning material more readily by augmenting synthetic speech with natural recorded real speech is also discussed and evaluated. The automatic provision of online lecture notes, synchronised with speech, enables staff and students to focus on learning and teaching issues, while also benefiting learners unable to attend the lecture or who find it difficult or impossible to take notes at the same time as listening, watching and thinking.

  1. Behavioral and electrophysiological evidence for early and automatic detection of phonological equivalence in variable speech inputs.

    Science.gov (United States)

    Kharlamov, Viktor; Campbell, Kenneth; Kazanina, Nina

    2011-11-01

    Speech sounds are not always perceived in accordance with their acoustic-phonetic content. For example, an early and automatic process of perceptual repair, which ensures conformity of speech inputs to the listener's native language phonology, applies to individual input segments that do not exist in the native inventory or to sound sequences that are illicit according to the native phonotactic restrictions on sound co-occurrences. The present study with Russian and Canadian English speakers shows that listeners may perceive phonetically distinct and licit sound sequences as equivalent when the native language system provides robust evidence for mapping multiple phonetic forms onto a single phonological representation. In Russian, due to an optional but productive t-deletion process that affects /stn/ clusters, the surface forms [sn] and [stn] may be phonologically equivalent and map to a single phonological form /stn/. In contrast, [sn] and [stn] clusters are usually phonologically distinct in (Canadian) English. Behavioral data from identification and discrimination tasks indicated that [sn] and [stn] clusters were more confusable for Russian than for English speakers. The EEG experiment employed an oddball paradigm with nonwords [asna] and [astna] used as the standard and deviant stimuli. A reliable mismatch negativity response was elicited approximately 100 msec postchange in the English group but not in the Russian group. These findings point to a perceptual repair mechanism that is engaged automatically at a prelexical level to ensure immediate encoding of speech inputs in phonological terms, which in turn enables efficient access to the meaning of a spoken utterance.

  2. Fidelity of Automatic Speech Processing for Adult and Child Talker Classifications.

    Directory of Open Access Journals (Sweden)

    Mark VanDam

    Full Text Available Automatic speech processing (ASP has recently been applied to very large datasets of naturalistically collected, daylong recordings of child speech via an audio recorder worn by young children. The system developed by the LENA Research Foundation analyzes children's speech for research and clinical purposes, with special focus on of identifying and tagging family speech dynamics and the at-home acoustic environment from the auditory perspective of the child. A primary issue for researchers, clinicians, and families using the Language ENvironment Analysis (LENA system is to what degree the segment labels are valid. This classification study evaluates the performance of the computer ASP output against 23 trained human judges who made about 53,000 judgements of classification of segments tagged by the LENA ASP. Results indicate performance consistent with modern ASP such as those using HMM methods, with acoustic characteristics of fundamental frequency and segment duration most important for both human and machine classifications. Results are likely to be important for interpreting and improving ASP output.

  3. Automatic generation of a subject-specific model for accurate markerless motion capture and biomechanical applications.

    Science.gov (United States)

    Corazza, Stefano; Gambaretto, Emiliano; Mündermann, Lars; Andriacchi, Thomas P

    2010-04-01

    A novel approach for the automatic generation of a subject-specific model consisting of morphological and joint location information is described. The aim is to address the need for efficient and accurate model generation for markerless motion capture (MMC) and biomechanical studies. The algorithm applied and expanded on previous work on human shapes space by embedding location information for ten joint centers in a subject-specific free-form surface. The optimal locations of joint centers in the 3-D mesh were learned through linear regression over a set of nine subjects whose joint centers were known. The model was shown to be sufficiently accurate for both kinematic (joint centers) and morphological (shape of the body) information to allow accurate tracking with MMC systems. The automatic model generation algorithm was applied to 3-D meshes of different quality and resolution such as laser scans and visual hulls. The complete method was tested using nine subjects of different gender, body mass index (BMI), age, and ethnicity. Experimental training error and cross-validation errors were 19 and 25 mm, respectively, on average over the joints of the ten subjects analyzed in the study.

  4. Language modeling for automatic speech recognition of inflective languages an applications-oriented approach using lexical data

    CERN Document Server

    Donaj, Gregor

    2017-01-01

    This book covers language modeling and automatic speech recognition for inflective languages (e.g. Slavic languages), which represent roughly half of the languages spoken in Europe. These languages do not perform as well as English in speech recognition systems and it is therefore harder to develop an application with sufficient quality for the end user. The authors describe the most important language features for the development of a speech recognition system. This is then presented through the analysis of errors in the system and the development of language models and their inclusion in speech recognition systems, which specifically address the errors that are relevant for targeted applications. The error analysis is done with regard to morphological characteristics of the word in the recognized sentences. The book is oriented towards speech recognition with large vocabularies and continuous and even spontaneous speech. Today such applications work with a rather small number of languages compared to the nu...

  5. An HMM-Like Dynamic Time Warping Scheme for Automatic Speech Recognition

    Directory of Open Access Journals (Sweden)

    Ing-Jr Ding

    2014-01-01

    Full Text Available In the past, the kernel of automatic speech recognition (ASR is dynamic time warping (DTW, which is feature-based template matching and belongs to the category technique of dynamic programming (DP. Although DTW is an early developed ASR technique, DTW has been popular in lots of applications. DTW is playing an important role for the known Kinect-based gesture recognition application now. This paper proposed an intelligent speech recognition system using an improved DTW approach for multimedia and home automation services. The improved DTW presented in this work, called HMM-like DTW, is essentially a hidden Markov model- (HMM- like method where the concept of the typical HMM statistical model is brought into the design of DTW. The developed HMM-like DTW method, transforming feature-based DTW recognition into model-based DTW recognition, will be able to behave as the HMM recognition technique and therefore proposed HMM-like DTW with the HMM-like recognition model will have the capability to further perform model adaptation (also known as speaker adaptation. A series of experimental results in home automation-based multimedia access service environments demonstrated the superiority and effectiveness of the developed smart speech recognition system by HMM-like DTW.

  6. Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?

    Directory of Open Access Journals (Sweden)

    Héctor Delgado

    2015-12-01

    Full Text Available This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.

  7. Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?

    Directory of Open Access Journals (Sweden)

    Héctor Delgado

    2015-06-01

    This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.

  8. Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space

    Directory of Open Access Journals (Sweden)

    Kan Li

    2018-04-01

    Full Text Available This paper presents a novel real-time dynamic framework for quantifying time-series structure in spoken words using spikes. Audio signals are converted into multi-channel spike trains using a biologically-inspired leaky integrate-and-fire (LIF spike generator. These spike trains are mapped into a function space of infinite dimension, i.e., a Reproducing Kernel Hilbert Space (RKHS using point-process kernels, where a state-space model learns the dynamics of the multidimensional spike input using gradient descent learning. This kernelized recurrent system is very parsimonious and achieves the necessary memory depth via feedback of its internal states when trained discriminatively, utilizing the full context of the phoneme sequence. A main advantage of modeling nonlinear dynamics using state-space trajectories in the RKHS is that it imposes no restriction on the relationship between the exogenous input and its internal state. We are free to choose the input representation with an appropriate kernel, and changing the kernel does not impact the system nor the learning algorithm. Moreover, we show that this novel framework can outperform both traditional hidden Markov model (HMM speech processing as well as neuromorphic implementations based on spiking neural network (SNN, yielding accurate and ultra-low power word spotters. As a proof of concept, we demonstrate its capabilities using the benchmark TI-46 digit corpus for isolated-word automatic speech recognition (ASR or keyword spotting. Compared to HMM using Mel-frequency cepstral coefficient (MFCC front-end without time-derivatives, our MFCC-KAARMA offered improved performance. For spike-train front-end, spike-KAARMA also outperformed state-of-the-art SNN solutions. Furthermore, compared to MFCCs, spike trains provided enhanced noise robustness in certain low signal-to-noise ratio (SNR regime.

  9. Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space.

    Science.gov (United States)

    Li, Kan; Príncipe, José C

    2018-01-01

    This paper presents a novel real-time dynamic framework for quantifying time-series structure in spoken words using spikes. Audio signals are converted into multi-channel spike trains using a biologically-inspired leaky integrate-and-fire (LIF) spike generator. These spike trains are mapped into a function space of infinite dimension, i.e., a Reproducing Kernel Hilbert Space (RKHS) using point-process kernels, where a state-space model learns the dynamics of the multidimensional spike input using gradient descent learning. This kernelized recurrent system is very parsimonious and achieves the necessary memory depth via feedback of its internal states when trained discriminatively, utilizing the full context of the phoneme sequence. A main advantage of modeling nonlinear dynamics using state-space trajectories in the RKHS is that it imposes no restriction on the relationship between the exogenous input and its internal state. We are free to choose the input representation with an appropriate kernel, and changing the kernel does not impact the system nor the learning algorithm. Moreover, we show that this novel framework can outperform both traditional hidden Markov model (HMM) speech processing as well as neuromorphic implementations based on spiking neural network (SNN), yielding accurate and ultra-low power word spotters. As a proof of concept, we demonstrate its capabilities using the benchmark TI-46 digit corpus for isolated-word automatic speech recognition (ASR) or keyword spotting. Compared to HMM using Mel-frequency cepstral coefficient (MFCC) front-end without time-derivatives, our MFCC-KAARMA offered improved performance. For spike-train front-end, spike-KAARMA also outperformed state-of-the-art SNN solutions. Furthermore, compared to MFCCs, spike trains provided enhanced noise robustness in certain low signal-to-noise ratio (SNR) regime.

  10. Long term Suboxone™ emotional reactivity as measured by automatic detection in speech.

    Directory of Open Access Journals (Sweden)

    Edward Hill

    Full Text Available Addictions to illicit drugs are among the nation's most critical public health and societal problems. The current opioid prescription epidemic and the need for buprenorphine/naloxone (Suboxone®; SUBX as an opioid maintenance substance, and its growing street diversion provided impetus to determine affective states ("true ground emotionality" in long-term SUBX patients. Toward the goal of effective monitoring, we utilized emotion-detection in speech as a measure of "true" emotionality in 36 SUBX patients compared to 44 individuals from the general population (GP and 33 members of Alcoholics Anonymous (AA. Other less objective studies have investigated emotional reactivity of heroin, methadone and opioid abstinent patients. These studies indicate that current opioid users have abnormal emotional experience, characterized by heightened response to unpleasant stimuli and blunted response to pleasant stimuli. However, this is the first study to our knowledge to evaluate "true ground" emotionality in long-term buprenorphine/naloxone combination (Suboxone™. We found in long-term SUBX patients a significantly flat affect (p<0.01, and they had less self-awareness of being happy, sad, and anxious compared to both the GP and AA groups. We caution definitive interpretation of these seemingly important results until we compare the emotional reactivity of an opioid abstinent control using automatic detection in speech. These findings encourage continued research strategies in SUBX patients to target the specific brain regions responsible for relapse prevention of opioid addiction.

  11. Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

    Directory of Open Access Journals (Sweden)

    Umit H. Yapanel

    2008-08-01

    Full Text Available A proven method for achieving effective automatic speech recognition (ASR due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN, despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN, where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER by 24%, and (ii for a diverse noisy speech task (SPINE 2, where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.

  12. Exploiting Speech for Automatic TV Delinearization: From Streams to Cross-Media Semantic Navigation

    Directory of Open Access Journals (Sweden)

    Guinaudeau Camille

    2011-01-01

    Full Text Available The gradual migration of television from broadcast diffusion to Internet diffusion offers countless possibilities for the generation of rich navigable contents. However, it also raises numerous scientific issues regarding delinearization of TV streams and content enrichment. In this paper, we study how speech can be used at different levels of the delinearization process, using automatic speech transcription and natural language processing (NLP for the segmentation and characterization of TV programs and for the generation of semantic hyperlinks in videos. Transcript-based video delinearization requires natural language processing techniques robust to transcription peculiarities, such as transcription errors, and to domain and genre differences. We therefore propose to modify classical NLP techniques, initially designed for regular texts, to improve their robustness in the context of TV delinearization. We demonstrate that the modified NLP techniques can efficiently handle various types of TV material and be exploited for program description, for topic segmentation, and for the generation of semantic hyperlinks between multimedia contents. We illustrate the concept of cross-media semantic navigation with a description of our news navigation demonstrator presented during the NEM Summit 2009.

  13. Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

    Directory of Open Access Journals (Sweden)

    Yapanel UmitH

    2008-01-01

    Full Text Available A proven method for achieving effective automatic speech recognition (ASR due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN, despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN, where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER by 24%, and (ii for a diverse noisy speech task (SPINE 2, where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.

  14. A new automatic blood pressure kit auscultates for accurate reading with a smartphone

    Science.gov (United States)

    Wu, Hongjun; Wang, Bingjian; Zhu, Xinpu; Chu, Guang; Zhang, Zhi

    2016-01-01

    Abstract The widely used oscillometric automated blood pressure (BP) monitor was continuously questioned on its accuracy. A novel BP kit named Accutension which adopted Korotkoff auscultation method was then devised. Accutension worked with a miniature microphone, a pressure sensor, and a smartphone. The BP values were automatically displayed on the smartphone screen through the installed App. Data recorded in the phone could be played back and reconfirmed after measurement. They could also be uploaded and saved to the iCloud. The accuracy and consistency of this novel electronic auscultatory sphygmomanometer was preliminarily verified here. Thirty-two subjects were included and 82 qualified readings were obtained. The mean differences ± SD for systolic and diastolic BP readings between Accutension and mercury sphygmomanometer were 0.87 ± 2.86 and −0.94 ± 2.93 mm Hg. Agreements between Accutension and mercury sphygmomanometer were highly significant for systolic (ICC = 0.993, 95% confidence interval (CI): 0.989–0.995) and diastolic (ICC = 0.987, 95% CI: 0.979–0.991). In conclusion, Accutension worked accurately based on our pilot study data. The difference was acceptable. ICC and Bland–Altman plot charts showed good agreements with manual measurements. Systolic readings of Accutension were slightly higher than those of manual measurement, while diastolic readings were slightly lower. One possible reason was that Accutension captured the first and the last korotkoff sound more sensitively than human ear during manual measurement and avoided sound missing, so that it might be more accurate than traditional mercury sphygmomanometer. By documenting and analyzing of variant tendency of BP values, Accutension helps management of hypertension and therefore contributes to the mobile heath service. PMID:27512876

  15. Two Methods of Automatic Evaluation of Speech Signal Enhancement Recorded in the Open-Air MRI Environment

    Science.gov (United States)

    Přibil, Jiří; Přibilová, Anna; Frollo, Ivan

    2017-12-01

    The paper focuses on two methods of evaluation of successfulness of speech signal enhancement recorded in the open-air magnetic resonance imager during phonation for the 3D human vocal tract modeling. The first approach enables to obtain a comparison based on statistical analysis by ANOVA and hypothesis tests. The second method is based on classification by Gaussian mixture models (GMM). The performed experiments have confirmed that the proposed ANOVA and GMM classifiers for automatic evaluation of the speech quality are functional and produce fully comparable results with the standard evaluation based on the listening test method.

  16. Automatically high accurate and efficient photomask defects management solution for advanced lithography manufacture

    Science.gov (United States)

    Zhu, Jun; Chen, Lijun; Ma, Lantao; Li, Dejian; Jiang, Wei; Pan, Lihong; Shen, Huiting; Jia, Hongmin; Hsiang, Chingyun; Cheng, Guojie; Ling, Li; Chen, Shijie; Wang, Jun; Liao, Wenkui; Zhang, Gary

    2014-04-01

    Defect review is a time consuming job. Human error makes result inconsistent. The defects located on don't care area would not hurt the yield and no need to review them such as defects on dark area. However, critical area defects can impact yield dramatically and need more attention to review them such as defects on clear area. With decrease in integrated circuit dimensions, mask defects are always thousands detected during inspection even more. Traditional manual or simple classification approaches are unable to meet efficient and accuracy requirement. This paper focuses on automatic defect management and classification solution using image output of Lasertec inspection equipment and Anchor pattern centric image process technology. The number of mask defect found during an inspection is always in the range of thousands or even more. This system can handle large number defects with quick and accurate defect classification result. Our experiment includes Die to Die and Single Die modes. The classification accuracy can reach 87.4% and 93.3%. No critical or printable defects are missing in our test cases. The missing classification defects are 0.25% and 0.24% in Die to Die mode and Single Die mode. This kind of missing rate is encouraging and acceptable to apply on production line. The result can be output and reloaded back to inspection machine to have further review. This step helps users to validate some unsure defects with clear and magnification images when captured images can't provide enough information to make judgment. This system effectively reduces expensive inline defect review time. As a fully inline automated defect management solution, the system could be compatible with current inspection approach and integrated with optical simulation even scoring function and guide wafer level defect inspection.

  17. Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients

    Directory of Open Access Journals (Sweden)

    TjongWan Sen

    2009-11-01

    Full Text Available To improve the performance of phoneme based Automatic Speech Recognition (ASR in noisy environment; we developed a new technique that could add robustness to clean phonemes features. These robust features are obtained from Complex Wavelet Packet Transform (CWPT coefficients. Since the CWPT coefficients represent all different frequency bands of the input signal, decomposing the input signal into complete CWPT tree would also cover all frequencies involved in recognition process. For time overlapping signals with different frequency contents, e. g. phoneme signal with noises, its CWPT coefficients are the combination of CWPT coefficients of phoneme signal and CWPT coefficients of noises. The CWPT coefficients of phonemes signal would be changed according to frequency components contained in noises. Since the numbers of phonemes in every language are relatively small (limited and already well known, one could easily derive principal component vectors from clean training dataset using Principal Component Analysis (PCA. These principal component vectors could be used then to add robustness and minimize noises effects in testing phase. Simulation results, using Alpha Numeric 4 (AN4 from Carnegie Mellon University and NOISEX-92 examples from Rice University, showed that this new technique could be used as features extractor that improves the robustness of phoneme based ASR systems in various adverse noisy conditions and still preserves the performance in clean environments.

  18. The Usefulness of Automatic Speech Recognition (ASR Eyespeak Software in Improving Iraqi EFL Students’ Pronunciation

    Directory of Open Access Journals (Sweden)

    Lina Fathi Sidig Sidgi

    2017-02-01

    Full Text Available The present study focuses on determining whether automatic speech recognition (ASR technology is reliable for improving English pronunciation to Iraqi EFL students. Non-native learners of English are generally concerned about improving their pronunciation skills, and Iraqi students face difficulties in pronouncing English sounds that are not found in their native language (Arabic. This study is concerned with ASR and its effectiveness in overcoming this difficulty. The data were obtained from twenty participants randomly selected from first-year college students at Al-Turath University College from the Department of English in Baghdad-Iraq. The students had participated in a two month pronunciation instruction course using ASR Eyespeak software. At the end of the pronunciation instruction course using ASR Eyespeak software, the students completed a questionnaire to get their opinions about the usefulness of the ASR Eyespeak in improving their pronunciation. The findings of the study revealed that the students found ASR Eyespeak software very useful in improving their pronunciation and helping them realise their pronunciation mistakes. They also reported that learning pronunciation with ASR Eyespeak enjoyable.

  19. Improvement of the exponential experiment system for the automatical and accurate measurement of the exponential decay constant

    International Nuclear Information System (INIS)

    Shin, Hee Sung; Jang, Ji Woon; Lee, Yoon Hee; Hwang, Yong Hwa; Kim, Ho Dong

    2004-01-01

    The previous exponential experiment system has been improved for the automatical and accurate axial movement of the neutron source and detector with attaching the automatical control system which consists of a Programmable Logical Controller(PLC) and a stepping motor set. The automatic control program which controls MCA and PLC consistently has been also developed on the basis of GENIE 2000 library. The exponential experiments have been carried out for Kori 1 unit spent fuel assemblies, C14, J14 and G23, and Kori 2 unit spent fuel assembly, J44, using the improved systematical measurement system. As the results, the average exponential decay constants for 4 assemblies are determined to be 0.1302, 0.1267, 0.1247, and 0.1210, respectively, with the application of poisson regression

  20. Novel Techniques for Dialectal Arabic Speech Recognition

    CERN Document Server

    Elmahdy, Mohamed; Minker, Wolfgang

    2012-01-01

    Novel Techniques for Dialectal Arabic Speech describes approaches to improve automatic speech recognition for dialectal Arabic. Since speech resources for dialectal Arabic speech recognition are very sparse, the authors describe how existing Modern Standard Arabic (MSA) speech data can be applied to dialectal Arabic speech recognition, while assuming that MSA is always a second language for all Arabic speakers. In this book, Egyptian Colloquial Arabic (ECA) has been chosen as a typical Arabic dialect. ECA is the first ranked Arabic dialect in terms of number of speakers, and a high quality ECA speech corpus with accurate phonetic transcription has been collected. MSA acoustic models were trained using news broadcast speech. In order to cross-lingually use MSA in dialectal Arabic speech recognition, the authors have normalized the phoneme sets for MSA and ECA. After this normalization, they have applied state-of-the-art acoustic model adaptation techniques like Maximum Likelihood Linear Regression (MLLR) and M...

  1. Contribution to automatic speech recognition. Analysis of the direct acoustical signal. Recognition of isolated words and phoneme identification

    International Nuclear Information System (INIS)

    Dupeyrat, Benoit

    1981-01-01

    This report deals with the acoustical-phonetic step of the automatic recognition of the speech. The parameters used are the extrema of the acoustical signal (coded in amplitude and duration). This coding method, the properties of which are described, is simple and well adapted to a digital processing. The quality and the intelligibility of the coded signal after reconstruction are particularly satisfactory. An experiment for the automatic recognition of isolated words has been carried using this coding system. We have designed a filtering algorithm operating on the parameters of the coding. Thus the characteristics of the formants can be derived under certain conditions which are discussed. Using these characteristics the identification of a large part of the phonemes for a given speaker was achieved. Carrying on the studies has required the development of a particular methodology of real time processing which allowed immediate evaluation of the improvement of the programs. Such processing on temporal coding of the acoustical signal is extremely powerful and could represent, used in connection with other methods an efficient tool for the automatic processing of the speech.(author) [fr

  2. Automatic recognition of spontaneous emotions in speech using acoustic and lexical features

    NARCIS (Netherlands)

    Raaijmakers, S.; Truong, K.P.

    2008-01-01

    We developed acoustic and lexical classifiers, based on a boosting algorithm, to assess the separability on arousal and valence dimensions in spontaneous emotional speech. The spontaneous emotional speech data was acquired by inviting subjects to play a first-person shooter video game. Our acoustic

  3. Quasi-closed phase forward-backward linear prediction analysis of speech for accurate formant detection and estimation.

    Science.gov (United States)

    Gowda, Dhananjaya; Airaksinen, Manu; Alku, Paavo

    2017-09-01

    Recently, a quasi-closed phase (QCP) analysis of speech signals for accurate glottal inverse filtering was proposed. However, the QCP analysis which belongs to the family of temporally weighted linear prediction (WLP) methods uses the conventional forward type of sample prediction. This may not be the best choice especially in computing WLP models with a hard-limiting weighting function. A sample selective minimization of the prediction error in WLP reduces the effective number of samples available within a given window frame. To counter this problem, a modified quasi-closed phase forward-backward (QCP-FB) analysis is proposed, wherein each sample is predicted based on its past as well as future samples thereby utilizing the available number of samples more effectively. Formant detection and estimation experiments on synthetic vowels generated using a physical modeling approach as well as natural speech utterances show that the proposed QCP-FB method yields statistically significant improvements over the conventional linear prediction and QCP methods.

  4. Subjective Quality Measurement of Speech Its Evaluation, Estimation and Applications

    CERN Document Server

    Kondo, Kazuhiro

    2012-01-01

    It is becoming crucial to accurately estimate and monitor speech quality in various ambient environments to guarantee high quality speech communication. This practical hands-on book shows speech intelligibility measurement methods so that the readers can start measuring or estimating speech intelligibility of their own system. The book also introduces subjective and objective speech quality measures, and describes in detail speech intelligibility measurement methods. It introduces a diagnostic rhyme test which uses rhyming word-pairs, and includes: An investigation into the effect of word familiarity on speech intelligibility. Speech intelligibility measurement of localized speech in virtual 3-D acoustic space using the rhyme test. Estimation of speech intelligibility using objective measures, including the ITU standard PESQ measures, and automatic speech recognizers.

  5. A new automatic blood pressure kit auscultates for accurate reading with a smartphone

    OpenAIRE

    Wu, Hongjun; Wang, Bingjian; Zhu, Xinpu; Chu, Guang; Zhang, Zhi

    2016-01-01

    Abstract The widely used oscillometric automated blood pressure (BP) monitor was continuously questioned on its accuracy. A novel BP kit named Accutension which adopted Korotkoff auscultation method was then devised. Accutension worked with a miniature microphone, a pressure sensor, and a smartphone. The BP values were automatically displayed on the smartphone screen through the installed App. Data recorded in the phone could be played back and reconfirmed after measurement. They could also b...

  6. The Suitability of Cloud-Based Speech Recognition Engines for Language Learning

    Science.gov (United States)

    Daniels, Paul; Iwago, Koji

    2017-01-01

    As online automatic speech recognition (ASR) engines become more accurate and more widely implemented with call software, it becomes important to evaluate the effectiveness and the accuracy of these recognition engines using authentic speech samples. This study investigates two of the most prominent cloud-based speech recognition engines--Apple's…

  7. Automatical and accurate segmentation of cerebral tissues in fMRI dataset with combination of image processing and deep learning

    Science.gov (United States)

    Kong, Zhenglun; Luo, Junyi; Xu, Shengpu; Li, Ting

    2018-02-01

    Image segmentation plays an important role in medical science. One application is multimodality imaging, especially the fusion of structural imaging with functional imaging, which includes CT, MRI and new types of imaging technology such as optical imaging to obtain functional images. The fusion process require precisely extracted structural information, in order to register the image to it. Here we used image enhancement, morphometry methods to extract the accurate contours of different tissues such as skull, cerebrospinal fluid (CSF), grey matter (GM) and white matter (WM) on 5 fMRI head image datasets. Then we utilized convolutional neural network to realize automatic segmentation of images in deep learning way. Such approach greatly reduced the processing time compared to manual and semi-automatic segmentation and is of great importance in improving speed and accuracy as more and more samples being learned. The contours of the borders of different tissues on all images were accurately extracted and 3D visualized. This can be used in low-level light therapy and optical simulation software such as MCVM. We obtained a precise three-dimensional distribution of brain, which offered doctors and researchers quantitative volume data and detailed morphological characterization for personal precise medicine of Cerebral atrophy/expansion. We hope this technique can bring convenience to visualization medical and personalized medicine.

  8. Automatic content linking: Speech-based just-in-time retrieval for multimedia archives

    NARCIS (Netherlands)

    Popescu-Belis, A.; Kilgour, J.; Poller, P.; Nanchen, A.; Boertjes, E.; Wit, J. de

    2010-01-01

    The Automatic Content Linking Device monitors a conversation and uses automatically recognized words to retrieve documents that are of potential use to the participants. The document set includes project related reports or emails, transcribed snippets of past meetings, and websites. Retrieval

  9. Accurate halo-galaxy mocks from automatic bias estimation and particle mesh gravity solvers

    Science.gov (United States)

    Vakili, Mohammadjavad; Kitaura, Francisco-Shu; Feng, Yu; Yepes, Gustavo; Zhao, Cheng; Chuang, Chia-Hsun; Hahn, ChangHoon

    2017-12-01

    Reliable extraction of cosmological information from clustering measurements of galaxy surveys requires estimation of the error covariance matrices of observables. The accuracy of covariance matrices is limited by our ability to generate sufficiently large number of independent mock catalogues that can describe the physics of galaxy clustering across a wide range of scales. Furthermore, galaxy mock catalogues are required to study systematics in galaxy surveys and to test analysis tools. In this investigation, we present a fast and accurate approach for generation of mock catalogues for the upcoming galaxy surveys. Our method relies on low-resolution approximate gravity solvers to simulate the large-scale dark matter field, which we then populate with haloes according to a flexible non-linear and stochastic bias model. In particular, we extend the PATCHY code with an efficient particle mesh algorithm to simulate the dark matter field (the FASTPM code), and with a robust MCMC method relying on the EMCEE code for constraining the parameters of the bias model. Using the haloes in the BigMultiDark high-resolution N-body simulation as a reference catalogue, we demonstrate that our technique can model the bivariate probability distribution function (counts-in-cells), power spectrum and bispectrum of haloes in the reference catalogue. Specifically, we show that the new ingredients permit us to reach percentage accuracy in the power spectrum up to k ∼ 0.4 h Mpc-1 (within 5 per cent up to k ∼ 0.6 h Mpc-1) with accurate bispectra improving previous results based on Lagrangian perturbation theory.

  10. User Experience of a Mobile Speaking Application with Automatic Speech Recognition for EFL Learning

    Science.gov (United States)

    Ahn, Tae youn; Lee, Sangmin-Michelle

    2016-01-01

    With the spread of mobile devices, mobile phones have enormous potential regarding their pedagogical use in language education. The goal of this study is to analyse user experience of a mobile-based learning system that is enhanced by speech recognition technology for the improvement of EFL (English as a foreign language) learners' speaking…

  11. Fast, automatic, and accurate catheter reconstruction in HDR brachytherapy using an electromagnetic 3D tracking system

    Energy Technology Data Exchange (ETDEWEB)

    Poulin, Eric; Racine, Emmanuel; Beaulieu, Luc, E-mail: Luc.Beaulieu@phy.ulaval.ca [Département de physique, de génie physique et d’optique et Centre de recherche sur le cancer de l’Université Laval, Université Laval, Québec, Québec G1V 0A6, Canada and Département de radio-oncologie et Axe Oncologie du Centre de recherche du CHU de Québec, CHU de Québec, 11 Côte du Palais, Québec, Québec G1R 2J6 (Canada); Binnekamp, Dirk [Integrated Clinical Solutions and Marketing, Philips Healthcare, Veenpluis 4-6, Best 5680 DA (Netherlands)

    2015-03-15

    Purpose: In high dose rate brachytherapy (HDR-B), current catheter reconstruction protocols are relatively slow and error prone. The purpose of this technical note is to evaluate the accuracy and the robustness of an electromagnetic (EM) tracking system for automated and real-time catheter reconstruction. Methods: For this preclinical study, a total of ten catheters were inserted in gelatin phantoms with different trajectories. Catheters were reconstructed using a 18G biopsy needle, used as an EM stylet and equipped with a miniaturized sensor, and the second generation Aurora{sup ®} Planar Field Generator from Northern Digital Inc. The Aurora EM system provides position and orientation value with precisions of 0.7 mm and 0.2°, respectively. Phantoms were also scanned using a μCT (GE Healthcare) and Philips Big Bore clinical computed tomography (CT) system with a spatial resolution of 89 μm and 2 mm, respectively. Reconstructions using the EM stylet were compared to μCT and CT. To assess the robustness of the EM reconstruction, five catheters were reconstructed twice and compared. Results: Reconstruction time for one catheter was 10 s, leading to a total reconstruction time inferior to 3 min for a typical 17-catheter implant. When compared to the μCT, the mean EM tip identification error was 0.69 ± 0.29 mm while the CT error was 1.08 ± 0.67 mm. The mean 3D distance error was found to be 0.66 ± 0.33 mm and 1.08 ± 0.72 mm for the EM and CT, respectively. EM 3D catheter trajectories were found to be more accurate. A maximum difference of less than 0.6 mm was found between successive EM reconstructions. Conclusions: The EM reconstruction was found to be more accurate and precise than the conventional methods used for catheter reconstruction in HDR-B. This approach can be applied to any type of catheters and applicators.

  12. Fast, automatic, and accurate catheter reconstruction in HDR brachytherapy using an electromagnetic 3D tracking system

    International Nuclear Information System (INIS)

    Poulin, Eric; Racine, Emmanuel; Beaulieu, Luc; Binnekamp, Dirk

    2015-01-01

    Purpose: In high dose rate brachytherapy (HDR-B), current catheter reconstruction protocols are relatively slow and error prone. The purpose of this technical note is to evaluate the accuracy and the robustness of an electromagnetic (EM) tracking system for automated and real-time catheter reconstruction. Methods: For this preclinical study, a total of ten catheters were inserted in gelatin phantoms with different trajectories. Catheters were reconstructed using a 18G biopsy needle, used as an EM stylet and equipped with a miniaturized sensor, and the second generation Aurora ® Planar Field Generator from Northern Digital Inc. The Aurora EM system provides position and orientation value with precisions of 0.7 mm and 0.2°, respectively. Phantoms were also scanned using a μCT (GE Healthcare) and Philips Big Bore clinical computed tomography (CT) system with a spatial resolution of 89 μm and 2 mm, respectively. Reconstructions using the EM stylet were compared to μCT and CT. To assess the robustness of the EM reconstruction, five catheters were reconstructed twice and compared. Results: Reconstruction time for one catheter was 10 s, leading to a total reconstruction time inferior to 3 min for a typical 17-catheter implant. When compared to the μCT, the mean EM tip identification error was 0.69 ± 0.29 mm while the CT error was 1.08 ± 0.67 mm. The mean 3D distance error was found to be 0.66 ± 0.33 mm and 1.08 ± 0.72 mm for the EM and CT, respectively. EM 3D catheter trajectories were found to be more accurate. A maximum difference of less than 0.6 mm was found between successive EM reconstructions. Conclusions: The EM reconstruction was found to be more accurate and precise than the conventional methods used for catheter reconstruction in HDR-B. This approach can be applied to any type of catheters and applicators

  13. Accurate Automatic Delineation of Heterogeneous Functional Volumes in Positron Emission Tomography for Oncology Applications

    International Nuclear Information System (INIS)

    Hatt, Mathieu; Cheze le Rest, Catherine; Descourt, Patrice; Dekker, Andre; De Ruysscher, Dirk; Oellers, Michel; Lambin, Philippe; Pradier, Olivier; Visvikis, Dimitris

    2010-01-01

    Purpose: Accurate contouring of positron emission tomography (PET) functional volumes is now considered crucial in image-guided radiotherapy and other oncology applications because the use of functional imaging allows for biological target definition. In addition, the definition of variable uptake regions within the tumor itself may facilitate dose painting for dosimetry optimization. Methods and Materials: Current state-of-the-art algorithms for functional volume segmentation use adaptive thresholding. We developed an approach called fuzzy locally adaptive Bayesian (FLAB), validated on homogeneous objects, and then improved it by allowing the use of up to three tumor classes for the delineation of inhomogeneous tumors (3-FLAB). Simulated and real tumors with histology data containing homogeneous and heterogeneous activity distributions were used to assess the algorithm's accuracy. Results: The new 3-FLAB algorithm is able to extract the overall tumor from the background tissues and delineate variable uptake regions within the tumors, with higher accuracy and robustness compared with adaptive threshold (T bckg ) and fuzzy C-means (FCM). 3-FLAB performed with a mean classification error of less than 9% ± 8% on the simulated tumors, whereas binary-only implementation led to errors of 15% ± 11%. T bckg and FCM led to mean errors of 20% ± 12% and 17% ± 14%, respectively. 3-FLAB also led to more robust estimation of the maximum diameters of tumors with histology measurements, with bckg and FCM lead to 10%, 12%, and 13%, respectively. Conclusion: These encouraging results warrant further investigation in future studies that will investigate the impact of 3-FLAB in radiotherapy treatment planning, diagnosis, and therapy response evaluation.

  14. The Use of an Autonomous Pedagogical Agent and Automatic Speech Recognition for Teaching Sight Words to Students with Autism Spectrum Disorder

    Science.gov (United States)

    Saadatzi, Mohammad Nasser; Pennington, Robert C.; Welch, Karla C.; Graham, James H.; Scott, Renee E.

    2017-01-01

    In the current study, we examined the effects of an instructional package comprised of an autonomous pedagogical agent, automatic speech recognition, and constant time delay during the instruction of reading sight words aloud to young adults with autism spectrum disorder. We used a concurrent multiple baseline across participants design to…

  15. Assessing the Performance of Automatic Speech Recognition Systems When Used by Native and Non-Native Speakers of Three Major Languages in Dictation Workflows

    DEFF Research Database (Denmark)

    Zapata, Julián; Kirkedal, Andreas Søeborg

    2015-01-01

    In this paper, we report on a two-part experiment aiming to assess and compare the performance of two types of automatic speech recognition (ASR) systems on two different computational platforms when used to augment dictation workflows. The experiment was performed with a sample of speakers...

  16. An automatic speech recognition system with speaker-independent identification support

    Science.gov (United States)

    Caranica, Alexandru; Burileanu, Corneliu

    2015-02-01

    The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.

  17. Achieving Accurate Automatic Sleep Staging on Manually Pre-processed EEG Data Through Synchronization Feature Extraction and Graph Metrics.

    Science.gov (United States)

    Chriskos, Panteleimon; Frantzidis, Christos A; Gkivogkli, Polyxeni T; Bamidis, Panagiotis D; Kourtidou-Papadeli, Chrysoula

    2018-01-01

    Sleep staging, the process of assigning labels to epochs of sleep, depending on the stage of sleep they belong, is an arduous, time consuming and error prone process as the initial recordings are quite often polluted by noise from different sources. To properly analyze such data and extract clinical knowledge, noise components must be removed or alleviated. In this paper a pre-processing and subsequent sleep staging pipeline for the sleep analysis of electroencephalographic signals is described. Two novel methods of functional connectivity estimation (Synchronization Likelihood/SL and Relative Wavelet Entropy/RWE) are comparatively investigated for automatic sleep staging through manually pre-processed electroencephalographic recordings. A multi-step process that renders signals suitable for further analysis is initially described. Then, two methods that rely on extracting synchronization features from electroencephalographic recordings to achieve computerized sleep staging are proposed, based on bivariate features which provide a functional overview of the brain network, contrary to most proposed methods that rely on extracting univariate time and frequency features. Annotation of sleep epochs is achieved through the presented feature extraction methods by training classifiers, which are in turn able to accurately classify new epochs. Analysis of data from sleep experiments on a randomized, controlled bed-rest study, which was organized by the European Space Agency and was conducted in the "ENVIHAB" facility of the Institute of Aerospace Medicine at the German Aerospace Center (DLR) in Cologne, Germany attains high accuracy rates, over 90% based on ground truth that resulted from manual sleep staging by two experienced sleep experts. Therefore, it can be concluded that the above feature extraction methods are suitable for semi-automatic sleep staging.

  18. Structured Semantic Knowledge Can Emerge Automatically from Predicting Word Sequences in Child-Directed Speech

    Directory of Open Access Journals (Sweden)

    Philip A. Huebner

    2018-02-01

    Full Text Available Previous research has suggested that distributional learning mechanisms may contribute to the acquisition of semantic knowledge. However, distributional learning mechanisms, statistical learning, and contemporary “deep learning” approaches have been criticized for being incapable of learning the kind of abstract and structured knowledge that many think is required for acquisition of semantic knowledge. In this paper, we show that recurrent neural networks, trained on noisy naturalistic speech to children, do in fact learn what appears to be abstract and structured knowledge. We trained two types of recurrent neural networks (Simple Recurrent Network, and Long Short-Term Memory to predict word sequences in a 5-million-word corpus of speech directed to children ages 0–3 years old, and assessed what semantic knowledge they acquired. We found that learned internal representations are encoding various abstract grammatical and semantic features that are useful for predicting word sequences. Assessing the organization of semantic knowledge in terms of the similarity structure, we found evidence of emergent categorical and hierarchical structure in both models. We found that the Long Short-term Memory (LSTM and SRN are both learning very similar kinds of representations, but the LSTM achieved higher levels of performance on a quantitative evaluation. We also trained a non-recurrent neural network, Skip-gram, on the same input to compare our results to the state-of-the-art in machine learning. We found that Skip-gram achieves relatively similar performance to the LSTM, but is representing words more in terms of thematic compared to taxonomic relations, and we provide reasons why this might be the case. Our findings show that a learning system that derives abstract, distributed representations for the purpose of predicting sequential dependencies in naturalistic language may provide insight into emergence of many properties of the developing

  19. Structured Semantic Knowledge Can Emerge Automatically from Predicting Word Sequences in Child-Directed Speech

    Science.gov (United States)

    Huebner, Philip A.; Willits, Jon A.

    2018-01-01

    Previous research has suggested that distributional learning mechanisms may contribute to the acquisition of semantic knowledge. However, distributional learning mechanisms, statistical learning, and contemporary “deep learning” approaches have been criticized for being incapable of learning the kind of abstract and structured knowledge that many think is required for acquisition of semantic knowledge. In this paper, we show that recurrent neural networks, trained on noisy naturalistic speech to children, do in fact learn what appears to be abstract and structured knowledge. We trained two types of recurrent neural networks (Simple Recurrent Network, and Long Short-Term Memory) to predict word sequences in a 5-million-word corpus of speech directed to children ages 0–3 years old, and assessed what semantic knowledge they acquired. We found that learned internal representations are encoding various abstract grammatical and semantic features that are useful for predicting word sequences. Assessing the organization of semantic knowledge in terms of the similarity structure, we found evidence of emergent categorical and hierarchical structure in both models. We found that the Long Short-term Memory (LSTM) and SRN are both learning very similar kinds of representations, but the LSTM achieved higher levels of performance on a quantitative evaluation. We also trained a non-recurrent neural network, Skip-gram, on the same input to compare our results to the state-of-the-art in machine learning. We found that Skip-gram achieves relatively similar performance to the LSTM, but is representing words more in terms of thematic compared to taxonomic relations, and we provide reasons why this might be the case. Our findings show that a learning system that derives abstract, distributed representations for the purpose of predicting sequential dependencies in naturalistic language may provide insight into emergence of many properties of the developing semantic system. PMID

  20. A new automatic blood pressure kit auscultates for accurate reading with a smartphone: A diagnostic accuracy study.

    Science.gov (United States)

    Wu, Hongjun; Wang, Bingjian; Zhu, Xinpu; Chu, Guang; Zhang, Zhi

    2016-08-01

    The widely used oscillometric automated blood pressure (BP) monitor was continuously questioned on its accuracy. A novel BP kit named Accutension which adopted Korotkoff auscultation method was then devised. Accutension worked with a miniature microphone, a pressure sensor, and a smartphone. The BP values were automatically displayed on the smartphone screen through the installed App. Data recorded in the phone could be played back and reconfirmed after measurement. They could also be uploaded and saved to the iCloud. The accuracy and consistency of this novel electronic auscultatory sphygmomanometer was preliminarily verified here. Thirty-two subjects were included and 82 qualified readings were obtained. The mean differences ± SD for systolic and diastolic BP readings between Accutension and mercury sphygmomanometer were 0.87 ± 2.86 and -0.94 ± 2.93 mm Hg. Agreements between Accutension and mercury sphygmomanometer were highly significant for systolic (ICC = 0.993, 95% confidence interval (CI): 0.989-0.995) and diastolic (ICC = 0.987, 95% CI: 0.979-0.991). In conclusion, Accutension worked accurately based on our pilot study data. The difference was acceptable. ICC and Bland-Altman plot charts showed good agreements with manual measurements. Systolic readings of Accutension were slightly higher than those of manual measurement, while diastolic readings were slightly lower. One possible reason was that Accutension captured the first and the last korotkoff sound more sensitively than human ear during manual measurement and avoided sound missing, so that it might be more accurate than traditional mercury sphygmomanometer. By documenting and analyzing of variant tendency of BP values, Accutension helps management of hypertension and therefore contributes to the mobile heath service.

  1. Automatic generation of accurate subject-specific bone finite element models to be used in clinical studies.

    Science.gov (United States)

    Viceconti, Marco; Davinelli, Mario; Taddei, Fulvia; Cappello, Angelo

    2004-10-01

    Most of the finite element models of bones used in orthopaedic biomechanics research are based on generic anatomies. However, in many cases it would be useful to generate from CT data a separate finite element model for each subject of a study group. In a recent study a hexahedral mesh generator based on a grid projection algorithm was found very effective in terms of accuracy and automation. However, so far the use of this method has been documented only on data collected in vitro and only for long bones. The present study was aimed at verifying if this method represents a procedure for the generation of finite element models of human bones from data collected in vivo, robust, accurate, automatic and general enough to be used in clinical studies. Robustness, automation and numerical accuracy of the proposed method were assessed on five femoral CT data sets of patients affected by various pathologies. The generality of the method was verified by processing a femur, an ileum, a phalanx, a proximal femur reconstruction, and the micro-CT of a small sample of spongy bone. The method was found robust enough to cope with the variability of the five femurs, producing meshes with a numerical accuracy and a computational weight comparable to those found in vitro. Even when the method was used to process the other bones the levels of mesh conditioning remained within acceptable limits. Thus, it may be concluded that the method presents a generality sufficient to cope with almost any orthopaedic application.

  2. An efficient and accurate framework for calculating lattice thermal conductivity of solids: AFLOW—AAPL Automatic Anharmonic Phonon Library

    Science.gov (United States)

    Plata, Jose J.; Nath, Pinku; Usanmaz, Demet; Carrete, Jesús; Toher, Cormac; de Jong, Maarten; Asta, Mark; Fornari, Marco; Nardelli, Marco Buongiorno; Curtarolo, Stefano

    2017-10-01

    One of the most accurate approaches for calculating lattice thermal conductivity, , is solving the Boltzmann transport equation starting from third-order anharmonic force constants. In addition to the underlying approximations of ab-initio parameterization, two main challenges are associated with this path: high computational costs and lack of automation in the frameworks using this methodology, which affect the discovery rate of novel materials with ad-hoc properties. Here, the Automatic Anharmonic Phonon Library (AAPL) is presented. It efficiently computes interatomic force constants by making effective use of crystal symmetry analysis, it solves the Boltzmann transport equation to obtain , and allows a fully integrated operation with minimum user intervention, a rational addition to the current high-throughput accelerated materials development framework AFLOW. An "experiment vs. theory" study of the approach is shown, comparing accuracy and speed with respect to other available packages, and for materials characterized by strong electron localization and correlation. Combining AAPL with the pseudo-hybrid functional ACBN0 is possible to improve accuracy without increasing computational requirements.

  3. Automatic and accurate reconstruction of distal humerus contours through B-Spline fitting based on control polygon deformation.

    Science.gov (United States)

    Mostafavi, Kamal; Tutunea-Fatan, O Remus; Bordatchev, Evgueni V; Johnson, James A

    2014-12-01

    The strong advent of computer-assisted technologies experienced by the modern orthopedic surgery prompts for the expansion of computationally efficient techniques to be built on the broad base of computer-aided engineering tools that are readily available. However, one of the common challenges faced during the current developmental phase continues to remain the lack of reliable frameworks to allow a fast and precise conversion of the anatomical information acquired through computer tomography to a format that is acceptable to computer-aided engineering software. To address this, this study proposes an integrated and automatic framework capable to extract and then postprocess the original imaging data to a common planar and closed B-Spline representation. The core of the developed platform relies on the approximation of the discrete computer tomography data by means of an original two-step B-Spline fitting technique based on successive deformations of the control polygon. In addition to its rapidity and robustness, the developed fitting technique was validated to produce accurate representations that do not deviate by more than 0.2 mm with respect to alternate representations of the bone geometry that were obtained through different-contact-based-data acquisition or data processing methods. © IMechE 2014.

  4. Optimizing Automatic Speech Recognition for Low-Proficient Non-Native Speakers

    Directory of Open Access Journals (Sweden)

    Catia Cucchiarini

    2010-01-01

    Full Text Available Computer-Assisted Language Learning (CALL applications for improving the oral skills of low-proficient learners have to cope with non-native speech that is particularly challenging. Since unconstrained non-native ASR is still problematic, a possible solution is to elicit constrained responses from the learners. In this paper, we describe experiments aimed at selecting utterances from lists of responses. The first experiment on utterance selection indicates that the decoding process can be improved by optimizing the language model and the acoustic models, thus reducing the utterance error rate from 29–26% to 10–8%. Since giving feedback on incorrectly recognized utterances is confusing, we verify the correctness of the utterance before providing feedback. The results of the second experiment on utterance verification indicate that combining duration-related features with a likelihood ratio (LR yield an equal error rate (EER of 10.3%, which is significantly better than the EER for the other measures in isolation.

  5. Second-language learning effects on automaticity of speech processing of Japanese phonetic contrasts: An MEG study.

    Science.gov (United States)

    Hisagi, Miwako; Shafer, Valerie L; Miyagawa, Shigeru; Kotek, Hadas; Sugawara, Ayaka; Pantazis, Dimitrios

    2016-12-01

    We examined discrimination of a second-language (L2) vowel duration contrast in English learners of Japanese (JP) with different amounts of experience using the magnetoencephalography mismatch field (MMF) component. Twelve L2 learners were tested before and after a second semester of college-level JP; half attended a regular rate course and half an accelerated course with more hours per week. Results showed no significant change in MMF for either the regular or accelerated learning group from beginning to end of the course. We also compared these groups against nine L2 learners who had completed four semesters of college-level JP. These 4-semester learners did not significantly differ from 2-semester learners, in that only a difference in hemisphere activation (interacting with time) between the two groups approached significance. These findings suggest that targeted training of L2 phonology may be necessary to allow for changes in processing of L2 speech contrasts at an early, automatic level. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. Security and Hyper-accurate Positioning Monitoring with Automatic Dependent Surveillance-Broadcast (ADS-B), Phase II

    Data.gov (United States)

    National Aeronautics and Space Administration — Lightning Ridge Technologies, LLC, working in collaboration with The Innovation Laboratory, Inc., extend Automatic Dependent Surveillance -- Broadcast (ADS-B) into a...

  7. Security and Hyper-accurate Positioning Monitoring with Automatic Dependent Surveillance-Broadcast (ADS-B), Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — Lightning Ridge Technologies, working in collaboration with The Innovation Laboratory, Inc., extend Automatic Dependent Surveillance Broadcast (ADS-B) into a safe,...

  8. A fully automatic tool to perform accurate flood mapping by merging remote sensing imagery and ancillary data

    Science.gov (United States)

    D'Addabbo, Annarita; Refice, Alberto; Lovergine, Francesco; Pasquariello, Guido

    2016-04-01

    Flooding is one of the most frequent and expansive natural hazard. High-resolution flood mapping is an essential step in the monitoring and prevention of inundation hazard, both to gain insight into the processes involved in the generation of flooding events, and from the practical point of view of the precise assessment of inundated areas. Remote sensing data are recognized to be useful in this respect, thanks to the high resolution and regular revisit schedules of state-of-the-art satellites, moreover offering a synoptic overview of the extent of flooding. In particular, Synthetic Aperture Radar (SAR) data present several favorable characteristics for flood mapping, such as their relative insensitivity to the meteorological conditions during acquisitions, as well as the possibility of acquiring independently of solar illumination, thanks to the active nature of the radar sensors [1]. However, flood scenarios are typical examples of complex situations in which different factors have to be considered to provide accurate and robust interpretation of the situation on the ground: the presence of many land cover types, each one with a particular signature in presence of flood, requires modelling the behavior of different objects in the scene in order to associate them to flood or no flood conditions [2]. Generally, the fusion of multi-temporal, multi-sensor, multi-resolution and/or multi-platform Earth observation image data, together with other ancillary information, seems to have a key role in the pursuit of a consistent interpretation of complex scenes. In the case of flooding, distance from the river, terrain elevation, hydrologic information or some combination thereof can add useful information to remote sensing data. Suitable methods, able to manage and merge different kind of data, are so particularly needed. In this work, a fully automatic tool, based on Bayesian Networks (BNs) [3] and able to perform data fusion, is presented. It supplies flood maps

  9. Automatic detection of hate speech in text: an overview of the topic and dataset annotation with hierarchical classes

    OpenAIRE

    Paula Cristina Teixeira Fortuna

    2017-01-01

    Nowadays people are using more and more social networks to communicate their opinions, share information and experiences. In social networks people have the feeling of being deindividualized and can incur more frequently in aggressive communication. In this context, it is important that government and social networks platforms have tools to detect hate speech because it is harmful to its targets. In our work we investigate the problem of detecting hate speech online. Our first goal is to make...

  10. Full automatic fiducial marker detection on coil arrays for accurate instrumentation placement during MRI guided breast interventions

    Science.gov (United States)

    Filippatos, Konstantinos; Boehler, Tobias; Geisler, Benjamin; Zachmann, Harald; Twellmann, Thorsten

    2010-02-01

    With its high sensitivity, dynamic contrast-enhanced MR imaging (DCE-MRI) of the breast is today one of the first-line tools for early detection and diagnosis of breast cancer, particularly in the dense breast of young women. However, many relevant findings are very small or occult on targeted ultrasound images or mammography, so that MRI guided biopsy is the only option for a precise histological work-up [1]. State-of-the-art software tools for computer-aided diagnosis of breast cancer in DCE-MRI data offer also means for image-based planning of biopsy interventions. One step in the MRI guided biopsy workflow is the alignment of the patient position with the preoperative MR images. In these images, the location and orientation of the coil localization unit can be inferred from a number of fiducial markers, which for this purpose have to be manually or semi-automatically detected by the user. In this study, we propose a method for precise, full-automatic localization of fiducial markers, on which basis a virtual localization unit can be subsequently placed in the image volume for the purpose of determining the parameters for needle navigation. The method is based on adaptive thresholding for separating breast tissue from background followed by rigid registration of marker templates. In an evaluation of 25 clinical cases comprising 4 different commercial coil array models and 3 different MR imaging protocols, the method yielded a sensitivity of 0.96 at a false positive rate of 0.44 markers per case. The mean distance deviation between detected fiducial centers and ground truth information that was appointed from a radiologist was 0.94mm.

  11. Accurate and precise determination of 2-25mg amounts of uranium by means of a special automatic potentiometric titration

    International Nuclear Information System (INIS)

    Slanina, J.; Bakker, F.; Groen, A.J.P.; Lingerak, W.A.

    1978-01-01

    A precise and accurate potentiometric titration of 2-25 mg of uranium is described. The uranium is reduced to U(IV) according to the method of Eberle et al. [3], and titrated with 0.05 N potassium dichromate, using a platinum indicator electrode. During the sample preparation the walls of the titration vessel are cleaned by centrifugation. To avoid overshoot of the set point a special differentiator is described, that interrupts the titration until equilibrium is reached. The precision of the method is 0.02%, the accuracy is better than 0.04% rel. Each titration takes 5 min. (orig.) [de

  12. Automatic Earthquake Shear Stress Measurement Method Developed for Accurate Time- Prediction Analysis of Forthcoming Major Earthquakes Along Shallow Active Faults

    Science.gov (United States)

    Serata, S.

    2006-12-01

    The Serata Stressmeter has been developed to measure and monitor earthquake shear stress build-up along shallow active faults. The development work made in the past 25 years has established the Stressmeter as an automatic stress measurement system to study timing of forthcoming major earthquakes in support of the current earthquake prediction studies based on statistical analysis of seismological observations. In early 1982, a series of major Man-made earthquakes (magnitude 4.5-5.0) suddenly occurred in an area over deep underground potash mine in Saskatchewan, Canada. By measuring underground stress condition of the mine, the direct cause of the earthquake was disclosed. The cause was successfully eliminated by controlling the stress condition of the mine. The Japanese government was interested in this development and the Stressmeter was introduced to the Japanese government research program for earthquake stress studies. In Japan the Stressmeter was first utilized for direct measurement of the intrinsic lateral tectonic stress gradient G. The measurement, conducted at the Mt. Fuji Underground Research Center of the Japanese government, disclosed the constant natural gradients of maximum and minimum lateral stresses in an excellent agreement with the theoretical value, i.e., G = 0.25. All the conventional methods of overcoring, hydrofracturing and deformation, which were introduced to compete with the Serata method, failed demonstrating the fundamental difficulties of the conventional methods. The intrinsic lateral stress gradient determined by the Stressmeter for the Japanese government was found to be the same with all the other measurements made by the Stressmeter in Japan. The stress measurement results obtained by the major international stress measurement work in the Hot Dry Rock Projects conducted in USA, England and Germany are found to be in good agreement with the Stressmeter results obtained in Japan. Based on this broad agreement, a solid geomechanical

  13. Fast, accurate, and robust automatic marker detection for motion correction based on oblique kV or MV projection image pairs

    International Nuclear Information System (INIS)

    Slagmolen, Pieter; Hermans, Jeroen; Maes, Frederik; Budiharto, Tom; Haustermans, Karin; Heuvel, Frank van den

    2010-01-01

    Purpose: A robust and accurate method that allows the automatic detection of fiducial markers in MV and kV projection image pairs is proposed. The method allows to automatically correct for inter or intrafraction motion. Methods: Intratreatment MV projection images are acquired during each of five treatment beams of prostate cancer patients with four implanted fiducial markers. The projection images are first preprocessed using a series of marker enhancing filters. 2D candidate marker locations are generated for each of the filtered projection images and 3D candidate marker locations are reconstructed by pairing candidates in subsequent projection images. The correct marker positions are retrieved in 3D by the minimization of a cost function that combines 2D image intensity and 3D geometric or shape information for the entire marker configuration simultaneously. This optimization problem is solved using dynamic programming such that the globally optimal configuration for all markers is always found. Translational interfraction and intrafraction prostate motion and the required patient repositioning is assessed from the position of the centroid of the detected markers in different MV image pairs. The method was validated on a phantom using CT as ground-truth and on clinical data sets of 16 patients using manual marker annotations as ground-truth. Results: The entire setup was confirmed to be accurate to around 1 mm by the phantom measurements. The reproducibility of the manual marker selection was less than 3.5 pixels in the MV images. In patient images, markers were correctly identified in at least 99% of the cases for anterior projection images and 96% of the cases for oblique projection images. The average marker detection accuracy was 1.4±1.8 pixels in the projection images. The centroid of all four reconstructed marker positions in 3D was positioned within 2 mm of the ground-truth position in 99.73% of all cases. Detecting four markers in a pair of MV images

  14. The software for automatic creation of the formal grammars used by speech recognition, computer vision, editable text conversion systems, and some new functions

    Science.gov (United States)

    Kardava, Irakli; Tadyszak, Krzysztof; Gulua, Nana; Jurga, Stefan

    2017-02-01

    For more flexibility of environmental perception by artificial intelligence it is needed to exist the supporting software modules, which will be able to automate the creation of specific language syntax and to make a further analysis for relevant decisions based on semantic functions. According of our proposed approach, of which implementation it is possible to create the couples of formal rules of given sentences (in case of natural languages) or statements (in case of special languages) by helping of computer vision, speech recognition or editable text conversion system for further automatic improvement. In other words, we have developed an approach, by which it can be achieved to significantly improve the training process automation of artificial intelligence, which as a result will give us a higher level of self-developing skills independently from us (from users). At the base of our approach we have developed a software demo version, which includes the algorithm and software code for the entire above mentioned component's implementation (computer vision, speech recognition and editable text conversion system). The program has the ability to work in a multi - stream mode and simultaneously create a syntax based on receiving information from several sources.

  15. A fast, accurate, and automatic 2D-3D image registration for image-guided cranial radiosurgery

    International Nuclear Information System (INIS)

    Fu Dongshan; Kuduvalli, Gopinath

    2008-01-01

    The authors developed a fast and accurate two-dimensional (2D)-three-dimensional (3D) image registration method to perform precise initial patient setup and frequent detection and correction for patient movement during image-guided cranial radiosurgery treatment. In this method, an approximate geometric relationship is first established to decompose a 3D rigid transformation in the 3D patient coordinate into in-plane transformations and out-of-plane rotations in two orthogonal 2D projections. Digitally reconstructed radiographs are generated offline from a preoperative computed tomography volume prior to treatment and used as the reference for patient position. A multiphase framework is designed to register the digitally reconstructed radiographs with the x-ray images periodically acquired during patient setup and treatment. The registration in each projection is performed independently; the results in the two projections are then combined and converted to a 3D rigid transformation by 2D-3D geometric backprojection. The in-plane transformation and the out-of-plane rotation are estimated using different search methods, including multiresolution matching, steepest descent minimization, and one-dimensional search. Two similarity measures, optimized pattern intensity and sum of squared difference, are applied at different registration phases to optimize accuracy and computation speed. Various experiments on an anthropomorphic head-and-neck phantom showed that, using fiducial registration as a gold standard, the registration errors were 0.33±0.16 mm (s.d.) in overall translation and 0.29 deg. ±0.11 deg. (s.d.) in overall rotation. The total targeting errors were 0.34±0.16 mm (s.d.), 0.40±0.2 mm (s.d.), and 0.51±0.26 mm (s.d.) for the targets at the distances of 2, 6, and 10 cm from the rotation center, respectively. The computation time was less than 3 s on a computer with an Intel Pentium 3.0 GHz dual processor

  16. Multilevel Analysis in Analyzing Speech Data

    Science.gov (United States)

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  17. Specific acoustic models for spontaneous and dictated style in indonesian speech recognition

    Science.gov (United States)

    Vista, C. B.; Satriawan, C. H.; Lestari, D. P.; Widyantoro, D. H.

    2018-03-01

    The performance of an automatic speech recognition system is affected by differences in speech style between the data the model is originally trained upon and incoming speech to be recognized. In this paper, the usage of GMM-HMM acoustic models for specific speech styles is investigated. We develop two systems for the experiments; the first employs a speech style classifier to predict the speech style of incoming speech, either spontaneous or dictated, then decodes this speech using an acoustic model specifically trained for that speech style. The second system uses both acoustic models to recognise incoming speech and decides upon a final result by calculating a confidence score of decoding. Results show that training specific acoustic models for spontaneous and dictated speech styles confers a slight recognition advantage as compared to a baseline model trained on a mixture of spontaneous and dictated training data. In addition, the speech style classifier approach of the first system produced slightly more accurate results than the confidence scoring employed in the second system.

  18. Estimating patient dose from CT exams that use automatic exposure control: Development and validation of methods to accurately estimate tube current values.

    Science.gov (United States)

    McMillan, Kyle; Bostani, Maryam; Cagnon, Christopher H; Yu, Lifeng; Leng, Shuai; McCollough, Cynthia H; McNitt-Gray, Michael F

    2017-08-01

    The vast majority of body CT exams are performed with automatic exposure control (AEC), which adapts the mean tube current to the patient size and modulates the tube current either angularly, longitudinally or both. However, most radiation dose estimation tools are based on fixed tube current scans. Accurate estimates of patient dose from AEC scans require knowledge of the tube current values, which is usually unavailable. The purpose of this work was to develop and validate methods to accurately estimate the tube current values prescribed by one manufacturer's AEC system to enable accurate estimates of patient dose. Methods were developed that took into account available patient attenuation information, user selected image quality reference parameters and x-ray system limits to estimate tube current values for patient scans. Methods consistent with AAPM Report 220 were developed that used patient attenuation data that were: (a) supplied by the manufacturer in the CT localizer radiograph and (b) based on a simulated CT localizer radiograph derived from image data. For comparison, actual tube current values were extracted from the projection data of each patient. Validation of each approach was based on data collected from 40 pediatric and adult patients who received clinically indicated chest (n = 20) and abdomen/pelvis (n = 20) scans on a 64 slice multidetector row CT (Sensation 64, Siemens Healthcare, Forchheim, Germany). For each patient dataset, the following were collected with Institutional Review Board (IRB) approval: (a) projection data containing actual tube current values at each projection view, (b) CT localizer radiograph (topogram) and (c) reconstructed image data. Tube current values were estimated based on the actual topogram (actual-topo) as well as the simulated topogram based on image data (sim-topo). Each of these was compared to the actual tube current values from the patient scan. In addition, to assess the accuracy of each method in estimating

  19. High-throughput image analysis of tumor spheroids: a user-friendly software application to measure the size of spheroids automatically and accurately.

    Science.gov (United States)

    Chen, Wenjin; Wong, Chung; Vosburgh, Evan; Levine, Arnold J; Foran, David J; Xu, Eugenia Y

    2014-07-08

    The increasing number of applications of three-dimensional (3D) tumor spheroids as an in vitro model for drug discovery requires their adaptation to large-scale screening formats in every step of a drug screen, including large-scale image analysis. Currently there is no ready-to-use and free image analysis software to meet this large-scale format. Most existing methods involve manually drawing the length and width of the imaged 3D spheroids, which is a tedious and time-consuming process. This study presents a high-throughput image analysis software application - SpheroidSizer, which measures the major and minor axial length of the imaged 3D tumor spheroids automatically and accurately; calculates the volume of each individual 3D tumor spheroid; then outputs the results in two different forms in spreadsheets for easy manipulations in the subsequent data analysis. The main advantage of this software is its powerful image analysis application that is adapted for large numbers of images. It provides high-throughput computation and quality-control workflow. The estimated time to process 1,000 images is about 15 min on a minimally configured laptop, or around 1 min on a multi-core performance workstation. The graphical user interface (GUI) is also designed for easy quality control, and users can manually override the computer results. The key method used in this software is adapted from the active contour algorithm, also known as Snakes, which is especially suitable for images with uneven illumination and noisy background that often plagues automated imaging processing in high-throughput screens. The complimentary "Manual Initialize" and "Hand Draw" tools provide the flexibility to SpheroidSizer in dealing with various types of spheroids and diverse quality images. This high-throughput image analysis software remarkably reduces labor and speeds up the analysis process. Implementing this software is beneficial for 3D tumor spheroids to become a routine in vitro model

  20. Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD

    DEFF Research Database (Denmark)

    Kraljevski, Ivan; Tan, Zheng-Hua; Paola Bissiri, Maria

    2015-01-01

    This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the ......This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions...... and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels....

  1. A novel device for automatic withdrawal and accurate calibration of 99m-technetium radiopharmaceuticals to minimise radiation exposure to nuclear medicine staff and patient

    International Nuclear Information System (INIS)

    Nazififard, M.; Mahdizadeh, S.; Meigooni, A. S.; Alavi, M.; Suh, K. Y.

    2012-01-01

    A Joint Automatic Dispenser Equipment (JADE) has been designed and fabricated for automatic withdrawal and calibration of radiopharmaceutical materials. The thermoluminescent dosemeter procedures have shown a reduction in dose to the technician's hand with this novel dose dispenser system JADE when compared with the manual withdrawal of 99m Tc. This system helps to increase the precision of calibration and to minimise the radiation dose to the hands and body of the workers. This paper describes the structure of this device, its function and user-friendliness, and its efficacy. The efficacy of this device was determined by measuring the radiation dose delivered to the hands of the nuclear medicine laboratory technician. The user-friendliness of JADE has been examined. The automatic withdrawal and calibration offered by this system reduces the dose to the technician's hand to a level below the maximum permissible dose stipulated by the international protocols. This research will serve as a backbone for future study about the safe use of ionising radiation in medicine. (authors)

  2. Speech recognition from spectral dynamics

    Indian Academy of Sciences (India)

    Some of the history of gradual infusion of the modulation spectrum concept into Automatic recognition of speech (ASR) comes next, pointing to the relationship of modulation spectrum processing to wellaccepted ASR techniques such as dynamic speech features or RelAtive SpecTrAl (RASTA) filtering. Next, the frequency ...

  3. An automatic and accurate x-ray tube focal spot/grid alignment system for mobile radiography: System description and alignment accuracy

    International Nuclear Information System (INIS)

    Gauntt, David M.; Barnes, Gary T.

    2010-01-01

    Purpose: A mobile radiography automatic grid alignment system (AGAS) has been developed by modifying a commercially available mobile unit. The objectives of this article are to describe the modifications and operation and to report on the accuracy with which the focal spot is aligned to the grid and the time required to achieve the alignment. Methods: The modifications include an optical target arm attached to the grid tunnel, a video camera attached to the collimator, a motion control system with six degrees of freedom to position the collimator and x-ray tube, and a computer to control the system. The video camera and computer determine the grid position, and then the motion control system drives the x-ray focal spot to the center of the grid focal axis. The accuracy of the alignment of the focal spot with the grid and the time required to achieve alignment were measured both in laboratory tests and in clinical use. Results: For a typical exam, the modified unit automatically aligns the focal spot with the grid in less than 10 s, with an accuracy of better than 4 mm. The results of the speed and accuracy tests in clinical use were similar to the results in laboratory tests. Comparison patient chest images are presented--one obtained with a standard mobile radiographic unit without a grid and the other obtained with the modified unit and a 15:1 grid. The 15:1 grid images demonstrate a marked improvement in image quality compared to the nongrid images with no increase in patient dose. Conclusions: The mobile radiography AGAS produces images of significantly improved quality compared to nongrid images with alignment times of less than 10 s and no increase in patient dose.

  4. Collecting and evaluating speech recognition corpora for nine Southern Bantu languages

    CSIR Research Space (South Africa)

    Badenhorst, JAC

    2009-03-01

    Full Text Available The authors describes the Lwazi corpus for automatic speech recognition (ASR), a new telephone speech corpus which includes data from nine Southern Bantu languages. Because of practical constraints, the amount of speech per language is relatively...

  5. Speech Problems

    Science.gov (United States)

    ... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...

  6. Speech Recognition on Mobile Devices

    DEFF Research Database (Denmark)

    Tan, Zheng-Hua; Lindberg, Børge

    2010-01-01

    in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within......The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR...

  7. WE-A-17A-10: Fast, Automatic and Accurate Catheter Reconstruction in HDR Brachytherapy Using An Electromagnetic 3D Tracking System

    Energy Technology Data Exchange (ETDEWEB)

    Poulin, E; Racine, E; Beaulieu, L [CHU de Quebec - Universite Laval, Quebec, Quebec (Canada); Binnekamp, D [Integrated Clinical Solutions and Marketing, Philips Healthcare, Best, DA (Netherlands)

    2014-06-15

    Purpose: In high dose rate brachytherapy (HDR-B), actual catheter reconstruction protocols are slow and errors prompt. The purpose of this study was to evaluate the accuracy and robustness of an electromagnetic (EM) tracking system for improved catheter reconstruction in HDR-B protocols. Methods: For this proof-of-principle, a total of 10 catheters were inserted in gelatin phantoms with different trajectories. Catheters were reconstructed using a Philips-design 18G biopsy needle (used as an EM stylet) and the second generation Aurora Planar Field Generator from Northern Digital Inc. The Aurora EM system exploits alternating current technology and generates 3D points at 40 Hz. Phantoms were also scanned using a μCT (GE Healthcare) and Philips Big Bore clinical CT system with a resolution of 0.089 mm and 2 mm, respectively. Reconstructions using the EM stylet were compared to μCT and CT. To assess the robustness of the EM reconstruction, 5 catheters were reconstructed twice and compared. Results: Reconstruction time for one catheter was 10 seconds or less. This would imply that for a typical clinical implant of 17 catheters, the total reconstruction time would be less than 3 minutes. When compared to the μCT, the mean EM tip identification error was 0.69 ± 0.29 mm while the CT error was 1.08 ± 0.67 mm. The mean 3D distance error was found to be 0.92 ± 0.37 mm and 1.74 ± 1.39 mm for the EM and CT, respectively. EM 3D catheter trajectories were found to be significantly more accurate (unpaired t-test, p < 0.05). A mean difference of less than 0.5 mm was found between successive EM reconstructions. Conclusion: The EM reconstruction was found to be faster, more accurate and more robust than the conventional methods used for catheter reconstruction in HDR-B. This approach can be applied to any type of catheters and applicators. We would like to disclose that the equipments, used in this study, is coming from a collaboration with Philips Medical.

  8. WE-A-17A-10: Fast, Automatic and Accurate Catheter Reconstruction in HDR Brachytherapy Using An Electromagnetic 3D Tracking System

    International Nuclear Information System (INIS)

    Poulin, E; Racine, E; Beaulieu, L; Binnekamp, D

    2014-01-01

    Purpose: In high dose rate brachytherapy (HDR-B), actual catheter reconstruction protocols are slow and errors prompt. The purpose of this study was to evaluate the accuracy and robustness of an electromagnetic (EM) tracking system for improved catheter reconstruction in HDR-B protocols. Methods: For this proof-of-principle, a total of 10 catheters were inserted in gelatin phantoms with different trajectories. Catheters were reconstructed using a Philips-design 18G biopsy needle (used as an EM stylet) and the second generation Aurora Planar Field Generator from Northern Digital Inc. The Aurora EM system exploits alternating current technology and generates 3D points at 40 Hz. Phantoms were also scanned using a μCT (GE Healthcare) and Philips Big Bore clinical CT system with a resolution of 0.089 mm and 2 mm, respectively. Reconstructions using the EM stylet were compared to μCT and CT. To assess the robustness of the EM reconstruction, 5 catheters were reconstructed twice and compared. Results: Reconstruction time for one catheter was 10 seconds or less. This would imply that for a typical clinical implant of 17 catheters, the total reconstruction time would be less than 3 minutes. When compared to the μCT, the mean EM tip identification error was 0.69 ± 0.29 mm while the CT error was 1.08 ± 0.67 mm. The mean 3D distance error was found to be 0.92 ± 0.37 mm and 1.74 ± 1.39 mm for the EM and CT, respectively. EM 3D catheter trajectories were found to be significantly more accurate (unpaired t-test, p < 0.05). A mean difference of less than 0.5 mm was found between successive EM reconstructions. Conclusion: The EM reconstruction was found to be faster, more accurate and more robust than the conventional methods used for catheter reconstruction in HDR-B. This approach can be applied to any type of catheters and applicators. We would like to disclose that the equipments, used in this study, is coming from a collaboration with Philips Medical

  9. INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM

    Directory of Open Access Journals (Sweden)

    J. SANGEETHA

    2015-02-01

    Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.

  10. Speech Compression

    Directory of Open Access Journals (Sweden)

    Jerry D. Gibson

    2016-06-01

    Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.

  11. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    Science.gov (United States)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  12. A speech-controlled environmental control system for people with severe dysarthria.

    Science.gov (United States)

    Hawley, Mark S; Enderby, Pam; Green, Phil; Cunningham, Stuart; Brownsell, Simon; Carmichael, James; Parker, Mark; Hatzis, Athanassios; O'Neill, Peter; Palmer, Rebecca

    2007-06-01

    Automatic speech recognition (ASR) can provide a rapid means of controlling electronic assistive technology. Off-the-shelf ASR systems function poorly for users with severe dysarthria because of the increased variability of their articulations. We have developed a limited vocabulary speaker dependent speech recognition application which has greater tolerance to variability of speech, coupled with a computerised training package which assists dysarthric speakers to improve the consistency of their vocalisations and provides more data for recogniser training. These applications, and their implementation as the interface for a speech-controlled environmental control system (ECS), are described. The results of field trials to evaluate the training program and the speech-controlled ECS are presented. The user-training phase increased the recognition rate from 88.5% to 95.4% (p<0.001). Recognition rates were good for people with even the most severe dysarthria in everyday usage in the home (mean word recognition rate 86.9%). Speech-controlled ECS were less accurate (mean task completion accuracy 78.6% versus 94.8%) but were faster to use than switch-scanning systems, even taking into account the need to repeat unsuccessful operations (mean task completion time 7.7s versus 16.9s, p<0.001). It is concluded that a speech-controlled ECS is a viable alternative to switch-scanning systems for some people with severe dysarthria and would lead, in many cases, to more efficient control of the home.

  13. Designing speech for a recipient

    DEFF Research Database (Denmark)

    Fischer, Kerstin

    This study asks how speakers adjust their speech to their addressees, focusing on the potential roles of cognitive representations such as partner models, automatic processes such as interactive alignment, and social processes such as interactional negotiation. The nature of addressee orientation......, psycholinguistics and conversation analysis, and offers both overviews of child-directed, foreigner-directed and robot-directed speech and in-depth analyses of the processes involved in adjusting to a communication partner....

  14. Epoch-based analysis of speech signals

    Indian Academy of Sciences (India)

    on speech production characteristics, but also helps in accurate analysis of speech. .... include time delay estimation, speech enhancement from single and multi- ...... log. (. E[k]. ∑K−1 l=0. E[l]. ) ,. (7) where K is the number of samples in the ...

  15. Speech Matters

    DEFF Research Database (Denmark)

    Hasse Jørgensen, Stina

    2011-01-01

    About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....

  16. Speech-to-Speech Relay Service

    Science.gov (United States)

    Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...

  17. Apraxia of Speech

    Science.gov (United States)

    ... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...

  18. Introductory speeches

    International Nuclear Information System (INIS)

    2001-01-01

    This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering

  19. HMM adaptation for child speech synthesis using ASR data

    CSIR Research Space (South Africa)

    Govender, N

    2015-11-01

    Full Text Available . This paper reports on a feasibility study that was conducted to determine whether it is possible to synthesize good quality child voices using child speech data that was recorded for automatic speech recognition (ASR) purposes. A text-to-speech system...

  20. Recognizing Stress Using Semantics and Modulation of Speech and Gestures

    NARCIS (Netherlands)

    Lefter, I.; Burghouts, G.J.; Rothkrantz, L.J.M.

    2016-01-01

    This paper investigates how speech and gestures convey stress, and how they can be used for automatic stress recognition. As a first step, we look into how humans use speech and gestures to convey stress. In particular, for both speech and gestures, we distinguish between stress conveyed by the

  1. Alternative Speech Communication System for Persons with Severe Speech Disorders

    Science.gov (United States)

    Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

    2009-12-01

    Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.

  2. Automatic differentiation bibliography

    Energy Technology Data Exchange (ETDEWEB)

    Corliss, G.F. [comp.

    1992-07-01

    This is a bibliography of work related to automatic differentiation. Automatic differentiation is a technique for the fast, accurate propagation of derivative values using the chain rule. It is neither symbolic nor numeric. Automatic differentiation is a fundamental tool for scientific computation, with applications in optimization, nonlinear equations, nonlinear least squares approximation, stiff ordinary differential equation, partial differential equations, continuation methods, and sensitivity analysis. This report is an updated version of the bibliography which originally appeared in Automatic Differentiation of Algorithms: Theory, Implementation, and Application.

  3. Text as a Supplement to Speech in Young and Older Adults.

    Science.gov (United States)

    Krull, Vidya; Humes, Larry E

    2016-01-01

    The purpose of this experiment was to quantify the contribution of visual text to auditory speech recognition in background noise. Specifically, the authors tested the hypothesis that partially accurate visual text from an automatic speech recognizer could be used successfully to supplement speech understanding in difficult listening conditions in older adults, with normal or impaired hearing. The working hypotheses were based on what is known regarding audiovisual speech perception in the elderly from speechreading literature. We hypothesized that (1) combining auditory and visual text information will result in improved recognition accuracy compared with auditory or visual text information alone, (2) benefit from supplementing speech with visual text (auditory and visual enhancement) in young adults will be greater than that in older adults, and (3) individual differences in performance on perceptual measures would be associated with cognitive abilities. Fifteen young adults with normal hearing, 15 older adults with normal hearing, and 15 older adults with hearing loss participated in this study. All participants completed sentence recognition tasks in auditory-only, text-only, and combined auditory-text conditions. The auditory sentence stimuli were spectrally shaped to restore audibility for the older participants with impaired hearing. All participants also completed various cognitive measures, including measures of working memory, processing speed, verbal comprehension, perceptual and cognitive speed, processing efficiency, inhibition, and the ability to form wholes from parts. Group effects were examined for each of the perceptual and cognitive measures. Audiovisual benefit was calculated relative to performance on auditory- and visual-text only conditions. Finally, the relationship between perceptual measures and other independent measures were examined using principal-component factor analyses, followed by regression analyses. Both young and older adults

  4. Speech production in amplitude-modulated noise

    DEFF Research Database (Denmark)

    Macdonald, Ewen N; Raufer, Stefan

    2013-01-01

    The Lombard effect refers to the phenomenon where talkers automatically increase their level of speech in a noisy environment. While many studies have characterized how the Lombard effect influences different measures of speech production (e.g., F0, spectral tilt, etc.), few have investigated...... the consequences of temporally fluctuating noise. In the present study, 20 talkers produced speech in a variety of noise conditions, including both steady-state and amplitude-modulated white noise. While listening to noise over headphones, talkers produced randomly generated five word sentences. Similar...... of noisy environments and will alter their speech accordingly....

  5. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  6. Speech is Golden

    DEFF Research Database (Denmark)

    Juel Henrichsen, Peter

    2014-01-01

    on the supply side. The present article reports on a new public action strategy which has taken shape in the course of 2013-14. While Denmark is a small language area, our public sector is well organised and has considerable purchasing power. Across this past year, Danish local authorities have organised around......Most of the Danish municipalities are ready to begin to adopt automatic speech recognition, but at the same time remain nervous following a long series of bad business cases in the recent past. Complaints are voiced over costly licences and low service levels, typical effects of a de facto monopoly...... the speech technology challenge, they have formulated a number of joint questions and new requirements to be met by suppliers and have deliberately worked towards formulating tendering material which will allow fair competition. Public researchers have contributed to this work, including the author...

  7. Speech Alarms Pilot Study

    Science.gov (United States)

    Sandor, A.; Moses, H. R.

    2016-01-01

    asked to identify the alert as quickly and as accurately as possible. Reaction time and accuracy were measured. Participants identified speech alerts significantly faster than tone alerts. The HERA study investigated the performance of participants in a flight-like environment. Participants were instructed to complete items on a task list and respond to C&W alerts as they occurred. Reaction time and accuracy were measured to determine if the benefits of speech alarms are still present in an applied setting.

  8. Noise-robust speech recognition through auditory feature detection and spike sequence decoding.

    Science.gov (United States)

    Schafer, Phillip B; Jin, Dezhe Z

    2014-03-01

    Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.

  9. Neural Entrainment to Speech Modulates Speech Intelligibility

    NARCIS (Netherlands)

    Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne

    2018-01-01

    Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and

  10. The speech signal segmentation algorithm using pitch synchronous analysis

    Directory of Open Access Journals (Sweden)

    Amirgaliyev Yedilkhan

    2017-03-01

    Full Text Available Parameterization of the speech signal using the algorithms of analysis synchronized with the pitch frequency is discussed. Speech parameterization is performed by the average number of zero transitions function and the signal energy function. Parameterization results are used to segment the speech signal and to isolate the segments with stable spectral characteristics. Segmentation results can be used to generate a digital voice pattern of a person or be applied in the automatic speech recognition. Stages needed for continuous speech segmentation are described.

  11. Automatic Imitation

    Science.gov (United States)

    Heyes, Cecilia

    2011-01-01

    "Automatic imitation" is a type of stimulus-response compatibility effect in which the topographical features of task-irrelevant action stimuli facilitate similar, and interfere with dissimilar, responses. This article reviews behavioral, neurophysiological, and neuroimaging research on automatic imitation, asking in what sense it is "automatic"…

  12. Speech Research

    Science.gov (United States)

    Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

  13. Hate speech

    Directory of Open Access Journals (Sweden)

    Anne Birgitta Nilsen

    2014-12-01

    Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the

  14. B0-correction and k-means clustering for accurate and automatic identification of regions with reduced apparent diffusion coefficient (ADC) in adva nced cervical cancer at the time of brachytherapy

    DEFF Research Database (Denmark)

    Haack, Søren; Pedersen, Erik Morre; Vinding, Mads Sloth

    in dose planning of radiotherapy. This study evaluates the use of k-means clustering for automatic user independent delineation of regions of reduced apparent diffusion coefficient (ADC) and the value of B0-correction of DW-MRI for reduction of geometrical distortions during dose planning of brachytherapy...

  15. Speech enhancement

    CERN Document Server

    Benesty, Jacob; Chen, Jingdong

    2006-01-01

    We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red

  16. Objective voice and speech analysis of persons with chronic hoarseness by prosodic analysis of speech samples.

    Science.gov (United States)

    Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar

    2016-10-01

    Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.

  17. End-to-end visual speech recognition with LSTMS

    NARCIS (Netherlands)

    Petridis, Stavros; Li, Zuwei; Pantic, Maja

    2017-01-01

    Traditional visual speech recognition systems consist of two stages, feature extraction and classification. Recently, several deep learning approaches have been presented which automatically extract features from the mouth images and aim to replace the feature extraction stage. However, research on

  18. Recognizing intentions in infant-directed speech: evidence for universals.

    Science.gov (United States)

    Bryant, Gregory A; Barrett, H Clark

    2007-08-01

    In all languages studied to date, distinct prosodic contours characterize different intention categories of infant-directed (ID) speech. This vocal behavior likely exists universally as a species-typical trait, but little research has examined whether listeners can accurately recognize intentions in ID speech using only vocal cues, without access to semantic information. We recorded native-English-speaking mothers producing four intention categories of utterances (prohibition, approval, comfort, and attention) as both ID and adult-directed (AD) speech, and we then presented the utterances to Shuar adults (South American hunter-horticulturalists). Shuar subjects were able to reliably distinguish ID from AD speech and were able to reliably recognize the intention categories in both types of speech, although performance was significantly better with ID speech. This is the first demonstration that adult listeners in an indigenous, nonindustrialized, and nonliterate culture can accurately infer intentions from both ID speech and AD speech in a language they do not speak.

  19. Speech-enabled Computer-aided Translation

    DEFF Research Database (Denmark)

    Mesa-Lao, Bartolomé

    2014-01-01

    The present study has surveyed post-editor trainees’ views and attitudes before and after the introduction of speech technology as a front end to a computer-aided translation workbench. The aim of the survey was (i) to identify attitudes and perceptions among post-editor trainees before performing...... a post-editing task using automatic speech recognition (ASR); and (ii) to assess the degree to which post-editors’ attitudes and expectations to the use of speech technology changed after actually using it. The survey was based on two questionnaires: the first one administered before the participants...

  20. Speech Intelligibility

    Science.gov (United States)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  1. Automated speech quality monitoring tool based on perceptual evaluation

    OpenAIRE

    Vozňák, Miroslav; Rozhon, Jan

    2010-01-01

    The paper deals with a speech quality monitoring tool which we have developed in accordance with PESQ (Perceptual Evaluation of Speech Quality) and is automatically running and calculating the MOS (Mean Opinion Score). Results are stored into database and used in a research project investigating how meteorological conditions influence the speech quality in a GSM network. The meteorological station, which is located in our university campus provides information about a temperature,...

  2. Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled

    NARCIS (Netherlands)

    Huijbregts, M.A.H.

    2008-01-01

    In this thesis, research on large vocabulary continuous speech recognition for unknown audio conditions is presented. For automatic speech recognition systems based on statistical methods, it is important that the conditions of the audio used for training the statistical models match the conditions

  3. Primary progressive aphasia and apraxia of speech.

    Science.gov (United States)

    Jung, Youngsin; Duffy, Joseph R; Josephs, Keith A

    2013-09-01

    Primary progressive aphasia is a neurodegenerative syndrome characterized by progressive language dysfunction. The majority of primary progressive aphasia cases can be classified into three subtypes: nonfluent/agrammatic, semantic, and logopenic variants. Each variant presents with unique clinical features, and is associated with distinctive underlying pathology and neuroimaging findings. Unlike primary progressive aphasia, apraxia of speech is a disorder that involves inaccurate production of sounds secondary to impaired planning or programming of speech movements. Primary progressive apraxia of speech is a neurodegenerative form of apraxia of speech, and it should be distinguished from primary progressive aphasia given its discrete clinicopathological presentation. Recently, there have been substantial advances in our understanding of these speech and language disorders. The clinical, neuroimaging, and histopathological features of primary progressive aphasia and apraxia of speech are reviewed in this article. The distinctions among these disorders for accurate diagnosis are increasingly important from a prognostic and therapeutic standpoint. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  4. Use of Speech Analyses within a Mobile Application for the Assessment of Cognitive Impairment in Elderly People.

    Science.gov (United States)

    Konig, Alexandra; Satt, Aharon; Sorin, Alex; Hoory, Ran; Derreumaux, Alexandre; David, Renaud; Robert, Phillippe H

    2018-01-01

    Various types of dementia and Mild Cognitive Impairment (MCI) are manifested as irregularities in human speech and language, which have proven to be strong predictors for the disease presence and progress ion. Therefore, automatic speech analytics provided by a mobile application may be a useful tool in providing additional indicators for assessment and detection of early stage dementia and MCI. 165 participants (subjects with subjective cognitive impairment (SCI), MCI patients, Alzheimer's disease (AD) and mixed dementia (MD) patients) were recorded with a mobile application while performing several short vocal cognitive tasks during a regular consultation. These tasks included verbal fluency, picture description, counting down and a free speech task. The voice recordings were processed in two steps: in the first step, vocal markers were extracted using speech signal processing techniques; in the second, the vocal markers were tested to assess their 'power' to distinguish between SCI, MCI, AD and MD. The second step included training automatic classifiers for detecting MCI and AD, based on machine learning methods, and testing the detection accuracy. The fluency and free speech tasks obtain the highest accuracy rates of classifying AD vs. MD vs. MCI vs. SCI. Using the data, we demonstrated classification accuracy as follows: SCI vs. AD = 92% accuracy; SCI vs. MD = 92% accuracy; SCI vs. MCI = 86% accuracy and MCI vs. AD = 86%. Our results indicate the potential value of vocal analytics and the use of a mobile application for accurate automatic differentiation between SCI, MCI and AD. This tool can provide the clinician with meaningful information for assessment and monitoring of people with MCI and AD based on a non-invasive, simple and low-cost method. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  5. Hybrid model decomposition of speech and noise in a radial basis function neural model framework

    DEFF Research Database (Denmark)

    Sørensen, Helge Bjarup Dissing; Hartmann, Uwe

    1994-01-01

    The aim of the paper is to focus on a new approach to automatic speech recognition in noisy environments where the noise has either stationary or non-stationary statistical characteristics. The aim is to perform automatic recognition of speech in the presence of additive car noise. The technique...

  6. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...

  7. Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

    Science.gov (United States)

    Larm, Petra; Hongisto, Valtteri

    2006-02-01

    During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.

  8. Random Deep Belief Networks for Recognizing Emotions from Speech Signals.

    Science.gov (United States)

    Wen, Guihua; Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang

    2017-01-01

    Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.

  9. Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

    Science.gov (United States)

    Kayasith, Prakasith; Theeramunkong, Thanaruk

    It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.

  10. Automatic, accurate, and reproducible segmentation of the brain and cerebro-spinal fluid in T1-weighted volume MRI scans and its application to serial cerebral and intracranial volumetry

    Science.gov (United States)

    Lemieux, Louis

    2001-07-01

    A new fully automatic algorithm for the segmentation of the brain and cerebro-spinal fluid (CSF) from T1-weighted volume MRI scans of the head was specifically developed in the context of serial intra-cranial volumetry. The method is an extension of a previously published brain extraction algorithm. The brain mask is used as a basis for CSF segmentation based on morphological operations, automatic histogram analysis and thresholding. Brain segmentation is then obtained by iterative tracking of the brain-CSF interface. Grey matter (GM), white matter (WM) and CSF volumes are calculated based on a model of intensity probability distribution that includes partial volume effects. Accuracy was assessed using a digital phantom scan. Reproducibility was assessed by segmenting pairs of scans from 20 normal subjects scanned 8 months apart and 11 patients with epilepsy scanned 3.5 years apart. Segmentation accuracy as measured by overlap was 98% for the brain and 96% for the intra-cranial tissues. The volume errors were: total brain (TBV): -1.0%, intra-cranial (ICV):0.1%, CSF: +4.8%. For repeated scans, matching resulted in improved reproducibility. In the controls, the coefficient of reliability (CR) was 1.5% for the TVB and 1.0% for the ICV. In the patients, the Cr for the ICV was 1.2%.

  11. OLIVE: Speech-Based Video Retrieval

    NARCIS (Netherlands)

    de Jong, Franciska M.G.; Gauvain, Jean-Luc; den Hartog, Jurgen; den Hartog, Jeremy; Netter, Klaus

    1999-01-01

    This paper describes the Olive project which aims to support automated indexing of video material by use of human language technologies. Olive is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which serve as the

  12. A distributed approach to speech resource collection

    CSIR Research Space (South Africa)

    Molapo, R

    2013-12-01

    Full Text Available The authors describe the integration of several tools to enable the end-to-end development of an Automatic Speech Recognition system in a typical under-resourced language. The authors analyse the data acquired by each of the tools and develop an ASR...

  13. Audiovisual Discrimination between Laughter and Speech

    NARCIS (Netherlands)

    Petridis, Stavros; Pantic, Maja

    Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audiovisual approach to distinguishing laughter from speech and we show that integrating the information from audio and video leads to an improved reliability of audiovisual approach in

  14. Childhood apraxia of speech: A survey of praxis and typical speech characteristics.

    Science.gov (United States)

    Malmenholt, Ann; Lohmander, Anette; McAllister, Anita

    2017-07-01

    The purpose of this study was to investigate current knowledge of the diagnosis childhood apraxia of speech (CAS) in Sweden and compare speech characteristics and symptoms to those of earlier survey findings in mainly English-speakers. In a web-based questionnaire 178 Swedish speech-language pathologists (SLPs) anonymously answered questions about their perception of typical speech characteristics for CAS. They graded own assessment skills and estimated clinical occurrence. The seven top speech characteristics reported as typical for children with CAS were: inconsistent speech production (85%), sequencing difficulties (71%), oro-motor deficits (63%), vowel errors (62%), voicing errors (61%), consonant cluster deletions (54%), and prosodic disturbance (53%). Motor-programming deficits described as lack of automatization of speech movements were perceived by 82%. All listed characteristics were consistent with the American Speech-Language-Hearing Association (ASHA) consensus-based features, Strand's 10-point checklist, and the diagnostic model proposed by Ozanne. The mode for clinical occurrence was 5%. Number of suspected cases of CAS in the clinical caseload was approximately one new patient/year and SLP. The results support and add to findings from studies of CAS in English-speaking children with similar speech characteristics regarded as typical. Possibly, these findings could contribute to cross-linguistic consensus on CAS characteristics.

  15. Speech disorders - children

    Science.gov (United States)

    ... disorder; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder; Speech disorder - stuttering ... evaluation tools that can help identify and diagnose speech disorders: Denver Articulation Screening Examination Goldman-Fristoe Test of ...

  16. Cortical activity patterns predict robust speech discrimination ability in noise

    Science.gov (United States)

    Shetake, Jai A.; Wolf, Jordan T.; Cheung, Ryan J.; Engineer, Crystal T.; Ram, Satyananda K.; Kilgard, Michael P.

    2012-01-01

    The neural mechanisms that support speech discrimination in noisy conditions are poorly understood. In quiet conditions, spike timing information appears to be used in the discrimination of speech sounds. In this study, we evaluated the hypothesis that spike timing is also used to distinguish between speech sounds in noisy conditions that significantly degrade neural responses to speech sounds. We tested speech sound discrimination in rats and recorded primary auditory cortex (A1) responses to speech sounds in background noise of different intensities and spectral compositions. Our behavioral results indicate that rats, like humans, are able to accurately discriminate consonant sounds even in the presence of background noise that is as loud as the speech signal. Our neural recordings confirm that speech sounds evoke degraded but detectable responses in noise. Finally, we developed a novel neural classifier that mimics behavioral discrimination. The classifier discriminates between speech sounds by comparing the A1 spatiotemporal activity patterns evoked on single trials with the average spatiotemporal patterns evoked by known sounds. Unlike classifiers in most previous studies, this classifier is not provided with the stimulus onset time. Neural activity analyzed with the use of relative spike timing was well correlated with behavioral speech discrimination in quiet and in noise. Spike timing information integrated over longer intervals was required to accurately predict rat behavioral speech discrimination in noisy conditions. The similarity of neural and behavioral discrimination of speech in noise suggests that humans and rats may employ similar brain mechanisms to solve this problem. PMID:22098331

  17. The impact of exploiting spectro-temporal context in computational speech segregation

    DEFF Research Database (Denmark)

    Bentsen, Thomas; Kressner, Abigail Anne; Dau, Torsten

    2018-01-01

    Computational speech segregation aims to automatically segregate speech from interfering noise, often by employing ideal binary mask estimation. Several studies have tried to exploit contextual information in speech to improve mask estimation accuracy by using two frequently-used strategies that (1...... for measured intelligibility. The findings may have implications for the design of speech segregation systems, and for the selection of a cost function that correlates with intelligibility....

  18. A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM

    Directory of Open Access Journals (Sweden)

    Chenchen Huang

    2014-01-01

    Full Text Available Feature extraction is a very important part in speech emotion recognition, and in allusion to feature extraction in speech emotion recognition problems, this paper proposed a new method of feature extraction, using DBNs in DNN to extract emotional features in speech signal automatically. By training a 5 layers depth DBNs, to extract speech emotion feature and incorporate multiple consecutive frames to form a high dimensional feature. The features after training in DBNs were the input of nonlinear SVM classifier, and finally speech emotion recognition multiple classifier system was achieved. The speech emotion recognition rate of the system reached 86.5%, which was 7% higher than the original method.

  19. Speech Processing.

    Science.gov (United States)

    1983-05-01

    The VDE system developed had the capability of recognizing up to 248 separate words in syntactic structures. 4 The two systems described are isolated...AND SPEAKER RECOGNITION by M.J.Hunt 5 ASSESSMENT OF SPEECH SYSTEMS ’ ..- * . by R.K.Moore 6 A SURVEY OF CURRENT EQUIPMENT AND RESEARCH’ by J.S.Bridle...TECHNOLOGY IN NAVY TRAINING SYSTEMS by R.Breaux, M.Blind and R.Lynchard 10 9 I-I GENERAL REVIEW OF MILITARY APPLICATIONS OF VOICE PROCESSING DR. BRUNO

  20. Speech Recognition

    Directory of Open Access Journals (Sweden)

    Adrian Morariu

    2009-01-01

    Full Text Available This paper presents a method of speech recognition by pattern recognition techniques. Learning consists in determining the unique characteristics of a word (cepstral coefficients by eliminating those characteristics that are different from one word to another. For learning and recognition, the system will build a dictionary of words by determining the characteristics of each word to be used in the recognition. Determining the characteristics of an audio signal consists in the following steps: noise removal, sampling it, applying Hamming window, switching to frequency domain through Fourier transform, calculating the magnitude spectrum, filtering data, determining cepstral coefficients.

  1. Speech emotion recognition methods: A literature review

    Science.gov (United States)

    Basharirad, Babak; Moradhaseli, Mohammadreza

    2017-10-01

    Recently, attention of the emotional speech signals research has been boosted in human machine interfaces due to availability of high computation capability. There are many systems proposed in the literature to identify the emotional state through speech. Selection of suitable feature sets, design of a proper classifications methods and prepare an appropriate dataset are the main key issues of speech emotion recognition systems. This paper critically analyzed the current available approaches of speech emotion recognition methods based on the three evaluating parameters (feature set, classification of features, accurately usage). In addition, this paper also evaluates the performance and limitations of available methods. Furthermore, it highlights the current promising direction for improvement of speech emotion recognition systems.

  2. An analysis of machine translation and speech synthesis in speech-to-speech translation system

    OpenAIRE

    Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.

    2011-01-01

    This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...

  3. Deficits in Letter-Speech Sound Associations but Intact Visual Conflict Processing in Dyslexia: Results from a Novel ERP-Paradigm

    OpenAIRE

    Bakos, Sarolta; Landerl, Karin; Bartling, Jürgen; Schulte-Körne, Gerd; Moll, Kristina

    2017-01-01

    The reading and spelling deficits characteristic of developmental dyslexia (dyslexia) have been related to problems in phonological processing and in learning associations between letters and speech-sounds. Even when children with dyslexia have learned the letters and their corresponding speech sounds, letter-speech sound associations might still be less automatized compared to children with age-adequate literacy skills. In order to examine automaticity in letter-speech sound associations and...

  4. Speech and Language Delay

    Science.gov (United States)

    ... OTC Relief for Diarrhea Home Diseases and Conditions Speech and Language Delay Condition Speech and Language Delay Share Print Table of Contents1. ... Treatment6. Everyday Life7. Questions8. Resources What is a speech and language delay? A speech and language delay ...

  5. Analysis of Feature Extraction Methods for Speaker Dependent Speech Recognition

    Directory of Open Access Journals (Sweden)

    Gurpreet Kaur

    2017-02-01

    Full Text Available Speech recognition is about what is being said, irrespective of who is saying. Speech recognition is a growing field. Major progress is taking place on the technology of automatic speech recognition (ASR. Still, there are lots of barriers in this field in terms of recognition rate, background noise, speaker variability, speaking rate, accent etc. Speech recognition rate mainly depends on the selection of features and feature extraction methods. This paper outlines the feature extraction techniques for speaker dependent speech recognition for isolated words. A brief survey of different feature extraction techniques like Mel-Frequency Cepstral Coefficients (MFCC, Linear Predictive Coding Coefficients (LPCC, Perceptual Linear Prediction (PLP, Relative Spectra Perceptual linear Predictive (RASTA-PLP analysis are presented and evaluation is done. Speech recognition has various applications from daily use to commercial use. We have made a speaker dependent system and this system can be useful in many areas like controlling a patient vehicle using simple commands.

  6. Multi-thread Parallel Speech Recognition for Mobile Applications

    Directory of Open Access Journals (Sweden)

    LOJKA Martin

    2014-05-01

    Full Text Available In this paper, the server based solution of the multi-thread large vocabulary automatic speech recognition engine is described along with the Android OS and HTML5 practical application examples. The basic idea was to bring speech recognition available for full variety of applications for computers and especially for mobile devices. The speech recognition engine should be independent of commercial products and services (where the dictionary could not be modified. Using of third-party services could be also a security and privacy problem in specific applications, when the unsecured audio data could not be sent to uncontrolled environments (voice data transferred to servers around the globe. Using our experience with speech recognition applications, we have been able to construct a multi-thread speech recognition serverbased solution designed for simple applications interface (API to speech recognition engine modified to specific needs of particular application.

  7. The importance of accurate glacier albedo for estimates of surface mass balance on Vatnajökull: evaluating the surface energy budget in a regional climate model with automatic weather station observations

    Science.gov (United States)

    Steffensen Schmidt, Louise; Aðalgeirsdóttir, Guðfinna; Guðmundsson, Sverrir; Langen, Peter L.; Pálsson, Finnur; Mottram, Ruth; Gascoin, Simon; Björnsson, Helgi

    2017-07-01

    A simulation of the surface climate of Vatnajökull ice cap, Iceland, carried out with the regional climate model HIRHAM5 for the period 1980-2014, is used to estimate the evolution of the glacier surface mass balance (SMB). This simulation uses a new snow albedo parameterization that allows albedo to exponentially decay with time and is surface temperature dependent. The albedo scheme utilizes a new background map of the ice albedo created from observed MODIS data. The simulation is evaluated against observed daily values of weather parameters from five automatic weather stations (AWSs) from the period 2001-2014, as well as in situ SMB measurements from the period 1995-2014. The model agrees well with observations at the AWS sites, albeit with a general underestimation of the net radiation. This is due to an underestimation of the incoming radiation and a general overestimation of the albedo. The average modelled albedo is overestimated in the ablation zone, which we attribute to an overestimation of the thickness of the snow layer and not taking the surface darkening from dirt and volcanic ash deposition during dust storms and volcanic eruptions into account. A comparison with the specific summer, winter, and net mass balance for the whole of Vatnajökull (1995-2014) shows a good overall fit during the summer, with a small mass balance underestimation of 0.04 m w.e. on average, whereas the winter mass balance is overestimated by on average 0.5 m w.e. due to too large precipitation at the highest areas of the ice cap. A simple correction of the accumulation at the highest points of the glacier reduces this to 0.15 m w.e. Here, we use HIRHAM5 to simulate the evolution of the SMB of Vatnajökull for the period 1981-2014 and show that the model provides a reasonable representation of the SMB for this period. However, a major source of uncertainty in the representation of the SMB is the representation of the albedo, and processes currently not accounted for in RCMs

  8. The importance of accurate glacier albedo for estimates of surface mass balance on Vatnajökull: evaluating the surface energy budget in a regional climate model with automatic weather station observations

    Directory of Open Access Journals (Sweden)

    L. S. Schmidt

    2017-07-01

    Full Text Available A simulation of the surface climate of Vatnajökull ice cap, Iceland, carried out with the regional climate model HIRHAM5 for the period 1980–2014, is used to estimate the evolution of the glacier surface mass balance (SMB. This simulation uses a new snow albedo parameterization that allows albedo to exponentially decay with time and is surface temperature dependent. The albedo scheme utilizes a new background map of the ice albedo created from observed MODIS data. The simulation is evaluated against observed daily values of weather parameters from five automatic weather stations (AWSs from the period 2001–2014, as well as in situ SMB measurements from the period 1995–2014. The model agrees well with observations at the AWS sites, albeit with a general underestimation of the net radiation. This is due to an underestimation of the incoming radiation and a general overestimation of the albedo. The average modelled albedo is overestimated in the ablation zone, which we attribute to an overestimation of the thickness of the snow layer and not taking the surface darkening from dirt and volcanic ash deposition during dust storms and volcanic eruptions into account. A comparison with the specific summer, winter, and net mass balance for the whole of Vatnajökull (1995–2014 shows a good overall fit during the summer, with a small mass balance underestimation of 0.04 m w.e. on average, whereas the winter mass balance is overestimated by on average 0.5 m w.e. due to too large precipitation at the highest areas of the ice cap. A simple correction of the accumulation at the highest points of the glacier reduces this to 0.15 m w.e. Here, we use HIRHAM5 to simulate the evolution of the SMB of Vatnajökull for the period 1981–2014 and show that the model provides a reasonable representation of the SMB for this period. However, a major source of uncertainty in the representation of the SMB is the representation of the albedo, and processes

  9. Methods and Application of Phonetic Label Alignment in Speech Processing Tasks

    Directory of Open Access Journals (Sweden)

    M. Myslivec

    2000-12-01

    Full Text Available The paper deals with the problem of automatic phonetic segmentation ofspeech signals, namely for speech analysis and recognition purposes.Several methods and approaches are described and evaluated from thepoint of view of their accuracy. A complete instruction for creating anannotated database for training a Czech speech recognition system isprovided together with the authors' own experience. The results of thework have found practical applications, for example, in developing atool for semi-automatic speech segmentation, building alarge-vocabulary phoneme-based speech recognition system and designingan aid for learning and practicing pronunciation of words or phrases inthe native or a foreign language.

  10. The speech perception skills of children with and without speech sound disorder.

    Science.gov (United States)

    Hearnshaw, Stephanie; Baker, Elise; Munro, Natalie

    To investigate whether Australian-English speaking children with and without speech sound disorder (SSD) differ in their overall speech perception accuracy. Additionally, to investigate differences in the perception of specific phonemes and the association between speech perception and speech production skills. Twenty-five Australian-English speaking children aged 48-60 months participated in this study. The SSD group included 12 children and the typically developing (TD) group included 13 children. Children completed routine speech and language assessments in addition to an experimental Australian-English lexical and phonetic judgement task based on Rvachew's Speech Assessment and Interactive Learning System (SAILS) program (Rvachew, 2009). This task included eight words across four word-initial phonemes-/k, ɹ, ʃ, s/. Children with SSD showed significantly poorer perceptual accuracy on the lexical and phonetic judgement task compared with TD peers. The phonemes /ɹ/ and /s/ were most frequently perceived in error across both groups. Additionally, the phoneme /ɹ/ was most commonly produced in error. There was also a positive correlation between overall speech perception and speech production scores. Children with SSD perceived speech less accurately than their typically developing peers. The findings suggest that an Australian-English variation of a lexical and phonetic judgement task similar to the SAILS program is promising and worthy of a larger scale study. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Segmental intelligibility of synthetic speech produced by rule.

    Science.gov (United States)

    Logan, J S; Greene, B G; Pisoni, D B

    1989-08-01

    This paper reports the results of an investigation that employed the modified rhyme test (MRT) to measure the segmental intelligibility of synthetic speech generated automatically by rule. Synthetic speech produced by ten text-to-speech systems was studied and compared to natural speech. A variation of the standard MRT was also used to study the effects of response set size on perceptual confusions. Results indicated that the segmental intelligibility scores formed a continuum. Several systems displayed very high levels of performance that were close to or equal to scores obtained with natural speech; other systems displayed substantially worse performance compared to natural speech. The overall performance of the best system, DECtalk--Paul, was equivalent to the data obtained with natural speech for consonants in syllable-initial position. The findings from this study are discussed in terms of the use of a set of standardized procedures for measuring intelligibility of synthetic speech under controlled laboratory conditions. Recent work investigating the perception of synthetic speech under more severe conditions in which greater demands are made on the listener's processing resources is also considered. The wide range of intelligibility scores obtained in the present study demonstrates important differences in perception and suggests that not all synthetic speech is perceptually equivalent to the listener.

  12. Segmental intelligibility of synthetic speech produced by rule

    Science.gov (United States)

    Logan, John S.; Greene, Beth G.; Pisoni, David B.

    2012-01-01

    This paper reports the results of an investigation that employed the modified rhyme test (MRT) to measure the segmental intelligibility of synthetic speech generated automatically by rule. Synthetic speech produced by ten text-to-speech systems was studied and compared to natural speech. A variation of the standard MRT was also used to study the effects of response set size on perceptual confusions. Results indicated that the segmental intelligibility scores formed a continuum. Several systems displayed very high levels of performance that were close to or equal to scores obtained with natural speech; other systems displayed substantially worse performance compared to natural speech. The overall performance of the best system, DECtalk—Paul, was equivalent to the data obtained with natural speech for consonants in syllable-initial position. The findings from this study are discussed in terms of the use of a set of standardized procedures for measuring intelligibility of synthetic speech under controlled laboratory conditions. Recent work investigating the perception of synthetic speech under more severe conditions in which greater demands are made on the listener’s processing resources is also considered. The wide range of intelligibility scores obtained in the present study demonstrates important differences in perception and suggests that not all synthetic speech is perceptually equivalent to the listener. PMID:2527884

  13. The role of visual spatial attention in audiovisual speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias; Tiippana, K.; Laarni, J.

    2009-01-01

    Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre-attentive b......Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre...... from each of the faces and from the voice on the auditory speech percept. We found that directing visual spatial attention towards a face increased the influence of that face on auditory perception. However, the influence of the voice on auditory perception did not change suggesting that audiovisual...... integration did not change. Visual spatial attention was also able to select between the faces when lip reading. This suggests that visual spatial attention acts at the level of visual speech perception prior to audiovisual integration and that the effect propagates through audiovisual integration...

  14. Predicting speech intelligibility in adverse conditions: evaluation of the speech-based envelope power spectrum model

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2011-01-01

    conditions by comparing predictions to measured data from [Kjems et al. (2009). J. Acoust. Soc. Am. 126 (3), 1415-1426] where speech is mixed with four different interferers, including speech-shaped noise, bottle noise, car noise, and cafe noise. The model accounts well for the differences in intelligibility......The speech-based envelope power spectrum model (sEPSM) [Jørgensen and Dau (2011). J. Acoust. Soc. Am., 130 (3), 1475–1487] estimates the envelope signal-to-noise ratio (SNRenv) of distorted speech and accurately describes the speech recognition thresholds (SRT) for normal-hearing listeners...... observed for the different interferers. None of the standardized models successfully describe these data....

  15. Filled pause refinement based on the pronunciation probability for lecture speech.

    Directory of Open Access Journals (Sweden)

    Yan-Hua Long

    Full Text Available Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement.

  16. Deep Recurrent Convolutional Neural Network: Improving Performance For Speech Recognition

    OpenAIRE

    Zhang, Zewang; Sun, Zheng; Liu, Jiaqi; Chen, Jingwen; Huo, Zhao; Zhang, Xiao

    2016-01-01

    A deep learning approach has been widely applied in sequence modeling problems. In terms of automatic speech recognition (ASR), its performance has significantly been improved by increasing large speech corpus and deeper neural network. Especially, recurrent neural network and deep convolutional neural network have been applied in ASR successfully. Given the arising problem of training speed, we build a novel deep recurrent convolutional network for acoustic modeling and then apply deep resid...

  17. Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

    Directory of Open Access Journals (Sweden)

    M. Bashirpour

    2016-09-01

    Full Text Available Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC in a speech emotion recognition system. We investigate its performance in emotion recognition using clean and noisy speech materials and compare it with the performances of the well-known MFCC, LPCC, RASTA-PLP, and also TEMFCC features. Speech samples are extracted from the Berlin emotional speech database (Emo DB and Persian emotional speech database (Persian ESD which are corrupted with 4 different noise types under various SNR levels. The experiments are conducted in clean train/noisy test scenarios to simulate practical conditions with noise sources. Simulation results show that higher recognition rates are achieved for PNCC as compared with the conventional features under noisy conditions.

  18. Accurate activity recognition in a home setting

    NARCIS (Netherlands)

    van Kasteren, T.; Noulas, A.; Englebienne, G.; Kröse, B.

    2008-01-01

    A sensor system capable of automatically recognizing activities would allow many potential ubiquitous applications. In this paper, we present an easy to install sensor network and an accurate but inexpensive annotation method. A recorded dataset consisting of 28 days of sensor data and its

  19. Speech and Communication Disorders

    Science.gov (United States)

    ... to being completely unable to speak or understand speech. Causes include Hearing disorders and deafness Voice problems, ... or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism ...

  20. THE BASIS FOR SPEECH PREVENTION

    Directory of Open Access Journals (Sweden)

    Jordan JORDANOVSKI

    1997-06-01

    Full Text Available The speech is a tool for accurate communication of ideas. When we talk about speech prevention as a practical realization of the language, we are referring to the fact that it should be comprised of the elements of the criteria as viewed from the perspective of the standards. This criteria, in the broad sense of the word, presupposes an exact realization of the thought expressed between the speaker and the recipient.The absence of this criterion catches the eye through the practical realization of the language and brings forth consequences, often hidden very deeply in the human psyche. Their outer manifestation already represents a delayed reaction of the social environment. The foundation for overcoming and standardization of this phenomenon must be the anatomy-physiological patterns of the body, accomplished through methods in concordance with the nature of the body.

  1. ACOUSTIC SPEECH RECOGNITION FOR MARATHI LANGUAGE USING SPHINX

    Directory of Open Access Journals (Sweden)

    Aman Ankit

    2016-09-01

    Full Text Available Speech recognition or speech to text processing, is a process of recognizing human speech by the computer and converting into text. In speech recognition, transcripts are created by taking recordings of speech as audio and their text transcriptions. Speech based applications which include Natural Language Processing (NLP techniques are popular and an active area of research. Input to such applications is in natural language and output is obtained in natural language. Speech recognition mostly revolves around three approaches namely Acoustic phonetic approach, Pattern recognition approach and Artificial intelligence approach. Creation of acoustic model requires a large database of speech and training algorithms. The output of an ASR system is recognition and translation of spoken language into text by computers and computerized devices. ASR today finds enormous application in tasks that require human machine interfaces like, voice dialing, and etc. Our key contribution in this paper is to create corpora for Marathi language and explore the use of Sphinx engine for automatic speech recognition

  2. Free Speech Yearbook 1978.

    Science.gov (United States)

    Phifer, Gregg, Ed.

    The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…

  3. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

    Directory of Open Access Journals (Sweden)

    Heracleous Panikos

    2007-01-01

    Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.

  4. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

    Directory of Open Access Journals (Sweden)

    Hiroshi Saruwatari

    2007-01-01

    Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a 93.9% word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.

  5. Adapting Speech Recognition in Augmented Reality for Mobile Devices in Outdoor Environments

    OpenAIRE

    Pascoal, Rui; Ribeiro, Ricardo; Batista, Fernando; de Almeida, Ana

    2017-01-01

    This paper describes the process of integrating automatic speech recognition (ASR) into a mobile application and explores the benefits and challenges of integrating speech with augmented reality (AR) in outdoor environments. The augmented reality allows end-users to interact with the information displayed and perform tasks, while increasing the user’s perception about the real world by adding virtual information to it. Speech is the most natural way of communication: it allows hands-free inte...

  6. Unobtrusive multimodal emotion detection in adaptive interfaces: speech and facial expressions

    NARCIS (Netherlands)

    Truong, K.P.; Leeuwen, D.A. van; Neerincx, M.A.

    2007-01-01

    Two unobtrusive modalities for automatic emotion recognition are discussed: speech and facial expressions. First, an overview is given of emotion recognition studies based on a combination of speech and facial expressions. We will identify difficulties concerning data collection, data fusion, system

  7. Speech in spinocerebellar ataxia.

    Science.gov (United States)

    Schalling, Ellika; Hartelius, Lena

    2013-12-01

    Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.

  8. Digital speech processing using Matlab

    CERN Document Server

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  9. Low- and high-frequency cortical brain oscillations reflect dissociable mechanisms of concurrent speech segregation in noise.

    Science.gov (United States)

    Yellamsetty, Anusha; Bidelman, Gavin M

    2018-04-01

    Parsing simultaneous speech requires listeners use pitch-guided segregation which can be affected by the signal-to-noise ratio (SNR) in the auditory scene. The interaction of these two cues may occur at multiple levels within the cortex. The aims of the current study were to assess the correspondence between oscillatory brain rhythms and determine how listeners exploit pitch and SNR cues to successfully segregate concurrent speech. We recorded electrical brain activity while participants heard double-vowel stimuli whose fundamental frequencies (F0s) differed by zero or four semitones (STs) presented in either clean or noise-degraded (+5 dB SNR) conditions. We found that behavioral identification was more accurate for vowel mixtures with larger pitch separations but F0 benefit interacted with noise. Time-frequency analysis decomposed the EEG into different spectrotemporal frequency bands. Low-frequency (θ, β) responses were elevated when speech did not contain pitch cues (0ST > 4ST) or was noisy, suggesting a correlate of increased listening effort and/or memory demands. Contrastively, γ power increments were observed for changes in both pitch (0ST > 4ST) and SNR (clean > noise), suggesting high-frequency bands carry information related to acoustic features and the quality of speech representations. Brain-behavior associations corroborated these effects; modulations in low-frequency rhythms predicted the speed of listeners' perceptual decisions with higher bands predicting identification accuracy. Results are consistent with the notion that neural oscillations reflect both automatic (pre-perceptual) and controlled (post-perceptual) mechanisms of speech processing that are largely divisible into high- and low-frequency bands of human brain rhythms. Copyright © 2018 Elsevier B.V. All rights reserved.

  10. Automatic lip reading by using multimodal visual features

    Science.gov (United States)

    Takahashi, Shohei; Ohya, Jun

    2013-12-01

    Since long time ago, speech recognition has been researched, though it does not work well in noisy places such as in the car or in the train. In addition, people with hearing-impaired or difficulties in hearing cannot receive benefits from speech recognition. To recognize the speech automatically, visual information is also important. People understand speeches from not only audio information, but also visual information such as temporal changes in the lip shape. A vision based speech recognition method could work well in noisy places, and could be useful also for people with hearing disabilities. In this paper, we propose an automatic lip-reading method for recognizing the speech by using multimodal visual information without using any audio information such as speech recognition. First, the ASM (Active Shape Model) is used to track and detect the face and lip in a video sequence. Second, the shape, optical flow and spatial frequencies of the lip features are extracted from the lip detected by ASM. Next, the extracted multimodal features are ordered chronologically so that Support Vector Machine is performed in order to learn and classify the spoken words. Experiments for classifying several words show promising results of this proposed method.

  11. Speech rate in Parkinson's disease: A controlled study.

    Science.gov (United States)

    Martínez-Sánchez, F; Meilán, J J G; Carro, J; Gómez Íñiguez, C; Millian-Morell, L; Pujante Valverde, I M; López-Alburquerque, T; López, D E

    2016-09-01

    Speech disturbances will affect most patients with Parkinson's disease (PD) over the course of the disease. The origin and severity of these symptoms are of clinical and diagnostic interest. To evaluate the clinical pattern of speech impairment in PD patients and identify significant differences in speech rate and articulation compared to control subjects. Speech rate and articulation in a reading task were measured using an automatic analytical method. A total of 39 PD patients in the 'on' state and 45 age-and sex-matched asymptomatic controls participated in the study. None of the patients experienced dyskinesias or motor fluctuations during the test. The patients with PD displayed a significant reduction in speech and articulation rates; there were no significant correlations between the studied speech parameters and patient characteristics such as L-dopa dose, duration of the disorder, age, and UPDRS III scores and Hoehn & Yahr scales. Patients with PD show a characteristic pattern of declining speech rate. These results suggest that in PD, disfluencies are the result of the movement disorder affecting the physiology of speech production systems. Copyright © 2014 Sociedad Española de Neurología. Publicado por Elsevier España, S.L.U. All rights reserved.

  12. Automatic phoneme category selectivity in the dorsal auditory stream.

    Science.gov (United States)

    Chevillet, Mark A; Jiang, Xiong; Rauschecker, Josef P; Riesenhuber, Maximilian

    2013-03-20

    Debates about motor theories of speech perception have recently been reignited by a burst of reports implicating premotor cortex (PMC) in speech perception. Often, however, these debates conflate perceptual and decision processes. Evidence that PMC activity correlates with task difficulty and subject performance suggests that PMC might be recruited, in certain cases, to facilitate category judgments about speech sounds (rather than speech perception, which involves decoding of sounds). However, it remains unclear whether PMC does, indeed, exhibit neural selectivity that is relevant for speech decisions. Further, it is unknown whether PMC activity in such cases reflects input via the dorsal or ventral auditory pathway, and whether PMC processing of speech is automatic or task-dependent. In a novel modified categorization paradigm, we presented human subjects with paired speech sounds from a phonetic continuum but diverted their attention from phoneme category using a challenging dichotic listening task. Using fMRI rapid adaptation to probe neural selectivity, we observed acoustic-phonetic selectivity in left anterior and left posterior auditory cortical regions. Conversely, we observed phoneme-category selectivity in left PMC that correlated with explicit phoneme-categorization performance measured after scanning, suggesting that PMC recruitment can account for performance on phoneme-categorization tasks. Structural equation modeling revealed connectivity from posterior, but not anterior, auditory cortex to PMC, suggesting a dorsal route for auditory input to PMC. Our results provide evidence for an account of speech processing in which the dorsal stream mediates automatic sensorimotor integration of speech and may be recruited to support speech decision tasks.

  13. Speech fluency profile on different tasks for individuals with Parkinson's disease.

    Science.gov (United States)

    Juste, Fabiola Staróbole; Andrade, Claudia Regina Furquim de

    2017-07-20

    To characterize the speech fluency profile of patients with Parkinson's disease. Study participants were 40 individuals of both genders aged 40 to 80 years divided into 2 groups: Research Group - RG (20 individuals with diagnosis of Parkinson's disease) and Control Group - CG (20 individuals with no communication or neurological disorders). For all of the participants, three speech samples involving different tasks were collected: monologue, individual reading, and automatic speech. The RG presented a significant larger number of speech disruptions, both stuttering-like and typical dysfluencies, and higher percentage of speech discontinuity in the monologue and individual reading tasks compared with the CG. Both groups presented reduced number of speech disruptions (stuttering-like and typical dysfluencies) in the automatic speech task; the groups presented similar performance in this task. Regarding speech rate, individuals in the RG presented lower number of words and syllables per minute compared with those in the CG in all speech tasks. Participants of the RG presented altered parameters of speech fluency compared with those of the CG; however, this change in fluency cannot be considered a stuttering disorder.

  14. Connected digit speech recognition system for Malayalam language

    Indian Academy of Sciences (India)

    A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer for Malayalam language. The system employs Perceptual ...

  15. Term clouds as surrogates for user generated speech

    NARCIS (Netherlands)

    Tsagkias, M.; Larson, M.; de Rijke, M.; Myaeng, S.-H.; Oard, D.W.; Sebastiani, F.; Chua, T.-S.; Leong, M.-K.

    2008-01-01

    User generated spoken audio remains a challenge for Automatic Speech Recognition (ASR) technology and content-based audio surrogates derived from ASR-transcripts must be error robust. An investigation of the use of term clouds as surrogates for podcasts demonstrates that ASR term clouds closely

  16. Enhanced multimedia content access and exploitation using semantic speech retrieval

    NARCIS (Netherlands)

    Ordelman, Roeland J.F.; de Jong, Franciska M.G.; Larson, Martha

    Techniques for automatic annotation of spoken content making use of speech recognition technology have long been characterized as holding unrealized promise to provide access to archives inundated with undisclosed multimedia material. This paper provides an overview of techniques and trends in

  17. Automated Speech and Audio Analysis for Semantic Access to Multimedia

    NARCIS (Netherlands)

    Jong, F.M.G. de; Ordelman, R.; Huijbregts, M.

    2006-01-01

    The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to

  18. Automated speech and audio analysis for semantic access to multimedia

    NARCIS (Netherlands)

    de Jong, Franciska M.G.; Ordelman, Roeland J.F.; Huijbregts, M.A.H.; Avrithis, Y.; Kompatsiaris, Y.; Staab, S.; O' Connor, N.E.

    2006-01-01

    The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to

  19. Speech Alarms Pilot Study

    Science.gov (United States)

    Sandor, Aniko; Moses, Haifa

    2016-01-01

    Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.

  20. Ear, Hearing and Speech

    DEFF Research Database (Denmark)

    Poulsen, Torben

    2000-01-01

    An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...

  1. Principles of speech coding

    CERN Document Server

    Ogunfunmi, Tokunbo

    2010-01-01

    It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the

  2. Speech disorder prevention

    Directory of Open Access Journals (Sweden)

    Miladis Fornaris-Méndez

    2017-04-01

    Full Text Available Language therapy has trafficked from a medical focus until a preventive focus. However, difficulties are evidenced in the development of this last task, because he is devoted bigger space to the correction of the disorders of the language. Because the speech disorders is the dysfunction with more frequently appearance, acquires special importance the preventive work that is developed to avoid its appearance. Speech education since early age of the childhood makes work easier for prevent the appearance of speech disorders in the children. The present work has as objective to offer different activities for the prevention of the speech disorders.

  3. Ultrasound applicability in Speech Language Pathology and Audiology.

    Science.gov (United States)

    Barberena, Luciana da Silva; Brasil, Brunah de Castro; Melo, Roberta Michelon; Mezzomo, Carolina Lisbôa; Mota, Helena Bolli; Keske-Soares, Márcia

    2014-01-01

    To present recent studies that used the ultrasound in the fields of Speech Language Pathology and Audiology, which evidence possibilities of the applicability of this technique in different subareas. A bibliographic research was carried out in the PubMed database, using the keywords "ultrasonic," "speech," "phonetics," "Speech, Language and Hearing Sciences," "voice," "deglutition," and "myofunctional therapy," comprising some areas of Speech Language Pathology and Audiology Sciences. The keywords "ultrasound," "ultrasonography," "swallow," "orofacial myofunctional therapy," and "orofacial myology" were also used in the search. Studies in humans from the past 5 years were selected. In the preselection, duplicated studies, articles not fully available, and those that did not present direct relation between ultrasound and Speech Language Pathology and Audiology Sciences were discarded. The data were analyzed descriptively and classified subareas of Speech Language Pathology and Audiology Sciences. The following items were considered: purposes, participants, procedures, and results. We selected 12 articles for ultrasound versus speech/phonetics subarea, 5 for ultrasound versus voice, 1 for ultrasound versus muscles of mastication, and 10 for ultrasound versus swallow. Studies relating "ultrasound" and "Speech Language Pathology and Audiology Sciences" in the past 5 years were not found. Different studies on the use of ultrasound in Speech Language Pathology and Audiology Sciences were found. Each of them, according to its purpose, confirms new possibilities of the use of this instrument in the several subareas, aiming at a more accurate diagnosis and new evaluative and therapeutic possibilities.

  4. SPEECH DELAY IN THE PRACTICE OF A PAEDIATRICIAN AND CHILD’S NEUROLOGIST

    Directory of Open Access Journals (Sweden)

    N. N. Zavadenko

    2015-01-01

    Full Text Available The article describes the main clinical forms and causes of speech delay in children. It presents modern data on the role of neurobiological factors in the speech delay pathogenesis, including early organic damage to the central nervous system due to the pregnancy and childbirth pathology, as well as genetic mechanisms. For early and accurate diagnosis of speech disorders in children, you need to consider normal patterns of speech development. The article presents indicators of pre-speech and speech development in children and describes the screening method for determining the speech delay. The main areas of complex correction are speech therapy, psycho-pedagogical and psychotherapeutic assistance, as well as pharmaceutical treatment. The capabilities of drug therapy for dysphasia (alalia are shown. 

  5. 'What is it?' A functional MRI and SPECT study of ictal speech in a second language

    International Nuclear Information System (INIS)

    Navarro, V.; Chauvire, V.; Baulac, M.; Cohen, L.; Delmaire, Ch.; Lehericy, St.; Habert, M.O.; Footnick, R.; Pallier, Ch.; Baulac, M.; Cohen, L.

    2009-01-01

    Neuronal networks involved in second language (L2) processing vary between normal subjects. Patients with epilepsy may have ictal speech automatisms in their second language. To delineate the brain systems involved in L2 ictal speech, we combined functional MRI during bilingual tasks and ictal - inter-ictal single-photon emission computed tomography in a patient who presented L2 ictal speech productions. These analyses showed that the networks activated by the seizure and those activated by L2 processing intersected in the right hippocampus. These results may provide some insights both into the pathophysiology of ictal speech and into the brain organization for L2. (authors)

  6. Analysis of vocal signal in its amplitude - time representation. speech synthesis-by-rules

    International Nuclear Information System (INIS)

    Rodet, Xavier

    1977-01-01

    In the first part of this dissertation, the natural speech production and the resulting acoustic waveform are examined under various aspects: communication, phonetics, frequency and temporal analysis. Our own study of direct signal is compared to other researches in these different fields, and fundamental features of vocal signals are described. The second part deals with the numerous methods already used for automatic text-to-speech synthesis. In the last part, we expose the new speech synthesis-by-rule methods that we have worked out, and we present in details the structure of the real-time speech synthesiser that we have implemented on a mini-computer. (author) [fr

  7. Speech and language support: How physicians can identify and treat speech and language delays in the office setting.

    Science.gov (United States)

    Moharir, Madhavi; Barnett, Noel; Taras, Jillian; Cole, Martha; Ford-Jones, E Lee; Levin, Leo

    2014-01-01

    Failure to recognize and intervene early in speech and language delays can lead to multifaceted and potentially severe consequences for early child development and later literacy skills. While routine evaluations of speech and language during well-child visits are recommended, there is no standardized (office) approach to facilitate this. Furthermore, extensive wait times for speech and language pathology consultation represent valuable lost time for the child and family. Using speech and language expertise, and paediatric collaboration, key content for an office-based tool was developed. early and accurate identification of speech and language delays as well as children at risk for literacy challenges; appropriate referral to speech and language services when required; and teaching and, thus, empowering parents to create rich and responsive language environments at home. Using this tool, in combination with the Canadian Paediatric Society's Read, Speak, Sing and Grow Literacy Initiative, physicians will be better positioned to offer practical strategies to caregivers to enhance children's speech and language capabilities. The tool represents a strategy to evaluate speech and language delays. It depicts age-specific linguistic/phonetic milestones and suggests interventions. The tool represents a practical interim treatment while the family is waiting for formal speech and language therapy consultation.

  8. Speech and language support: How physicians can identify and treat speech and language delays in the office setting

    Science.gov (United States)

    Moharir, Madhavi; Barnett, Noel; Taras, Jillian; Cole, Martha; Ford-Jones, E Lee; Levin, Leo

    2014-01-01

    Failure to recognize and intervene early in speech and language delays can lead to multifaceted and potentially severe consequences for early child development and later literacy skills. While routine evaluations of speech and language during well-child visits are recommended, there is no standardized (office) approach to facilitate this. Furthermore, extensive wait times for speech and language pathology consultation represent valuable lost time for the child and family. Using speech and language expertise, and paediatric collaboration, key content for an office-based tool was developed. The tool aimed to help physicians achieve three main goals: early and accurate identification of speech and language delays as well as children at risk for literacy challenges; appropriate referral to speech and language services when required; and teaching and, thus, empowering parents to create rich and responsive language environments at home. Using this tool, in combination with the Canadian Paediatric Society’s Read, Speak, Sing and Grow Literacy Initiative, physicians will be better positioned to offer practical strategies to caregivers to enhance children’s speech and language capabilities. The tool represents a strategy to evaluate speech and language delays. It depicts age-specific linguistic/phonetic milestones and suggests interventions. The tool represents a practical interim treatment while the family is waiting for formal speech and language therapy consultation. PMID:24627648

  9. Collective speech acts

    NARCIS (Netherlands)

    Meijers, A.W.M.; Tsohatzidis, S.L.

    2007-01-01

    From its early development in the 1960s, speech act theory always had an individualistic orientation. It focused exclusively on speech acts performed by individual agents. Paradigmatic examples are ‘I promise that p’, ‘I order that p’, and ‘I declare that p’. There is a single speaker and a single

  10. Private Speech in Ballet

    Science.gov (United States)

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  11. Free Speech Yearbook 1980.

    Science.gov (United States)

    Kane, Peter E., Ed.

    The 11 articles in this collection deal with theoretical and practical freedom of speech issues. The topics covered are (1) the United States Supreme Court and communication theory; (2) truth, knowledge, and a democratic respect for diversity; (3) denial of freedom of speech in Jock Yablonski's campaign for the presidency of the United Mine…

  12. Illustrated Speech Anatomy.

    Science.gov (United States)

    Shearer, William M.

    Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…

  13. Free Speech. No. 38.

    Science.gov (United States)

    Kane, Peter E., Ed.

    This issue of "Free Speech" contains the following articles: "Daniel Schoor Relieved of Reporting Duties" by Laurence Stern, "The Sellout at CBS" by Michael Harrington, "Defending Dan Schorr" by Tome Wicker, "Speech to the Washington Press Club, February 25, 1976" by Daniel Schorr, "Funds…

  14. Tools for the assessment of childhood apraxia of speech.

    Science.gov (United States)

    Gubiani, Marileda Barichello; Pagliarin, Karina Carlesso; Keske-Soares, Marcia

    2015-01-01

    This study systematically reviews the literature on the main tools used to evaluate childhood apraxia of speech (CAS). The search strategy includes Scopus, PubMed, and Embase databases. Empirical studies that used tools for assessing CAS were selected. Articles were selected by two independent researchers. The search retrieved 695 articles, out of which 12 were included in the study. Five tools were identified: Verbal Motor Production Assessment for Children, Dynamic Evaluation of Motor Speech Skill, The Orofacial Praxis Test, Kaufman Speech Praxis Test for Children, and Madison Speech Assessment Protocol. There are few instruments available for CAS assessment and most of them are intended to assess praxis and/or orofacial movements, sequences of orofacial movements, articulation of syllables and phonemes, spontaneous speech, and prosody. There are some tests for assessment and diagnosis of CAS. However, few studies on this topic have been conducted at the national level, as well as protocols to assess and assist in an accurate diagnosis.

  15. Automatic Online Lecture Highlighting Based on Multimedia Analysis

    Science.gov (United States)

    Che, Xiaoyin; Yang, Haojin; Meinel, Christoph

    2018-01-01

    Textbook highlighting is widely considered to be beneficial for students. In this paper, we propose a comprehensive solution to highlight the online lecture videos in both sentence- and segment-level, just as is done with paper books. The solution is based on automatic analysis of multimedia lecture materials, such as speeches, transcripts, and…

  16. Random Deep Belief Networks for Recognizing Emotions from Speech Signals

    Directory of Open Access Journals (Sweden)

    Guihua Wen

    2017-01-01

    Full Text Available Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.

  17. Speech recognition systems on the Cell Broadband Engine

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Y; Jones, H; Vaidya, S; Perrone, M; Tydlitat, B; Nanda, A

    2007-04-20

    In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousands of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.

  18. Human phoneme recognition depending on speech-intrinsic variability.

    Science.gov (United States)

    Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger

    2010-11-01

    The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).

  19. Musician advantage for speech-on-speech perception

    NARCIS (Netherlands)

    Başkent, Deniz; Gaudrain, Etienne

    Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level

  20. Speech Production and Speech Discrimination by Hearing-Impaired Children.

    Science.gov (United States)

    Novelli-Olmstead, Tina; Ling, Daniel

    1984-01-01

    Seven hearing impaired children (five to seven years old) assigned to the Speakers group made highly significant gains in speech production and auditory discrimination of speech, while Listeners made only slight speech production gains and no gains in auditory discrimination. Combined speech and auditory training was more effective than auditory…

  1. A methodology of error detection: Improving speech recognition in radiology

    OpenAIRE

    Voll, Kimberly Dawn

    2006-01-01

    Automated speech recognition (ASR) in radiology report dictation demands highly accurate and robust recognition software. Despite vendor claims, current implementations are suboptimal, leading to poor accuracy, and time and money wasted on proofreading. Thus, other methods must be considered for increasing the reliability and performance of ASR before it is a viable alternative to human transcription. One such method is post-ASR error detection, used to recover from the inaccuracy of speech r...

  2. Accurate guitar tuning by cochlear implant musicians.

    Directory of Open Access Journals (Sweden)

    Thomas Lu

    Full Text Available Modern cochlear implant (CI users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to <0.5 Hz. One subject had normal contralateral hearing and produced more accurate tuning with CI than his normal ear. To understand these counterintuitive findings, we presented tones sequentially and found that tuning error was larger at ∼ 30 Hz for both subjects. A third subject, a non-musician CI user with normal contralateral hearing, showed similar trends in performance between CI and normal hearing ears but with less precision. This difference, along with electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task.

  3. Inner Speech's Relationship With Overt Speech in Poststroke Aphasia.

    Science.gov (United States)

    Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A

    2017-09-18

    Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.

  4. Environmental Contamination of Normal Speech.

    Science.gov (United States)

    Harley, Trevor A.

    1990-01-01

    Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…

  5. APPRECIATING SPEECH THROUGH GAMING

    Directory of Open Access Journals (Sweden)

    Mario T Carreon

    2014-06-01

    Full Text Available This paper discusses the Speech and Phoneme Recognition as an Educational Aid for the Deaf and Hearing Impaired (SPREAD application and the ongoing research on its deployment as a tool for motivating deaf and hearing impaired students to learn and appreciate speech. This application uses the Sphinx-4 voice recognition system to analyze the vocalization of the student and provide prompt feedback on their pronunciation. The packaging of the application as an interactive game aims to provide additional motivation for the deaf and hearing impaired student through visual motivation for them to learn and appreciate speech.

  6. Global Freedom of Speech

    DEFF Research Database (Denmark)

    Binderup, Lars Grassme

    2007-01-01

    , as opposed to a legal norm, that curbs exercises of the right to free speech that offend the feelings or beliefs of members from other cultural groups. The paper rejects the suggestion that acceptance of such a norm is in line with liberal egalitarian thinking. Following a review of the classical liberal...... egalitarian reasons for free speech - reasons from overall welfare, from autonomy and from respect for the equality of citizens - it is argued that these reasons outweigh the proposed reasons for curbing culturally offensive speech. Currently controversial cases such as that of the Danish Cartoon Controversy...

  7. Automated Intelligibility Assessment of Pathological Speech Using Phonological Features

    Directory of Open Access Journals (Sweden)

    Catherine Middag

    2009-01-01

    Full Text Available It is commonly acknowledged that word or phoneme intelligibility is an important criterion in the assessment of the communication efficiency of a pathological speaker. People have therefore put a lot of effort in the design of perceptual intelligibility rating tests. These tests usually have the drawback that they employ unnatural speech material (e.g., nonsense words and that they cannot fully exclude errors due to listener bias. Therefore, there is a growing interest in the application of objective automatic speech recognition technology to automate the intelligibility assessment. Current research is headed towards the design of automated methods which can be shown to produce ratings that correspond well with those emerging from a well-designed and well-performed perceptual test. In this paper, a novel methodology that is built on previous work (Middag et al., 2008 is presented. It utilizes phonological features, automatic speech alignment based on acoustic models that were trained on normal speech, context-dependent speaker feature extraction, and intelligibility prediction based on a small model that can be trained on pathological speech samples. The experimental evaluation of the new system reveals that the root mean squared error of the discrepancies between perceived and computed intelligibilities can be as low as 8 on a scale of 0 to 100.

  8. Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

    Directory of Open Access Journals (Sweden)

    Petar S. Aleksic

    2002-11-01

    Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0–30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.

  9. Spectrally accurate contour dynamics

    International Nuclear Information System (INIS)

    Van Buskirk, R.D.; Marcus, P.S.

    1994-01-01

    We present an exponentially accurate boundary integral method for calculation the equilibria and dynamics of piece-wise constant distributions of potential vorticity. The method represents contours of potential vorticity as a spectral sum and solves the Biot-Savart equation for the velocity by spectrally evaluating a desingularized contour integral. We use the technique in both an initial-value code and a newton continuation method. Our methods are tested by comparing the numerical solutions with known analytic results, and it is shown that for the same amount of computational work our spectral methods are more accurate than other contour dynamics methods currently in use

  10. One-against-all weighted dynamic time warping for language-independent and speaker-dependent speech recognition in adverse conditions.

    Directory of Open Access Journals (Sweden)

    Xianglilan Zhang

    Full Text Available Considering personal privacy and difficulty of obtaining training material for many seldom used English words and (often non-English names, language-independent (LI with lightweight speaker-dependent (SD automatic speech recognition (ASR is a promising option to solve the problem. The dynamic time warping (DTW algorithm is the state-of-the-art algorithm for small foot-print SD ASR applications with limited storage space and small vocabulary, such as voice dialing on mobile devices, menu-driven recognition, and voice control on vehicles and robotics. Even though we have successfully developed two fast and accurate DTW variations for clean speech data, speech recognition for adverse conditions is still a big challenge. In order to improve recognition accuracy in noisy environment and bad recording conditions such as too high or low volume, we introduce a novel one-against-all weighted DTW (OAWDTW. This method defines a one-against-all index (OAI for each time frame of training data and applies the OAIs to the core DTW process. Given two speech signals, OAWDTW tunes their final alignment score by using OAI in the DTW process. Our method achieves better accuracies than DTW and merge-weighted DTW (MWDTW, as 6.97% relative reduction of error rate (RRER compared with DTW and 15.91% RRER compared with MWDTW are observed in our extensive experiments on one representative SD dataset of four speakers' recordings. To the best of our knowledge, OAWDTW approach is the first weighted DTW specially designed for speech data in adverse conditions.

  11. Charisma in business speeches

    DEFF Research Database (Denmark)

    Niebuhr, Oliver; Brem, Alexander; Novák-Tót, Eszter

    2016-01-01

    to business speeches. Consistent with the public opinion, our findings are indicative of Steve Jobs being a more charismatic speaker than Mark Zuckerberg. Beyond previous studies, our data suggest that rhythm and emphatic accentuation are also involved in conveying charisma. Furthermore, the differences...... between Steve Jobs and Mark Zuckerberg and the investor- and customer-related sections of their speeches support the modern understanding of charisma as a gradual, multiparametric, and context-sensitive concept....

  12. Speech spectrum envelope modeling

    Czech Academy of Sciences Publication Activity Database

    Vích, Robert; Vondra, Martin

    Vol. 4775, - (2007), s. 129-137 ISSN 0302-9743. [COST Action 2102 International Workshop. Vietri sul Mare, 29.03.2007-31.03.2007] R&D Projects: GA AV ČR(CZ) 1ET301710509 Institutional research plan: CEZ:AV0Z20670512 Keywords : speech * speech processing * cepstral analysis Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering Impact factor: 0.302, year: 2005

  13. Automatic error detection in alignments for speech synthesis

    CSIR Research Space (South Africa)

    Barnard, E

    2006-11-01

    Full Text Available achieving a uniform level of conservatism across phones is a significant challenge. Our current approach therefore relies on a manual process for choosing the threshold for each class. • How much context dependence must be modelled in the choice...

  14. Memory for speech and speech for memory.

    Science.gov (United States)

    Locke, J L; Kutz, K J

    1975-03-01

    Thirty kindergarteners, 15 who substituted /w/ for /r/ and 15 with correct articulation, received two perception tests and a memory test that included /w/ and /r/ in minimally contrastive syllables. Although both groups had nearly perfect perception of the experimenter's productions of /w/ and /r/, misarticulating subjects perceived their own tape-recorded w/r productions as /w/. In the memory task these same misarticulating subjects committed significantly more /w/-/r/ confusions in unspoken recall. The discussion considers why people subvocally rehearse; a developmental period in which children do not rehearse; ways subvocalization may aid recall, including motor and acoustic encoding; an echoic store that provides additional recall support if subjects rehearse vocally, and perception of self- and other- produced phonemes by misarticulating children-including its relevance to a motor theory of perception. Evidence is presented that speech for memory can be sufficiently impaired to cause memory disorder. Conceptions that restrict speech disorder to an impairment of communication are challenged.

  15. The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility

    DEFF Research Database (Denmark)

    Bentsen, Thomas; May, Tobias; Kressner, Abigail Anne

    2018-01-01

    Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements....... A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech......, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where...

  16. Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...

  17. Music and Speech Perception in Children Using Sung Speech.

    Science.gov (United States)

    Nie, Yingjiu; Galvin, John J; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie

    2018-01-01

    This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.

  18. Improving Understanding of Emotional Speech Acoustic Content

    Science.gov (United States)

    Tinnemore, Anna

    Children with cochlear implants show deficits in identifying emotional intent of utterances without facial or body language cues. A known limitation to cochlear implants is the inability to accurately portray the fundamental frequency contour of speech which carries the majority of information needed to identify emotional intent. Without reliable access to the fundamental frequency, other methods of identifying vocal emotion, if identifiable, could be used to guide therapies for training children with cochlear implants to better identify vocal emotion. The current study analyzed recordings of adults speaking neutral sentences with a set array of emotions in a child-directed and adult-directed manner. The goal was to identify acoustic cues that contribute to emotion identification that may be enhanced in child-directed speech, but are also present in adult-directed speech. Results of this study showed that there were significant differences in the variation of the fundamental frequency, the variation of intensity, and the rate of speech among emotions and between intended audiences.

  19. Practical speech user interface design

    CERN Document Server

    Lewis, James R

    2010-01-01

    Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application

  20. Speech interaction strategies for a humanoid assistant

    Directory of Open Access Journals (Sweden)

    Stüker Sebastian

    2018-01-01

    Full Text Available The goal of SecondHands, a H2020 project, is to design a robot that can offer help to a maintenance technician in a proactive manner. The robot is to act as a second pair of hands that can assist the technician when he is in need of help. In order for the robot to be of real help to the technician, it needs to understand his needs and follow his commands. Interaction via speech is a crucial part of this. Due to the nature of the situation in which the interactions take place, often the technician needs to speak to the robot when under stress performing strenuous physical labor, the classical turn based interaction schemes need to be transformed into dialogue systems that perform stream processing, anticipating user intentions, correcting itself as more information become available, in order to be able to respond in a rapid manner. In order to meet these demands, we are developing low-latency streaming based automatic speech recognition systems in combination with recurrent neural network based Natural Language Understanding systems that perform slot filling and intent recognition in order for the robot to provide assistance in a rapid manner, that can be partly based on speculative classifications that are then being refined as more speech becomes available.

  1. Improving the speech intelligibility in classrooms

    Science.gov (United States)

    Lam, Choi Ling Coriolanus

    One of the major acoustical concerns in classrooms is the establishment of effective verbal communication between teachers and students. Non-optimal acoustical conditions, resulting in reduced verbal communication, can cause two main problems. First, they can lead to reduce learning efficiency. Second, they can also cause fatigue, stress, vocal strain and health problems, such as headaches and sore throats, among teachers who are forced to compensate for poor acoustical conditions by raising their voices. Besides, inadequate acoustical conditions can induce the usage of public address system. Improper usage of such amplifiers or loudspeakers can lead to impairment of students' hearing systems. The social costs of poor classroom acoustics will be large to impair the learning of children. This invisible problem has far reaching implications for learning, but is easily solved. Many researches have been carried out that they have accurately and concisely summarized the research findings on classrooms acoustics. Though, there is still a number of challenging questions remaining unanswered. Most objective indices for speech intelligibility are essentially based on studies of western languages. Even several studies of tonal languages as Mandarin have been conducted, there is much less on Cantonese. In this research, measurements have been done in unoccupied rooms to investigate the acoustical parameters and characteristics of the classrooms. The speech intelligibility tests, which based on English, Mandarin and Cantonese, and the survey were carried out on students aged from 5 years old to 22 years old. It aims to investigate the differences in intelligibility between English, Mandarin and Cantonese of the classrooms in Hong Kong. The significance on speech transmission index (STI) related to Phonetically Balanced (PB) word scores will further be developed. Together with developed empirical relationship between the speech intelligibility in classrooms with the variations

  2. Finding weak points automatically

    International Nuclear Information System (INIS)

    Archinger, P.; Wassenberg, M.

    1999-01-01

    Operators of nuclear power stations have to carry out material tests at selected components by regular intervalls. Therefore a full automaticated test, which achieves a clearly higher reproducibility, compared to part automaticated variations, would provide a solution. In addition the full automaticated test reduces the dose of radiation for the test person. (orig.) [de

  3. Digitised evaluation of speech intelligibility using vowels in maxillectomy patients.

    Science.gov (United States)

    Sumita, Y I; Hattori, M; Murase, M; Elbashti, M E; Taniguchi, H

    2018-03-01

    Among the functional disabilities that patients face following maxillectomy, speech impairment is a major factor influencing quality of life. Proper rehabilitation of speech, which may include prosthodontic and surgical treatments and speech therapy, requires accurate evaluation of speech intelligibility (SI). A simple, less time-consuming yet accurate evaluation is desirable both for maxillectomy patients and the various clinicians providing maxillofacial treatment. This study sought to determine the utility of digital acoustic analysis of vowels for the prediction of SI in maxillectomy patients, based on a comprehensive understanding of speech production in the vocal tract of maxillectomy patients and its perception. Speech samples were collected from 33 male maxillectomy patients (mean age 57.4 years) in two conditions, without and with a maxillofacial prosthesis, and formant data for the vowels /a/,/e/,/i/,/o/, and /u/ were calculated based on linear predictive coding. The frequency range of formant 2 (F2) was determined by differences between the minimum and maximum frequency. An SI test was also conducted to reveal the relationship between SI score and F2 range. Statistical analyses were applied. F2 range and SI score were significantly different between the two conditions without and with a prosthesis (both P maxillectomy. © 2017 John Wiley & Sons Ltd.

  4. Under-resourced speech recognition based on the speech manifold

    CSIR Research Space (South Africa)

    Sahraeian, R

    2015-09-01

    Full Text Available Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a...

  5. A Study on Efficient Robust Speech Recognition with Stochastic Dynamic Time Warping

    OpenAIRE

    孫, 喜浩

    2014-01-01

    In recent years, great progress has been made in automatic speech recognition (ASR) system. The hidden Markov model (HMM) and dynamic time warping (DTW) are the two main algorithms which have been widely applied to ASR system. Although, HMM technique achieves higher recognition accuracy in clear speech environment and noisy environment. It needs large-set of words and realizes the algorithm more complexly.Thus, more and more researchers have focused on DTW-based ASR system.Dynamic time warpin...

  6. Intelligibility of speech of children with speech and sound disorders

    OpenAIRE

    Ivetac, Tina

    2014-01-01

    The purpose of this study is to examine speech intelligibility of children with primary speech and sound disorders aged 3 to 6 years in everyday life. The research problem is based on the degree to which parents or guardians, immediate family members (sister, brother, grandparents), extended family members (aunt, uncle, cousin), child's friends, other acquaintances, child's teachers and strangers understand the speech of children with speech sound disorders. We examined whether the level ...

  7. Speech Recognition for the iCub Platform

    Directory of Open Access Journals (Sweden)

    Bertrand Higy

    2018-02-01

    Full Text Available This paper describes open source software (available at https://github.com/robotology/natural-speech to build automatic speech recognition (ASR systems and run them within the YARP platform. The toolkit is designed (i to allow non-ASR experts to easily create their own ASR system and run it on iCub and (ii to build deep learning-based models specifically addressing the main challenges an ASR system faces in the context of verbal human–iCub interactions. The toolkit mostly consists of Python, C++ code and shell scripts integrated in YARP. As additional contribution, a second codebase (written in Matlab is provided for more expert ASR users who want to experiment with bio-inspired and developmental learning-inspired ASR systems. Specifically, we provide code for two distinct kinds of speech recognition: “articulatory” and “unsupervised” speech recognition. The first is largely inspired by influential neurobiological theories of speech perception which assume speech perception to be mediated by brain motor cortex activities. Our articulatory systems have been shown to outperform strong deep learning-based baselines. The second type of recognition systems, the “unsupervised” systems, do not use any supervised information (contrary to most ASR systems, including our articulatory systems. To some extent, they mimic an infant who has to discover the basic speech units of a language by herself. In addition, we provide resources consisting of pre-trained deep learning models for ASR, and a 2.5-h speech dataset of spoken commands, the VoCub dataset, which can be used to adapt an ASR system to the typical acoustic environments in which iCub operates.

  8. Multiple Time-Instances Features of Degraded Speech for Single Ended Quality Measurement

    Directory of Open Access Journals (Sweden)

    Rajesh Kumar Dubey

    2017-01-01

    Full Text Available The use of single time-instance features, where entire speech utterance is used for feature computation, is not accurate and adequate in capturing the time localized information of short-time transient distortions and their distinction from plosive sounds of speech, particularly degraded by impulsive noise. Hence, the importance of estimating features at multiple time-instances is sought. In this, only active speech segments of degraded speech are used for features computation at multiple time-instances on per frame basis. Here, active speech means both voiced and unvoiced frames except silence. The features of different combinations of multiple contiguous active speech segments are computed and called multiple time-instances features. The joint GMM training has been done using these features along with the subjective MOS of the corresponding speech utterance to obtain the parameters of GMM. These parameters of GMM and multiple time-instances features of test speech are used to compute the objective MOS values of different combinations of multiple contiguous active speech segments. The overall objective MOS of the test speech utterance is obtained by assigning equal weight to the objective MOS values of the different combinations of multiple contiguous active speech segments. This algorithm outperforms the Recommendation ITU-T P.563 and recently published algorithms.

  9. 'What is it?' A functional MRI and SPECT study of ictal speech in a second language

    Energy Technology Data Exchange (ETDEWEB)

    Navarro, V.; Chauvire, V.; Baulac, M.; Cohen, L. [Department of Neurology, AP-HP, Hopital de la Pitie-Salpetriere, IFR 70, Paris (France); Delmaire, Ch.; Lehericy, St. [Department of Neuroradiology, AP-HP, Hopital de la Pitie-Salpetriere, IFR 70, Paris (France); Habert, M.O. [Department of Nuclear Medicine, AP-HP, Hopital de la Pitie-Salpetriere, IFR 70, Paris (France); Footnick, R.; Pallier, Ch. [INSERM, U562, CEA/DSV, IFR 49, Orsay (France); Baulac, M.; Cohen, L. [Universite Paris VI, Faculte Pitie-Salpetriere, Paris (France)

    2009-07-01

    Neuronal networks involved in second language (L2) processing vary between normal subjects. Patients with epilepsy may have ictal speech automatisms in their second language. To delineate the brain systems involved in L2 ictal speech, we combined functional MRI during bilingual tasks and ictal - inter-ictal single-photon emission computed tomography in a patient who presented L2 ictal speech productions. These analyses showed that the networks activated by the seizure and those activated by L2 processing intersected in the right hippocampus. These results may provide some insights both into the pathophysiology of ictal speech and into the brain organization for L2. (authors)

  10. Toward Speech and Nonverbal Behaviors Integration for Humanoid Robot

    Directory of Open Access Journals (Sweden)

    Wei Wang

    2012-09-01

    Full Text Available It is essential to integrate speeches and nonverbal behaviors for a humanoid robot in human-robot interaction. This paper presents an approach using multi-object genetic algorithm to match the speeches and behaviors automatically. Firstly, with humanoid robot's emotion status, we construct a hierarchical structure to link voice characteristics and nonverbal behaviors. Secondly, these behaviors corresponding to speeches are matched and integrated into an action sequence based on genetic algorithm, so the robot can consistently speak and perform emotional behaviors. Our approach takes advantage of relevant knowledge described by psychologists and nonverbal communication. And from experiment results, our ultimate goal, implementing an affective robot to act and speak with partners vividly and fluently, could be achieved.

  11. Exploring expressivity and emotion with artificial voice and speech technologies.

    Science.gov (United States)

    Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James

    2013-10-01

    Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration.

  12. Robust Speech/Non-Speech Classification in Heterogeneous Multimedia Content

    NARCIS (Netherlands)

    Huijbregts, M.A.H.; de Jong, Franciska M.G.

    In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because

  13. Estimating spatial travel times using automatic vehicle identification data

    Science.gov (United States)

    2001-01-01

    Prepared ca. 2001. The paper describes an algorithm that was developed for estimating reliable and accurate average roadway link travel times using Automatic Vehicle Identification (AVI) data. The algorithm presented is unique in two aspects. First, ...

  14. Tackling the complexity in speech

    DEFF Research Database (Denmark)

    section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...

  15. Discriminating individually considerate and authoritarian leaders by speech activity cues

    OpenAIRE

    Feese, Sebastian; Muaremi, Amir; Arnrich, Bert; Tröster, Gerhard; Meyer, Bertolt; Jonas, Klaus

    2011-01-01

    Effective leadership can increase team performance, however up to now the influence of specific micro-level behavioral patterns on team performance is unclear. At the same time, current behavior observation methods in social psychology mostly rely on manual video annotations that impede research. In our work, we follow a sensor-based approach to automatically extract speech activity cues to discriminate individualized considerate from authoritarian leadership. On a subset of 35 selected...

  16. Innovative Speech Reconstructive Surgery

    OpenAIRE

    Hashem Shemshadi

    2003-01-01

    Proper speech functioning in human being, depends on the precise coordination and timing balances in a series of complex neuro nuscular movements and actions. Starting from the prime organ of energy source of expelled air from respirato y system; deliver such air to trigger vocal cords; swift changes of this phonatory episode to a comprehensible sound in RESONACE and final coordination of all head and neck structures to elicit final speech in ...

  17. The chairman's speech

    International Nuclear Information System (INIS)

    Allen, A.M.

    1986-01-01

    The paper contains a transcript of a speech by the chairman of the UKAEA, to mark the publication of the 1985/6 annual report. The topics discussed in the speech include: the Chernobyl accident and its effect on public attitudes to nuclear power, management and disposal of radioactive waste, the operation of UKAEA as a trading fund, and the UKAEA development programmes. The development programmes include work on the following: fast reactor technology, thermal reactors, reactor safety, health and safety aspects of water cooled reactors, the Joint European Torus, and under-lying research. (U.K.)

  18. High accurate time system of the Low Latitude Meridian Circle.

    Science.gov (United States)

    Yang, Jing; Wang, Feng; Li, Zhiming

    In order to obtain the high accurate time signal for the Low Latitude Meridian Circle (LLMC), a new GPS accurate time system is developed which include GPS, 1 MC frequency source and self-made clock system. The second signal of GPS is synchronously used in the clock system and information can be collected by a computer automatically. The difficulty of the cancellation of the time keeper can be overcomed by using this system.

  19. Speech intelligibility enhancement after maxillary denture treatment and its impact on quality of life.

    Science.gov (United States)

    Knipfer, Christian; Riemann, Max; Bocklet, Tobias; Noeth, Elmar; Schuster, Maria; Sokol, Biljana; Eitner, Stephan; Nkenke, Emeka; Stelzle, Florian

    2014-01-01

    Tooth loss and its prosthetic rehabilitation significantly affect speech intelligibility. However, little is known about the influence of speech deficiencies on oral health-related quality of life (OHRQoL). The aim of this study was to investigate whether speech intelligibility enhancement through prosthetic rehabilitation significantly influences OHRQoL in patients wearing complete maxillary dentures. Speech intelligibility by means of an automatic speech recognition system (ASR) was prospectively evaluated and compared with subjectively assessed Oral Health Impact Profile (OHIP) scores. Speech was recorded in 28 edentulous patients 1 week prior to the fabrication of new complete maxillary dentures and 6 months thereafter. Speech intelligibility was computed based on the word accuracy (WA) by means of an ASR and compared with a matched control group. One week before and 6 months after rehabilitation, patients assessed themselves for OHRQoL. Speech intelligibility improved significantly after 6 months. Subjects reported a significantly higher OHRQoL after maxillary rehabilitation with complete dentures. No significant correlation was found between the OHIP sum score or its subscales to the WA. Speech intelligibility enhancement achieved through the fabrication of new complete maxillary dentures might not be in the forefront of the patients' perception of their quality of life. For the improvement of OHRQoL in patients wearing complete maxillary dentures, food intake and mastication as well as freedom from pain play a more prominent role.

  20. Visualizing structures of speech expressiveness

    DEFF Research Database (Denmark)

    Herbelin, Bruno; Jensen, Karl Kristoffer; Graugaard, Lars

    2008-01-01

    Speech is both beautiful and informative. In this work, a conceptual study of the speech, through investigation of the tower of Babel, the archetypal phonemes, and a study of the reasons of uses of language is undertaken in order to create an artistic work investigating the nature of speech. The ....... The artwork is presented at the Re:New festival in May 2008....

  1. The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility.

    Science.gov (United States)

    Bentsen, Thomas; May, Tobias; Kressner, Abigail A; Dau, Torsten

    2018-01-01

    Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements. A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech intelligibility in normal-hearing listeners. A substantial improvement of 25.4 percentage points in speech intelligibility scores was found going from a subband-based architecture, in which a Gaussian Mixture Model-based classifier predicts the distributions of speech and noise for each frequency channel, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where the units are assigned a continuous value between zero and one. Therefore, both components play significant roles and by combining them, speech intelligibility improvements were obtained in a six-talker condition at a low signal-to-noise ratio.

  2. Superior Speech Acquisition and Robust Automatic Speech Recognition for Integrated Spacesuit Audio Systems, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — Astronauts suffer from poor dexterity of their hands due to the clumsy spacesuit gloves during Extravehicular Activity (EVA) operations and NASA has had a widely...

  3. Superior Speech Acquisition and Robust Automatic Speech Recognition for Integrated Spacesuit Audio Systems, Phase II

    Data.gov (United States)

    National Aeronautics and Space Administration — Astronauts suffer from poor dexterity of their hands due to the clumsy spacesuit gloves during Extravehicular Activity (EVA) operations and NASA has had a widely...

  4. Accurate quantum chemical calculations

    Science.gov (United States)

    Bauschlicher, Charles W., Jr.; Langhoff, Stephen R.; Taylor, Peter R.

    1989-01-01

    An important goal of quantum chemical calculations is to provide an understanding of chemical bonding and molecular electronic structure. A second goal, the prediction of energy differences to chemical accuracy, has been much harder to attain. First, the computational resources required to achieve such accuracy are very large, and second, it is not straightforward to demonstrate that an apparently accurate result, in terms of agreement with experiment, does not result from a cancellation of errors. Recent advances in electronic structure methodology, coupled with the power of vector supercomputers, have made it possible to solve a number of electronic structure problems exactly using the full configuration interaction (FCI) method within a subspace of the complete Hilbert space. These exact results can be used to benchmark approximate techniques that are applicable to a wider range of chemical and physical problems. The methodology of many-electron quantum chemistry is reviewed. Methods are considered in detail for performing FCI calculations. The application of FCI methods to several three-electron problems in molecular physics are discussed. A number of benchmark applications of FCI wave functions are described. Atomic basis sets and the development of improved methods for handling very large basis sets are discussed: these are then applied to a number of chemical and spectroscopic problems; to transition metals; and to problems involving potential energy surfaces. Although the experiences described give considerable grounds for optimism about the general ability to perform accurate calculations, there are several problems that have proved less tractable, at least with current computer resources, and these and possible solutions are discussed.

  5. Dynamic weighing for accurate fertilizer application and monitoring

    NARCIS (Netherlands)

    Bergeijk, van J.; Goense, D.; Willigenburg, van L.G.; Speelman, L.

    2001-01-01

    The mass flow of fertilizer spreaders must be calibrated for the different types of fertilizers used. To obtain accurate fertilizer application manual calibration of actual mass flow must be repeated frequently. Automatic calibration is possible by measurement of the actual mass flow, based on

  6. A theory of lexical access in speech production [target paper

    NARCIS (Netherlands)

    Levelt, W.J.M.; Roelofs, A.P.A.; Meyer, A.S.

    1999-01-01

    Preparing words in speech production is normally a fast and accurate process. We generate them two or three per second in fluent conversation; and overtly naming a clear picture of an object can easily be initiated within 600 ms after picture onset. The underlying process, however, is exceedingly

  7. Workshop: Welcoming speech

    International Nuclear Information System (INIS)

    Lummerzheim, D.

    1994-01-01

    The welcoming speech underlines the fact that any validation process starting with calculation methods and ending with studies on the long-term behaviour of a repository system can only be effected through laboratory, field and natural-analogue studies. The use of natural analogues (NA) is to secure the biosphere and to verify whether this safety really exists. (HP) [de

  8. Hearing speech in music.

    Science.gov (United States)

    Ekström, Seth-Reino; Borg, Erik

    2011-01-01

    The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (Ptempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (Pmusic offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

  9. Hearing speech in music

    Directory of Open Access Journals (Sweden)

    Seth-Reino Ekström

    2011-01-01

    Full Text Available The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA noise and speech spectrum-filtered noise (SPN]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA. The results showed a significant effect of piano performance speed and octave (P<.01. Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01 and SPN (P<.05. Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01, but there were smaller differences between masking conditions (P<.01. It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

  10. Free Speech Yearbook 1979.

    Science.gov (United States)

    Kane, Peter E., Ed.

    The seven articles in this collection deal with theoretical and practical freedom of speech issues. Topics covered are: the United States Supreme Court, motion picture censorship, and the color line; judicial decision making; the established scientific community's suppression of the ideas of Immanuel Velikovsky; the problems of avant-garde jazz,…

  11. Nobel peace speech

    Directory of Open Access Journals (Sweden)

    Joshua FRYE

    2017-07-01

    Full Text Available The Nobel Peace Prize has long been considered the premier peace prize in the world. According to Geir Lundestad, Secretary of the Nobel Committee, of the 300 some peace prizes awarded worldwide, “none is in any way as well known and as highly respected as the Nobel Peace Prize” (Lundestad, 2001. Nobel peace speech is a unique and significant international site of public discourse committed to articulating the universal grammar of peace. Spanning over 100 years of sociopolitical history on the world stage, Nobel Peace Laureates richly represent an important cross-section of domestic and international issues increasingly germane to many publics. Communication scholars’ interest in this rhetorical genre has increased in the past decade. Yet, the norm has been to analyze a single speech artifact from a prestigious or controversial winner rather than examine the collection of speeches for generic commonalities of import. In this essay, we analyze the discourse of Nobel peace speech inductively and argue that the organizing principle of the Nobel peace speech genre is the repetitive form of normative liberal principles and values that function as rhetorical topoi. These topoi include freedom and justice and appeal to the inviolable, inborn right of human beings to exercise certain political and civil liberties and the expectation of equality of protection from totalitarian and tyrannical abuses. The significance of this essay to contemporary communication theory is to expand our theoretical understanding of rhetoric’s role in the maintenance and development of an international and cross-cultural vocabulary for the grammar of peace.

  12. Automatic speech recognition (zero crossing method). Automatic recognition of isolated vowels

    International Nuclear Information System (INIS)

    Dupeyrat, Benoit

    1975-01-01

    This note describes a recognition method of isolated vowels, using a preprocessing of the vocal signal. The processing extracts the extrema of the vocal signal and the interval time separating them (Zero crossing distances of the first derivative of the signal). The recognition of vowels uses normalized histograms of the values of these intervals. The program determines a distance between the histogram of the sound to be recognized and histograms models built during a learning phase. The results processed on real time by a minicomputer, are relatively independent of the speaker, the fundamental frequency being not allowed to vary too much (i.e. speakers of the same sex). (author) [fr

  13. Automatic Photoelectric Telescope Service

    International Nuclear Information System (INIS)

    Genet, R.M.; Boyd, L.J.; Kissell, K.E.; Crawford, D.L.; Hall, D.S.; BDM Corp., McLean, VA; Kitt Peak National Observatory, Tucson, AZ; Dyer Observatory, Nashville, TN)

    1987-01-01

    Automatic observatories have the potential of gathering sizable amounts of high-quality astronomical data at low cost. The Automatic Photoelectric Telescope Service (APT Service) has realized this potential and is routinely making photometric observations of a large number of variable stars. However, without observers to provide on-site monitoring, it was necessary to incorporate special quality checks into the operation of the APT Service at its multiple automatic telescope installation on Mount Hopkins. 18 references

  14. Automatic Fiscal Stabilizers

    Directory of Open Access Journals (Sweden)

    Narcis Eduard Mitu

    2013-11-01

    Full Text Available Policies or institutions (built into an economic system that automatically tend to dampen economic cycle fluctuations in income, employment, etc., without direct government intervention. For example, in boom times, progressive income tax automatically reduces money supply as incomes and spendings rise. Similarly, in recessionary times, payment of unemployment benefits injects more money in the system and stimulates demand. Also called automatic stabilizers or built-in stabilizers.

  15. Metaheuristic applications to speech enhancement

    CERN Document Server

    Kunche, Prajna

    2016-01-01

    This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.

  16. Automatic Barometric Updates from Ground-Based Navigational Aids

    Science.gov (United States)

    1990-03-12

    ro fAutomatic Barometric Updates US Department from of Transportation Ground-Based Federal Aviation Administration Navigational Aids Office of Safety...tighter vertical spacing controls , particularly for operations near Terminal Control Areas (TCAs), Airport Radar Service Areas (ARSAs), military climb and...E.F., Ruth, J.C., and Williges, B.H. (1987). Speech Controls and Displays. In Salvendy, G., E. Handbook of Human Factors/Ergonomics, New York, John

  17. Objective Prediction of Hearing Aid Benefit Across Listener Groups Using Machine Learning: Speech Recognition Performance With Binaural Noise-Reduction Algorithms

    Science.gov (United States)

    Schädler, Marc R.; Warzybok, Anna; Kollmeier, Birger

    2018-01-01

    The simulation framework for auditory discrimination experiments (FADE) was adopted and validated to predict the individual speech-in-noise recognition performance of listeners with normal and impaired hearing with and without a given hearing-aid algorithm. FADE uses a simple automatic speech recognizer (ASR) to estimate the lowest achievable speech reception thresholds (SRTs) from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Empirical data from the literature were used to evaluate the model in terms of predicted SRTs and benefits in SRT with the German matrix sentence recognition test when using eight single- and multichannel binaural noise-reduction algorithms. To allow individual predictions of SRTs in binaural conditions, the model was extended with a simple better ear approach and individualized by taking audiograms into account. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for a group of normal-hearing listeners and predicted the corresponding benefits with a root-mean-square prediction error of 0.6 dB. This highlights the potential of the approach for the objective assessment of benefits in SRT without prior knowledge about the empirical data. The predictions for the group of listeners with impaired hearing explained 75% of the empirical variance, while the individual predictions explained less than 25%. Possibly, additional individual factors should be considered for more accurate predictions with impaired hearing. A competing talker condition clearly showed one limitation of current ASR technology, as the empirical performance with SRTs lower than −20 dB could not be predicted. PMID:29692200

  18. Objective Prediction of Hearing Aid Benefit Across Listener Groups Using Machine Learning: Speech Recognition Performance With Binaural Noise-Reduction Algorithms.

    Science.gov (United States)

    Schädler, Marc R; Warzybok, Anna; Kollmeier, Birger

    2018-01-01

    The simulation framework for auditory discrimination experiments (FADE) was adopted and validated to predict the individual speech-in-noise recognition performance of listeners with normal and impaired hearing with and without a given hearing-aid algorithm. FADE uses a simple automatic speech recognizer (ASR) to estimate the lowest achievable speech reception thresholds (SRTs) from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Empirical data from the literature were used to evaluate the model in terms of predicted SRTs and benefits in SRT with the German matrix sentence recognition test when using eight single- and multichannel binaural noise-reduction algorithms. To allow individual predictions of SRTs in binaural conditions, the model was extended with a simple better ear approach and individualized by taking audiograms into account. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for a group of normal-hearing listeners and predicted the corresponding benefits with a root-mean-square prediction error of 0.6 dB. This highlights the potential of the approach for the objective assessment of benefits in SRT without prior knowledge about the empirical data. The predictions for the group of listeners with impaired hearing explained 75% of the empirical variance, while the individual predictions explained less than 25%. Possibly, additional individual factors should be considered for more accurate predictions with impaired hearing. A competing talker condition clearly showed one limitation of current ASR technology, as the empirical performance with SRTs lower than -20 dB could not be predicted.

  19. Acoustical conditions for speech communication in active elementary school classrooms

    Science.gov (United States)

    Sato, Hiroshi; Bradley, John

    2005-04-01

    Detailed acoustical measurements were made in 34 active elementary school classrooms with typical rectangular room shape in schools near Ottawa, Canada. There was an average of 21 students in classrooms. The measurements were made to obtain accurate indications of the acoustical quality of conditions for speech communication during actual teaching activities. Mean speech and noise levels were determined from the distribution of recorded sound levels and the average speech-to-noise ratio was 11 dBA. Measured mid-frequency reverberation times (RT) during the same occupied conditions varied from 0.3 to 0.6 s, and were a little less than for the unoccupied rooms. RT values were not related to noise levels. Octave band speech and noise levels, useful-to-detrimental ratios, and Speech Transmission Index values were also determined. Key results included: (1) The average vocal effort of teachers corresponded to louder than Pearsons Raised voice level; (2) teachers increase their voice level to overcome ambient noise; (3) effective speech levels can be enhanced by up to 5 dB by early reflection energy; and (4) student activity is seen to be the dominant noise source, increasing average noise levels by up to 10 dBA during teaching activities. [Work supported by CLLRnet.

  20. Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

    Science.gov (United States)

    van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

    2007-01-01

    Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…

  1. Direct and indirect measures of speech articulator motions using low power EM sensors

    International Nuclear Information System (INIS)

    Barnes, T; Burnett, G; Gable, T; Holzrichter, J F; Ng, L

    1999-01-01

    Low power Electromagnetic (EM) Wave sensors can measure general properties of human speech articulator motions, as speech is produced. See Holzrichter, Burnett, Ng, and Lea, J.Acoust.Soc.Am. 103 (1) 622 (1998). Experiments have demonstrated extremely accurate pitch measurements ( and lt; 1 Hz per pitch cycle) and accurate onset of voiced speech. Recent measurements of pressure-induced tracheal motions enable very good spectra and amplitude estimates of a voiced excitation function. The use of the measured excitation functions and pitch synchronous processing enable the determination of each pitch cycle of an accurate transfer function and, indirectly, of the corresponding articulator motions. In addition, direct measurements have been made of EM wave reflections from articulator interfaces, including jaw, tongue, and palate, simultaneously with acoustic and glottal open/close signals. While several types of EM sensors are suitable for speech articulator measurements, the homodyne sensor has been found to provide good spatial and temporal resolution for several applications

  2. Neural Bases of Automaticity

    Science.gov (United States)

    Servant, Mathieu; Cassey, Peter; Woodman, Geoffrey F.; Logan, Gordon D.

    2018-01-01

    Automaticity allows us to perform tasks in a fast, efficient, and effortless manner after sufficient practice. Theories of automaticity propose that across practice processing transitions from being controlled by working memory to being controlled by long-term memory retrieval. Recent event-related potential (ERP) studies have sought to test this…

  3. Automatic control systems engineering

    International Nuclear Information System (INIS)

    Shin, Yun Gi

    2004-01-01

    This book gives descriptions of automatic control for electrical electronics, which indicates history of automatic control, Laplace transform, block diagram and signal flow diagram, electrometer, linearization of system, space of situation, state space analysis of electric system, sensor, hydro controlling system, stability, time response of linear dynamic system, conception of root locus, procedure to draw root locus, frequency response, and design of control system.

  4. Automatic Camera Control

    DEFF Research Database (Denmark)

    Burelli, Paolo; Preuss, Mike

    2014-01-01

    Automatically generating computer animations is a challenging and complex problem with applications in games and film production. In this paper, we investigate howto translate a shot list for a virtual scene into a series of virtual camera configurations — i.e automatically controlling the virtual...

  5. Automatic differentiation of functions

    International Nuclear Information System (INIS)

    Douglas, S.R.

    1990-06-01

    Automatic differentiation is a method of computing derivatives of functions to any order in any number of variables. The functions must be expressible as combinations of elementary functions. When evaluated at specific numerical points, the derivatives have no truncation error and are automatically found. The method is illustrated by simple examples. Source code in FORTRAN is provided

  6. Silent Speech Recognition as an Alternative Communication Device for Persons with Laryngectomy.

    Science.gov (United States)

    Meltzner, Geoffrey S; Heaton, James T; Deng, Yunbin; De Luca, Gianluca; Roy, Serge H; Kline, Joshua C

    2017-12-01

    Each year thousands of individuals require surgical removal of their larynx (voice box) due to trauma or disease, and thereby require an alternative voice source or assistive device to verbally communicate. Although natural voice is lost after laryngectomy, most muscles controlling speech articulation remain intact. Surface electromyographic (sEMG) activity of speech musculature can be recorded from the neck and face, and used for automatic speech recognition to provide speech-to-text or synthesized speech as an alternative means of communication. This is true even when speech is mouthed or spoken in a silent (subvocal) manner, making it an appropriate communication platform after laryngectomy. In this study, 8 individuals at least 6 months after total laryngectomy were recorded using 8 sEMG sensors on their face (4) and neck (4) while reading phrases constructed from a 2,500-word vocabulary. A unique set of phrases were used for training phoneme-based recognition models for each of the 39 commonly used phonemes in English, and the remaining phrases were used for testing word recognition of the models based on phoneme identification from running speech. Word error rates were on average 10.3% for the full 8-sensor set (averaging 9.5% for the top 4 participants), and 13.6% when reducing the sensor set to 4 locations per individual (n=7). This study provides a compelling proof-of-concept for sEMG-based alaryngeal speech recognition, with the strong potential to further improve recognition performance.

  7. A social feedback loop for speech development and its reduction in autism.

    Science.gov (United States)

    Warlaumont, Anne S; Richards, Jeffrey A; Gilkerson, Jill; Oller, D Kimbrough

    2014-07-01

    We analyzed the microstructure of child-adult interaction during naturalistic, daylong, automatically labeled audio recordings (13,836 hr total) of children (8- to 48-month-olds) with and without autism. We found that an adult was more likely to respond when the child's vocalization was speech related rather than not speech related. In turn, a child's vocalization was more likely to be speech related if the child's previous speech-related vocalization had received an immediate adult response rather than no response. Taken together, these results are consistent with the idea that there is a social feedback loop between child and caregiver that promotes speech development. Although this feedback loop applies in both typical development and autism, children with autism produced proportionally fewer speech-related vocalizations, and the responses they received were less contingent on whether their vocalizations were speech related. We argue that such differences will diminish the strength of the social feedback loop and have cascading effects on speech development over time. Differences related to socioeconomic status are also reported. © The Author(s) 2014.

  8. Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar.

    Science.gov (United States)

    Shin, Young Hoon; Seo, Jiwon

    2016-10-29

    People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.

  9. Speech-Language Therapy (For Parents)

    Science.gov (United States)

    ... Staying Safe Videos for Educators Search English Español Speech-Language Therapy KidsHealth / For Parents / Speech-Language Therapy ... most kids with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech ...

  10. [Improving speech comprehension using a new cochlear implant speech processor].

    Science.gov (United States)

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  11. Neurophysiology of speech differences in childhood apraxia of speech.

    Science.gov (United States)

    Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole

    2014-01-01

    Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.

  12. Robust Speaker Authentication Based on Combined Speech and Voiceprint Recognition

    Science.gov (United States)

    Malcangi, Mario

    2009-08-01

    Personal authentication is becoming increasingly important in many applications that have to protect proprietary data. Passwords and personal identification numbers (PINs) prove not to be robust enough to ensure that unauthorized people do not use them. Biometric authentication technology may offer a secure, convenient, accurate solution but sometimes fails due to its intrinsically fuzzy nature. This research aims to demonstrate that combining two basic speech processing methods, voiceprint identification and speech recognition, can provide a very high degree of robustness, especially if fuzzy decision logic is used.

  13. On stylistic automatization of lexical units in various types of contexts

    Directory of Open Access Journals (Sweden)

    В В Зуева

    2009-12-01

    Full Text Available Stylistic automatization of lexical units in various types of contexts is investigated in this article. Following the works of Boguslav Havranek and other linguists of the Prague Linguistic School automatization is treated as a contextual narrowing of the meaning of a lexical unit to the level of its complete predictability in situational contexts and the lack of stylistic contradiction with other lexical units in speech.

  14. The role of automated speech and audio analysis in semantic multimedia annotation

    NARCIS (Netherlands)

    de Jong, Franciska M.G.; Ordelman, Roeland J.F.; van Hessen, Adrianus J.

    This paper overviews the various ways in which automatic speech and audio analysis can be deployed to enhance the semantic annotation of multimedia content, and as a consequence to improve the effectiveness of conceptual access tools. A number of techniques will be presented, including the alignment

  15. Comparing grapheme-based and phoneme-based speech recognition for Afrikaans

    CSIR Research Space (South Africa)

    Basson, WD

    2012-11-01

    Full Text Available This paper compares the recognition accuracy of a phoneme-based automatic speech recognition system with that of a grapheme-based system, using Afrikaans as case study. The first system is developed using a conventional pronunciation dictionary...

  16. Speech-based recognition of self-reported and observed emotion in a dimensional space

    NARCIS (Netherlands)

    Truong, Khiet Phuong; van Leeuwen, David A.; de Jong, Franciska M.G.

    2012-01-01

    The differences between self-reported and observed emotion have only marginally been investigated in the context of speech-based automatic emotion recognition. We address this issue by comparing self-reported emotion ratings to observed emotion ratings and look at how differences between these two

  17. Audiovisual discrimination between speech and laughter: Why and when visual information might help

    NARCIS (Netherlands)

    Petridis, Stavros; Pantic, Maja

    Past research on automatic laughter classification/detection has focused mainly on audio-based approaches. Here we present an audiovisual approach to distinguishing laughter from speech, and we show that integrating the information from audio and video channels may lead to improved performance over

  18. Results from the Dutch speech-in-noise screening test by telephone

    NARCIS (Netherlands)

    Smits, C.H.M.; Houtgast, T.

    2005-01-01

    OBJECTIVE: The objective of the study was to implement a previously developed automatic speech-in-noise screening test by telephone (Smits, Kapteyn, & Houtgast, 2004), introduce it nationwide as a self-test, and analyze the results. DESIGN: The test was implemented on an interactive voice response

  19. Dealing with Phrase Level Co-Articulation (PLC) in speech recognition: a first approach

    NARCIS (Netherlands)

    Ordelman, Roeland J.F.; van Hessen, Adrianus J.; van Leeuwen, David A.; Robinson, Tony; Renals, Steve

    1999-01-01

    Whereas nowadays within-word co-articulation effects are usually sufficiently dealt with in automatic speech recognition, this is not always the case with phrase level co-articulation effects (PLC). This paper describes a first approach in dealing with phrase level co-articulation by applying these

  20. Speech overlap detection in a two-pass speaker diarization system

    NARCIS (Netherlands)

    Huijbregts, M.A.H.; Leeuwen, D.A. van; Jong, F. M. G de

    2009-01-01

    In this paper we present the two-pass speaker diarization system that we developed for the NIST RT09s evaluation. In the first pass of our system a model for speech overlap detection is gen- erated automatically. This model is used in two ways to reduce the diarization errors due to overlapping

  1. Speech overlap detection in a two-pass speaker diarization system

    NARCIS (Netherlands)

    Huijbregts, M.; Leeuwen, D.A. van; Jong, F.M.G. de

    2009-01-01

    In this paper we present the two-pass speaker diarization system that we developed for the NIST RT09s evaluation. In the first pass of our system a model for speech overlap detection is generated automatically. This model is used in two ways to reduce the diarization errors due to overlapping

  2. Musical expertise and foreign speech perception.

    Science.gov (United States)

    Martínez-Montes, Eduardo; Hernández-Pérez, Heivet; Chobert, Julie; Morgado-Rodríguez, Lisbet; Suárez-Murias, Carlos; Valdés-Sosa, Pedro A; Besson, Mireille

    2013-01-01

    The aim of this experiment was to investigate the influence of musical expertise on the automatic perception of foreign syllables and harmonic sounds. Participants were Cuban students with high level of expertise in music or in visual arts and with the same level of general education and socio-economic background. We used a multi-feature Mismatch Negativity (MMN) design with sequences of either syllables in Mandarin Chinese or harmonic sounds, both comprising deviants in pitch contour, duration and Voice Onset Time (VOT) or equivalent that were either far from (Large deviants) or close to (Small deviants) the standard. For both Mandarin syllables and harmonic sounds, results were clear-cut in showing larger MMNs to pitch contour deviants in musicians than in visual artists. Results were less clear for duration and VOT deviants, possibly because of the specific characteristics of the stimuli. Results are interpreted as reflecting similar processing of pitch contour in speech and non-speech sounds. The implications of these results for understanding the influence of intense musical training from childhood to adulthood and of genetic predispositions for music on foreign language perception are discussed.

  3. Musical expertise and foreign speech perception

    Directory of Open Access Journals (Sweden)

    Eduardo eMartínez-Montes

    2013-11-01

    Full Text Available The aim of this experiment was to investigate the influence of musical expertise on the automatic perception of foreign syllables and harmonic sounds. Participants were Cuban students with high level of expertise in music or in visual arts and with the same level of general education and socio-economic background. We used a multi-feature Mismatch Negativity (MMN design with sequences of either syllables in Mandarin Chinese or harmonic sounds, both comprising deviants in pitch contour, duration and Voice Onset Time (VOT or equivalent that were either far from (Large deviants or close to (Small deviants the standard. For both Mandarin syllables and harmonic sounds, results were clear-cut in showing larger MMNs to pitch contour deviants in musicians than in visual artists. Results were less clear for duration and VOT deviants, possibly because of the specific characteristics of the stimuli. Results are interpreted as reflecting similar processing of pitch contour in speech and non-speech sounds. The implications of these results for understanding the influence of intense musical training from childhood to adulthood and of genetic predispositions for music on foreign language perception is discussed.

  4. Abortion and compelled physician speech.

    Science.gov (United States)

    Orentlicher, David

    2015-01-01

    Informed consent mandates for abortion providers may infringe the First Amendment's freedom of speech. On the other hand, they may reinforce the physician's duty to obtain informed consent. Courts can promote both doctrines by ensuring that compelled physician speech pertains to medical facts about abortion rather than abortion ideology and that compelled speech is truthful and not misleading. © 2015 American Society of Law, Medicine & Ethics, Inc.

  5. Automatic control variac system for electronic accelerator

    International Nuclear Information System (INIS)

    Zhang Shuocheng; Wang Dan; Jing Lan; Qiao Weimin; Ma Yunhai

    2006-01-01

    An automatic control variac system is designed in order to satisfy the controlling requirement of the electronic accelerator developed by the Institute. Both design and operational principles, structure of the system as well as the software of industrial PC and micro controller unit are described. The interfaces of the control module are RS232 and RS485. A fiber optical interface (FOC) could be set up if an industrial FOC network is necessary, which will extend the filed of its application and make the communication of the system better. It is shown in practice that the system can adjust the variac output voltage automatically and assure the accurate and automatic control of the electronic accelerator. The system is designed in accordance with the general design principles and possesses the merits such as easy operation and maintenance, good expansibility, and low cost, thus it could also be used in other industrial branches. (authors)

  6. Automatic positioning control device for automatic control rod exchanger

    International Nuclear Information System (INIS)

    Nasu, Seiji; Sasaki, Masayoshi.

    1982-01-01

    Purpose: To attain accurate positioning for a control rod exchanger. Constitution: The present position for an automatic control rod exchanger is detected by a synchro generator. An aimed stopping position for the exchanger, a stop instruction range depending on the distantial operation delay in the control system and the inertia-running distance of the mechanical system, and a coincidence confirmation range depending on the required positioning accuracy are previously set. If there is a difference between the present position and the aimed stopping position, the automatic exchanger is caused to run toward the aimed stopping position. A stop instruction is generated upon arrival at the position within said stop instruction range, and a coincidence confirmation signal is generated upon arrival at the position within the coincidence confirmation range. Since uncertain factors such as operation delay in the control system and the inertia-running distance of the mechanical system that influence the positioning accuracy are made definite by the method of actual measurement or the like and the stop instruction range and the coincidence confirmation range are set based on the measured data, the accuracy for the positioning can be improved. (Ikeda, J.)

  7. Child vocalization composition as discriminant information for automatic autism detection.

    Science.gov (United States)

    Xu, Dongxin; Gilkerson, Jill; Richards, Jeffrey; Yapanel, Umit; Gray, Sharmi

    2009-01-01

    Early identification is crucial for young children with autism to access early intervention. The existing screens require either a parent-report questionnaire and/or direct observation by a trained practitioner. Although an automatic tool would benefit parents, clinicians and children, there is no automatic screening tool in clinical use. This study reports a fully automatic mechanism for autism detection/screening for young children. This is a direct extension of the LENA (Language ENvironment Analysis) system, which utilizes speech signal processing technology to analyze and monitor a child's natural language environment and the vocalizations/speech of the child. It is discovered that child vocalization composition contains rich discriminant information for autism detection. By applying pattern recognition and machine learning approaches to child vocalization composition data, accuracy rates of 85% to 90% in cross-validation tests for autism detection have been achieved at the equal-error-rate (EER) point on a data set with 34 children with autism, 30 language delayed children and 76 typically developing children. Due to its easy and automatic procedure, it is believed that this new tool can serve a significant role in childhood autism screening, especially in regards to population-based or universal screening.

  8. Current trends in multilingual speech processing

    Indian Academy of Sciences (India)

    2016-08-26

    ; speech-to-speech translation; language identification. ... interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers.

  9. Multimodal Speech Capture System for Speech Rehabilitation and Learning.

    Science.gov (United States)

    Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam

    2017-11-01

    Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.

  10. Measurement of speech parameters in casual speech of dementia patients

    NARCIS (Netherlands)

    Ossewaarde, Roelant; Jonkers, Roel; Jalvingh, Fedor; Bastiaanse, Yvonne

    Measurement of speech parameters in casual speech of dementia patients Roelant Adriaan Ossewaarde1,2, Roel Jonkers1, Fedor Jalvingh1,3, Roelien Bastiaanse1 1CLCG, University of Groningen (NL); 2HU University of Applied Sciences Utrecht (NL); 33St. Marienhospital - Vechta, Geriatric Clinic Vechta

  11. Speech Perception as a Multimodal Phenomenon

    OpenAIRE

    Rosenblum, Lawrence D.

    2008-01-01

    Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal s...

  12. Automatic Payroll Deposit System.

    Science.gov (United States)

    Davidson, D. B.

    1979-01-01

    The Automatic Payroll Deposit System in Yakima, Washington's Public School District No. 7, directly transmits each employee's salary amount for each pay period to a bank or other financial institution. (Author/MLF)

  13. Automatic Test Systems Aquisition

    National Research Council Canada - National Science Library

    1994-01-01

    We are providing this final memorandum report for your information and use. This report discusses the efforts to achieve commonality in standards among the Military Departments as part of the DoD policy for automatic test systems (ATS...

  14. Automatically sweeping dual-channel boxcar integrator

    International Nuclear Information System (INIS)

    Keefe, D.J.; Patterson, D.R.

    1978-01-01

    An automatically sweeping dual-channel boxcar integrator has been developed to automate the search for a signal that repeatedly follows a trigger pulse by a constant or slowly varying time delay when that signal is completely hidden in random electrical noise and dc-offset drifts. The automatically sweeping dual-channel boxcar integrator improves the signal-to-noise ratio and eliminates dc-drift errors in the same way that a conventional dual-channel boxcar integrator does, but, in addition, automatically locates the hidden signal. When the signal is found, its time delay is displayed with 100-ns resolution, and its peak value is automatically measured and displayed. This relieves the operator of the tedious, time-consuming, and error-prone search for the signal whenever the time delay changes. The automatically sweeping boxcar integrator can also be used as a conventional dual-channel boxcar integrator. In either mode, it can repeatedly integrate a signal up to 990 times and thus make accurate measurements of the signal pulse height in the presence of random noise, dc offsets, and unsynchronized interfering signals

  15. Brand and automaticity

    OpenAIRE

    Liu, J.

    2008-01-01

    A presumption of most consumer research is that consumers endeavor to maximize the utility of their choices and are in complete control of their purchasing and consumption behavior. However, everyday life experience suggests that many of our choices are not all that reasoned or conscious. Indeed, automaticity, one facet of behavior, is indispensable to complete the portrait of consumers. Despite its importance, little attention is paid to how the automatic side of behavior can be captured and...

  16. Position automatic determination technology

    International Nuclear Information System (INIS)

    1985-10-01

    This book tells of method of position determination and characteristic, control method of position determination and point of design, point of sensor choice for position detector, position determination of digital control system, application of clutch break in high frequency position determination, automation technique of position determination, position determination by electromagnetic clutch and break, air cylinder, cam and solenoid, stop position control of automatic guide vehicle, stacker crane and automatic transfer control.

  17. Automatic intelligent cruise control

    OpenAIRE

    Stanton, NA; Young, MS

    2006-01-01

    This paper reports a study on the evaluation of automatic intelligent cruise control (AICC) from a psychological perspective. It was anticipated that AICC would have an effect upon the psychology of driving—namely, make the driver feel like they have less control, reduce the level of trust in the vehicle, make drivers less situationally aware, but might reduce the workload and make driving might less stressful. Drivers were asked to drive in a driving simulator under manual and automatic inte...

  18. Auditory Modeling for Noisy Speech Recognition

    National Research Council Canada - National Science Library

    2000-01-01

    ... digital filtering for noise cancellation which interfaces to speech recognition software. It uses auditory features in speech recognition training, and provides applications to multilingual spoken language translation...

  19. Teaching Speech Acts

    Directory of Open Access Journals (Sweden)

    Teaching Speech Acts

    2007-01-01

    Full Text Available In this paper I argue that pragmatic ability must become part of what we teach in the classroom if we are to realize the goals of communicative competence for our students. I review the research on pragmatics, especially those articles that point to the effectiveness of teaching pragmatics in an explicit manner, and those that posit methods for teaching. I also note two areas of scholarship that address classroom needs—the use of authentic data and appropriate assessment tools. The essay concludes with a summary of my own experience teaching speech acts in an advanced-level Portuguese class.

  20. Subjective and Objective Quality Assessment of Single-Channel Speech Separation Algorithms

    DEFF Research Database (Denmark)

    Mowlaee, Pejman; Saeidi, Rahim; Christensen, Mads Græsbøll

    2012-01-01

    Previous studies on performance evaluation of single-channel speech separation (SCSS) algorithms mostly focused on automatic speech recognition (ASR) accuracy as their performance measure. Assessing the separated signals by different metrics other than this has the benefit that the results...... are expected to carry on to other applications beyond ASR. In this paper, in addition to conventional speech quality metrics (PESQ and SNRloss), we also evaluate the separation systems output using different source separation metrics: blind source separation evaluation (BSS EVAL) and perceptual evaluation...... that PESQ and PEASS quality metrics predict well the subjective quality of separated signals obtained by the separation systems. From the results it is observed that the short-time objective intelligibility (STOI) measure predict the speech intelligibility results....

  1. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System.

    Science.gov (United States)

    Partila, Pavol; Voznak, Miroslav; Tovarek, Jaromir

    2015-01-01

    The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.

  2. A Joint Approach for Single-Channel Speaker Identification and Speech Separation

    DEFF Research Database (Denmark)

    Mowlaee, Pejman; Saeidi, Rahim; Christensen, Mads Græsbøll

    2012-01-01

    ) accuracy, here, we report the objective and subjective results as well. The results show that the proposed system performs as well as the best of the state-of-the-art in terms of perceived quality while its performance in terms of speaker identification and automatic speech recognition results......In this paper, we present a novel system for joint speaker identification and speech separation. For speaker identification a single-channel speaker identification algorithm is proposed which provides an estimate of signal-to-signal ratio (SSR) as a by-product. For speech separation, we propose...... a sinusoidal model-based algorithm. The speech separation algorithm consists of a double-talk/single-talk detector followed by a minimum mean square error estimator of sinusoidal parameters for finding optimal codevectors from pre-trained speaker codebooks. In evaluating the proposed system, we start from...

  3. FUSING SPEECH SIGNAL AND PALMPRINT FEATURES FOR AN SECURED AUTHENTICATION SYSTEM

    Directory of Open Access Journals (Sweden)

    P.K. Mahesh

    2011-11-01

    Full Text Available In the application of Biometric authentication, personal identification is regarded as an effective method for automatic recognition, with a high confidence, a person’s identity. Using multimodal biometric systems we typically get better performance compare to single biometric modality. This paper proposes the multimodal biometrics system for identity verification using two traits, i.e., speech signal and palmprint. Integrating the palmprint and speech information increases robustness of person authentication. The proposed system is designed for applications where the training data contains a speech signal and palmprint. It is well known that the performance of person authentication using only speech signal or palmprint is deteriorated by feature changes with time. The final decision is made by fusion at matching score level architecture in which feature vectors are created independently for query measures and are then compared to the enrolment templates, which are stored during database preparation.

  4. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System

    Directory of Open Access Journals (Sweden)

    Pavol Partila

    2015-01-01

    Full Text Available The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.

  5. Histogram equalization with Bayesian estimation for noise robust speech recognition.

    Science.gov (United States)

    Suh, Youngjoo; Kim, Hoirin

    2018-02-01

    The histogram equalization approach is an efficient feature normalization technique for noise robust automatic speech recognition. However, it suffers from performance degradation when some fundamental conditions are not satisfied in the test environment. To remedy these limitations of the original histogram equalization methods, class-based histogram equalization approach has been proposed. Although this approach showed substantial performance improvement under noise environments, it still suffers from performance degradation due to the overfitting problem when test data are insufficient. To address this issue, the proposed histogram equalization technique employs the Bayesian estimation method in the test cumulative distribution function estimation. It was reported in a previous study conducted on the Aurora-4 task that the proposed approach provided substantial performance gains in speech recognition systems based on the acoustic modeling of the Gaussian mixture model-hidden Markov model. In this work, the proposed approach was examined in speech recognition systems with deep neural network-hidden Markov model (DNN-HMM), the current mainstream speech recognition approach where it also showed meaningful performance improvement over the conventional maximum likelihood estimation-based method. The fusion of the proposed features with the mel-frequency cepstral coefficients provided additional performance gains in DNN-HMM systems, which otherwise suffer from performance degradation in the clean test condition.

  6. Comparison of Classification Methods for Detecting Emotion from Mandarin Speech

    Science.gov (United States)

    Pao, Tsang-Long; Chen, Yu-Te; Yeh, Jun-Heng

    It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers being able to perceive and respond to human emotion, the human-computer interaction will be more natural. Several classifiers are adopted for automatically assigning an emotion category, such as anger, happiness or sadness, to a speech utterance. These classifiers were designed independently and tested on various emotional speech corpora, making it difficult to compare and evaluate their performance. In this paper, we first compared several popular classification methods and evaluated their performance by applying them to a Mandarin speech corpus consisting of five basic emotions, including anger, happiness, boredom, sadness and neutral. The extracted feature streams contain MFCC, LPCC, and LPC. The experimental results show that the proposed WD-MKNN classifier achieves an accuracy of 81.4% for the 5-class emotion recognition and outperforms other classification techniques, including KNN, MKNN, DW-KNN, LDA, QDA, GMM, HMM, SVM, and BPNN. Then, to verify the advantage of the proposed method, we compared these classifiers by applying them to another Mandarin expressive speech corpus consisting of two emotions. The experimental results still show that the proposed WD-MKNN outperforms others.

  7. Auditory analysis for speech recognition based on physiological models

    Science.gov (United States)

    Jeon, Woojay; Juang, Biing-Hwang

    2004-05-01

    To address the limitations of traditional cepstrum or LPC based front-end processing methods for automatic speech recognition, more elaborate methods based on physiological models of the human auditory system may be used to achieve more robust speech recognition in adverse environments. For this purpose, a modified version of a model of the primary auditory cortex featuring a three dimensional mapping of auditory spectra [Wang and Shamma, IEEE Trans. Speech Audio Process. 3, 382-395 (1995)] is adopted and investigated for its use as an improved front-end processing method. The study is conducted in two ways: first, by relating the model's redundant representation to traditional spectral representations and showing that the former not only encompasses information provided by the latter, but also reveals more relevant information that makes it superior in describing the identifying features of speech signals; and second, by observing the statistical features of the representation for various classes of sound to show how different identifying features manifest themselves as specific patterns on the cortical map, thereby becoming a place-coded data set on which detection theory could be applied to simulate auditory perception and cognition.

  8. Speech Training for Inmate Rehabilitation.

    Science.gov (United States)

    Parkinson, Michael G.; Dobkins, David H.

    1982-01-01

    Using a computerized content analysis, the authors demonstrate changes in speech behaviors of prison inmates. They conclude that two to four hours of public speaking training can have only limited effect on students who live in a culture in which "prison speech" is the expected and rewarded form of behavior. (PD)

  9. Separating Underdetermined Convolutive Speech Mixtures

    DEFF Research Database (Denmark)

    Pedersen, Michael Syskind; Wang, DeLiang; Larsen, Jan

    2006-01-01

    a method for underdetermined blind source separation of convolutive mixtures. The proposed framework is applicable for separation of instantaneous as well as convolutive speech mixtures. It is possible to iteratively extract each speech signal from the mixture by combining blind source separation...

  10. Speech Prosody in Cerebellar Ataxia

    Science.gov (United States)

    Casper, Maureen A.; Raphael, Lawrence J.; Harris, Katherine S.; Geibel, Jennifer M.

    2007-01-01

    Persons with cerebellar ataxia exhibit changes in physical coordination and speech and voice production. Previously, these alterations of speech and voice production were described primarily via perceptual coordinates. In this study, the spatial-temporal properties of syllable production were examined in 12 speakers, six of whom were healthy…

  11. Delayed speech development in children: Introduction to terminology

    Directory of Open Access Journals (Sweden)

    M. Yu. Bobylova

    2017-01-01

    Full Text Available There has been recently an increase in the number of children diagnosed with delayed speech development. There is delay compensation with age, but mild deficiency often remains for life. Delayed speech development is more common in boys than in girls. Its etiology is unknown in most cases, so a child should be followed up to make an accurate diagnosis. Genetic predisposition or environmental factors frequently influence speech development. The course of its delays is various. In the history of a number of disorders (childhood disintegrative disorder, Landau–Kleffner syndrome, there is evidence for the normal development of speech to a certain period and then stops or even regresses. By way of comparison, there are generally speech developmental changes in autism even during the preverbal stage (a complex of revival fails to form; babbling is poor, low emotional, gibberish; at the same time, the baby recipes whole phrases without using them to communicate. These speech disorders are considered not only as a delay, but also as a developmental abnormality. Speech disorders in children should be diagnosed as early as possible in order to initiative corrective measures in time. In this case, a physician makes a diagnosis and a special education teacher does corrective work. The successful collaboration and mutual understanding of the specialists in these areas will determine quality of life for a child in the future. This paper focusses on the terminology and classification of delays, which are necessary for physicians and teachers to speak the same language.

  12. Methods for eliciting, annotating, and analyzing databases for child speech development.

    Science.gov (United States)

    Beckman, Mary E; Plummer, Andrew R; Munson, Benjamin; Reidy, Patrick F

    2017-09-01

    Methods from automatic speech recognition (ASR), such as segmentation and forced alignment, have facilitated the rapid annotation and analysis of very large adult speech databases and databases of caregiver-infant interaction, enabling advances in speech science that were unimaginable just a few decades ago. This paper centers on two main problems that must be addressed in order to have analogous resources for developing and exploiting databases of young children's speech. The first problem is to understand and appreciate the differences between adult and child speech that cause ASR models developed for adult speech to fail when applied to child speech. These differences include the fact that children's vocal tracts are smaller than those of adult males and also changing rapidly in size and shape over the course of development, leading to between-talker variability across age groups that dwarfs the between-talker differences between adult men and women. Moreover, children do not achieve fully adult-like speech motor control until they are young adults, and their vocabularies and phonological proficiency are developing as well, leading to considerably more within-talker variability as well as more between-talker variability. The second problem then is to determine what annotation schemas and analysis techniques can most usefully capture relevant aspects of this variability. Indeed, standard acoustic characterizations applied to child speech reveal that adult-centered annotation schemas fail to capture phenomena such as the emergence of covert contrasts in children's developing phonological systems, while also revealing children's nonuniform progression toward community speech norms as they acquire the phonological systems of their native languages. Both problems point to the need for more basic research into the growth and development of the articulatory system (as well as of the lexicon and phonological system) that is oriented explicitly toward the construction of

  13. Effects of low harmonics on tone identification in natural and vocoded speech.

    Science.gov (United States)

    Liu, Chang; Azimi, Behnam; Tahmina, Qudsia; Hu, Yi

    2012-11-01

    This study investigated the contribution of low-frequency harmonics to identifying Mandarin tones in natural and vocoded speech in quiet and noisy conditions. Results showed that low-frequency harmonics of natural speech led to highly accurate tone identification; however, for vocoded speech, low-frequency harmonics yielded lower tone identification than stimuli with full harmonics, except for tone 4. Analysis of the correlation between tone accuracy and the amplitude-F0 correlation index suggested that "more" speech contents (i.e., more harmonics) did not necessarily yield better tone recognition for vocoded speech, especially when the amplitude contour of the signals did not co-vary with the F0 contour.

  14. On speech recognition during anaesthesia

    DEFF Research Database (Denmark)

    Alapetite, Alexandre

    2007-01-01

    This PhD thesis in human-computer interfaces (informatics) studies the case of the anaesthesia record used during medical operations and the possibility to supplement it with speech recognition facilities. Problems and limitations have been identified with the traditional paper-based anaesthesia...... and inaccuracies in the anaesthesia record. Supplementing the electronic anaesthesia record interface with speech input facilities is proposed as one possible solution to a part of the problem. The testing of the various hypotheses has involved the development of a prototype of an electronic anaesthesia record...... interface with speech input facilities in Danish. The evaluation of the new interface was carried out in a full-scale anaesthesia simulator. This has been complemented by laboratory experiments on several aspects of speech recognition for this type of use, e.g. the effects of noise on speech recognition...

  15. From Gesture to Speech

    Directory of Open Access Journals (Sweden)

    Maurizio Gentilucci

    2012-11-01

    Full Text Available One of the major problems concerning the evolution of human language is to understand how sounds became associated to meaningful gestures. It has been proposed that the circuit controlling gestures and speech evolved from a circuit involved in the control of arm and mouth movements related to ingestion. This circuit contributed to the evolution of spoken language, moving from a system of communication based on arm gestures. The discovery of the mirror neurons has provided strong support for the gestural theory of speech origin because they offer a natural substrate for the embodiment of language and create a direct link between sender and receiver of a message. Behavioural studies indicate that manual gestures are linked to mouth movements used for syllable emission. Grasping with the hand selectively affected movement of inner or outer parts of the mouth according to syllable pronunciation and hand postures, in addition to hand actions, influenced the control of mouth grasp and vocalization. Gestures and words are also related to each other. It was found that when producing communicative gestures (emblems the intention to interact directly with a conspecific was transferred from gestures to words, inducing modification in voice parameters. Transfer effects of the meaning of representational gestures were found on both vocalizations and meaningful words. It has been concluded that the results of our studies suggest the existence of a system relating gesture to vocalization which was precursor of a more general system reciprocally relating gesture to word.

  16. Can blind persons accurately assess body size from the voice?

    Science.gov (United States)

    Pisanski, Katarzyna; Oleszkiewicz, Anna; Sorokowska, Agnieszka

    2016-04-01

    Vocal tract resonances provide reliable information about a speaker's body size that human listeners use for biosocial judgements as well as speech recognition. Although humans can accurately assess men's relative body size from the voice alone, how this ability is acquired remains unknown. In this study, we test the prediction that accurate voice-based size estimation is possible without prior audiovisual experience linking low frequencies to large bodies. Ninety-one healthy congenitally or early blind, late blind and sighted adults (aged 20-65) participated in the study. On the basis of vowel sounds alone, participants assessed the relative body sizes of male pairs of varying heights. Accuracy of voice-based body size assessments significantly exceeded chance and did not differ among participants who were sighted, or congenitally blind or who had lost their sight later in life. Accuracy increased significantly with relative differences in physical height between men, suggesting that both blind and sighted participants used reliable vocal cues to size (i.e. vocal tract resonances). Our findings demonstrate that prior visual experience is not necessary for accurate body size estimation. This capacity, integral to both nonverbal communication and speech perception, may be present at birth or may generalize from broader cross-modal correspondences. © 2016 The Author(s).

  17. Readability index as a design criterion for elicited imitation tasks in automatic oral proficiency assessment

    CSIR Research Space (South Africa)

    De Wet, Febe

    2011-08-01

    Full Text Available ) techniques to automatically assess oral proficiency and listen- ing comprehension is one way in which these logistical prob- lems can be obviated. Another appealing feature of automatic tests is that they provide a means to assess consistently and ob...?146, 2008. [2] F. De Wet, C. Van der Walt, and T. R. Niesler, ?Automatic assess- ment of oral language proficiency and listening comprehension,? Speech Communication, vol. 51, pp. 864?874, 2009. [3] C. R. Graham, D. Lonsdale, C. Kennington, A. Johnson...

  18. A Novel Real-Time Speech Summarizer System for the Learning of Sustainability

    Directory of Open Access Journals (Sweden)

    Hsiu-Wen Wang

    2015-04-01

    Full Text Available As the number of speech and video documents increases on the Internet and portable devices proliferate, speech summarization becomes increasingly essential. Relevant research in this domain has typically focused on broadcasts and news; however, the automatic summarization methods used in the past may not apply to other speech domains (e.g., speech in lectures. Therefore, this study explores the lecture speech domain. The features used in previous research were analyzed and suitable features were selected following experimentation; subsequently, a three-phase real-time speech summarizer for the learning of sustainability (RTSSLS was proposed. Phase One involved selecting independent features (e.g., centrality, resemblance to the title, sentence length, term frequency, and thematic words and calculating the independent feature scores; Phase Two involved calculating the dependent features, such as the position compared with the independent feature scores; and Phase Three involved comparing these feature scores to obtain weighted averages of the function-scores, determine the highest-scoring sentence, and provide a summary. In practical results, the accuracies of macro-average and micro-average for the RTSSLS were 70% and 73%, respectively. Therefore, using a RTSSLS can enable users to acquire key speech information for the learning of sustainability.

  19. Online Speech/Music Segmentation Based on the Variance Mean of Filter Bank Energy

    Directory of Open Access Journals (Sweden)

    Zdravko Kačič

    2009-01-01

    Full Text Available This paper presents a novel feature for online speech/music segmentation based on the variance mean of filter bank energy (VMFBE. The idea that encouraged the feature's construction is energy variation in a narrow frequency sub-band. The energy varies more rapidly, and to a greater extent for speech than for music. Therefore, an energy variance in such a sub-band is greater for speech than for music. The radio broadcast database and the BNSI broadcast news database were used for feature discrimination and segmentation ability evaluation. The calculation procedure of the VMFBE feature has 4 out of 6 steps in common with the MFCC feature calculation procedure. Therefore, it is a very convenient speech/music discriminator for use in real-time automatic speech recognition systems based on MFCC features, because valuable processing time can be saved, and computation load is only slightly increased. Analysis of the feature's speech/music discriminative ability shows an average error rate below 10% for radio broadcast material and it outperforms other features used for comparison, by more than 8%. The proposed feature as a stand-alone speech/music discriminator in a segmentation system achieves an overall accuracy of over 94% on radio broadcast material.

  20. Magneto encephalography (MEG: perspectives of speech areas functional mapping in human subjects

    Directory of Open Access Journals (Sweden)

    Butorina A. V.

    2012-06-01

    Full Text Available One of the main problems in clinical practice and academic research is how to localize speech zones in the human brain. Two speech areas (Broca and Wernicke areas that are responsible for language production and for understanding of written and spoken language have been known since the past century. Their location and even hemispheric lateralization have a substantial inter-individual variability, especially in neurosurgery patients. Wada test is one of the most frequently used invasive methodology for speech hemispheric lateralization in neurosurgery patients. However, besides relatively high-risk of Wada test for patient's health, it has its own limitation, e. g. low reliability of Wada-based evidence of verbal memory brain lateralization. Therefore, there is an urgent need for non-invasive, reliable methods of speech zones mapping.The current review summarizes the recent experimental evidence from magnitoencephalographic (MEG research suggesting that speech areas are included in the speech processing within the first 200 ms after the word onset. The electro-magnetic response to deviant word, mismatch negativity wave with latency of 100—200 ms, can be recorded from auditory cortex within the oddball-paradigm. We provide the arguments that basic features of this brain response, such as its automatic, pre-attentive nature, high signal to noise ratio, source localization at superior temporal sulcus, make it a promising vehicle for non-invasive MEG-based speech areas mapping in neurosurgery.

  1. Emotional prosody of task-irrelevant speech interferes with the retention of serial order.

    Science.gov (United States)

    Kattner, Florian; Ellermeier, Wolfgang

    2018-04-09

    Task-irrelevant speech and other temporally changing sounds are known to interfere with the short-term memorization of ordered verbal materials, as compared to silence or stationary sounds. It has been argued that this disruption of short-term memory (STM) may be due to (a) interference of automatically encoded acoustical fluctuations with the process of serial rehearsal or (b) attentional capture by salient task-irrelevant information. To disentangle the contributions of these 2 processes, the authors investigated whether the disruption of serial recall is due to the semantic or acoustical properties of task-irrelevant speech (Experiment 1). They found that performance was affected by the prosody (emotional intonation), but not by the semantics (word meaning), of irrelevant speech, suggesting that the disruption of serial recall is due to interference of precategorically encoded changing-state sound (with higher fluctuation strength of emotionally intonated speech). The authors further demonstrated a functional distinction between this form of distraction and attentional capture by contrasting the effect of (a) speech prosody and (b) sudden prosody deviations on both serial and nonserial STM tasks (Experiment 2). Although serial recall was again sensitive to the emotional prosody of irrelevant speech, performance on a nonserial missing-item task was unaffected by the presence of neutral or emotionally intonated speech sounds. In contrast, sudden prosody changes tended to impair performance on both tasks, suggesting an independent effect of attentional capture. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  2. Technological evaluation of gesture and speech interfaces for enabling dismounted soldier-robot dialogue

    Science.gov (United States)

    Kattoju, Ravi Kiran; Barber, Daniel J.; Abich, Julian; Harris, Jonathan

    2016-05-01

    With increasing necessity for intuitive Soldier-robot communication in military operations and advancements in interactive technologies, autonomous robots have transitioned from assistance tools to functional and operational teammates able to service an array of military operations. Despite improvements in gesture and speech recognition technologies, their effectiveness in supporting Soldier-robot communication is still uncertain. The purpose of the present study was to evaluate the performance of gesture and speech interface technologies to facilitate Soldier-robot communication during a spatial-navigation task with an autonomous robot. Gesture and speech semantically based spatial-navigation commands leveraged existing lexicons for visual and verbal communication from the U.S Army field manual for visual signaling and a previously established Squad Level Vocabulary (SLV). Speech commands were recorded by a Lapel microphone and Microsoft Kinect, and classified by commercial off-the-shelf automatic speech recognition (ASR) software. Visual signals were captured and classified using a custom wireless gesture glove and software. Participants in the experiment commanded a robot to complete a simulated ISR mission in a scaled down urban scenario by delivering a sequence of gesture and speech commands, both individually and simultaneously, to the robot. Performance and reliability of gesture and speech hardware interfaces and recognition tools were analyzed and reported. Analysis of experimental results demonstrated the employed gesture technology has significant potential for enabling bidirectional Soldier-robot team dialogue based on the high classification accuracy and minimal training required to perform gesture commands.

  3. Online Speech/Music Segmentation Based on the Variance Mean of Filter Bank Energy

    Science.gov (United States)

    Kos, Marko; Grašič, Matej; Kačič, Zdravko

    2009-12-01

    This paper presents a novel feature for online speech/music segmentation based on the variance mean of filter bank energy (VMFBE). The idea that encouraged the feature's construction is energy variation in a narrow frequency sub-band. The energy varies more rapidly, and to a greater extent for speech than for music. Therefore, an energy variance in such a sub-band is greater for speech than for music. The radio broadcast database and the BNSI broadcast news database were used for feature discrimination and segmentation ability evaluation. The calculation procedure of the VMFBE feature has 4 out of 6 steps in common with the MFCC feature calculation procedure. Therefore, it is a very convenient speech/music discriminator for use in real-time automatic speech recognition systems based on MFCC features, because valuable processing time can be saved, and computation load is only slightly increased. Analysis of the feature's speech/music discriminative ability shows an average error rate below 10% for radio broadcast material and it outperforms other features used for comparison, by more than 8%. The proposed feature as a stand-alone speech/music discriminator in a segmentation system achieves an overall accuracy of over 94% on radio broadcast material.

  4. Changes in speech production in a child with a cochlear implant: acoustic and kinematic evidence.

    Science.gov (United States)

    Goffman, Lisa; Ertmer, David J; Erdle, Christa

    2002-10-01

    A method is presented for examining change in motor patterns used to produce linguistic contrasts. In this case study, the method is applied to a child receiving new auditory input following cochlear implantation. This child experienced hearing loss at age 3 years and received a multichannel cochlear implant at age 7 years. Data collection points occurred both pre- and postimplant and included acoustic and kinematic analyses. Overall, this child's speech output was transcribed as accurate across the pre- and postimplant periods. Postimplant, with the onset of new auditory experience, acoustic durations showed a predictable maturational change, usually decreasing in duration. Conversely, the spatiotemporal stability of speech movements initially became more variable postimplantation. The auditory perturbations experienced by this child during development led to changes in the physiological underpinnings of speech production, even when speech output was perceived as accurate.

  5. Automatic River Network Extraction from LIDAR Data

    Science.gov (United States)

    Maderal, E. N.; Valcarcel, N.; Delgado, J.; Sevilla, C.; Ojeda, J. C.

    2016-06-01

    National Geographic Institute of Spain (IGN-ES) has launched a new production system for automatic river network extraction for the Geospatial Reference Information (GRI) within hydrography theme. The goal is to get an accurate and updated river network, automatically extracted as possible. For this, IGN-ES has full LiDAR coverage for the whole Spanish territory with a density of 0.5 points per square meter. To implement this work, it has been validated the technical feasibility, developed a methodology to automate each production phase: hydrological terrain models generation with 2 meter grid size and river network extraction combining hydrographic criteria (topographic network) and hydrological criteria (flow accumulation river network), and finally the production was launched. The key points of this work has been managing a big data environment, more than 160,000 Lidar data files, the infrastructure to store (up to 40 Tb between results and intermediate files), and process; using local virtualization and the Amazon Web Service (AWS), which allowed to obtain this automatic production within 6 months, it also has been important the software stability (TerraScan-TerraSolid, GlobalMapper-Blue Marble , FME-Safe, ArcGIS-Esri) and finally, the human resources managing. The results of this production has been an accurate automatic river network extraction for the whole country with a significant improvement for the altimetric component of the 3D linear vector. This article presents the technical feasibility, the production methodology, the automatic river network extraction production and its advantages over traditional vector extraction systems.

  6. AUTOMATIC RIVER NETWORK EXTRACTION FROM LIDAR DATA

    Directory of Open Access Journals (Sweden)

    E. N. Maderal

    2016-06-01

    Full Text Available National Geographic Institute of Spain (IGN-ES has launched a new production system for automatic river network extraction for the Geospatial Reference Information (GRI within hydrography theme. The goal is to get an accurate and updated river network, automatically extracted as possible. For this, IGN-ES has full LiDAR coverage for the whole Spanish territory with a density of 0.5 points per square meter. To implement this work, it has been validated the technical feasibility, developed a methodology to automate each production phase: hydrological terrain models generation with 2 meter grid size and river network extraction combining hydrographic criteria (topographic network and hydrological criteria (flow accumulation river network, and finally the production was launched. The key points of this work has been managing a big data environment, more than 160,000 Lidar data files, the infrastructure to store (up to 40 Tb between results and intermediate files, and process; using local virtualization and the Amazon Web Service (AWS, which allowed to obtain this automatic production within 6 months, it also has been important the software stability (TerraScan-TerraSolid, GlobalMapper-Blue Marble , FME-Safe, ArcGIS-Esri and finally, the human resources managing. The results of this production has been an accurate automatic river network extraction for the whole country with a significant improvement for the altimetric component of the 3D linear vector. This article presents the technical feasibility, the production methodology, the automatic river network extraction production and its advantages over traditional vector extraction systems.

  7. Stuttering Frequency, Speech Rate, Speech Naturalness, and Speech Effort During the Production of Voluntary Stuttering.

    Science.gov (United States)

    Davidow, Jason H; Grossman, Heather L; Edge, Robin L

    2018-05-01

    Voluntary stuttering techniques involve persons who stutter purposefully interjecting disfluencies into their speech. Little research has been conducted on the impact of these techniques on the speech pattern of persons who stutter. The present study examined whether changes in the frequency of voluntary stuttering accompanied changes in stuttering frequency, articulation rate, speech naturalness, and speech effort. In total, 12 persons who stutter aged 16-34 years participated. Participants read four 300-syllable passages during a control condition, and three voluntary stuttering conditions that involved attempting to produce purposeful, tension-free repetitions of initial sounds or syllables of a word for two or more repetitions (i.e., bouncing). The three voluntary stuttering conditions included bouncing on 5%, 10%, and 15% of syllables read. Friedman tests and follow-up Wilcoxon signed ranks tests were conducted for the statistical analyses. Stuttering frequency, articulation rate, and speech naturalness were significantly different between the voluntary stuttering conditions. Speech effort did not differ between the voluntary stuttering conditions. Stuttering frequency was significantly lower during the three voluntary stuttering conditions compared to the control condition, and speech effort was significantly lower during two of the three voluntary stuttering conditions compared to the control condition. Due to changes in articulation rate across the voluntary stuttering conditions, it is difficult to conclude, as has been suggested previously, that voluntary stuttering is the reason for stuttering reductions found when using voluntary stuttering techniques. Additionally, future investigations should examine different types of voluntary stuttering over an extended period of time to determine their impact on stuttering frequency, speech rate, speech naturalness, and speech effort.

  8. Discrete Model Reference Adaptive Control System for Automatic Profiling Machine

    Directory of Open Access Journals (Sweden)

    Peng Song

    2012-01-01

    Full Text Available Automatic profiling machine is a movement system that has a high degree of parameter variation and high frequency of transient process, and it requires an accurate control in time. In this paper, the discrete model reference adaptive control system of automatic profiling machine is discussed. Firstly, the model of automatic profiling machine is presented according to the parameters of DC motor. Then the design of the discrete model reference adaptive control is proposed, and the control rules are proven. The results of simulation show that adaptive control system has favorable dynamic performances.

  9. Visual Speech Fills in Both Discrimination and Identification of Non-Intact Auditory Speech in Children

    Science.gov (United States)

    Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Herve

    2018-01-01

    To communicate, children must discriminate and identify speech sounds. Because visual speech plays an important role in this process, we explored how visual speech influences phoneme discrimination and identification by children. Critical items had intact visual speech (e.g. baez) coupled to non-intact (excised onsets) auditory speech (signified…

  10. Speech enhancement using emotion dependent codebooks

    NARCIS (Netherlands)

    Naidu, D.H.R.; Srinivasan, S.

    2012-01-01

    Several speech enhancement approaches utilize trained models of clean speech data, such as codebooks, Gaussian mixtures, and hidden Markov models. These models are typically trained on neutral clean speech data, without any emotion. However, in practical scenarios, emotional speech is a common

  11. Automated Speech Rate Measurement in Dysarthria

    Science.gov (United States)

    Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

    2015-01-01

    Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…

  12. Is Birdsong More Like Speech or Music?

    Science.gov (United States)

    Shannon, Robert V

    2016-04-01

    Music and speech share many acoustic cues but not all are equally important. For example, harmonic pitch is essential for music but not for speech. When birds communicate is their song more like speech or music? A new study contrasting pitch and spectral patterns shows that birds perceive their song more like humans perceive speech. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. Freedom of Speech Newsletter, September, 1975.

    Science.gov (United States)

    Allen, Winfred G., Jr., Ed.

    The Freedom of Speech Newsletter is the communication medium for the Freedom of Speech Interest Group of the Western Speech Communication Association. The newsletter contains such features as a statement of concern by the National Ad Hoc Committee Against Censorship; Reticence and Free Speech, an article by James F. Vickrey discussing the subtle…

  14. Automatic Program Development

    DEFF Research Database (Denmark)

    Automatic Program Development is a tribute to Robert Paige (1947-1999), our accomplished and respected colleague, and moreover our good friend, whose untimely passing was a loss to our academic and research community. We have collected the revised, updated versions of the papers published in his...... honor in the Higher-Order and Symbolic Computation Journal in the years 2003 and 2005. Among them there are two papers by Bob: (i) a retrospective view of his research lines, and (ii) a proposal for future studies in the area of the automatic program derivation. The book also includes some papers...... by members of the IFIP Working Group 2.1 of which Bob was an active member. All papers are related to some of the research interests of Bob and, in particular, to the transformational development of programs and their algorithmic derivation from formal specifications. Automatic Program Development offers...

  15. Neurophysiological Evidence That Musical Training Influences the Recruitment of Right Hemispheric Homologues for Speech Perception

    Directory of Open Access Journals (Sweden)

    McNeel Gordon Jantzen

    2014-03-01

    Full Text Available Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Kraus & Chandrasekaran, 2010; Parbery-Clark, Skoe, & Kraus, 2009; Zendel & Alain, 2008; Musacchia, Sams, Skoe, & Kraus, 2007. Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus (MTG and superior temporal gyrus (STG in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain.

  16. Speech recovery device

    Energy Technology Data Exchange (ETDEWEB)

    Frankle, Christen M.

    2004-04-20

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  17. Speech recovery device

    Energy Technology Data Exchange (ETDEWEB)

    Frankle, Christen M.

    2000-10-19

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  18. Steganalysis of recorded speech

    Science.gov (United States)

    Johnson, Micah K.; Lyu, Siwei; Farid, Hany

    2005-03-01

    Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.

  19. Common cues to emotion in the dynamic facial expressions of speech and song.

    Science.gov (United States)

    Livingstone, Steven R; Thompson, William F; Wanderley, Marcelo M; Palmer, Caroline

    2015-01-01

    Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech-song differences. Vocalists' jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech-song. Vocalists' emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists' facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production.

  20. Speech enhancement theory and practice

    CERN Document Server

    Loizou, Philipos C

    2013-01-01

    With the proliferation of mobile devices and hearing devices, including hearing aids and cochlear implants, there is a growing and pressing need to design algorithms that can improve speech intelligibility without sacrificing quality. Responding to this need, Speech Enhancement: Theory and Practice, Second Edition introduces readers to the basic problems of speech enhancement and the various algorithms proposed to solve these problems. Updated and expanded, this second edition of the bestselling textbook broadens its scope to include evaluation measures and enhancement algorithms aimed at impr

  1. Automatic text summarization

    CERN Document Server

    Torres Moreno, Juan Manuel

    2014-01-01

    This new textbook examines the motivations and the different algorithms for automatic document summarization (ADS). We performed a recent state of the art. The book shows the main problems of ADS, difficulties and the solutions provided by the community. It presents recent advances in ADS, as well as current applications and trends. The approaches are statistical, linguistic and symbolic. Several exemples are included in order to clarify the theoretical concepts.  The books currently available in the area of Automatic Document Summarization are not recent. Powerful algorithms have been develop

  2. Automatic Ultrasound Scanning

    DEFF Research Database (Denmark)

    Moshavegh, Ramin

    on the user adjustments on the scanner interface to optimize the scan settings. This explains the huge interest in the subject of this PhD project entitled “AUTOMATIC ULTRASOUND SCANNING”. The key goals of the project have been to develop automated techniques to minimize the unnecessary settings...... on the scanners, and to improve the computer-aided diagnosis (CAD) in ultrasound by introducing new quantitative measures. Thus, four major issues concerning automation of the medical ultrasound are addressed in this PhD project. They touch upon gain adjustments in ultrasound, automatic synthetic aperture image...

  3. Automatic NAA. Saturation activities

    International Nuclear Information System (INIS)

    Westphal, G.P.; Grass, F.; Kuhnert, M.

    2008-01-01

    A system for Automatic NAA is based on a list of specific saturation activities determined for one irradiation position at a given neutron flux and a single detector geometry. Originally compiled from measurements of standard reference materials, the list may be extended also by the calculation of saturation activities from k 0 and Q 0 factors, and f and α values of the irradiation position. A systematic improvement of the SRM approach is currently being performed by pseudo-cyclic activation analysis, to reduce counting errors. From these measurements, the list of saturation activities is recalculated in an automatic procedure. (author)

  4. Speech of people with autism: Echolalia and echolalic speech

    OpenAIRE

    Błeszyński, Jacek Jarosław

    2013-01-01

    Speech of people with autism is recognised as one of the basic diagnostic, therapeutic and theoretical problems. One of the most common symptoms of autism in children is echolalia, described here as being of different types and severity. This paper presents the results of studies into different levels of echolalia, both in normally developing children and in children diagnosed with autism, discusses the differences between simple echolalia and echolalic speech - which can be considered to b...

  5. Advocate: A Distributed Architecture for Speech-to-Speech Translation

    Science.gov (United States)

    2009-01-01

    tecture, are either wrapped natural-language processing ( NLP ) components or objects developed from scratch using the architecture’s API. GATE is...framework, we put together a demonstration Arabic -to- English speech translation system using both internally developed ( Arabic speech recognition and MT...conditions of our Arabic S2S demonstration system described earlier. Once again, the data size was varied and eighty identical requests were

  6. A new time-adaptive discrete bionic wavelet transform for enhancing speech from adverse noise environment

    Science.gov (United States)

    Palaniswamy, Sumithra; Duraisamy, Prakash; Alam, Mohammad Showkat; Yuan, Xiaohui

    2012-04-01

    Automatic speech processing systems are widely used in everyday life such as mobile communication, speech and speaker recognition, and for assisting the hearing impaired. In speech communication systems, the quality and intelligibility of speech is of utmost importance for ease and accuracy of information exchange. To obtain an intelligible speech signal and one that is more pleasant to listen, noise reduction is essential. In this paper a new Time Adaptive Discrete Bionic Wavelet Thresholding (TADBWT) scheme is proposed. The proposed technique uses Daubechies mother wavelet to achieve better enhancement of speech from additive non- stationary noises which occur in real life such as street noise and factory noise. Due to the integration of human auditory system model into the wavelet transform, bionic wavelet transform (BWT) has great potential for speech enhancement which may lead to a new path in speech processing. In the proposed technique, at first, discrete BWT is applied to noisy speech to derive TADBWT coefficients. Then the adaptive nature of the BWT is captured by introducing a time varying linear factor which updates the coefficients at each scale over time. This approach has shown better performance than the existing algorithms at lower input SNR due to modified soft level dependent thresholding on time adaptive coefficients. The objective and subjective test results confirmed the competency of the TADBWT technique. The effectiveness of the proposed technique is also evaluated for speaker recognition task under noisy environment. The recognition results show that the TADWT technique yields better performance when compared to alternate methods specifically at lower input SNR.

  7. Comprehension of synthetic speech and digitized natural speech by adults with aphasia.

    Science.gov (United States)

    Hux, Karen; Knollman-Porter, Kelly; Brown, Jessica; Wallace, Sarah E

    2017-09-01

    Using text-to-speech technology to provide simultaneous written and auditory content presentation may help compensate for chronic reading challenges if people with aphasia can understand synthetic speech output; however, inherent auditory comprehension challenges experienced by people with aphasia may make understanding synthetic speech difficult. This study's purpose was to compare the preferences and auditory comprehension accuracy of people with aphasia when listening to sentences generated with digitized natural speech, Alex synthetic speech (i.e., Macintosh platform), or David synthetic speech (i.e., Windows platform). The methodology required each of 20 participants with aphasia to select one of four images corresponding in meaning to each of 60 sentences comprising three stimulus sets. Results revealed significantly better accuracy given digitized natural speech than either synthetic speech option; however, individual participant performance analyses revealed three patterns: (a) comparable accuracy regardless of speech condition for 30% of participants, (b) comparable accuracy between digitized natural speech and one, but not both, synthetic speech option for 45% of participants, and (c) greater accuracy with digitized natural speech than with either synthetic speech option for remaining participants. Ranking and Likert-scale rating data revealed a preference for digitized natural speech and David synthetic speech over Alex synthetic speech. Results suggest many individuals with aphasia can comprehend synthetic speech options available on popular operating systems. Further examination of synthetic speech use to support reading comprehension through text-to-speech technology is thus warranted. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Automatic coding method of the ACR Code

    International Nuclear Information System (INIS)

    Park, Kwi Ae; Ihm, Jong Sool; Ahn, Woo Hyun; Baik, Seung Kook; Choi, Han Yong; Kim, Bong Gi

    1993-01-01

    The authors developed a computer program for automatic coding of ACR(American College of Radiology) code. The automatic coding of the ACR code is essential for computerization of the data in the department of radiology. This program was written in foxbase language and has been used for automatic coding of diagnosis in the Department of Radiology, Wallace Memorial Baptist since May 1992. The ACR dictionary files consisted of 11 files, one for the organ code and the others for the pathology code. The organ code was obtained by typing organ name or code number itself among the upper and lower level codes of the selected one that were simultaneous displayed on the screen. According to the first number of the selected organ code, the corresponding pathology code file was chosen automatically. By the similar fashion of organ code selection, the proper pathologic dode was obtained. An example of obtained ACR code is '131.3661'. This procedure was reproducible regardless of the number of fields of data. Because this program was written in 'User's Defined Function' from, decoding of the stored ACR code was achieved by this same program and incorporation of this program into program in to another data processing was possible. This program had merits of simple operation, accurate and detail coding, and easy adjustment for another program. Therefore, this program can be used for automation of routine work in the department of radiology

  9. Optimal pattern synthesis for speech recognition based on principal component analysis

    Science.gov (United States)

    Korsun, O. N.; Poliyev, A. V.

    2018-02-01

    The algorithm for building an optimal pattern for the purpose of automatic speech recognition, which increases the probability of correct recognition, is developed and presented in this work. The optimal pattern forming is based on the decomposition of an initial pattern to principal components, which enables to reduce the dimension of multi-parameter optimization problem. At the next step the training samples are introduced and the optimal estimates for principal components decomposition coefficients are obtained by a numeric parameter optimization algorithm. Finally, we consider the experiment results that show the improvement in speech recognition introduced by the proposed optimization algorithm.

  10. Text-to-audiovisual speech synthesizer for children with learning disabilities.

    Science.gov (United States)

    Mendi, Engin; Bayrak, Coskun

    2013-01-01

    Learning disabilities affect the ability of children to learn, despite their having normal intelligence. Assistive tools can highly increase functional capabilities of children with learning disorders such as writing, reading, or listening. In this article, we describe a text-to-audiovisual synthesizer that can serve as an assistive tool for such children. The system automatically converts an input text to audiovisual speech, providing synchronization of the head, eye, and lip movements of the three-dimensional face model with appropriate facial expressions and word flow of the text. The proposed system can enhance speech perception and help children having learning deficits to improve their chances of success.

  11. Cliff : the automatized zipper

    NARCIS (Netherlands)

    Baharom, M.Z.; Toeters, M.J.; Delbressine, F.L.M.; Bangaru, C.; Feijs, L.M.G.

    2016-01-01

    It is our strong believe that fashion - more specifically apparel - can support us so much more in our daily life than it currently does. The Cliff project takes the opportunity to create a generic automatized zipper. It is a response to the struggle by elderly, people with physical disability, and

  12. Automatic Complexity Analysis

    DEFF Research Database (Denmark)

    Rosendahl, Mads

    1989-01-01

    One way to analyse programs is to to derive expressions for their computational behaviour. A time bound function (or worst-case complexity) gives an upper bound for the computation time as a function of the size of input. We describe a system to derive such time bounds automatically using abstract...

  13. Automatic Oscillating Turret.

    Science.gov (United States)

    1981-03-01

    Final Report: February 1978 ZAUTOMATIC OSCILLATING TURRET SYSTEM September 1980 * 6. PERFORMING 01G. REPORT NUMBER .J7. AUTHOR(S) S. CONTRACT OR GRANT...o....e.... *24 APPENDIX P-4 OSCILLATING BUMPER TURRET ...................... 25 A. DESCRIPTION 1. Turret Controls ...Other criteria requirements were: 1. Turret controls inside cab. 2. Automatic oscillation with fixed elevation to range from 20* below the horizontal to

  14. Reactor component automatic grapple

    International Nuclear Information System (INIS)

    Greenaway, P.R.

    1982-01-01

    A grapple for handling nuclear reactor components in a medium such as liquid sodium which, upon proper seating and alignment of the grapple with the component as sensed by a mechanical logic integral to the grapple, automatically seizes the component. The mechanical logic system also precludes seizure in the absence of proper seating and alignment. (author)

  15. Automatic sweep circuit

    International Nuclear Information System (INIS)

    Keefe, D.J.

    1980-01-01

    An automatically sweeping circuit for searching for an evoked response in an output signal in time with respect to a trigger input is described. Digital counters are used to activate a detector at precise intervals, and monitoring is repeated for statistical accuracy. If the response is not found then a different time window is examined until the signal is found

  16. Automatic sweep circuit

    Science.gov (United States)

    Keefe, Donald J.

    1980-01-01

    An automatically sweeping circuit for searching for an evoked response in an output signal in time with respect to a trigger input. Digital counters are used to activate a detector at precise intervals, and monitoring is repeated for statistical accuracy. If the response is not found then a different time window is examined until the signal is found.

  17. Recursive automatic classification algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Bauman, E V; Dorofeyuk, A A

    1982-03-01

    A variational statement of the automatic classification problem is given. The dependence of the form of the optimal partition surface on the form of the classification objective functional is investigated. A recursive algorithm is proposed for maximising a functional of reasonably general form. The convergence problem is analysed in connection with the proposed algorithm. 8 references.

  18. Automatic Commercial Permit Sets

    Energy Technology Data Exchange (ETDEWEB)

    Grana, Paul [Folsom Labs, Inc., San Francisco, CA (United States)

    2017-12-21

    Final report for Folsom Labs’ Solar Permit Generator project, which has successfully completed, resulting in the development and commercialization of a software toolkit within the cloud-based HelioScope software environment that enables solar engineers to automatically generate and manage draft documents for permit submission.

  19. Accurate Evaluation of Quantum Integrals

    Science.gov (United States)

    Galant, D. C.; Goorvitch, D.; Witteborn, Fred C. (Technical Monitor)

    1995-01-01

    Combining an appropriate finite difference method with Richardson's extrapolation results in a simple, highly accurate numerical method for solving a Schrodinger's equation. Important results are that error estimates are provided, and that one can extrapolate expectation values rather than the wavefunctions to obtain highly accurate expectation values. We discuss the eigenvalues, the error growth in repeated Richardson's extrapolation, and show that the expectation values calculated on a crude mesh can be extrapolated to obtain expectation values of high accuracy.

  20. Speech Mannerisms: Games Clients Play

    Science.gov (United States)

    Morgan, Lewis B.

    1978-01-01

    This article focuses on speech mannerisms often employed by clients in a helping relationship. Eight mannerisms are presented and discussed, as well as possible interpretations. Suggestions are given to help counselors respond to them. (Author)

  1. Speech recognition from spectral dynamics

    Indian Academy of Sciences (India)

    Carrier nature of speech; modulation spectrum; spectral dynamics ... the relationships between phonetic values of sounds and their short-term spectral envelopes .... the number of free parameters that need to be estimated from training data.

  2. Evaluation of speech and language assessment approaches with bilingual children.

    Science.gov (United States)

    De Lamo White, Caroline; Jin, Lixian

    2011-01-01

    British society is multicultural and multilingual, thus for many children English is not their main or only language. Speech and language therapists are required to assess accurately the speech and language skills of bilingual children if they are suspected of having a disorder. Cultural and linguistic diversity means that a more complex assessment procedure is needed and research suggests that bilingual children are at risk of misdiagnosis. Clinicians have identified a lack of suitable assessment instruments for use with this client group. This paper highlights the challenges of assessing bilingual children and reviews available speech and language assessment procedures and approaches for use with this client group. It evaluates different approaches for assessing bilingual children to identify approaches that may be more appropriate for carrying out assessments effectively. This review discusses and evaluates the efficacy of norm-referenced standardized measures, criterion-referenced measures, language-processing measures, dynamic assessment and a sociocultural approach. When all named procedures and approaches are compared, the sociocultural approach appears to hold the most promise for accurate assessment of bilingual children. Research suggests that language-processing measures are not effective indicators for identifying speech and language disorders in bilingual children, but further research is warranted. The sociocultural approach encompasses some of the other approaches discussed, including norm-referenced measures, criterion-referenced measures and dynamic assessment. The sociocultural approach enables the clinician to interpret results in the light of the child's linguistic and cultural background. In addition, combining approaches mitigates the weaknesses inherent in each approach. © 2011 Royal College of Speech and Language Therapists.

  3. National features of speech etiquette

    OpenAIRE

    Nacafova S.

    2017-01-01

    The article shows the differences between the speech etiquette of different peoples. The most important thing is to find a common language with this or that interlocutor. Knowledge of national etiquette, national character helps to learn the principles of speech of another nation. The article indicates in which cases certain forms of etiquette considered acceptable. At the same time, the rules of etiquette emphasized in the conduct of a dialogue in official meetings and for example, in the ex...

  4. Censored: Whistleblowers and impossible speech

    OpenAIRE

    Kenny, Kate

    2017-01-01

    What happens to a person who speaks out about corruption in their organization, and finds themselves excluded from their profession? In this article, I argue that whistleblowers experience exclusions because they have engaged in ‘impossible speech’, that is, a speech act considered to be unacceptable or illegitimate. Drawing on Butler’s theories of recognition and censorship, I show how norms of acceptable speech working through recruitment practices, alongside the actions of colleagues, can ...

  5. Speech Function and Speech Role in Carl Fredricksen's Dialogue on Up Movie

    OpenAIRE

    Rehana, Ridha; Silitonga, Sortha

    2013-01-01

    One aim of this article is to show through a concrete example how speech function and speech role used in movie. The illustrative example is taken from the dialogue of Up movie. Central to the analysis proper form of dialogue on Up movie that contain of speech function and speech role; i.e. statement, offer, question, command, giving, and demanding. 269 dialogue were interpreted by actor, and it was found that the use of speech function and speech role.

  6. Exploring Australian speech-language pathologists' use and perceptions ofnon-speech oral motor exercises.

    Science.gov (United States)

    Rumbach, Anna F; Rose, Tanya A; Cheah, Mynn

    2018-01-29

    To explore Australian speech-language pathologists' use of non-speech oral motor exercises, and rationales for using/not using non-speech oral motor exercises in clinical practice. A total of 124 speech-language pathologists practising in Australia, working with paediatric and/or adult clients with speech sound difficulties, completed an online survey. The majority of speech-language pathologists reported that they did not use non-speech oral motor exercises when working with paediatric or adult clients with speech sound difficulties. However, more than half of the speech-language pathologists working with adult clients who have dysarthria reported using non-speech oral motor exercises with this population. The most frequently reported rationale for using non-speech oral motor exercises in speech sound difficulty management was to improve awareness/placement of articulators. The majority of speech-language pathologists agreed there is no clear clinical or research evidence base to support non-speech oral motor exercise use with clients who have speech sound difficulties. This study provides an overview of Australian speech-language pathologists' reported use and perceptions of non-speech oral motor exercises' applicability and efficacy in treating paediatric and adult clients who have speech sound difficulties. The research findings provide speech-language pathologists with insight into how and why non-speech oral motor exercises are currently used, and adds to the knowledge base regarding Australian speech-language pathology practice of non-speech oral motor exercises in the treatment of speech sound difficulties. Implications for Rehabilitation Non-speech oral motor exercises refer to oral motor activities which do not involve speech, but involve the manipulation or stimulation of oral structures including the lips, tongue, jaw, and soft palate. Non-speech oral motor exercises are intended to improve the function (e.g., movement, strength) of oral structures. The

  7. An Approximate Approach to Automatic Kernel Selection.

    Science.gov (United States)

    Ding, Lizhong; Liao, Shizhong

    2016-02-02

    Kernel selection is a fundamental problem of kernel-based learning algorithms. In this paper, we propose an approximate approach to automatic kernel selection for regression from the perspective of kernel matrix approximation. We first introduce multilevel circulant matrices into automatic kernel selection, and develop two approximate kernel selection algorithms by exploiting the computational virtues of multilevel circulant matrices. The complexity of the proposed algorithms is quasi-linear in the number of data points. Then, we prove an approximation error bound to measure the effect of the approximation in kernel matrices by multilevel circulant matrices on the hypothesis and further show that the approximate hypothesis produced with multilevel circulant matrices converges to the accurate hypothesis produced with kernel matrices. Experimental evaluations on benchmark datasets demonstrate the effectiveness of approximate kernel selection.

  8. Automatic modulation recognition of communication signals

    CERN Document Server

    Azzouz, Elsayed Elsayed

    1996-01-01

    Automatic modulation recognition is a rapidly evolving area of signal analysis. In recent years, interest from the academic and military research institutes has focused around the research and development of modulation recognition algorithms. Any communication intelligence (COMINT) system comprises three main blocks: receiver front-end, modulation recogniser and output stage. Considerable work has been done in the area of receiver front-ends. The work at the output stage is concerned with information extraction, recording and exploitation and begins with signal demodulation, that requires accurate knowledge about the signal modulation type. There are, however, two main reasons for knowing the current modulation type of a signal; to preserve the signal information content and to decide upon the suitable counter action, such as jamming. Automatic Modulation Recognition of Communications Signals describes in depth this modulation recognition process. Drawing on several years of research, the authors provide a cr...

  9. Restoring the missing features of the corrupted speech using linear interpolation methods

    Science.gov (United States)

    Rassem, Taha H.; Makbol, Nasrin M.; Hasan, Ali Muttaleb; Zaki, Siti Syazni Mohd; Girija, P. N.

    2017-10-01

    One of the main challenges in the Automatic Speech Recognition (ASR) is the noise. The performance of the ASR system reduces significantly if the speech is corrupted by noise. In spectrogram representation of a speech signal, after deleting low Signal to Noise Ratio (SNR) elements, the incomplete spectrogram is obtained. In this case, the speech recognizer should make modifications to the spectrogram in order to restore the missing elements, which is one direction. In another direction, speech recognizer should be able to restore the missing elements due to deleting low SNR elements before performing the recognition. This is can be done using different spectrogram reconstruction methods. In this paper, the geometrical spectrogram reconstruction methods suggested by some researchers are implemented as a toolbox. In these geometrical reconstruction methods, the linear interpolation along time or frequency methods are used to predict the missing elements between adjacent observed elements in the spectrogram. Moreover, a new linear interpolation method using time and frequency together is presented. The CMU Sphinx III software is used in the experiments to test the performance of the linear interpolation reconstruction method. The experiments are done under different conditions such as different lengths of the window and different lengths of utterances. Speech corpus consists of 20 males and 20 females; each one has two different utterances are used in the experiments. As a result, 80% recognition accuracy is achieved with 25% SNR ratio.

  10. Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity

    Science.gov (United States)

    Moses, David A.; Mesgarani, Nima; Leonard, Matthew K.; Chang, Edward F.

    2016-10-01

    Objective. The superior temporal gyrus (STG) and neighboring brain regions play a key role in human language processing. Previous studies have attempted to reconstruct speech information from brain activity in the STG, but few of them incorporate the probabilistic framework and engineering methodology used in modern speech recognition systems. In this work, we describe the initial efforts toward the design of a neural speech recognition (NSR) system that performs continuous phoneme recognition on English stimuli with arbitrary vocabulary sizes using the high gamma band power of local field potentials in the STG and neighboring cortical areas obtained via electrocorticography. Approach. The system implements a Viterbi decoder that incorporates phoneme likelihood estimates from a linear discriminant analysis model and transition probabilities from an n-gram phonemic language model. Grid searches were used in an attempt to determine optimal parameterizations of the feature vectors and Viterbi decoder. Main results. The performance of the system was significantly improved by using spatiotemporal representations of the neural activity (as opposed to purely spatial representations) and by including language modeling and Viterbi decoding in the NSR system. Significance. These results emphasize the importance of modeling the temporal dynamics of neural responses when analyzing their variations with respect to varying stimuli and demonstrate that speech recognition techniques can be successfully leveraged when decoding speech from neural signals. Guided by the results detailed in this work, further development of the NSR system could have applications in the fields of automatic speech recognition and neural prosthetics.

  11. Human and automatic speaker recognition over telecommunication channels

    CERN Document Server

    Fernández Gallardo, Laura

    2016-01-01

    This work addresses the evaluation of the human and the automatic speaker recognition performances under different channel distortions caused by bandwidth limitation, codecs, and electro-acoustic user interfaces, among other impairments. Its main contribution is the demonstration of the benefits of communication channels of extended bandwidth, together with an insight into how speaker-specific characteristics of speech are preserved through different transmissions. It provides sufficient motivation for considering speaker recognition as a criterion for the migration from narrowband to enhanced bandwidths, such as wideband and super-wideband.

  12. Swahili speech development: preliminary normative data from typically developing pre-school children in Tanzania.

    Science.gov (United States)

    Gangji, Nazneen; Pascoe, Michelle; Smouse, Mantoa

    2015-01-01

    Swahili is widely spoken in East Africa, but to date there are no culturally and linguistically appropriate materials available for speech-language therapists working in the region. The challenges are further exacerbated by the limited research available on the typical acquisition of Swahili phonology. To describe the speech development of 24 typically developing first language Swahili-speaking children between the ages of 3;0 and 5;11 years in Dar es Salaam, Tanzania. A cross-sectional design was used with six groups of four children in 6-month age bands. Single-word speech samples were obtained from each child using a set of culturally appropriate pictures designed to elicit all consonants and vowels of Swahili. Each child's speech was audio-recorded and phonetically transcribed using International Phonetic Alphabet (IPA) conventions. Children's speech development is described in terms of (1) phonetic inventory, (2) syllable structure inventory, (3) phonological processes and (4) percentage consonants correct (PCC) and percentage vowels correct (PVC). Results suggest a gradual progression in the acquisition of speech sounds and syllables between the ages of 3;0 and 5;11 years. Vowel acquisition was completed and most of the consonants acquired by age 3;0. Fricatives/z, s, h/ were later acquired at 4 years and /θ/and /r/ were the last acquired consonants at age 5;11. Older children were able to produce speech sounds more accurately and had fewer phonological processes in their speech than younger children. Common phonological processes included lateralization and sound preference substitutions. The study contributes a preliminary set of normative data on speech development of Swahili-speaking children. Findings are discussed in relation to theories of phonological development, and may be used as a basis for further normative studies with larger numbers of children and ultimately the development of a contextually relevant assessment of the phonology of Swahili

  13. Neural pathways for visual speech perception

    Directory of Open Access Journals (Sweden)

    Lynne E Bernstein

    2014-12-01

    Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.

  14. Automatic indexing, compiling and classification

    International Nuclear Information System (INIS)

    Andreewsky, Alexandre; Fluhr, Christian.

    1975-06-01

    A review of the principles of automatic indexing, is followed by a comparison and summing-up of work by the authors and by a Soviet staff from the Moscou INFORM-ELECTRO Institute. The mathematical and linguistic problems of the automatic building of thesaurus and automatic classification are examined [fr

  15. Towards accurate emergency response behavior

    International Nuclear Information System (INIS)

    Sargent, T.O.

    1981-01-01

    Nuclear reactor operator emergency response behavior has persisted as a training problem through lack of information. The industry needs an accurate definition of operator behavior in adverse stress conditions, and training methods which will produce the desired behavior. Newly assembled information from fifty years of research into human behavior in both high and low stress provides a more accurate definition of appropriate operator response, and supports training methods which will produce the needed control room behavior. The research indicates that operator response in emergencies is divided into two modes, conditioned behavior and knowledge based behavior. Methods which assure accurate conditioned behavior, and provide for the recovery of knowledge based behavior, are described in detail

  16. Automatization of welding

    International Nuclear Information System (INIS)

    Iwabuchi, Masashi; Tomita, Jinji; Nishihara, Katsunori.

    1978-01-01

    Automatization of welding is one of the effective measures for securing high degree of quality of nuclear power equipment, as well as for correspondence to the environment at the site of plant. As the latest ones of the automatic welders practically used for welding of nuclear power apparatuses in factories of Toshiba and IHI, those for pipes and lining tanks are described here. The pipe welder performs the battering welding on the inside of pipe end as the so-called IGSCC countermeasure and the succeeding butt welding through the same controller. The lining tank welder is able to perform simultaneous welding of two parallel weld lines on a large thin plate lining tank. Both types of the welders are demonstrating excellent performance at the shops as well as at the plant site. (author)

  17. Automatic structural scene digitalization.

    Science.gov (United States)

    Tang, Rui; Wang, Yuhan; Cosker, Darren; Li, Wenbin

    2017-01-01

    In this paper, we present an automatic system for the analysis and labeling of structural scenes, floor plan drawings in Computer-aided Design (CAD) format. The proposed system applies a fusion strategy to detect and recognize various components of CAD floor plans, such as walls, doors, windows and other ambiguous assets. Technically, a general rule-based filter parsing method is fist adopted to extract effective information from the original floor plan. Then, an image-processing based recovery method is employed to correct information extracted in the first step. Our proposed method is fully automatic and real-time. Such analysis system provides high accuracy and is also evaluated on a public website that, on average, archives more than ten thousands effective uses per day and reaches a relatively high satisfaction rate.

  18. Automatic trend estimation

    CERN Document Server

    Vamos¸, C˘alin

    2013-01-01

    Our book introduces a method to evaluate the accuracy of trend estimation algorithms under conditions similar to those encountered in real time series processing. This method is based on Monte Carlo experiments with artificial time series numerically generated by an original algorithm. The second part of the book contains several automatic algorithms for trend estimation and time series partitioning. The source codes of the computer programs implementing these original automatic algorithms are given in the appendix and will be freely available on the web. The book contains clear statement of the conditions and the approximations under which the algorithms work, as well as the proper interpretation of their results. We illustrate the functioning of the analyzed algorithms by processing time series from astrophysics, finance, biophysics, and paleoclimatology. The numerical experiment method extensively used in our book is already in common use in computational and statistical physics.

  19. Current trends in small vocabulary speech recognition for equipment control

    Science.gov (United States)

    Doukas, Nikolaos; Bardis, Nikolaos G.

    2017-09-01

    Speech recognition systems allow human - machine communication to acquire an intuitive nature that approaches the simplicity of inter - human communication. Small vocabulary speech recognition is a subset of the overall speech recognition problem, where only a small number of words need to be recognized. Speaker independent small vocabulary recognition can find significant applications in field equipment used by military personnel. Such equipment may typically be controlled by a small number of commands that need to be given quickly and accurately, under conditions where delicate manual operations are difficult to achieve. This type of application could hence significantly benefit by the use of robust voice operated control components, as they would facilitate the interaction with their users and render it much more reliable in times of crisis. This paper presents current challenges involved in attaining efficient and robust small vocabulary speech recognition. These challenges concern feature selection, classification techniques, speaker diversity and noise effects. A state machine approach is presented that facilitates the voice guidance of different equipment in a variety of situations.

  20. On the Evaluation of the Conversational Speech Quality in Telecommunications

    Directory of Open Access Journals (Sweden)

    Vincent Barriac

    2008-04-01

    Full Text Available We propose an objective method to assess speech quality in the conversational context by taking into account the talking and listening speech qualities and the impact of delay. This approach is applied to the results of four subjective tests on the effects of echo, delay, packet loss, and noise. The dataset is divided into training and validation sets. For the training set, a multiple linear regression is applied to determine a relationship between conversational, talking, and listening speech qualities and the delay value. The multiple linear regression leads to an accurate estimation of the conversational scores with high correlation and low error between subjective and estimated scores, both on the training and validation sets. In addition, a validation is performed on the data of a subjective test found in the literature which confirms the reliability of the regression. The relationship is then applied to an objective level by replacing talking and listening subjective scores with talking and listening objective scores provided by existing objective models, fed by speech signals recorded during the subjective tests. The conversational model achieves high performance as revealed by comparison with the test results and with the existing standard methodology “E-model,” presented in the ITU-T (International Telecommunication Union Recommendation G.107.

  1. Vocabulary Facilitates Speech Perception in Children With Hearing Aids.

    Science.gov (United States)

    Klein, Kelsey E; Walker, Elizabeth A; Kirby, Benjamin; McCreery, Ryan W

    2017-08-16

    We examined the effects of vocabulary, lexical characteristics (age of acquisition and phonotactic probability), and auditory access (aided audibility and daily hearing aid [HA] use) on speech perception skills in children with HAs. Participants included 24 children with HAs and 25 children with normal hearing (NH), ages 5-12 years. Groups were matched on age, expressive and receptive vocabulary, articulation, and nonverbal working memory. Participants repeated monosyllabic words and nonwords in noise. Stimuli varied on age of acquisition, lexical frequency, and phonotactic probability. Performance in each condition was measured by the signal-to-noise ratio at which the child could accurately repeat 50% of the stimuli. Children from both groups with larger vocabularies showed better performance than children with smaller vocabularies on nonwords and late-acquired words but not early-acquired words. Overall, children with HAs showed poorer performance than children with NH. Auditory access was not associated with speech perception for the children with HAs. Children with HAs show deficits in sensitivity to phonological structure but appear to take advantage of vocabulary skills to support speech perception in the same way as children with NH. Further investigation is needed to understand the causes of the gap that exists between the overall speech perception abilities of children with HAs and children with NH.

  2. Head movements encode emotions during speech and song.

    Science.gov (United States)

    Livingstone, Steven R; Palmer, Caroline

    2016-04-01

    When speaking or singing, vocalists often move their heads in an expressive fashion, yet the influence of emotion on vocalists' head motion is unknown. Using a comparative speech/song task, we examined whether vocalists' intended emotions influence head movements and whether those movements influence the perceived emotion. In Experiment 1, vocalists were recorded with motion capture while speaking and singing each statement with different emotional intentions (very happy, happy, neutral, sad, very sad). Functional data analyses showed that head movements differed in translational and rotational displacement across emotional intentions, yet were similar across speech and song, transcending differences in F0 (varied freely in speech, fixed in song) and lexical variability. Head motion specific to emotional state occurred before and after vocalizations, as well as during sound production, confirming that some aspects of movement were not simply a by-product of sound production. In Experiment 2, observers accurately identified vocalists' intended emotion on the basis of silent, face-occluded videos of head movements during speech and song. These results provide the first evidence that head movements encode a vocalist's emotional intent and that observers decode emotional information from these movements. We discuss implications for models of head motion during vocalizations and applied outcomes in social robotics and automated emotion recognition. (c) 2016 APA, all rights reserved).

  3. High-performance speech recognition using consistency modeling

    Science.gov (United States)

    Digalakis, Vassilios; Murveit, Hy; Monaco, Peter; Neumeyer, Leo; Sankar, Ananth

    1994-12-01

    The goal of SRI's consistency modeling project is to improve the raw acoustic modeling component of SRI's DECIPHER speech recognition system and develop consistency modeling technology. Consistency modeling aims to reduce the number of improper independence assumptions used in traditional speech recognition algorithms so that the resulting speech recognition hypotheses are more self-consistent and, therefore, more accurate. At the initial stages of this effort, SRI focused on developing the appropriate base technologies for consistency modeling. We first developed the Progressive Search technology that allowed us to perform large-vocabulary continuous speech recognition (LVCSR) experiments. Since its conception and development at SRI, this technique has been adopted by most laboratories, including other ARPA contracting sites, doing research on LVSR. Another goal of the consistency modeling project is to attack difficult modeling problems, when there is a mismatch between the training and testing phases. Such mismatches may include outlier speakers, different microphones and additive noise. We were able to either develop new, or transfer and evaluate existing, technologies that adapted our baseline genonic HMM recognizer to such difficult conditions.

  4. Audiovisual integration of speech falters under high attention demands.

    Science.gov (United States)

    Alsius, Agnès; Navarra, Jordi; Campbell, Ruth; Soto-Faraco, Salvador

    2005-05-10

    One of the most commonly cited examples of human multisensory integration occurs during exposure to natural speech, when the vocal and the visual aspects of the signal are integrated in a unitary percept. Audiovisual association of facial gestures and vocal sounds has been demonstrated in nonhuman primates and in prelinguistic children, arguing for a general basis for this capacity. One critical question, however, concerns the role of attention in such multisensory integration. Although both behavioral and neurophysiological studies have converged on a preattentive conceptualization of audiovisual speech integration, this mechanism has rarely been measured under conditions of high attentional load, when the observers' attention resources are depleted. We tested the extent to which audiovisual integration was modulated by the amount of available attentional resources by measuring the observers' susceptibility to the classic McGurk illusion in a dual-task paradigm. The proportion of visually influenced responses was severely, and selectively, reduced if participants were concurrently performing an unrelated visual or auditory task. In contrast with the assumption that crossmodal speech integration is automatic, our results suggest that these multisensory binding processes are subject to attentional demands.

  5. Automatic food decisions

    DEFF Research Database (Denmark)

    Mueller Loose, Simone

    Consumers' food decisions are to a large extent shaped by automatic processes, which are either internally directed through learned habits and routines or externally influenced by context factors and visual information triggers. Innovative research methods such as eye tracking, choice experiments...... and food diaries allow us to better understand the impact of unconscious processes on consumers' food choices. Simone Mueller Loose will provide an overview of recent research insights into the effects of habit and context on consumers' food choices....

  6. Automatic LOD selection

    OpenAIRE

    Forsman, Isabelle

    2017-01-01

    In this paper a method to automatically generate transition distances for LOD, improving image stability and performance is presented. Three different methods were tested all measuring the change between two level of details using the spatial frequency. The methods were implemented as an optional pre-processing step in order to determine the transition distances from multiple view directions. During run-time both view direction based selection and the furthest distance for each direction was ...

  7. Investigation of in-vehicle speech intelligibility metrics for normal hearing and hearing impaired listeners

    Science.gov (United States)

    Samardzic, Nikolina

    The effectiveness of in-vehicle speech communication can be a good indicator of the perception of the overall vehicle quality and customer satisfaction. Currently available speech intelligibility metrics do not account in their procedures for essential parameters needed for a complete and accurate evaluation of in-vehicle speech intelligibility. These include the directivity and the distance of the talker with respect to the listener, binaural listening, hearing profile of the listener, vocal effort, and multisensory hearing. In the first part of this research the effectiveness of in-vehicle application of these metrics is investigated in a series of studies to reveal their shortcomings, including a wide range of scores resulting from each of the metrics for a given measurement configuration and vehicle operating condition. In addition, the nature of a possible correlation between the scores obtained from each metric is unknown. The metrics and the subjective perception of speech intelligibility using, for example, the same speech material have not been compared in literature. As a result, in the second part of this research, an alternative method for speech intelligibility evaluation is proposed for use in the automotive industry by utilizing a virtual reality driving environment for ultimately setting targets, including the associated statistical variability, for future in-vehicle speech intelligibility evaluation. The Speech Intelligibility Index (SII) was evaluated at the sentence Speech Receptions Threshold (sSRT) for various listening situations and hearing profiles using acoustic perception jury testing and a variety of talker and listener configurations and background noise. In addition, the effect of individual sources and transfer paths of sound in an operating vehicle to the vehicle interior sound, specifically their effect on speech intelligibility was quantified, in the framework of the newly developed speech intelligibility evaluation method. Lastly

  8. Automatic cough episode detection using a vibroacoustic sensor.

    Science.gov (United States)

    Mlynczak, Marcel; Pariaszewska, Katarzyna; Cybulski, Gerard

    2015-08-01

    Cough monitoring is an important element of the diagnostics of respiratory diseases. The European Respiratory Society recommends objective assessment of cough episodes and the search for methods of automatic analysis to make obtaining the quantitative parameters possible. The cough "events" could be classified by a microphone and a sensor that measures the vibrations of the chest. Analysis of the recorded signals consists of calculating the features vectors for selected episodes and of performing automatic classification using them. The aim of the study was to assess the accuracy of classification based on an artificial neural networks using vibroacoustic signals collected from chest. Six healthy, young men and eight healthy, young women carried out an imitated cough, hand clapping, speech and shouting. Three methods of parametrization were used to prepare the vectors of episode features - time domain, time-frequency domain and spectral modeling. We obtained the accuracy of 95% using artificial neural networks.

  9. Do We Perceive Others Better than Ourselves? A Perceptual Benefit for Noise-Vocoded Speech Produced by an Average Speaker.

    Directory of Open Access Journals (Sweden)

    William L Schuerman

    Full Text Available In different tasks involving action perception, performance has been found to be facilitated when the presented stimuli were produced by the participants themselves rather than by another participant. These results suggest that the same mental representations are accessed during both production and perception. However, with regard to spoken word perception, evidence also suggests that listeners' representations for speech reflect the input from their surrounding linguistic community rather than their own idiosyncratic productions. Furthermore, speech perception is heavily influenced by indexical cues that may lead listeners to frame their interpretations of incoming speech signals with regard to speaker identity. In order to determine whether word recognition evinces similar self-advantages as found in action perception, it was necessary to eliminate indexical cues from the speech signal. We therefore asked participants to identify noise-vocoded versions of Dutch words that were based on either their own recordings or those of a statistically average speaker. The majority of participants were more accurate for the average speaker than for themselves, even after taking into account differences in intelligibility. These results suggest that the speech representations accessed during perception of noise-vocoded speech are more reflective of the input of the speech community, and hence that speech perception is not necessarily based on representations of one's own speech.

  10. Detection of target phonemes in spontaneous and read speech

    NARCIS (Netherlands)

    Mehta, G.; Cutler, A.

    1988-01-01

    Although spontaneous speech occurs more frequently in most listeners' experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ

  11. Automatic continuous dew point measurement in combustion gases

    Energy Technology Data Exchange (ETDEWEB)

    Fehler, D.

    1986-08-01

    Low exhaust temperatures serve to minimize energy consumption in combustion systems. This requires accurate, continuous measurement of exhaust condensation. An automatic dew point meter for continuous operation is described. The principle of measurement, the design of the measuring system, and practical aspects of operation are discussed.

  12. Automatic 3D modeling of the urban landscape

    NARCIS (Netherlands)

    Esteban, I.; Dijk, J.; Groen, F.

    2010-01-01

    In this paper we present a fully automatic system for building 3D models of urban areas at the street level. We propose a novel approach for the accurate estimation of the scale consistent camera pose given two previous images. We employ a new method for global optimization and use a novel sampling

  13. [Development of automatic urine monitoring system].

    Science.gov (United States)

    Wei, Liang; Li, Yongqin; Chen, Bihua

    2014-03-01

    An automatic urine monitoring system is presented to replace manual operation. The system is composed of the flow sensor, MSP430f149 single chip microcomputer, human-computer interaction module, LCD module, clock module and memory module. The signal of urine volume is captured when the urine flows through the flow sensor and then displayed on the LCD after data processing. The experiment results suggest that the design of the monitor provides a high stability, accurate measurement and good real-time, and meets the demand of the clinical application.

  14. Steam jet ejectors are examined automatically

    International Nuclear Information System (INIS)

    Lardiere, C.

    2013-01-01

    Steam jet ejectors are used in the nuclear industry particularly for the transfer of radioactive fluids. Their working is based on the Venturi effect and the conservation of energy. A steam ejector can be considered as a thermodynamical pump without mobile parts. The Descote enterprise manufactures a broad range of steam jet ejectors and the characterization and testing of the steam ejectors was made manually and empirically so far. A new test bench has been designed, the tests are led automatically and allow a more accurate characterization and optimization of the steam jet ejectors. (A.C.)

  15. Automatic segmentation of psoriasis lesions

    Science.gov (United States)

    Ning, Yang; Shi, Chenbo; Wang, Li; Shu, Chang

    2014-10-01

    The automatic segmentation of psoriatic lesions is widely researched these years. It is an important step in Computer-aid methods of calculating PASI for estimation of lesions. Currently those algorithms can only handle single erythema or only deal with scaling segmentation. In practice, scaling and erythema are often mixed together. In order to get the segmentation of lesions area - this paper proposes an algorithm based on Random forests with color and texture features. The algorithm has three steps. The first step, the polarized light is applied based on the skin's Tyndall-effect in the imaging to eliminate the reflection and Lab color space are used for fitting the human perception. The second step, sliding window and its sub windows are used to get textural feature and color feature. In this step, a feature of image roughness has been defined, so that scaling can be easily separated from normal skin. In the end, Random forests will be used to ensure the generalization ability of the algorithm. This algorithm can give reliable segmentation results even the image has different lighting conditions, skin types. In the data set offered by Union Hospital, more than 90% images can be segmented accurately.

  16. Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN

    Science.gov (United States)

    Zhu, Lianzhang; Chen, Leiming; Zhao, Dehai

    2017-01-01

    Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed. PMID:28737705

  17. Development of a Low-Cost, Noninvasive, Portable Visual Speech Recognition Program.

    Science.gov (United States)

    Kohlberg, Gavriel D; Gal, Ya'akov Kobi; Lalwani, Anil K

    2016-09-01

    Loss of speech following tracheostomy and laryngectomy severely limits communication to simple gestures and facial expressions that are largely ineffective. To facilitate communication in these patients, we seek to develop a low-cost, noninvasive, portable, and simple visual speech recognition program (VSRP) to convert articulatory facial movements into speech. A Microsoft Kinect-based VSRP was developed to capture spatial coordinates of lip movements and translate them into speech. The articulatory speech movements associated with 12 sentences were used to train an artificial neural network classifier. The accuracy of the classifier was then evaluated on a separate, previously unseen set of articulatory speech movements. The VSRP was successfully implemented and tested in 5 subjects. It achieved an accuracy rate of 77.2% (65.0%-87.6% for the 5 speakers) on a 12-sentence data set. The mean time to classify an individual sentence was 2.03 milliseconds (1.91-2.16). We have demonstrated the feasibility of a low-cost, noninvasive, portable VSRP based on Kinect to accurately predict speech from articulation movements in clinically trivial time. This VSRP could be used as a novel communication device for aphonic patients. © The Author(s) 2016.

  18. Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN.

    Science.gov (United States)

    Zhu, Lianzhang; Chen, Leiming; Zhao, Dehai; Zhou, Jiehan; Zhang, Weishan

    2017-07-24

    Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed.

  19. Selection of individual features of a speech signal using genetic algorithms

    Directory of Open Access Journals (Sweden)

    Kamil Kamiński

    2016-03-01

    Full Text Available The paper presents an automatic speaker’s recognition system, implemented in the Matlab environment, and demonstrates how to achieve and optimize various elements of the system. The main emphasis was put on features selection of a speech signal using a genetic algorithm which takes into account synergy of features. The results of optimization of selected elements of a classifier have been also shown, including the number of Gaussian distributions used to model each of the voices. In addition, for creating voice models, a universal voice model has been used.[b]Keywords[/b]: biometrics, automatic speaker recognition, genetic algorithms, feature selection

  20. Parametric Representation of the Speaker's Lips for Multimodal Sign Language and Speech Recognition

    Science.gov (United States)

    Ryumin, D.; Karpov, A. A.

    2017-05-01

    In this article, we propose a new method for parametric representation of human's lips region. The functional diagram of the method is described and implementation details with the explanation of its key stages and features are given. The results of automatic detection of the regions of interest are illustrated. A speed of the method work using several computers with different performances is reported. This universal method allows applying parametrical representation of the speaker's lipsfor the tasks of biometrics, computer vision, machine learning, and automatic recognition of face, elements of sign languages, and audio-visual speech, including lip-reading.

  1. Prosody's Contribution to Fluency: An Examination of the Theory of Automatic Information Processing

    Science.gov (United States)

    Schrauben, Julie E.

    2010-01-01

    LaBerge and Samuels' (1974) theory of automatic information processing in reading offers a model that explains how and where the processing of information occurs and the degree to which processing of information occurs. These processes are dependent upon two criteria: accurate word decoding and automatic word recognition. However, LaBerge and…

  2. Noise-robust speech triage.

    Science.gov (United States)

    Bartos, Anthony L; Cipr, Tomas; Nelson, Douglas J; Schwarz, Petr; Banowetz, John; Jerabek, Ladislav

    2018-04-01

    A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (-10 dB ≤ SNR < +10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR ≥ +10 dB).

  3. Voice Activity Detection. Fundamentals and Speech Recognition System Robustness

    OpenAIRE

    Ramirez, J.; Gorriz, J. M.; Segura, J. C.

    2007-01-01

    This chapter has shown an overview of the main challenges in robust speech detection and a review of the state of the art and applications. VADs are frequently used in a number of applications including speech coding, speech enhancement and speech recognition. A precise VAD extracts a set of discriminative speech features from the noisy speech and formulates the decision in terms of well defined rule. The chapter has summarized three robust VAD methods that yield high speech/non-speech discri...

  4. Acoustic analysis assessment in speech pathology detection

    Directory of Open Access Journals (Sweden)

    Panek Daria

    2015-09-01

    Full Text Available Automatic detection of voice pathologies enables non-invasive, low cost and objective assessments of the presence of disorders, as well as accelerating and improving the process of diagnosis and clinical treatment given to patients. In this work, a vector made up of 28 acoustic parameters is evaluated using principal component analysis (PCA, kernel principal component analysis (kPCA and an auto-associative neural network (NLPCA in four kinds of pathology detection (hyperfunctional dysphonia, functional dysphonia, laryngitis, vocal cord paralysis using the a, i and u vowels, spoken at a high, low and normal pitch. The results indicate that the kPCA and NLPCA methods can be considered a step towards pathology detection of the vocal folds. The results show that such an approach provides acceptable results for this purpose, with the best efficiency levels of around 100%. The study brings the most commonly used approaches to speech signal processing together and leads to a comparison of the machine learning methods determining the health status of the patient

  5. Abnormal laughter-like vocalisations replacing speech in primary progressive aphasia

    Science.gov (United States)

    Rohrer, Jonathan D.; Warren, Jason D.; Rossor, Martin N.

    2009-01-01

    We describe ten patients with a clinical diagnosis of primary progressive aphasia (PPA) (pathologically confirmed in three cases) who developed abnormal laughter-like vocalisations in the context of progressive speech output impairment leading to mutism. Failure of speech output was accompanied by increasing frequency of the abnormal vocalisations until ultimately they constituted the patient's only extended utterance. The laughter-like vocalisations did not show contextual sensitivity but occurred as an automatic vocal output that replaced speech. Acoustic analysis of the vocalisations in two patients revealed abnormal motor features including variable note duration and inter-note interval, loss of temporal symmetry of laugh notes and loss of the normal decrescendo. Abnormal laughter-like vocalisations may be a hallmark of a subgroup in the PPA spectrum with impaired control and production of nonverbal vocal behaviour due to disruption of fronto-temporal networks mediating vocalisation. PMID:19435636

  6. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    Directory of Open Access Journals (Sweden)

    Santiago-Omar Caballero-Morales

    2013-01-01

    Full Text Available An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR system was built with Hidden Markov Models (HMMs, where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness. Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR’s output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech.

  7. Speech Inconsistency in Children with Childhood Apraxia of Speech, Language Impairment, and Speech Delay: Depends on the Stimuli

    Science.gov (United States)

    Iuzzini-Seigel, Jenya; Hogan, Tiffany P.; Green, Jordan R.

    2017-01-01

    Purpose: The current research sought to determine (a) if speech inconsistency is a core feature of childhood apraxia of speech (CAS) or if it is driven by comorbid language impairment that affects a large subset of children with CAS and (b) if speech inconsistency is a sensitive and specific diagnostic marker that can differentiate between CAS and…

  8. Variable Span Filters for Speech Enhancement

    DEFF Research Database (Denmark)

    Jensen, Jesper Rindom; Benesty, Jacob; Christensen, Mads Græsbøll

    2016-01-01

    In this work, we consider enhancement of multichannel speech recordings. Linear filtering and subspace approaches have been considered previously for solving the problem. The current linear filtering methods, although many variants exist, have limited control of noise reduction and speech...

  9. Represented Speech in Qualitative Health Research

    DEFF Research Database (Denmark)

    Musaeus, Peter

    2017-01-01

    Represented speech refers to speech where we reference somebody. Represented speech is an important phenomenon in everyday conversation, health care communication, and qualitative research. This case will draw first from a case study on physicians’ workplace learning and second from a case study...... on nurses’ apprenticeship learning. The aim of the case is to guide the qualitative researcher to use own and others’ voices in the interview and to be sensitive to represented speech in everyday conversation. Moreover, reported speech matters to health professionals who aim to represent the voice...... of their patients. Qualitative researchers and students might learn to encourage interviewees to elaborate different voices or perspectives. Qualitative researchers working with natural speech might pay attention to how people talk and use represented speech. Finally, represented speech might be relevant...

  10. Quick Statistics about Voice, Speech, and Language

    Science.gov (United States)

    ... here Home » Health Info » Statistics and Epidemiology Quick Statistics About Voice, Speech, Language Voice, Speech, Language, and ... no 205. Hyattsville, MD: National Center for Health Statistics. 2015. Hoffman HJ, Li C-M, Losonczy K, ...

  11. A NOVEL APPROACH TO STUTTERED SPEECH CORRECTION

    Directory of Open Access Journals (Sweden)

    Alim Sabur Ajibola

    2016-06-01

    Full Text Available Stuttered speech is a dysfluency rich speech, more prevalent in males than females. It has been associated with insufficient air pressure or poor articulation, even though the root causes are more complex. The primary features include prolonged speech and repetitive speech, while some of its secondary features include, anxiety, fear, and shame. This study used LPC analysis and synthesis algorithms to reconstruct the stuttered speech. The results were evaluated using cepstral distance, Itakura-Saito distance, mean square error, and likelihood ratio. These measures implied perfect speech reconstruction quality. ASR was used for further testing, and the results showed that all the reconstructed speech samples were perfectly recognized while only three samples of the original speech were perfectly recognized.

  12. Developmental language and speech disability.

    Science.gov (United States)

    Spiel, G; Brunner, E; Allmayer, B; Pletz, A

    2001-09-01

    Speech disabilities (articulation deficits) and language disorders--expressive (vocabulary) receptive (language comprehension) are not uncommon in children. An overview of these along with a global description of the impairment of communication as well as clinical characteristics of language developmental disorders are presented in this article. The diagnostic tables, which are applied in the European and Anglo-American speech areas, ICD-10 and DSM-IV, have been explained and compared. Because of their strengths and weaknesses an alternative classification of language and speech developmental disorders is proposed, which allows a differentiation between expressive and receptive language capabilities with regard to the semantic and the morphological/syntax domains. Prevalence and comorbidity rates, psychosocial influences, biological factors and the biological social interaction have been discussed. The necessity of the use of standardized examinations is emphasised. General logopaedic treatment paradigms, specific therapy concepts and an overview of prognosis have been described.

  13. Motor Speech Phenotypes of Frontotemporal Dementia, Primary Progressive Aphasia, and Progressive Apraxia of Speech

    Science.gov (United States)

    Poole, Matthew L.; Brodtmann, Amy; Darby, David; Vogel, Adam P.

    2017-01-01

    Purpose: Our purpose was to create a comprehensive review of speech impairment in frontotemporal dementia (FTD), primary progressive aphasia (PPA), and progressive apraxia of speech in order to identify the most effective measures for diagnosis and monitoring, and to elucidate associations between speech and neuroimaging. Method: Speech and…

  14. Visual context enhanced. The joint contribution of iconic gestures and visible speech to degraded speech comprehension.

    NARCIS (Netherlands)

    Drijvers, L.; Özyürek, A.

    2017-01-01

    Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech

  15. Listeners Experience Linguistic Masking Release in Noise-Vocoded Speech-in-Speech Recognition

    Science.gov (United States)

    Viswanathan, Navin; Kokkinakis, Kostas; Williams, Brittany T.

    2018-01-01

    Purpose: The purpose of this study was to evaluate whether listeners with normal hearing perceiving noise-vocoded speech-in-speech demonstrate better intelligibility of target speech when the background speech was mismatched in language (linguistic release from masking [LRM]) and/or location (spatial release from masking [SRM]) relative to the…

  16. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    Science.gov (United States)

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  17. Visual Context Enhanced: The Joint Contribution of Iconic Gestures and Visible Speech to Degraded Speech Comprehension

    Science.gov (United States)

    Drijvers, Linda; Ozyurek, Asli

    2017-01-01

    Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. Method:…

  18. An experimental Dutch keyboard-to-speech system for the speech impaired

    NARCIS (Netherlands)

    Deliege, R.J.H.

    1989-01-01

    An experimental Dutch keyboard-to-speech system has been developed to explor the possibilities and limitations of Dutch speech synthesis in a communication aid for the speech impaired. The system uses diphones and a formant synthesizer chip for speech synthesis. Input to the system is in

  19. Perceived Liveliness and Speech Comprehensibility in Aphasia: The Effects of Direct Speech in Auditory Narratives

    Science.gov (United States)

    Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike

    2014-01-01

    Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in "healthy" communication direct speech constructions contribute to the liveliness, and indirectly to the comprehensibility, of speech.…

  20. Poor Speech Perception Is Not a Core Deficit of Childhood Apraxia of Speech: Preliminary Findings

    Science.gov (United States)

    Zuk, Jennifer; Iuzzini-Seigel, Jenya; Cabbage, Kathryn; Green, Jordan R.; Hogan, Tiffany P.

    2018-01-01

    Purpose: Childhood apraxia of speech (CAS) is hypothesized to arise from deficits in speech motor planning and programming, but the influence of abnormal speech perception in CAS on these processes is debated. This study examined speech perception abilities among children with CAS with and without language impairment compared to those with…

  1. Common neural substrates support speech and non-speech vocal tract gestures.

    Science.gov (United States)

    Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M J; Poletto, Christopher J; Ludlow, Christy L

    2009-08-01

    The issue of whether speech is supported by the same neural substrates as non-speech vocal tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, was compared to the production of speech syllables without meaning. Brain activation related to overt production was captured with BOLD fMRI using a sparse sampling design for both conditions. Speech and non-speech were compared using voxel-wise whole brain analyses, and ROI analyses focused on frontal and temporoparietal structures previously reported to support speech production. Results showed substantial activation overlap between speech and non-speech function in regions. Although non-speech gesture production showed greater extent and amplitude of activation in the regions examined, both speech and non-speech showed comparable left laterality in activation for both target perception and production. These findings posit a more general role of the previously proposed "auditory dorsal stream" in the left hemisphere--to support the production of vocal tract gestures that are not limited to speech processing.

  2. The treatment of apraxia of speech : Speech and music therapy, an innovative joint effort

    NARCIS (Netherlands)

    Hurkmans, Josephus Johannes Stephanus

    2016-01-01

    Apraxia of Speech (AoS) is a neurogenic speech disorder. A wide variety of behavioural methods have been developed to treat AoS. Various therapy programmes use musical elements to improve speech production. A unique therapy programme combining elements of speech therapy and music therapy is called

  3. Speech Perception and Short-Term Memory Deficits in Persistent Developmental Speech Disorder

    Science.gov (United States)

    Kenney, Mary Kay; Barac-Cikoja, Dragana; Finnegan, Kimberly; Jeffries, Neal; Ludlow, Christy L.

    2006-01-01

    Children with developmental speech disorders may have additional deficits in speech perception and/or short-term memory. To determine whether these are only transient developmental delays that can accompany the disorder in childhood or persist as part of the speech disorder, adults with a persistent familial speech disorder were tested on speech…

  4. Robust Recognition of Loud and Lombard speech in the Fighter Cockpit Environment

    Science.gov (United States)

    1988-08-01

    the latter as inter-speaker variability. According to Zue [Z85j, inter-speaker variabilities can be attributed to sociolinguistic background, dialect...34 Journal of the Acoustical Society of America , Vol 50, 1971. [At74I B. S. Atal, "Linear prediction for speaker identification," Journal of the Acoustical...Society of America , Vol 55, 1974. [B771 B. Beek, E. P. Neuberg, and D. C. Hodge, "An Assessment of the Technology of Automatic Speech Recognition for

  5. Lwazi II Final Report: Increasing the impact of speech technologies in South Africa

    CSIR Research Space (South Africa)

    Calteaux, K

    2013-02-01

    Full Text Available  North-West University, Potchefstroom Campus  Department of Basic Education, National School Nutrition Unit  Thusong Service Centres (Bushbuckridge, Musina and Sterkspruit)  Senqu Municipality  Afrivet Training Services  Kokotla Junior Secondary... activities should also continue, in order to refine these technologies and improve their robustness and scalability. 5 | P a g e Acronyms API – Application programming interface ASR – Automatic speech recognition ATS – Afrivet Training Services CDW...

  6. THE ONTOGENESIS OF SPEECH DEVELOPMENT

    Directory of Open Access Journals (Sweden)

    T. E. Braudo

    2017-01-01

    Full Text Available The purpose of this article is to acquaint the specialists, working with children having developmental disorders, with age-related norms for speech development. Many well-known linguists and psychologists studied speech ontogenesis (logogenesis. Speech is a higher mental function, which integrates many functional systems. Speech development in infants during the first months after birth is ensured by the innate hearing and emerging ability to fix the gaze on the face of an adult. Innate emotional reactions are also being developed during this period, turning into nonverbal forms of communication. At about 6 months a baby starts to pronounce some syllables; at 7–9 months – repeats various sounds combinations, pronounced by adults. At 10–11 months a baby begins to react on the words, referred to him/her. The first words usually appear at an age of 1 year; this is the start of the stage of active speech development. At this time it is acceptable, if a child confuses or rearranges sounds, distorts or misses them. By the age of 1.5 years a child begins to understand abstract explanations of adults. Significant vocabulary enlargement occurs between 2 and 3 years; grammatical structures of the language are being formed during this period (a child starts to use phrases and sentences. Preschool age (3–7 y. o. is characterized by incorrect, but steadily improving pronunciation of sounds and phonemic perception. The vocabulary increases; abstract speech and retelling are being formed. Children over 7 y. o. continue to improve grammar, writing and reading skills. The described stages may not have strict age boundaries, as soon as they are dependent not only on environment, but also on the child’s mental constitution, heredity and character.

  7. Common neural substrates support speech and non-speech vocal tract gestures

    OpenAIRE

    Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M.J.; Poletto, Christopher J.; Ludlow, Christy L.

    2009-01-01

    The issue of whether speech is supported by the same neural substrates as non-speech vocal-tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, were compared to the production of speech sylla...

  8. When Is Network Lasso Accurate?

    Directory of Open Access Journals (Sweden)

    Alexander Jung

    2018-01-01

    Full Text Available The “least absolute shrinkage and selection operator” (Lasso method has been adapted recently for network-structured datasets. In particular, this network Lasso method allows to learn graph signals from a small number of noisy signal samples by using the total variation of a graph signal for regularization. While efficient and scalable implementations of the network Lasso are available, only little is known about the conditions on the underlying network structure which ensure network Lasso to be accurate. By leveraging concepts of compressed sensing, we address this gap and derive precise conditions on the underlying network topology and sampling set which guarantee the network Lasso for a particular loss function to deliver an accurate estimate of the entire underlying graph signal. We also quantify the error incurred by network Lasso in terms of two constants which reflect the connectivity of the sampled nodes.

  9. Multimicrophone Speech Dereverberation: Experimental Validation

    Directory of Open Access Journals (Sweden)

    Marc Moonen

    2007-05-01

    Full Text Available Dereverberation is required in various speech processing applications such as handsfree telephony and voice-controlled systems, especially when signals are applied that are recorded in a moderately or highly reverberant environment. In this paper, we compare a number of classical and more recently developed multimicrophone dereverberation algorithms, and validate the different algorithmic settings by means of two performance indices and a speech recognition system. It is found that some of the classical solutions obtain a moderate signal enhancement. More advanced subspace-based dereverberation techniques, on the other hand, fail to enhance the signals despite their high-computational load.

  10. Discriminative learning for speech recognition

    CERN Document Server

    He, Xiadong

    2008-01-01

    In this book, we introduce the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition. The specific models treated in depth include the widely used exponential-family distributions and the hidden Markov model. A detailed study is presented on unifying the common objective functions for discriminative learning in speech recognition, namely maximum mutual information (MMI), minimum classification error, and minimum phone/word error. The unification is presented, with rigorous mathematical analysis, in a common rational-functio

  11. Reviewing the connection between speech and obstructive sleep apnea.

    Science.gov (United States)

    Espinoza-Cuadros, Fernando; Fernández-Pozo, Rubén; Toledano, Doroteo T; Alcázar-Ramírez, José D; López-Gonzalo, Eduardo; Hernández-Gómez, Luis A

    2016-02-20

    Sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). The altered UA structure or function in OSA speakers has led to hypothesize the automatic analysis of speech for OSA assessment. In this paper we critically review several approaches using speech analysis and machine learning techniques for OSA detection, and discuss the limitations that can arise when using machine learning techniques for diagnostic applications. A large speech database including 426 male Spanish speakers suspected to suffer OSA and derived to a sleep disorders unit was used to study the clinical validity of several proposals using machine learning techniques to predict the apnea-hypopnea index (AHI) or classify individuals according to their OSA severity. AHI describes the severity of patients' condition. We first evaluate AHI prediction using state-of-the-art speaker recognition technologies: speech spectral information is modelled using supervectors or i-vectors techniques, and AHI is predicted through support vector regression (SVR). Using the same database we then critically review several OSA classification approaches previously proposed. The influence and possible interference of other clinical variables or characteristics available for our OSA population: age, height, weight, body mass index, and cervical perimeter, are also studied. The poor results obtained when estimating AHI using supervectors or i-vectors followed by SVR contrast with the positive results reported by previous research. This fact prompted us to a careful review of these approaches, also testing some reported results over our database. Several methodological limitations and deficiencies were detected that may have led to overoptimistic results. The methodological deficiencies observed after critically reviewing previous research can be relevant examples of potential pitfalls when using machine learning techniques for

  12. The Accurate Particle Tracer Code

    OpenAIRE

    Wang, Yulei; Liu, Jian; Qin, Hong; Yu, Zhi

    2016-01-01

    The Accurate Particle Tracer (APT) code is designed for large-scale particle simulations on dynamical systems. Based on a large variety of advanced geometric algorithms, APT possesses long-term numerical accuracy and stability, which are critical for solving multi-scale and non-linear problems. Under the well-designed integrated and modularized framework, APT serves as a universal platform for researchers from different fields, such as plasma physics, accelerator physics, space science, fusio...

  13. Accurate x-ray spectroscopy

    International Nuclear Information System (INIS)

    Deslattes, R.D.

    1987-01-01

    Heavy ion accelerators are the most flexible and readily accessible sources of highly charged ions. These having only one or two remaining electrons have spectra whose accurate measurement is of considerable theoretical significance. Certain features of ion production by accelerators tend to limit the accuracy which can be realized in measurement of these spectra. This report aims to provide background about spectroscopic limitations and discuss how accelerator operations may be selected to permit attaining intrinsically limited data

  14. Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

    Directory of Open Access Journals (Sweden)

    Yue Zhao

    2012-12-01

    Full Text Available Audio-visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi-stream Dynamic Bayesian Network and coupled HMM are widely used for audio-visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial-temporal multimodal features from Tibetan audio-visual speech data and build an accurate audio-visual speech recognition model under a no frame-independency assumption. The experiment results on Tibetan speech data from some real-world environments showed the proposed DDBN outperforms the state-of-art methods in word recognition accuracy.

  15. Automatic quantitative renal scintigraphy

    International Nuclear Information System (INIS)

    Valeyre, J.; Deltour, G.; Delisle, M.J.; Bouchard, A.

    1976-01-01

    Renal scintigraphy data may be analyzed automatically by the use of a processing system coupled to an Anger camera (TRIDAC-MULTI 8 or CINE 200). The computing sequence is as follows: normalization of the images; background noise subtraction on both images; evaluation of mercury 197 uptake by the liver and spleen; calculation of the activity fractions on each kidney with respect to the injected dose, taking into account the kidney depth and the results referred to normal values; edition of the results. Automation minimizes the scattering parameters and by its simplification is a great asset in routine work [fr

  16. AUTOMATIC FREQUENCY CONTROL SYSTEM

    Science.gov (United States)

    Hansen, C.F.; Salisbury, J.D.

    1961-01-10

    A control is described for automatically matching the frequency of a resonant cavity to that of a driving oscillator. The driving oscillator is disconnected from the cavity and a secondary oscillator is actuated in which the cavity is the frequency determining element. A low frequency is mixed with the output of the driving oscillator and the resultant lower and upper sidebands are separately derived. The frequencies of the sidebands are compared with the secondary oscillator frequency. deriving a servo control signal to adjust a tuning element in the cavity and matching the cavity frequency to that of the driving oscillator. The driving oscillator may then be connected to the cavity.

  17. Automatic dipole subtraction

    International Nuclear Information System (INIS)

    Hasegawa, K.

    2008-01-01

    The Catani-Seymour dipole subtraction is a general procedure to treat infrared divergences in real emission processes at next-to-leading order in QCD. We automatized the procedure in a computer code. The code is useful especially for the processes with many parton legs. In this talk, we first explain the algorithm of the dipole subtraction and the whole structure of our code. After that we show the results for some processes where the infrared divergences of real emission processes are subtracted. (author)

  18. Automatic programmable air ozonizer

    International Nuclear Information System (INIS)

    Gubarev, S.P.; Klosovsky, A.V.; Opaleva, G.P.; Taran, V.S.; Zolototrubova, M.I.

    2015-01-01

    In this paper we describe a compact, economical, easy to manage auto air ozonator developed at the Institute of Plasma Physics of the NSC KIPT. It is designed for sanitation, disinfection of premises and cleaning the air from foreign odors. A distinctive feature of the developed device is the generation of a given concentration of ozone, approximately 0.7 maximum allowable concentration (MAC), and automatic maintenance of a specified level. This allows people to be inside the processed premises during operation. The microprocessor controller to control the operation of the ozonator was developed

  19. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    International Nuclear Information System (INIS)

    Holzrichter, J.F.; Ng, L.C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs

  20. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    Science.gov (United States)

    Holzrichter, John F.; Ng, Lawrence C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  1. Relationship between Speech Intelligibility and Speech Comprehension in Babble Noise

    Science.gov (United States)

    Fontan, Lionel; Tardieu, Julien; Gaillard, Pascal; Woisard, Virginie; Ruiz, Robert

    2015-01-01

    Purpose: The authors investigated the relationship between the intelligibility and comprehension of speech presented in babble noise. Method: Forty participants listened to French imperative sentences (commands for moving objects) in a multitalker babble background for which intensity was experimentally controlled. Participants were instructed to…

  2. Spectral integration in speech and non-speech sounds

    Science.gov (United States)

    Jacewicz, Ewa

    2005-04-01

    Spectral integration (or formant averaging) was proposed in vowel perception research to account for the observation that a reduction of the intensity of one of two closely spaced formants (as in /u/) produced a predictable shift in vowel quality [Delattre et al., Word 8, 195-210 (1952)]. A related observation was reported in psychoacoustics, indicating that when the components of a two-tone periodic complex differ in amplitude and frequency, its perceived pitch is shifted toward that of the more intense tone [Helmholtz, App. XIV (1875/1948)]. Subsequent research in both fields focused on the frequency interval that separates these two spectral components, in an attempt to determine the size of the bandwidth for spectral integration to occur. This talk will review the accumulated evidence for and against spectral integration within the hypothesized limit of 3.5 Bark for static and dynamic signals in speech perception and psychoacoustics. Based on similarities in the processing of speech and non-speech sounds, it is suggested that spectral integration may reflect a general property of the auditory system. A larger frequency bandwidth, possibly close to 3.5 Bark, may be utilized in integrating acoustic information, including speech, complex signals, or sound quality of a violin.

  3. Intelligibility of synthetic speech in the presence of interfering speech

    NARCIS (Netherlands)

    Eggen, J.H.

    1989-01-01

    Standard articulation tests are not always sensitive enough to discriminate between speech samples which are of high intelligibility. One can increase the sensitivity of such tests by presenting the test materials in noise. In this way, small differences in intelligibility can be magnified into

  4. Multimedia with a speech track: searching spontaneous conversational speech

    NARCIS (Netherlands)

    Larson, Martha; Ordelman, Roeland J.F.; de Jong, Franciska M.G.; Kohler, Joachim; Kraaij, Wessel

    After two successful years at SIGIR in 2007 and 2008, the third workshop on Searching Spontaneous Conversational Speech (SSCS 2009) was held conjunction with the ACM Multimedia 2009. The goal of the SSCS series is to serve as a forum that brings together the disciplines that collaborate on spoken

  5. SPEECH ACT ANALYSIS: HOSNI MUBARAK'S SPEECHES IN PRE ...

    African Journals Online (AJOL)

    enerco

    from movements of certain organs with his (man‟s) throat and mouth…. By means ... In other words, government engages language; and how this affects the ... address the audience in a social gathering in order to have a new dawn. ..... Agbedo, C. U. Speech Act Analysis of Political discourse in the Nigerian Print Media in.

  6. Cognitive Functions in Childhood Apraxia of Speech

    Science.gov (United States)

    Nijland, Lian; Terband, Hayo; Maassen, Ben

    2015-01-01

    Purpose: Childhood apraxia of speech (CAS) is diagnosed on the basis of specific speech characteristics, in the absence of problems in hearing, intelligence, and language comprehension. This does not preclude the possibility that children with this speech disorder might demonstrate additional problems. Method: Cognitive functions were investigated…

  7. Phonetic recalibration of speech by text

    NARCIS (Netherlands)

    Keetels, M.N.; Schakel, L.; de Bonte, M.; Vroomen, J.

    2016-01-01

    Listeners adjust their phonetic categories to cope with variations in the speech signal (phonetic recalibration). Previous studies have shown that lipread speech (and word knowledge) can adjust the perception of ambiguous speech and can induce phonetic adjustments (Bertelson, Vroomen, & de Gelder in

  8. Speech and Debate as Civic Education

    Science.gov (United States)

    Hogan, J. Michael; Kurr, Jeffrey A.; Johnson, Jeremy D.; Bergmaier, Michael J.

    2016-01-01

    In light of the U.S. Senate's designation of March 15, 2016 as "National Speech and Debate Education Day" (S. Res. 398, 2016), it only seems fitting that "Communication Education" devote a special section to the role of speech and debate in civic education. Speech and debate have been at the heart of the communication…

  9. Speech Synthesis Applied to Language Teaching.

    Science.gov (United States)

    Sherwood, Bruce

    1981-01-01

    The experimental addition of speech output to computer-based Esperanto lessons using speech synthesized from text is described. Because of Esperanto's phonetic spelling and simple rhythm, it is particularly easy to describe the mechanisms of Esperanto synthesis. Attention is directed to how the text-to-speech conversion is performed and the ways…

  10. Normal Aspects of Speech, Hearing, and Language.

    Science.gov (United States)

    Minifie, Fred. D., Ed.; And Others

    This book is written as a guide to the understanding of the processes involved in human speech communication. Ten authorities contributed material to provide an introduction to the physiological aspects of speech production and reception, the acoustical aspects of speech production and transmission, the psychophysics of sound reception, the nature…

  11. Audiovisual Asynchrony Detection in Human Speech

    Science.gov (United States)

    Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

    2011-01-01

    Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…

  12. The interpersonal level in English: reported speech

    NARCIS (Netherlands)

    Keizer, E.

    2009-01-01

    The aim of this article is to describe and classify a number of different forms of English reported speech (or thought), and subsequently to analyze and represent them within the theory of FDG. First, the most prototypical forms of reported speech are discussed (direct and indirect speech);

  13. Cognitive functions in Childhood Apraxia of Speech

    NARCIS (Netherlands)

    Nijland, L.; Terband, H.; Maassen, B.

    2015-01-01

    Purpose: Childhood Apraxia of Speech (CAS) is diagnosed on the basis of specific speech characteristics, in the absence of problems in hearing, intelligence, and language comprehension. This does not preclude the possibility that children with this speech disorder might demonstrate additional

  14. Regulation of speech in multicultural societies

    NARCIS (Netherlands)

    Maussen, M.; Grillo, R.

    2015-01-01

    This book focuses on the way in which public debate and legal practice intersect when it comes to the value of free speech and the need to regulate "offensive", "blasphemous" or "hate" speech, especially, though not exclusively where such speech is thought to be offensive to members of ethnic and

  15. Theoretical Value in Teaching Freedom of Speech.

    Science.gov (United States)

    Carney, John J., Jr.

    The exercise of freedom of speech within our nation has deteriorated. A practical value in teaching free speech is the possibility of restoring a commitment to its principles by educators. What must be taught is why freedom of speech is important, why it has been compromised, and the extent to which it has been compromised. Every technological…

  16. Interventions for Speech Sound Disorders in Children

    Science.gov (United States)

    Williams, A. Lynn, Ed.; McLeod, Sharynne, Ed.; McCauley, Rebecca J., Ed.

    2010-01-01

    With detailed discussion and invaluable video footage of 23 treatment interventions for speech sound disorders (SSDs) in children, this textbook and DVD set should be part of every speech-language pathologist's professional preparation. Focusing on children with functional or motor-based speech disorders from early childhood through the early…

  17. Application of wavelets in speech processing

    CERN Document Server

    Farouk, Mohamed Hesham

    2014-01-01

    This book provides a survey on wide-spread of employing wavelets analysis  in different applications of speech processing. The author examines development and research in different application of speech processing. The book also summarizes the state of the art research on wavelet in speech processing.

  18. DEVELOPMENT AND DISORDERS OF SPEECH IN CHILDHOOD.

    Science.gov (United States)

    KARLIN, ISAAC W.; AND OTHERS

    THE GROWTH, DEVELOPMENT, AND ABNORMALITIES OF SPEECH IN CHILDHOOD ARE DESCRIBED IN THIS TEXT DESIGNED FOR PEDIATRICIANS, PSYCHOLOGISTS, EDUCATORS, MEDICAL STUDENTS, THERAPISTS, PATHOLOGISTS, AND PARENTS. THE NORMAL DEVELOPMENT OF SPEECH AND LANGUAGE IS DISCUSSED, INCLUDING THEORIES ON THE ORIGIN OF SPEECH IN MAN AND FACTORS INFLUENCING THE NORMAL…

  19. Impaired Feedforward Control and Enhanced Feedback Control of Speech in Patients with Cerebellar Degeneration.

    Science.gov (United States)

    Parrell, Benjamin; Agnew, Zarinah; Nagarajan, Srikantan; Houde, John; Ivry, Richard B

    2017-09-20

    The cerebellum has been hypothesized to form a crucial part of the speech motor control network. Evidence for this comes from patients with cerebellar damage, who exhibit a variety of speech deficits, as well as imaging studies showing cerebellar activation during speech production in healthy individuals. To date, the precise role of the cerebellum in speech motor control remains unclear, as it has been implicated in both anticipatory (feedforward) and reactive (feedback) control. Here, we assess both anticipatory and reactive aspects of speech motor control, comparing the performance of patients with cerebellar degeneration and matched controls. Experiment 1 tested feedforward control by examining speech adaptation across trials in response to a consistent perturbation of auditory feedback. Experiment 2 tested feedback control, examining online corrections in response to inconsistent perturbations of auditory feedback. Both male and female patients and controls were tested. The patients were impaired in adapting their feedforward control system relative to controls, exhibiting an attenuated anticipatory response to the perturbation. In contrast, the patients produced even larger compensatory responses than controls, suggesting an increased reliance on sensory feedback to guide speech articulation in this population. Together, these results suggest that the cerebellum is crucial for maintaining accurate feedforward control of speech, but relatively uninvolved in feedback control. SIGNIFICANCE STATEMENT Speech motor control is a complex activity that is thought to rely on both predictive, feedforward control as well as reactive, feedback control. While the cerebellum has been shown to be part of the speech motor control network, its functional contribution to feedback and feedforward control remains controversial. Here, we use real-time auditory perturbations of speech to show that patients with cerebellar degeneration are impaired in adapting feedforward control of

  20. Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem.

    Directory of Open Access Journals (Sweden)

    Cai Wingfield

    2017-09-01

    Full Text Available There is widespread interest in the relationship between the neurobiological systems supporting human cognition and emerging computational systems capable of emulating these capacities. Human speech comprehension, poorly understood as a neurobiological process, is an important case in point. Automatic Speech Recognition (ASR systems with near-human levels of performance are now available, which provide a computationally explicit solution for the recognition of words in continuous speech. This research aims to bridge the gap between speech recognition processes in humans and machines, using novel multivariate techniques to compare incremental 'machine states', generated as the ASR analysis progresses over time, to the incremental 'brain states', measured using combined electro- and magneto-encephalography (EMEG, generated as the same inputs are heard by human listeners. This direct comparison of dynamic human and machine internal states, as they respond to the same incrementally delivered sensory input, revealed a significant correspondence between neural response patterns in human superior temporal cortex and the structural properties of ASR-derived phonetic models. Spatially coherent patches in human temporal cortex responded selectively to individual phonetic features defined on the basis of machine-extracted regularities in the speech to lexicon mapping process. These results demonstrate the feasibility of relating human and ASR solutions to the problem of speech recognition, and suggest the potential for further studies relating complex neural computations in human speech comprehension to the rapidly evolving ASR systems that address the same problem domain.