WorldWideScience

Sample records for modeling speech signals

  1. Enhancement of speech signals - with a focus on voiced speech models

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie

    This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based on this mo......This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based...

  2. Mathematical modeling and signal processing in speech and hearing sciences

    CERN Document Server

    Xin, Jack

    2014-01-01

    The aim of the book is to give an accessible introduction of mathematical models and signal processing methods in speech and hearing sciences for senior undergraduate and beginning graduate students with basic knowledge of linear algebra, differential equations, numerical analysis, and probability. Speech and hearing sciences are fundamental to numerous technological advances of the digital world in the past decade, from music compression in MP3 to digital hearing aids, from network based voice enabled services to speech interaction with mobile phones. Mathematics and computation are intimately related to these leaps and bounds. On the other hand, speech and hearing are strongly interdisciplinary areas where dissimilar scientific and engineering publications and approaches often coexist and make it difficult for newcomers to enter.

  3. Phonetic perspectives on modelling information in the speech signal

    Indian Academy of Sciences (India)

    information available from the speech signal. Examples of variation in phonetic detail which systematically signals non-phonemic linguis- tic information such as the grammatical or morphological status of a stretch of sound are given. Other examples indicate the discourse function of the utterance. Some of these systematic ...

  4. Improved Emotion Recognition Using Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal Signals

    Directory of Open Access Journals (Sweden)

    Hariharan Muthusamy

    2015-01-01

    Full Text Available Recently, researchers have paid escalating attention to studying the emotional state of an individual from his/her speech signals as the speech signal is the fastest and the most natural method of communication between individuals. In this work, new feature enhancement using Gaussian mixture model (GMM was proposed to enhance the discriminatory power of the features extracted from speech and glottal signals. Three different emotional speech databases were utilized to gauge the proposed methods. Extreme learning machine (ELM and k-nearest neighbor (kNN classifier were employed to classify the different types of emotions. Several experiments were conducted and results show that the proposed methods significantly improved the speech emotion recognition performance compared to research works published in the literature.

  5. Phonetic perspectives on modelling information in the speech signal

    Indian Academy of Sciences (India)

    ... and uses formalisms that force us to recognize that every perceptual decision is context- and task-dependent. Examples of perceptually-significant phonetic detail that is neglected by standard models are discussed. Similarities between the theoretical approach recommended and current work on perception–action robots ...

  6. Model-Based Speech Signal Coding Using Optimized Temporal Decomposition for Storage and Broadcasting Applications

    Science.gov (United States)

    Athaudage, Chandranath R. N.; Bradley, Alan B.; Lech, Margaret

    2003-12-01

    A dynamic programming-based optimization strategy for a temporal decomposition (TD) model of speech and its application to low-rate speech coding in storage and broadcasting is presented. In previous work with the spectral stability-based event localizing (SBEL) TD algorithm, the event localization was performed based on a spectral stability criterion. Although this approach gave reasonably good results, there was no assurance on the optimality of the event locations. In the present work, we have optimized the event localizing task using a dynamic programming-based optimization strategy. Simulation results show that an improved TD model accuracy can be achieved. A methodology of incorporating the optimized TD algorithm within the standard MELP speech coder for the efficient compression of speech spectral information is also presented. The performance evaluation results revealed that the proposed speech coding scheme achieves 50%-60% compression of speech spectral information with negligible degradation in the decoded speech quality.

  7. Robust digital processing of speech signals

    CERN Document Server

    Kovacevic, Branko; Veinović, Mladen; Marković, Milan

    2017-01-01

    This book focuses on speech signal phenomena, presenting a robustification of the usual speech generation models with regard to the presumed types of excitation signals, which is equivalent to the introduction of a class of nonlinear models and the corresponding criterion functions for parameter estimation. Compared to the general class of nonlinear models, such as various neural networks, these models possess good properties of controlled complexity, the option of working in “online” mode, as well as a low information volume for efficient speech encoding and transmission. Providing comprehensive insights, the book is based on the authors’ research, which has already been published, supplemented by additional texts discussing general considerations of speech modeling, linear predictive analysis and robust parameter estimation.

  8. Prediction of speech intelligibility based on an auditory preprocessing model

    DEFF Research Database (Denmark)

    Christiansen, Claus Forup Corlin; Pedersen, Michael Syskind; Dau, Torsten

    2010-01-01

    Classical speech intelligibility models, such as the speech transmission index (STI) and the speech intelligibility index (SII) are based on calculations on the physical acoustic signals. The present study predicts speech intelligibility by combining a psychoacoustically validated model of auditory...

  9. Automatic Smoker Detection from Telephone Speech Signals

    DEFF Research Database (Denmark)

    Alavijeh, Amir Hossein Poorjam; Hesaraki, Soheila; Safavi, Saeid

    2017-01-01

    This paper proposes an automatic smoking habit detection from spontaneous telephone speech signals. In this method, each utterance is modeled using i-vector and non-negative factor analysis (NFA) frameworks, which yield low-dimensional representation of utterances by applying factor analysis on G...

  10. Modeling speech intelligibility based on the signal-to-noise envelope power ratio

    DEFF Research Database (Denmark)

    Jørgensen, Søren

    background noise, reverberation and noise reduction processing on speech intelligibility, indicating that the model is more general than traditional modeling approaches. Moreover, the model accounts for phase distortions when it includes a mechanism that evaluates the variation of envelope power across...... (audio) frequency. However, because the SNRenv is based on the long-term average envelope power, the model cannot account for the greater intelligibility typically observed in fluctuating noise compared to stationary noise. To overcome this limitation, a multi-resolution version of the sEPSM is presented...... distorted by reverberation or spectral subtraction. The relationship between the SNRenv based decision-metric and psychoacoustic speech intelligibility is further evaluated by generating stimuli with different SNRenv but the same overall power SNR. The results from the corresponding psychoacoustic data...

  11. Method and apparatus for obtaining complete speech signals for speech recognition applications

    Science.gov (United States)

    Abrash, Victor (Inventor); Cesari, Federico (Inventor); Franco, Horacio (Inventor); George, Christopher (Inventor); Zheng, Jing (Inventor)

    2009-01-01

    The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.

  12. Optimal Wavelets for Speech Signal Representations

    Directory of Open Access Journals (Sweden)

    Shonda L. Walker

    2003-08-01

    Full Text Available It is well known that in many speech processing applications, speech signals are characterized by their voiced and unvoiced components. Voiced speech components contain dense frequency spectrum with many harmonics. The periodic or semi-periodic nature of voiced signals lends itself to Fourier Processing. Unvoiced speech contains many high frequency components and thus resembles random noise. Several methods for voiced and unvoiced speech representations that utilize wavelet processing have been developed. These methods seek to improve the accuracy of wavelet-based speech signal representations using adaptive wavelet techniques, superwavelets, which uses a linear combination of adaptive wavelets, gaussian methods and a multi-resolution sinusoidal transform approach to mention a few. This paper addresses the relative performance of these wavelet methods and evaluates the usefulness of wavelet processing in speech signal representations. In addition, this paper will also address some of the hardware considerations for the wavelet methods presented.

  13. Automatic Smoker Detection from Telephone Speech Signals

    DEFF Research Database (Denmark)

    Alavijeh, Amir Hossein Poorjam; Hesaraki, Soheila; Safavi, Saeid

    2017-01-01

    This paper proposes an automatic smoking habit detection from spontaneous telephone speech signals. In this method, each utterance is modeled using i-vector and non-negative factor analysis (NFA) frameworks, which yield low-dimensional representation of utterances by applying factor analysis...... on Gaussian mixture model means and weights respectively. Each framework is evaluated using different classification algorithms to detect the smoker speakers. Finally, score-level fusion of the i-vector-based and the NFA-based recognizers is considered to improve the classification accuracy. The proposed...... method is evaluated on telephone speech signals of speakers whose smoking habits are known drawn from the National Institute of Standards and Technology (NIST) 2008 and 2010 Speaker Recognition Evaluation databases. Experimental results over 1194 utterances show the effectiveness of the proposed approach...

  14. Speech Communication and Signal Processing

    Indian Academy of Sciences (India)

    on 'Auditory-like filter bank: An optimal speech processor for efficient human speech commu- nication', Ghosh et al argue that the auditory filter bank in human ear is a near-optimal speech processor for efficient speech communication between human beings. They use mutual informa- tion criterion to design the optimal filter ...

  15. Acquirement and enhancement of remote speech signals

    Science.gov (United States)

    Lü, Tao; Guo, Jin; Zhang, He-yong; Yan, Chun-hui; Wang, Can-jin

    2017-07-01

    To address the challenges of non-cooperative and remote acoustic detection, an all-fiber laser Doppler vibrometer (LDV) is established. The all-fiber LDV system can offer the advantages of smaller size, lightweight design and robust structure, hence it is a better fit for remote speech detection. In order to improve the performance and the efficiency of LDV for long-range hearing, the speech enhancement technology based on optimally modified log-spectral amplitude (OM-LSA) algorithm is used. The experimental results show that the comprehensible speech signals within the range of 150 m can be obtained by the proposed LDV. The signal-to-noise ratio ( SNR) and mean opinion score ( MOS) of the LDV speech signal can be increased by 100% and 27%, respectively, by using the speech enhancement technology. This all-fiber LDV, which combines the speech enhancement technology, can meet the practical demand in engineering.

  16. Speech Communication and Signal Processing

    Indian Academy of Sciences (India)

    Communicating with a machine in a natural mode such as speech brings out not only several technological challenges, but also limitations in our understanding of how people communicate so effortlessly. The key is to understand the distinction between speech processing (as is done in human communication) and speech ...

  17. Comparing Infants' Preference for Correlated Audiovisual Speech with Signal-Level Computational Models

    Science.gov (United States)

    Hollich, George; Prince, Christopher G.

    2009-01-01

    How much of infant behaviour can be accounted for by signal-level analyses of stimuli? The current paper directly compares the moment-by-moment behaviour of 8-month-old infants in an audiovisual preferential looking task with that of several computational models that use the same video stimuli as presented to the infants. One type of model…

  18. Modeling speech intelligibility in adverse conditions

    DEFF Research Database (Denmark)

    Dau, Torsten

    2012-01-01

    by the normal as well as impaired auditory system. Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII......) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting...... the intelligibility of reverberant speech as well as noisy speech processed by spectral subtraction. However, the sEPSM cannot account for speech subjected to phase jitter, a condition in which the spectral structure of speech is destroyed, while the broadband temporal envelope is kept largely intact. In contrast...

  19. Modelling speech intelligibility in adverse conditions

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    Jørgensen and Dau (J Acoust Soc Am 130:1475-1487, 2011) proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII) in conditions with nonlinearly processed speech....... Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting the intelligibility of reverberant speech as well...... subjected to phase jitter, a condition in which the spectral structure of the intelligibility of speech signal is strongly affected, while the broadband temporal envelope is kept largely intact. In contrast, the effects of this distortion can be predicted -successfully by the spectro-temporal modulation...

  20. Modeling Speech Intelligibility in Hearing Impaired Listeners

    DEFF Research Database (Denmark)

    Scheidiger, Christoph; Jørgensen, Søren; Dau, Torsten

    2014-01-01

    Models of speech intelligibility (SI) have a long history, starting with the articulation index (AI, [17]), followed by the SI index (SI I, [18]) and the speech transmission index (STI, [7]), to only name a few. However, these models fail to accurately predict SI with nonlinearly processed noisy...... speech, e.g. phase jitter or spectral subtraction. Recent studies predict SI for normal-hearing (NH) listeners based on a signal-to-noise ratio measure in the envelope domain (SNRenv), in the framework of the speech-based envelope power spectrum model (sEPSM, [20, 21]). These models have shown good...... is not yet available. As a firrst step towards such a model, this study investigates to what extent eects of hearing impairment on SI can be modeled in the sEPSM framework. Preliminary results show that, by only modeling the loss of audibility, the model cannot account for the higher speech reception...

  1. A Model for the representation of Speech Signals in Normal and Impaired Ears

    DEFF Research Database (Denmark)

    Christiansen, Thomas Ulrich

    2004-01-01

    hearing was modelled as a combination of outer- and inner hair cell loss. The percentage of dead inner hair cells was calculated based on a new computational method relating auditory nerve fibre thresholds to behavioural thresholds. Finally, a model of the entire auditory nerve fibre population......A model of human auditory periphery, ranging from the outer ear to the auditory nerve, was developed. The model consists of the following components: outer ear transfer function, middle ear transfer function, basilar membrane velocity, inner hair cell receptor potential, inner hair cell probability...... of neurotransmitter release and auditory nerve fibre refractoriness. The model builds on previously published models, however, parameters for basilar membrane velocity and inner hair cell probability of neurotransmitter release were successfully fitted to model data from psychophysical and physiological data...

  2. A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

    Directory of Open Access Journals (Sweden)

    Hermus Kris

    2007-01-01

    Full Text Available The objective of this paper is threefold: (1 to provide an extensive review of signal subspace speech enhancement, (2 to derive an upper bound for the performance of these techniques, and (3 to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise distortions. Subspace filtering methods are based on the orthogonal decomposition of the noisy speech observation space into a signal subspace and a noise subspace. This decomposition is possible under the assumption of a low-rank model for speech, and on the availability of an estimate of the noise correlation matrix. We present an extensive overview of the available estimators, and derive a theoretical estimator to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method. Automatic speech recognition experiments with noisy data demonstrate that subspace-based speech enhancement can significantly increase the robustness of these systems in additive coloured noise environments. Optimal performance is obtained only if no explicit rank reduction of the noisy Hankel matrix is performed. Although this strategy might increase the level of the residual noise, it reduces the risk of removing essential signal information for the recogniser's back end. Finally, it is also shown that subspace filtering compares favourably to the well-known spectral subtraction technique.

  3. Auditory Modeling for Noisy Speech Recognition

    National Research Council Canada - National Science Library

    2000-01-01

    ...) has used its existing technology in phonetic speech recognition, audio signal processing, and multilingual language translation to design and demonstrate an advanced audio interface for speech...

  4. Audio signal recognition for speech, music, and environmental sounds

    Science.gov (United States)

    Ellis, Daniel P. W.

    2003-10-01

    Human listeners are very good at all kinds of sound detection and identification tasks, from understanding heavily accented speech to noticing a ringing phone underneath music playing at full blast. Efforts to duplicate these abilities on computer have been particularly intense in the area of speech recognition, and it is instructive to review which approaches have proved most powerful, and which major problems still remain. The features and models developed for speech have found applications in other audio recognition tasks, including musical signal analysis, and the problems of analyzing the general ``ambient'' audio that might be encountered by an auditorily endowed robot. This talk will briefly review statistical pattern recognition for audio signals, giving examples in several of these domains. Particular emphasis will be given to common aspects and lessons learned.

  5. Modeling Speech Intelligibility in Hearing Impaired Listeners

    DEFF Research Database (Denmark)

    Scheidiger, Christoph; Jørgensen, Søren; Dau, Torsten

    2014-01-01

    speech, e.g. phase jitter or spectral subtraction. Recent studies predict SI for normal-hearing (NH) listeners based on a signal-to-noise ratio measure in the envelope domain (SNRenv), in the framework of the speech-based envelope power spectrum model (sEPSM, [20, 21]). These models have shown good...... agreement with measured data under a broad range of conditions, including stationary and modulated interferers, reverberation, and spectral subtraction. Despite the advances in modeling intelligibility in NH listeners, a broadly applicable model that can predict SI in hearing-impaired (HI) listeners...... is not yet available. As a firrst step towards such a model, this study investigates to what extent eects of hearing impairment on SI can be modeled in the sEPSM framework. Preliminary results show that, by only modeling the loss of audibility, the model cannot account for the higher speech reception...

  6. The speech signal segmentation algorithm using pitch synchronous analysis

    Directory of Open Access Journals (Sweden)

    Amirgaliyev Yedilkhan

    2017-03-01

    Full Text Available Parameterization of the speech signal using the algorithms of analysis synchronized with the pitch frequency is discussed. Speech parameterization is performed by the average number of zero transitions function and the signal energy function. Parameterization results are used to segment the speech signal and to isolate the segments with stable spectral characteristics. Segmentation results can be used to generate a digital voice pattern of a person or be applied in the automatic speech recognition. Stages needed for continuous speech segmentation are described.

  7. Automatic Speech Recognition from Neural Signals: A Focused Review

    Directory of Open Access Journals (Sweden)

    Christian Herff

    2016-09-01

    Full Text Available Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e.~patients suffering from locked-in syndrome. For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people.This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography. As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the emph{Brain-to-text} system.

  8. Advancements in Bio-radar Speech Signal Detection Technology

    OpenAIRE

    Chen Fuming; Li Sheng; An Qiang; Zhang Ziqi; Wang Jianqi

    2016-01-01

    Speech signal acquisition is of great significance for human communication. Bio-radar technology has many advantages, such as it is noncontact, noninvasive, safe, highly directional, highly sensitivity, immune to strong acoustical disturbance and penetrable. This technology has important applications in the field of speech detection. In this paper, we first review the developmental history of speech detection technology, and then summarize the status of bio-radar speech detection technolog...

  9. Advancements in Bio-radar Speech Signal Detection Technology

    Directory of Open Access Journals (Sweden)

    Chen Fuming

    2016-10-01

    Full Text Available Speech signal acquisition is of great significance for human communication. Bio-radar technology has many advantages, such as it is noncontact, noninvasive, safe, highly directional, highly sensitivity, immune to strong acoustical disturbance and penetrable. This technology has important applications in the field of speech detection. In this paper, we first review the developmental history of speech detection technology, and then summarize the status of bio-radar speech detection technology. The basic principles of a bio-radar in detecting speech signals are given, and the performance of three types of bio-radar speech detection systems are compared in this paper. Finally, the potential applications of bio-radar speech signal detection technology are prospected.

  10. Speech spectrum envelope modeling

    Czech Academy of Sciences Publication Activity Database

    Vích, Robert; Vondra, Martin

    Vol. 4775, - (2007), s. 129-137 ISSN 0302-9743. [COST Action 2102 International Workshop. Vietri sul Mare, 29.03.2007-31.03.2007] R&D Projects: GA AV ČR(CZ) 1ET301710509 Institutional research plan: CEZ:AV0Z20670512 Keywords : speech * speech processing * cepstral analysis Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering Impact factor: 0.302, year: 2005

  11. Epoch-based analysis of speech signals

    Indian Academy of Sciences (India)

    Epoch sequence is useful to manipulate prosody in speech synthesis applications. Accurate estimation of epochs helps in characterizing voice quality features. Epoch extraction also helps in speech enhancement and multispeaker separation. In this tutorial article, the importance of epochs for speech analysis is discussed, ...

  12. Does Signal Degradation Affect Top-Down Processing of Speech?

    Science.gov (United States)

    Wagner, Anita; Pals, Carina; de Blecourt, Charlotte M; Sarampalis, Anastasios; Başkent, Deniz

    2016-01-01

    Speech perception is formed based on both the acoustic signal and listeners' knowledge of the world and semantic context. Access to semantic information can facilitate interpretation of degraded speech, such as speech in background noise or the speech signal transmitted via cochlear implants (CIs). This paper focuses on the latter, and investigates the time course of understanding words, and how sentential context reduces listeners' dependency on the acoustic signal for natural and degraded speech via an acoustic CI simulation.In an eye-tracking experiment we combined recordings of listeners' gaze fixations with pupillometry, to capture effects of semantic information on both the time course and effort of speech processing. Normal-hearing listeners were presented with sentences with or without a semantically constraining verb (e.g., crawl) preceding the target (baby), and their ocular responses were recorded to four pictures, including the target, a phonological (bay) competitor and a semantic (worm) and an unrelated distractor.The results show that in natural speech, listeners' gazes reflect their uptake of acoustic information, and integration of preceding semantic context. Degradation of the signal leads to a later disambiguation of phonologically similar words, and to a delay in integration of semantic information. Complementary to this, the pupil dilation data show that early semantic integration reduces the effort in disambiguating phonologically similar words. Processing degraded speech comes with increased effort due to the impoverished nature of the signal. Delayed integration of semantic information further constrains listeners' ability to compensate for inaudible signals.

  13. The effects of noise on speech and warning signals

    Science.gov (United States)

    Suter, Alice H.

    1989-06-01

    To assess the effects of noise on speech communication it is necessary to examine certain characteristics of the speech signal. Speech level can be measured by a variety of methods, none of which has yet been standardized, and it should be kept in mind that vocal effort increases with background noise level and with different types of activity. Noise and filtering commonly degrade the speech signal, especially as it is transmitted through communications systems. Intelligibility is also adversely affected by distance, reverberation, and monaural listening. Communication systems currently in use may cause strain and delays on the part of the listener, but there are many possibilities for improvement. Individuals who need to communicate in noise may be subject to voice disorders. Shouted speech becomes progressively less intelligible at high voice levels, but improvements can be realized when talkers use clear speech. Tolerable listening levels are lower for negative than for positive S/Ns, and comfortable listening levels should be at a S/N of at least 5 dB, and preferably above 10 dB. Popular methods to predict speech intelligibility in noise include the Articulation Index, Speech Interference Level, Speech Transmission Index, and the sound level meter's A-weighting network. This report describes these methods, discussing certain advantages and disadvantages of each, and shows their interrelations.

  14. Random Deep Belief Networks for Recognizing Emotions from Speech Signals

    Directory of Open Access Journals (Sweden)

    Guihua Wen

    2017-01-01

    Full Text Available Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.

  15. Histogram Equalization to Model Adaptation for Robust Speech Recognition

    Directory of Open Access Journals (Sweden)

    Suh Youngjoo

    2010-01-01

    Full Text Available We propose a new model adaptation method based on the histogram equalization technique for providing robustness in noisy environments. The trained acoustic mean models of a speech recognizer are adapted into environmentally matched conditions by using the histogram equalization algorithm on a single utterance basis. For more robust speech recognition in the heavily noisy conditions, trained acoustic covariance models are efficiently adapted by the signal-to-noise ratio-dependent linear interpolation between trained covariance models and utterance-level sample covariance models. Speech recognition experiments on both the digit-based Aurora2 task and the large vocabulary-based task showed that the proposed model adaptation approach provides significant performance improvements compared to the baseline speech recognizer trained on the clean speech data.

  16. Histogram Equalization to Model Adaptation for Robust Speech Recognition

    Science.gov (United States)

    Suh, Youngjoo; Kim, Hoirin

    2010-12-01

    We propose a new model adaptation method based on the histogram equalization technique for providing robustness in noisy environments. The trained acoustic mean models of a speech recognizer are adapted into environmentally matched conditions by using the histogram equalization algorithm on a single utterance basis. For more robust speech recognition in the heavily noisy conditions, trained acoustic covariance models are efficiently adapted by the signal-to-noise ratio-dependent linear interpolation between trained covariance models and utterance-level sample covariance models. Speech recognition experiments on both the digit-based Aurora2 task and the large vocabulary-based task showed that the proposed model adaptation approach provides significant performance improvements compared to the baseline speech recognizer trained on the clean speech data.

  17. Pitch Synchronous Segmentation of Speech Signals

    Data.gov (United States)

    National Aeronautics and Space Administration — The Pitch Synchronous Segmentation (PSS) that accelerates speech without changing its fundamental frequency method could be applied and evaluated for use at NASA....

  18. Automatic speech signal segmentation based on the innovation adaptive filter

    Directory of Open Access Journals (Sweden)

    Makowski Ryszard

    2014-06-01

    Full Text Available Speech segmentation is an essential stage in designing automatic speech recognition systems and one can find several algorithms proposed in the literature. It is a difficult problem, as speech is immensely variable. The aim of the authors’ studies was to design an algorithm that could be employed at the stage of automatic speech recognition. This would make it possible to avoid some problems related to speech signal parametrization. Posing the problem in such a way requires the algorithm to be capable of working in real time. The only such algorithm was proposed by Tyagi et al., (2006, and it is a modified version of Brandt’s algorithm. The article presents a new algorithm for unsupervised automatic speech signal segmentation. It performs segmentation without access to information about the phonetic content of the utterances, relying exclusively on second-order statistics of a speech signal. The starting point for the proposed method is time-varying Schur coefficients of an innovation adaptive filter. The Schur algorithm is known to be fast, precise, stable and capable of rapidly tracking changes in second order signal statistics. A transfer from one phoneme to another in the speech signal always indicates a change in signal statistics caused by vocal track changes. In order to allow for the properties of human hearing, detection of inter-phoneme boundaries is performed based on statistics defined on the mel spectrum determined from the reflection coefficients. The paper presents the structure of the algorithm, defines its properties, lists parameter values, describes detection efficiency results, and compares them with those for another algorithm. The obtained segmentation results, are satisfactory.

  19. Signal-to-Signal Ratio Independent Speaker Identification for Co-channel Speech Signals

    DEFF Research Database (Denmark)

    Saeidi, Rahim; Mowlaee, Pejman; Kinnunen, Tomi

    2010-01-01

    In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is recorded by one microphone only. The goal is to identify both of the speakers from their mixed signal. High recognition accuracies have already been reported when an accurately...

  20. Epoch-based analysis of speech signals

    Indian Academy of Sciences (India)

    It is shown that knowledge of epochs can be used for analysis of speech for glottal activity detection, extrac- ..... The performance of the proposed GAD method was evaluated using the detection error trade- off (DET) curves ..... the oral closure is released more or less impulsively as the vocal tract moves towards a configu-.

  1. Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders

    CERN Document Server

    Baghai-Ravary, Ladan

    2013-01-01

    Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders provides a survey of methods designed to aid clinicians in the diagnosis and monitoring of speech disorders such as dysarthria and dyspraxia, with an emphasis on the signal processing techniques, statistical validity of the results presented in the literature, and the appropriateness of methods that do not require specialized equipment, rigorously controlled recording procedures or highly skilled personnel to interpret results. Such techniques offer the promise of a simple and cost-effective, yet objective, assessment of a range of medical conditions, which would be of great value to clinicians. The ideal scenario would begin with the collection of examples of the clients’ speech, either over the phone or using portable recording devices operated by non-specialist nursing staff. The recordings could then be analyzed initially to aid diagnosis of conditions, and subsequently to monitor the clients’ progress and res...

  2. Mathematical model for classification of EEG signals

    Science.gov (United States)

    Ortiz, Victor H.; Tapia, Juan J.

    2015-09-01

    A mathematical model to filter and classify brain signals from a brain machine interface is developed. The mathematical model classifies the signals from the different lobes of the brain to differentiate the signals: alpha, beta, gamma and theta, besides the signals from vision, speech, and orientation. The model to develop further eliminates noise signals that occur in the process of signal acquisition. This mathematical model can be used on different platforms interfaces for rehabilitation of physically handicapped persons.

  3. Applications of Hilbert Spectral Analysis for Speech and Sound Signals

    Science.gov (United States)

    Huang, Norden E.

    2003-01-01

    A new method for analyzing nonlinear and nonstationary data has been developed, and the natural applications are to speech and sound signals. The key part of the method is the Empirical Mode Decomposition method with which any complicated data set can be decomposed into a finite and often small number of Intrinsic Mode Functions (IMF). An IMF is defined as any function having the same numbers of zero-crossing and extrema, and also having symmetric envelopes defined by the local maxima and minima respectively. The IMF also admits well-behaved Hilbert transform. This decomposition method is adaptive, and, therefore, highly efficient. Since the decomposition is based on the local characteristic time scale of the data, it is applicable to nonlinear and nonstationary processes. With the Hilbert transform, the Intrinsic Mode Functions yield instantaneous frequencies as functions of time, which give sharp identifications of imbedded structures. This method invention can be used to process all acoustic signals. Specifically, it can process the speech signals for Speech synthesis, Speaker identification and verification, Speech recognition, and Sound signal enhancement and filtering. Additionally, as the acoustical signals from machinery are essentially the way the machines are talking to us. Therefore, the acoustical signals, from the machines, either from sound through air or vibration on the machines, can tell us the operating conditions of the machines. Thus, we can use the acoustic signal to diagnosis the problems of machines.

  4. Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language: Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, commerce, education, and beyond.

    Science.gov (United States)

    Narayanan, Shrikanth; Georgiou, Panayiotis G

    2013-02-07

    The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion.

  5. What information is necessary for speech categorization? Harnessing variability in the speech signal by integrating cues computed relative to expectations

    Science.gov (United States)

    McMurray, Bob; Jongman, Allard

    2012-01-01

    Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important is the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context-dependent. This study assessed the informational assumptions of several models of speech categorization, in particular, the number of cues that are the basis of categorization and whether these cues represent the input veridically or have undergone compensation. We collected a corpus of 2880 fricative productions (Jongman, Wayland & Wong, 2000) spanning many talker- and vowel-contexts and measured 24 cues for each. A subset was also presented to listeners in an 8AFC phoneme categorization task. We then trained a common classification model based on logistic regression to categorize the fricative from the cue values, and manipulated the information in the training set to contrast 1) models based on a small number of invariant cues; 2) models using all cues without compensation, and 3) models in which cues underwent compensation for contextual factors. Compensation was modeled by Computing Cues Relative to Expectations (C-CuRE), a new approach to compensation that preserves fine-grained detail in the signal. Only the compensation model achieved a similar accuracy to listeners, and showed the same effects of context. Thus, even simple categorization metrics can overcome the variability in speech when sufficient information is available and compensation schemes like C-CuRE are employed. PMID:21417542

  6. Tools for signal compression applications to speech and audio coding

    CERN Document Server

    Moreau, Nicolas

    2013-01-01

    This book presents tools and algorithms required to compress/uncompress signals such as speech and music. These algorithms are largely used in mobile phones, DVD players, HDTV sets, etc. In a first rather theoretical part, this book presents the standard tools used in compression systems: scalar and vector quantization, predictive quantization, transform quantization, entropy coding. In particular we show the consistency between these different tools. The second part explains how these tools are used in the latest speech and audio coders. The third part gives Matlab programs simulating t

  7. On the Influence of Inharmonicities in Model-Based Speech Enhancement

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2013-01-01

    In relation to speech enhancement, we study the influence of modifying the harmonic signal model for voiced speech to include small perturbations in the frequencies of the harmonics. A perturbed signal model is incorporated in the nonlinear least squares method, the Capon filter and the amplitude...... than the harmonic signal model at input signal-to-noise ratios above approximately -10 dB, and that they are equally good below....

  8. Subauditory Speech Recognition based on EMG/EPG Signals

    Science.gov (United States)

    Jorgensen, Charles; Lee, Diana Dee; Agabon, Shane; Lau, Sonie (Technical Monitor)

    2003-01-01

    Sub-vocal electromyogram/electro palatogram (EMG/EPG) signal classification is demonstrated as a method for silent speech recognition. Recorded electrode signals from the larynx and sublingual areas below the jaw are noise filtered and transformed into features using complex dual quad tree wavelet transforms. Feature sets for six sub-vocally pronounced words are trained using a trust region scaled conjugate gradient neural network. Real time signals for previously unseen patterns are classified into categories suitable for primitive control of graphic objects. Feature construction, recognition accuracy and an approach for extension of the technique to a variety of real world application areas are presented.

  9. Application of automatic speech recognition to quantitative assessment of tracheoesophageal speech with different signal quality.

    Science.gov (United States)

    Haderlein, Tino; Riedhammer, Korbinian; Nöth, Elmar; Toy, Hikmet; Schuster, Maria; Eysholdt, Ulrich; Hornegger, Joachim; Rosanowski, Frank

    2009-01-01

    Tracheoesophageal voice is state-of-the-art in voice rehabilitation after laryngectomy. Intelligibility on a telephone is an important evaluation criterion as it is a crucial part of social life. An objective measure of intelligibility when talking on a telephone is desirable in the field of postlaryngectomy speech therapy and its evaluation. Based upon successful earlier studies with broadband speech, an automatic speech recognition (ASR) system was applied to 41 recordings of postlaryngectomy patients. Recordings were available in different signal qualities; quality was the crucial criterion for this study. Compared to the intelligibility rating of 5 human experts, the ASR system had a correlation coefficient of r = -0.87 and Krippendorff's alpha of 0.65 when broadband speech was processed. The rater group alone achieved alpha = 0.66. With the test recordings in telephone quality, the system reached r = -0.79 and alpha = 0.67. For medical purposes, a comprehensive diagnostic approach to (substitute) voice has to cover both subjective and objective tests. An automatic recognition system such as the one proposed in this study can be used for objective intelligibility rating with results comparable to those of human experts. This holds for broadband speech as well as for automatic evaluation via telephone. Copyright 2008 S. Karger AG, Basel.

  10. System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi

    Science.gov (United States)

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-04-25

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  11. Detecting Parkinson's disease from sustained phonation and speech signals.

    Directory of Open Access Journals (Sweden)

    Evaldas Vaiciukynas

    Full Text Available This study investigates signals from sustained phonation and text-dependent speech modalities for Parkinson's disease screening. Phonation corresponds to the vowel /a/ voicing task and speech to the pronunciation of a short sentence in Lithuanian language. Signals were recorded through two channels simultaneously, namely, acoustic cardioid (AC and smart phone (SP microphones. Additional modalities were obtained by splitting speech recording into voiced and unvoiced parts. Information in each modality is summarized by 18 well-known audio feature sets. Random forest (RF is used as a machine learning algorithm, both for individual feature sets and for decision-level fusion. Detection performance is measured by the out-of-bag equal error rate (EER and the cost of log-likelihood-ratio. Essentia audio feature set was the best using the AC speech modality and YAAFE audio feature set was the best using the SP unvoiced modality, achieving EER of 20.30% and 25.57%, respectively. Fusion of all feature sets and modalities resulted in EER of 19.27% for the AC and 23.00% for the SP channel. Non-linear projection of a RF-based proximity matrix into the 2D space enriched medical decision support by visualization.

  12. Pronunciation Modeling for Large Vocabulary Speech Recognition

    Science.gov (United States)

    Kantor, Arthur

    2010-01-01

    The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy in automatic speech recognition (ASR). Many pronunciation modeling approaches have been developed to address this problem. Some explicitly manipulate the pronunciation dictionary as well as the set of the units used to define the…

  13. Voice Quality Modelling for Expressive Speech Synthesis

    Directory of Open Access Journals (Sweden)

    Carlos Monzo

    2014-01-01

    Full Text Available This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ parameters modelling, along with the well-known prosody (F0, duration, and energy, from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics.

  14. Effects of lengthening the speech signal on auditory word discrimination in kindergartners with SLI

    NARCIS (Netherlands)

    Segers, P.C.J.; Verhoeven, L.T.W.

    2005-01-01

    In the present study, it was investigated whether kindergartners with specific language impairment (SLI) and normallanguage achieving (NLA) kindergartners can benefit from slowing down the entire speech signal or part of the speech signal in a synthetic speech discrimination task. Subjects were 19

  15. A Method of Speech Periodicity Enhancement Using Transform-domain Signal Decomposition.

    Science.gov (United States)

    Huang, Huang; Lee, Tan; Kleijn, W Bastiaan; Kong, Ying-Yee

    2015-03-01

    Periodicity is an important property of speech signals. It is the basis of the signal's fundamental frequency and the pitch of voice, which is crucial to speech communication. This paper presents a novel framework of periodicity enhancement for noisy speech. The enhancement is applied to the linear prediction residual of speech. The residual signal goes through a constant-pitch time warping process and two sequential lapped-frequency transforms, by which the periodic component is concentrated in certain transform coefficients. By emphasizing the respective transform coefficients, periodicity enhancement of noisy residual signal is achieved. The enhanced residual signal and estimated linear prediction filter parameters are used to synthesize the output speech. An adaptive algorithm is proposed for adjusting the weights for the periodic and aperiodic components. Effectiveness of the proposed approach is demonstrated via experimental evaluation. It is observed that harmonic structure of the original speech could be properly restored to improve the perceptual quality of enhanced speech.

  16. Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language

    Science.gov (United States)

    Narayanan, Shrikanth; Georgiou, Panayiotis G.

    2013-01-01

    The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion. PMID:24039277

  17. Robust Models and Features for Speech Recognition.

    Science.gov (United States)

    1998-03-13

    and relevant Spokes of the Speaker Independent Wall Street Journal database in 1994, the Marketplace database in 1995, and the Broadcast news...also built a 64000 word vocabulary. Lan- guage models for this vocabulary were built from a combination of Wall Street Journal data available from...was made from transcribing clean read speech ( Wall Street Journal task in 1994) to real world speech (transcription of radio and TV broadcast news

  18. Speech Subvocal Signal Processing using Packet Wavelet and Neuronal Network

    Directory of Open Access Journals (Sweden)

    Luis E. Mendoza

    2013-11-01

    Full Text Available This paper presents the results obtained from the recording, processing and classification of words in the Spanish language by means of the analysis of subvocal speech signals. The processed database has six words (forward, backward, right, left, start and stop. In this work, the signals were sensed with surface electrodes placed on the surface of the throat and acquired with a sampling frequency of 50 kHz. The signal conditioning consisted in: the location of area of interest using energy analysis, and filtering using Discrete Wavelet Transform. Finally, the feature extraction was made in the time-frequency domain using Wavelet Packet and statistical techniques for windowing. The classification was carried out with a backpropagation neural network whose training was performed with 70% of the database obtained. The correct classification rate was 75%±2.

  19. Mathematical pattern, smoothing and digital filtering of a speech signal

    International Nuclear Information System (INIS)

    Razzam, Mohamed Habib

    1979-01-01

    After presentation of speech synthesis methods, characterized by a treatment of pre-recorded natural signals, or by an analog simulation of vocal tract, we present a new synthesis method especially based on a mathematical pattern of the signal, as a development of M. RODET's method. For their physiological origin, these signals are partially or totally voiced, or aleatory. For the phoneme voiced parts, we compute the formant curves, the sum of which constitute the wave, directly in time-domain by applying a specific envelope (operating as a time-window analysis) to a sinusoidal wave, The sinusoidal wave computation is made at the beginning of each signal's pseudo-period. The transition from successive periods is assured by a polynomial smoothing followed by a digital filtering. For the aleatory parts, we present an aleatory computation method of formant curves. Each signal is subjected to a melodic diagrams computed in accordance with the nature of the phoneme (vowel or consonant) and its context (isolated or not). (author) [fr

  20. Digital speech processing using Matlab

    CERN Document Server

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  1. The application of sparse linear prediction dictionary to compressive sensing in speech signals

    Directory of Open Access Journals (Sweden)

    YOU Hanxu

    2016-04-01

    Full Text Available Appling compressive sensing (CS,which theoretically guarantees that signal sampling and signal compression can be achieved simultaneously,into audio and speech signal processing is one of the most popular research topics in recent years.In this paper,K-SVD algorithm was employed to learn a sparse linear prediction dictionary regarding as the sparse basis of underlying speech signals.Compressed signals was obtained by applying random Gaussian matrix to sample original speech frames.Orthogonal matching pursuit (OMP and compressive sampling matching pursuit (CoSaMP were adopted to recovery original signals from compressed one.Numbers of experiments were carried out to investigate the impact of speech frames length,compression ratios,sparse basis and reconstruction algorithms on CS performance.Results show that sparse linear prediction dictionary can advance the performance of speech signals reconstruction compared with discrete cosine transform (DCT matrix.

  2. Algorithms and Software for Predictive and Perceptual Modeling of Speech

    CERN Document Server

    Atti, Venkatraman

    2010-01-01

    From the early pulse code modulation-based coders to some of the recent multi-rate wideband speech coding standards, the area of speech coding made several significant strides with an objective to attain high quality of speech at the lowest possible bit rate. This book presents some of the recent advances in linear prediction (LP)-based speech analysis that employ perceptual models for narrow- and wide-band speech coding. The LP analysis-synthesis framework has been successful for speech coding because it fits well the source-system paradigm for speech synthesis. Limitations associated with th

  3. Emotion Recognition of Speech Signals Based on Filter Methods

    Directory of Open Access Journals (Sweden)

    Narjes Yazdanian

    2016-10-01

    Full Text Available Speech is the basic mean of communication among human beings.With the increase of transaction between human and machine, necessity of automatic dialogue and removing human factor has been considered. The aim of this study was to determine a set of affective features the speech signal is based on emotions. In this study system was designs that include three mains sections, features extraction, features selection and classification. After extraction of useful features such as, mel frequency cepstral coefficient (MFCC, linear prediction cepstral coefficients (LPC, perceptive linear prediction coefficients (PLP, ferment frequency, zero crossing rate, cepstral coefficients and pitch frequency, Mean, Jitter, Shimmer, Energy, Minimum, Maximum, Amplitude, Standard Deviation, at a later stage with filter methods such as Pearson Correlation Coefficient, t-test, relief and information gain, we came up with a method to rank and select effective features in emotion recognition. Then Result, are given to the classification system as a subset of input. In this classification stage, multi support vector machine are used to classify seven type of emotion. According to the results, that method of relief, together with multi support vector machine, has the most classification accuracy with emotion recognition rate of 93.94%.

  4. A Novel Voice Sensor for the Detection of Speech Signals

    Directory of Open Access Journals (Sweden)

    Kun-Ching Wang

    2013-12-01

    Full Text Available In order to develop a novel voice sensor to detect human voices, the use of features which are more robust to noise is an important issue. Voice sensor is also called voice activity detection (VAD. Due to that the inherent nature of the formant structure only occurred on the speech spectrogram (well-known as voiceprint, Wu et al. were the first to use band-spectral entropy (BSE to describe the characteristics of voiceprints. However, the performance of VAD based on BSE feature was degraded in colored noise (or voiceprint-like noise environments. In order to solve this problem, we propose the two-dimensional part-band energy entropy (TD-PBEE parameter based on two variables: part-band partition number upon frequency index and long-term window size upon time index to further improve the BSE-based VAD algorithm. The two variables can efficiently represent the characteristics of voiceprints on each critical frequency band and use long-term information for noisy speech spectrograms, respectively. The TD-PBEE parameter can be regarded as a PBEE parameter over time. First, the strength of voiceprints can be partly enhanced by using four entropies applied to four part-bands. We can use the four part-band energy entropies for describing the voiceprints in detail. Due to the characteristics of non-stationary for speech and various noises, we will then use long-term information processing to refine the PBEE, so the voice-like noise can be distinguished from noisy speech through the concept of PBEE with long-term information. Our experiments show that the proposed feature extraction with the TD-PBEE parameter is quite insensitive to background noise. The proposed TD-PBEE-based VAD algorithm is evaluated for four types of noises and five signal-to-noise ratio (SNR levels. We find that the accuracy of the proposed TD-PBEE-based VAD algorithm averaged over all noises and all SNR levels is better than that of other considered VAD algorithms.

  5. A novel voice sensor for the detection of speech signals.

    Science.gov (United States)

    Wang, Kun-Ching

    2013-12-02

    In order to develop a novel voice sensor to detect human voices, the use of features which are more robust to noise is an important issue. Voice sensor is also called voice activity detection (VAD). Due to that the inherent nature of the formant structure only occurred on the speech spectrogram (well-known as voiceprint), Wu et al. were the first to use band-spectral entropy (BSE) to describe the characteristics of voiceprints. However, the performance of VAD based on BSE feature was degraded in colored noise (or voiceprint-like noise) environments. In order to solve this problem, we propose the two-dimensional part-band energy entropy (TD-PBEE) parameter based on two variables: part-band partition number upon frequency index and long-term window size upon time index to further improve the BSE-based VAD algorithm. The two variables can efficiently represent the characteristics of voiceprints on each critical frequency band and use long-term information for noisy speech spectrograms, respectively. The TD-PBEE parameter can be regarded as a PBEE parameter over time. First, the strength of voiceprints can be partly enhanced by using four entropies applied to four part-bands. We can use the four part-band energy entropies for describing the voiceprints in detail. Due to the characteristics of non-stationary for speech and various noises, we will then use long-term information processing to refine the PBEE, so the voice-like noise can be distinguished from noisy speech through the concept of PBEE with long-term information. Our experiments show that the proposed feature extraction with the TD-PBEE parameter is quite insensitive to background noise. The proposed TD-PBEE-based VAD algorithm is evaluated for four types of noises and five signal-to-noise ratio (SNR) levels. We find that the accuracy of the proposed TD-PBEE-based VAD algorithm averaged over all noises and all SNR levels is better than that of other considered VAD algorithms.

  6. Speech Denoising in White Noise Based on Signal Subspace Low-rank Plus Sparse Decomposition

    Directory of Open Access Journals (Sweden)

    yuan Shuai

    2017-01-01

    Full Text Available In this paper, a new subspace speech enhancement method using low-rank and sparse decomposition is presented. In the proposed method, we firstly structure the corrupted data as a Toeplitz matrix and estimate its effective rank for the underlying human speech signal. Then the low-rank and sparse decomposition is performed with the guidance of speech rank value to remove the noise. Extensive experiments have been carried out in white Gaussian noise condition, and experimental results show the proposed method performs better than conventional speech enhancement methods, in terms of yielding less residual noise and lower speech distortion.

  7. Ordinal models of audiovisual speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias

    2011-01-01

    ordinal models that can account for the McGurk illusion. We compare this type of models to the Fuzzy Logical Model of Perception (FLMP) in which the response categories are not ordered. While the FLMP generally fit the data better than the ordinal model it also employs more free parameters in complex...... experiments when the number of response categories are high as it is for speech perception in general. Testing the predictive power of the models using a form of cross-validation we found that ordinal models perform better than the FLMP. Based on these findings we suggest that ordinal models generally have...

  8. Analysis of vocal signal in its amplitude - time representation. speech synthesis-by-rules

    International Nuclear Information System (INIS)

    Rodet, Xavier

    1977-01-01

    In the first part of this dissertation, the natural speech production and the resulting acoustic waveform are examined under various aspects: communication, phonetics, frequency and temporal analysis. Our own study of direct signal is compared to other researches in these different fields, and fundamental features of vocal signals are described. The second part deals with the numerous methods already used for automatic text-to-speech synthesis. In the last part, we expose the new speech synthesis-by-rule methods that we have worked out, and we present in details the structure of the real-time speech synthesiser that we have implemented on a mini-computer. (author) [fr

  9. Speech-To-Text Conversion STT System Using Hidden Markov Model HMM

    Directory of Open Access Journals (Sweden)

    Su Myat Mon

    2015-06-01

    Full Text Available Abstract Speech is an easiest way to communicate with each other. Speech processing is widely used in many applications like security devices household appliances cellular phones ATM machines and computers. The human computer interface has been developed to communicate or interact conveniently for one who is suffering from some kind of disabilities. Speech-to-Text Conversion STT systems have a lot of benefits for the deaf or dumb people and find their applications in our daily lives. In the same way the aim of the system is to convert the input speech signals into the text output for the deaf or dumb students in the educational fields. This paper presents an approach to extract features by using Mel Frequency Cepstral Coefficients MFCC from the speech signals of isolated spoken words. And Hidden Markov Model HMM method is applied to train and test the audio files to get the recognized spoken word. The speech database is created by using MATLAB.Then the original speech signals are preprocessed and these speech samples are extracted to the feature vectors which are used as the observation sequences of the Hidden Markov Model HMM recognizer. The feature vectors are analyzed in the HMM depending on the number of states.

  10. Non-intrusive speech quality assessment in simplified e-model

    OpenAIRE

    Vozňák, Miroslav

    2012-01-01

    The E-model brings a modern approach to the computation of estimated quality, allowing for easy implementation. One of its advantages is that it can be applied in real time. The method is based on a mathematical computation model evaluating transmission path impairments influencing speech signal, especially delays and packet losses. These parameters, common in an IP network, can affect speech quality dramatically. The paper deals with a proposal for a simplified E-model and its pr...

  11. A New Method to Represent Speech Signals Via Predefined Signature and Envelope Sequences

    Directory of Open Access Journals (Sweden)

    Binboga Sıddık Yarman

    2007-01-01

    Full Text Available A novel systematic procedure referred to as “SYMPES” to model speech signals is introduced. The structure of SYMPES is based on the creation of the so-called predefined “signature S={SR(n} and envelope E={EK(n}” sets. These sets are speaker and language independent. Once the speech signals are divided into frames with selected lengths, then each frame sequence Xi(n is reconstructed by means of the mathematical form Xi(n=CiEK(nSR(n. In this representation, Ci is called the gain factor, SR(n and EK(n are properly assigned from the predefined signature and envelope sets, respectively. Examples are given to exhibit the implementation of SYMPES. It is shown that for the same compression ratio or better, SYMPES yields considerably better speech quality over the commercially available coders such as G.726 (ADPCM at 16 kbps and voice excited LPC-10E (FS1015 at 2.4 kbps.

  12. Hearing aid processing of loud speech and noise signals: Consequences for loudness perception and listening comfort

    DEFF Research Database (Denmark)

    Schmidt, Erik

    2007-01-01

    research -for example investigations of loudness perception in hearing impaired listeners. Most research has been focused on speech and sounds at medium input-levels (e.g., 60-65 dB SPL). It is well documented that for speech at conversational levels, hearing aid-users prefer the signal to be amplified......, such prescriptions are based mainly on logic, as there is limited evidence on what type of amplification is best for these input-levels. The focus of the PhD-project has been on hearing aid processing of loud speech and noise signals. Previous research, investigating the preferred listening levels for soft and loud......Hearing aid processing of loud speech and noise signals: Consequences for loudness perception and listening comfort. Sound processing in hearing aids is determined by the fitting rule. The fitting rule describes how the hearing aid should amplify speech and sounds in the surroundings...

  13. Phoneme Compression: processing of the speech signal and effects on speech intelligibility in hearing-Impaired listeners

    NARCIS (Netherlands)

    A. Goedegebure (Andre)

    2005-01-01

    textabstractHearing-aid users often continue to have problems with poor speech understanding in difficult acoustical conditions. Another generally accounted problem is that certain sounds become too loud whereas other sounds are still not audible. Dynamic range compression is a signal processing

  14. Speech Motor Development in Childhood Apraxia of Speech : Generating Testable Hypotheses by Neurocomputational Modeling

    NARCIS (Netherlands)

    Terband, H.; Maassen, B.

    2010-01-01

    Childhood apraxia of speech (CAS) is a highly controversial clinical entity, with respect to both clinical signs and underlying neuromotor deficit. In the current paper, we advocate a modeling approach in which a computational neural model of speech acquisition and production is utilized in order to

  15. Speech motor development in childhood apraxia of speech: generating testable hypotheses by neurocomputational modeling.

    NARCIS (Netherlands)

    Terband, H.R.; Maassen, B.A.M.

    2010-01-01

    Childhood apraxia of speech (CAS) is a highly controversial clinical entity, with respect to both clinical signs and underlying neuromotor deficit. In the current paper, we advocate a modeling approach in which a computational neural model of speech acquisition and production is utilized in order to

  16. Application of adaptive digital signal processing to speech enhancement for the hearing impaired.

    Science.gov (United States)

    Chabries, D M; Christiansen, R W; Brey, R H; Robinette, M S; Harris, R W

    1987-01-01

    A major complaint of individuals with normal hearing and hearing impairments is a reduced ability to understand speech in a noisy environment. This paper describes the concept of adaptive noise cancelling for removing noise from corrupted speech signals. Application of adaptive digital signal processing has long been known and is described from a historical as well as technical perspective. The Widrow-Hoff LMS (least mean square) algorithm developed in 1959 forms the introduction to modern adaptive signal processing. This method uses a "primary" input which consists of the desired speech signal corrupted with noise and a second "reference" signal which is used to estimate the primary noise signal. By subtracting the adaptively filtered estimate of the noise, the desired speech signal is obtained. Recent developments in the field as they relate to noise cancellation are described. These developments include more computationally efficient algorithms as well as algorithms that exhibit improved learning performance. A second method for removing noise from speech, for use when no independent reference for the noise exists, is referred to as single channel noise suppression. Both adaptive and spectral subtraction techniques have been applied to this problem--often with the result of decreased speech intelligibility. Current techniques applied to this problem are described, including signal processing techniques that offer promise in the noise suppression application.

  17. DPCM with Forward Gain-Adaptive Quantizer and Simple Switched Predictor for High Quality Speech Signals

    Directory of Open Access Journals (Sweden)

    VELIMIROVIC, L.

    2010-11-01

    Full Text Available In this article DPCM (Differential Pulse Code Modulation speech coding scheme with a simple switched first order predictor is presented. Adaptation of the quantizer to the signal variance is performed for each particular frame. Each frame is classified as high or low correlated, based on the value of the correlation coefficient, then the selection of the appropriate predictor coefficient and bitrate is performed. Low correlated frames are encoded with a higher bitrate, while high correlated frames are encoded with a lower bitrate without the objectionable loss in quality. Theoretical model and experimental results are provided for the proposed algorithm.

  18. Modeling auditory processing and speech perception in hearing-impaired listeners

    OpenAIRE

    Jepsen, Morten Løve; Pedersen, Michael Syskind; Dau, Torsten

    2010-01-01

    A better understanding of how the human auditory system represents and analyzes sounds and how hearing impairment affects such processing is of great interest for researchers in the fields of auditory neuroscience, audiology, and speech communication as well as for applications in hearing-instrument and speech technology. In this thesis, the primary focus was on the development and evaluation of a computational model of human auditory signal-processing and perception. The model was initially ...

  19. On the Perception of Speech Sounds as Biologically Significant Signals1,2

    Science.gov (United States)

    Pisoni, David B.

    2012-01-01

    This paper reviews some of the major evidence and arguments currently available to support the view that human speech perception may require the use of specialized neural mechanisms for perceptual analysis. Experiments using synthetically produced speech signals with adults are briefly summarized and extensions of these results to infants and other organisms are reviewed with an emphasis towards detailing those aspects of speech perception that may require some need for specialized species-specific processors. Finally, some comments on the role of early experience in perceptual development are provided as an attempt to identify promising areas of new research in speech perception. PMID:399200

  20. Two Methods of Automatic Evaluation of Speech Signal Enhancement Recorded in the Open-Air MRI Environment

    Science.gov (United States)

    Přibil, Jiří; Přibilová, Anna; Frollo, Ivan

    2017-12-01

    The paper focuses on two methods of evaluation of successfulness of speech signal enhancement recorded in the open-air magnetic resonance imager during phonation for the 3D human vocal tract modeling. The first approach enables to obtain a comparison based on statistical analysis by ANOVA and hypothesis tests. The second method is based on classification by Gaussian mixture models (GMM). The performed experiments have confirmed that the proposed ANOVA and GMM classifiers for automatic evaluation of the speech quality are functional and produce fully comparable results with the standard evaluation based on the listening test method.

  1. Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model

    Directory of Open Access Journals (Sweden)

    Lotter Thomas

    2005-01-01

    Full Text Available This contribution presents two spectral amplitude estimators for acoustical background noise suppression based on maximum a posteriori estimation and super-Gaussian statistical modelling of the speech DFT amplitudes. The probability density function of the speech spectral amplitude is modelled with a simple parametric function, which allows a high approximation accuracy for Laplace- or Gamma-distributed real and imaginary parts of the speech DFT coefficients. Also, the statistical model can be adapted to optimally fit the distribution of the speech spectral amplitudes for a specific noise reduction system. Based on the super-Gaussian statistical model, computationally efficient maximum a posteriori speech estimators are derived, which outperform the commonly applied Ephraim-Malah algorithm.

  2. Community-based model for speech therapy in Thailand: implementation.

    Science.gov (United States)

    Prathanee, Benjamas; Lorwatanapongsa, Preeya; Makarabhirom, Kalyanee; Suphawatjariyakul, Ratchanee; Thinnaithorn, Rattana; Thanwiratananich, Panida

    2010-10-01

    To establish a Community-Based Model for Speech Therapy in Thailand and to implement it. The development of a Community-Based Model for Speech Therapy was based on the principles of primary healthcare, community-based rehabilitation and institutional sharing. Workshops for speech and language pathologists (SLPs), including "Training for Trainers" and six "Smart Smile & Speech" workshops were held. We held 1) a workshop for training SLPs in how to manage speech and language problems in cleft lip and palate (CLP); 2) a workshop for training healthcare providers who are not speech and language pathologists (para-speech and language pathologists: para-SLPs) how to identify speech, language and hearing problems in CLP and undertake early intervention; and, 3) four speech camps for continuing education via life demonstration and practice. Standard guidelines were produced for SLPs to remedy speech and language disorders in children with CLP in Thailand and para-SLP manuals for speech and language intervention for CLP were developed. Para-SLPs will be better equipped to identify and then provide early intervention for individuals with CLP, as well as to refer children with CLP and complicated speech and language disorders to speech clinics for the further management. Percentage of agreement among SLP, audiologists and para-SLPs ranged 50-93.33 while the Kappa coefficients ranged -0.07 to 0.86. The Community-Based Model for Speech Therapy for Children with CLP was an appropriate approach for coming up with solutions for the lack of speech services for children with CLP in Thailand.

  3. Predicting speech intelligibility in adverse conditions: evaluation of the speech-based envelope power spectrum model

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2011-01-01

    conditions by comparing predictions to measured data from [Kjems et al. (2009). J. Acoust. Soc. Am. 126 (3), 1415-1426] where speech is mixed with four different interferers, including speech-shaped noise, bottle noise, car noise, and cafe noise. The model accounts well for the differences in intelligibility...

  4. Effects of manipulating the signal-to-noise envelope power ratio on speech intelligibility

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Decorsière, Remi Julien Blaise; Dau, Torsten

    2015-01-01

    was obtained when the noise was manipulated and mixed with the unprocessed speech, consistent with the hypothesis that SNRenv is indicative of speech intelligibility. However, discrepancies between data and predictions occurred for conditions where the speech was manipulated and the noise left untouched...... power spectrum model (sEPSM), the SNRenv was demonstrated to account for speech intelligibility data in various conditions with linearly and nonlinearly processed noisy speech, as well as for conditions with stationary and fluctuating interferers. Here, the relation between the SNRenv and speech...... intelligibility was investigated further by systematically varying the modulation power of either the speech or the noise before mixing the two components, while keeping the overall power ratio of the two components constant. A good correspondence between the data and the corresponding sEPSM predictions...

  5. Neural correlates of quality perception for complex speech signals

    CERN Document Server

    Antons, Jan-Niklas

    2015-01-01

    This book interconnects two essential disciplines to study the perception of speech: Neuroscience and Quality of Experience, which to date have rarely been used together for the purposes of research on speech quality perception. In five key experiments, the book demonstrates the application of standard clinical methods in neurophysiology on the one hand, and of methods used in fields of research concerned with speech quality perception on the other. Using this combination, the book shows that speech stimuli with different lengths and different quality impairments are accompanied by physiological reactions related to quality variations, e.g., a positive peak in an event-related potential. Furthermore, it demonstrates that – in most cases – quality impairment intensity has an impact on the intensity of physiological reactions.

  6. Compact Acoustic Models for Embedded Speech Recognition

    Directory of Open Access Journals (Sweden)

    Christophe Lévy

    2009-01-01

    Full Text Available Speech recognition applications are known to require a significant amount of resources. However, embedded speech recognition only authorizes few KB of memory, few MIPS, and small amount of training data. In order to fit the resource constraints of embedded applications, an approach based on a semicontinuous HMM system using state-independent acoustic modelling is proposed. A transformation is computed and applied to the global model in order to obtain each HMM state-dependent probability density functions, authorizing to store only the transformation parameters. This approach is evaluated on two tasks: digit and voice-command recognition. A fast adaptation technique of acoustic models is also proposed. In order to significantly reduce computational costs, the adaptation is performed only on the global model (using related speaker recognition adaptation techniques with no need for state-dependent data. The whole approach results in a relative gain of more than 20% compared to a basic HMM-based system fitting the constraints.

  7. Tracking the speech signal--time-locked MEG signals during perception of ultra-fast and moderately fast speech in blind and in sighted listeners.

    Science.gov (United States)

    Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

    2013-01-01

    Blind people can learn to understand speech at ultra-high syllable rates (ca. 20 syllables/s), a capability associated with hemodynamic activation of the central-visual system. To further elucidate the neural mechanisms underlying this skill, magnetoencephalographic (MEG) measurements during listening to sentence utterances were cross-correlated with time courses derived from the speech signal (envelope, syllable onsets and pitch periodicity) to capture phase-locked MEG components (14 blind, 12 sighted subjects; speech rate=8 or 16 syllables/s, pre-defined source regions: auditory and visual cortex, inferior frontal gyrus). Blind individuals showed stronger phase locking in auditory cortex than sighted controls, and right-hemisphere visual cortex activity correlated with syllable onsets in case of ultra-fast speech. Furthermore, inferior-frontal MEG components time-locked to pitch periodicity displayed opposite lateralization effects in sighted (towards right hemisphere) and blind subjects (left). Thus, ultra-fast speech comprehension in blind individuals appears associated with changes in early signal-related processing mechanisms both within and outside the central-auditory terrain. Copyright © 2012 Elsevier Inc. All rights reserved.

  8. Cuckoo search based optimal mask generation for noise suppression and enhancement of speech signal

    Directory of Open Access Journals (Sweden)

    Anil Garg

    2015-07-01

    Full Text Available In this paper, an effective noise suppression technique for enhancement of speech signals using optimized mask is proposed. Initially, the noisy speech signal is broken down into various time–frequency (TF units and the features are extracted by finding out the Amplitude Magnitude Spectrogram (AMS. The signals are then classified based on quality ratio into different classes to generate the initial set of solutions. Subsequently, the optimal mask for each class is generated based on Cuckoo search algorithm. Subsequently, in the waveform synthesis stage, filtered waveforms are windowed and then multiplied by the optimal mask value and summed up to get the enhanced target signal. The experimentation of the proposed technique was carried out using various datasets and the performance is compared with the previous techniques using SNR. The results obtained proved the effectiveness of the proposed technique and its ability to suppress noise and enhance the speech signal.

  9. Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2011-01-01

    A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data....... The model estimates the speech-to-noise envelope power ratio, SNR env, at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech...... process provides a key measure of speech intelligibility. © 2011 Acoustical Society of America....

  10. Didactic speech synthesizer – acoustic module, formants model

    OpenAIRE

    Teixeira, João Paulo; Fernandes, Anildo

    2013-01-01

    Text-to-speech synthesis is the main subject treated in this work. It will be presented the constitution of a generic text-to-speech system conversion, explained the functions of the various modules and described the development techniques using the formants model. The development of a didactic formant synthesiser under Matlab environment will also be described. This didactic synthesiser is intended for a didactic understanding of the formant model of speech production.

  11. An effective cluster-based model for robust speech detection and speech recognition in noisy environments.

    Science.gov (United States)

    Górriz, J M; Ramírez, J; Segura, J C; Puntonet, C G

    2006-07-01

    This paper shows an accurate speech detection algorithm for improving the performance of speech recognition systems working in noisy environments. The proposed method is based on a hard decision clustering approach where a set of prototypes is used to characterize the noisy channel. Detecting the presence of speech is enabled by a decision rule formulated in terms of an averaged distance between the observation vector and a cluster-based noise model. The algorithm benefits from using contextual information, a strategy that considers not only a single speech frame but also a neighborhood of data in order to smooth the decision function and improve speech detection robustness. The proposed scheme exhibits reduced computational cost making it adequate for real time applications, i.e., automated speech recognition systems. An exhaustive analysis is conducted on the AURORA 2 and AURORA 3 databases in order to assess the performance of the algorithm and to compare it to existing standard voice activity detection (VAD) methods. The results show significant improvements in detection accuracy and speech recognition rate over standard VADs such as ITU-T G.729, ETSI GSM AMR, and ETSI AFE for distributed speech recognition and a representative set of recently reported VAD algorithms.

  12. Statistical Model-Based Voice Activity Detection Using Spatial Cues and Log Energy for Dual-Channel Noisy Speech Recognition

    Science.gov (United States)

    Park, Ji Hun; Shin, Min Hwa; Kim, Hong Kook

    In this paper, a voice activity detection (VAD) method for dual-channel noisy speech recognition is proposed on the basis of statistical models constructed by spatial cues and log energy. In particular, spatial cues are composed of the interaural time differences and interaural level differences of dual-channel speech signals, and the statistical models for speech presence and absence are based on a Gaussian kernel density. In order to evaluate the performance of the proposed VAD method, speech recognition is performed using only speech signals segmented by the proposed VAD method. The performance of the proposed VAD method is then compared with those of conventional methods such as a signal-to-noise ratio variance based method and a phase vector based method. It is shown from the experiments that the proposed VAD method outperforms conventional methods, providing the relative word error rate reductions of 19.5% and 12.2%, respectively.

  13. Development of the speech test signal in Brazilian Portuguese for real-ear measurement.

    Science.gov (United States)

    Garolla, Luciana P; Scollie, Susan D; Martinelli Iório, Maria Cecília

    2013-08-01

    Recommended practice is to verify the gain and/or output of hearing aids with speech or speech-shaped signals. This study has the purpose of developing a speech test signal in Brazilian Portuguese that is electroacoustically similar to the international long-term average speech spectrum (ILTASS) for use in real ear verification systems. A Brazilian Portuguese speech passage was recorded using standardized equipment and procedures for one female talker and compared to ISTS. The passage consisted of simple, declarative sentences making a total of 148 words. The recordings of a Brazilian Portuguese passage were filtered to the ILTASS and compared to the International Speech Test Signal (ISTS). Aided recordings were made at three test levels, for three audiograms for the Brazilian Portuguese passage and the ISTS. The unaided test signals were spectrally matched to within 0.5 dB. Aided evaluation revealed that the Brazilian Portuguese passage produced aided spectra that were within 1 dB on average, within about 2 dB per audiogram, and within about 3 dB per frequency for 95% of fittings. Results indicate that the Brazilian Portuguese passage developed in this study provides similar electroacoustic hearing-aid evaluations to those expected from the standard ISTS passage.

  14. Multichannel infinite clipping as a form of sampling of speech signals

    International Nuclear Information System (INIS)

    Guidarelli, G.

    1985-01-01

    A remarkable improvement of both intelligibility and naturalness of infinitely clipped speech can be achieved by means of a multichannel system in which the speech signal is splitted into several band-pass channels before the clipping and successively reconstructed by summing up the clipped outputs of each channel. A possible explanation of such an improvement is given, founded on the so-called zero-based representation of band limited signals where the zero-crossings sequence is considered as a set of samples of the signal

  15. SECUREROBUST AND HYBRID WATERMARKING FOR SPEECH SIGNAL USING DISCRETE WAVELETTRANSFORM DISCRETE COSINE TRANSFORM ANDSINGULAR VALUE DECOMPOSITION

    Directory of Open Access Journals (Sweden)

    AMBIKA DORAISAMY

    2017-06-01

    Full Text Available A digital watermark is defined as inaudible data, permanently embedded in a speech file for authenticating the secret data. The main goal of this paper is to embed a watermark in the speech signal without any degradation. Here the hybrid watermarking is performed based on the three techniques such as Discrete Cosine Transform (DCT with Singular Value Decomposition (SVD and Discrete Wavelet Transform (DWT and it is optimized by performing the separation of speech and silent regions using a voice activity detection algorithm. The performances were evaluated based on Peak Signal to Noise Ratio (PSNR and Normalized Cross Correlation (NCC. The result shows that the optimization method performs better than the existing algorithm and it is robust against different kinds of attacks. It also shows that the algorithm is efficient in terms of robustness, security, and imperceptibility and also the watermarked signal is perceptually similar to the original audio signal.

  16. Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach

    Directory of Open Access Journals (Sweden)

    W. Bastiaan Kleijn

    2005-06-01

    Full Text Available Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features that are most relevant to the listener. The aim of this paper is to provide a tutorial on perceptual speech and audio coding using an invertible auditory model. In this approach, the audio signal is converted into an auditory representation using an invertible auditory model. The auditory representation is quantized and coded. Upon decoding, it is then transformed back into the acoustic domain. This transformation converts a complex distortion criterion into a simple one, thus facilitating quantization with low complexity. We briefly review past work on auditory models and describe in more detail the components of our invertible model and its inversion procedure, that is, the method to reconstruct the signal from the output of the auditory model. We summarize attempts to use the auditory representation for low-bit-rate coding. Our approach also allows the exploitation of the inherent redundancy of the human auditory system for the purpose of multiple description (joint source-channel coding.

  17. Role of neural network models for developing speech systems

    Indian Academy of Sciences (India)

    These prosody models are further examined for applications such as text to speech synthesis, speech recognition, speaker recognition and language identification. Neural network models in voice conversion system are explored for capturing the mapping functions between source and target speakers at source, system and ...

  18. Design of language models at various phases of Tamil speech ...

    African Journals Online (AJOL)

    This paper describes the use of language models in various phases of Tamil speech recognition system for improving its performance. In this work, the language models are applied at various levels of speech recognition such as segmentation phase, recognition phase and the syllable and word level error correction phase.

  19. Bridging Automatic Speech Recognition and Psycholinguistics: Extending Shortlist to an End-to-End Model of Human Speech Recognition

    NARCIS (Netherlands)

    Scharenborg, O.E.; Bosch, L.F.M. ten; Boves, L.W.J.; Norris, D.

    2003-01-01

    This letter evaluates potential benefits of combining human speech recognition (HSR) and automatic speech recognition by building a joint model of an automatic phone recognizer (APR) and a computational model of HSR, viz. Shortlist (Norris, 1994). Experiments based on 'real-life' speech highlight

  20. Speech Signal Analysis and Pattern Recognition in Diagnosis of Dysarthria.

    Science.gov (United States)

    Thoppil, Minu George; Kumar, C Santhosh; Kumar, Anand; Amose, John

    2017-01-01

    Dysarthria refers to a group of disorders resulting from disturbances in muscular control over the speech mechanism due to damage of central or peripheral nervous system. There is wide subjective variability in assessment of dysarthria between different clinicians. In our study, we tried to identify a pattern among types of dysarthria by acoustic analysis and to prevent intersubject variability. (1) Pattern recognition among types of dysarthria with software tool and to compare with normal subjects. (2) To assess the severity of dysarthria with software tool. Speech of seventy subjects were recorded, both normal subjects and the dysarthric patients who attended the outpatient department/admitted in AIMS. Speech waveforms were analyzed using Praat and MATHLAB toolkit. The pitch contour, formant variation, and speech duration of the extracted graphs were analyzed. Study population included 25 normal subjects and 45 dysarthric patients. Dysarthric subjects included 24 patients with extrapyramidal dysarthria, 14 cases of spastic dysarthria, and 7 cases of ataxic dysarthria. Analysis of pitch of the study population showed a specific pattern in each type. F0 jitter was found in spastic dysarthria, pitch break with ataxic dysarthria, and pitch monotonicity with extrapyramidal dysarthria. By pattern recognition, we identified 19 cases in which one or more recognized patterns coexisted. There was a significant correlation between the severity of dysarthria and formant range. Specific patterns were identified for types of dysarthria so that this software tool will help clinicians to identify the types of dysarthria in a better way and could prevent intersubject variability. We also assessed the severity of dysarthria by formant range. Mixed dysarthria can be more common than clinically expected.

  1. Speech Signal Analysis and Pattern Recognition in Diagnosis of Dysarthria

    Science.gov (United States)

    Thoppil, Minu George; Kumar, C. Santhosh; Kumar, Anand; Amose, John

    2017-01-01

    Background: Dysarthria refers to a group of disorders resulting from disturbances in muscular control over the speech mechanism due to damage of central or peripheral nervous system. There is wide subjective variability in assessment of dysarthria between different clinicians. In our study, we tried to identify a pattern among types of dysarthria by acoustic analysis and to prevent intersubject variability. Objectives: (1) Pattern recognition among types of dysarthria with software tool and to compare with normal subjects. (2) To assess the severity of dysarthria with software tool. Materials and Methods: Speech of seventy subjects were recorded, both normal subjects and the dysarthric patients who attended the outpatient department/admitted in AIMS. Speech waveforms were analyzed using Praat and MATHLAB toolkit. The pitch contour, formant variation, and speech duration of the extracted graphs were analyzed. Results: Study population included 25 normal subjects and 45 dysarthric patients. Dysarthric subjects included 24 patients with extrapyramidal dysarthria, 14 cases of spastic dysarthria, and 7 cases of ataxic dysarthria. Analysis of pitch of the study population showed a specific pattern in each type. F0 jitter was found in spastic dysarthria, pitch break with ataxic dysarthria, and pitch monotonicity with extrapyramidal dysarthria. By pattern recognition, we identified 19 cases in which one or more recognized patterns coexisted. There was a significant correlation between the severity of dysarthria and formant range. Conclusions: Specific patterns were identified for types of dysarthria so that this software tool will help clinicians to identify the types of dysarthria in a better way and could prevent intersubject variability. We also assessed the severity of dysarthria by formant range. Mixed dysarthria can be more common than clinically expected. PMID:29184336

  2. Speech Signal Analysis and Pattern Recognition in Diagnosis of Dysarthria

    OpenAIRE

    Thoppil, Minu George; Kumar, C. Santhosh; Kumar, Anand; Amose, John

    2017-01-01

    Background: Dysarthria refers to a group of disorders resulting from disturbances in muscular control over the speech mechanism due to damage of central or peripheral nervous system. There is wide subjective variability in assessment of dysarthria between different clinicians. In our study, we tried to identify a pattern among types of dysarthria by acoustic analysis and to prevent intersubject variability. Objectives: (1) Pattern recognition among types of dysarthria with software tool and t...

  3. The Effects of the Active Hypoxia to the Speech Signal Inharmonicity

    Directory of Open Access Journals (Sweden)

    Z. N. Milivojevic

    2014-06-01

    Full Text Available When the people are climbing on the mountain, they are exposed to decreased oxygen concentration in the tissue, which is commonly called the active hypoxia. This paper addressed the problem of an acute hypoxia that affects the speech signal at the altitude up to 2500 m. For the experiment, the speech signal database that contains the articulation of vowels was recorded at different alti¬tudes. This speech signal was processed by the originally developed algorithm, which extracted the fundamental frequency and the inharmonicity coefficient. Then, they were subjected to the analysis in order to derive the effects of the acute hypoxia. The results showed that the hypoxia level can be determined by the change of the inharmonicity coefficient. Accordingly, the degree of hypoxia can be estimated.

  4. Decoding Speech With Integrated Hybrid Signals Recorded From the Human Ventral Motor Cortex

    Directory of Open Access Journals (Sweden)

    Kenji Ibayashi

    2018-04-01

    Full Text Available Restoration of speech communication for locked-in patients by means of brain computer interfaces (BCIs is currently an important area of active research. Among the neural signals obtained from intracranial recordings, single/multi-unit activity (SUA/MUA, local field potential (LFP, and electrocorticography (ECoG are good candidates for an input signal for BCIs. However, the question of which signal or which combination of the three signal modalities is best suited for decoding speech production remains unverified. In order to record SUA, LFP, and ECoG simultaneously from a highly localized area of human ventral sensorimotor cortex (vSMC, we fabricated an electrode the size of which was 7 by 13 mm containing sparsely arranged microneedle and conventional macro contacts. We determined which signal modality is the most capable of decoding speech production, and tested if the combination of these signals could improve the decoding accuracy of spoken phonemes. Feature vectors were constructed from spike frequency obtained from SUAs and event-related spectral perturbation derived from ECoG and LFP signals, then input to the decoder. The results showed that the decoding accuracy for five spoken vowels was highest when features from multiple signals were combined and optimized for each subject, and reached 59% when averaged across all six subjects. This result suggests that multi-scale signals convey complementary information for speech articulation. The current study demonstrated that simultaneous recording of multi-scale neuronal activities could raise decoding accuracy even though the recording area is limited to a small portion of cortex, which is advantageous for future implementation of speech-assisting BCIs.

  5. Applications of sub-audible speech recognition based upon electromyographic signals

    Science.gov (United States)

    Jorgensen, C. Charles (Inventor); Betts, Bradley J. (Inventor)

    2009-01-01

    Method and system for generating electromyographic or sub-audible signals (''SAWPs'') and for transmitting and recognizing the SAWPs that represent the original words and/or phrases. The SAWPs may be generated in an environment that interferes excessively with normal speech or that requires stealth communications, and may be transmitted using encoded, enciphered or otherwise transformed signals that are less subject to signal distortion or degradation in the ambient environment.

  6. Modelling context in automatic speech recognition

    NARCIS (Netherlands)

    Wiggers, P.

    2008-01-01

    Speech is at the core of human communication. Speaking and listing comes so natural to us that we do not have to think about it at all. The underlying cognitive processes are very rapid and almost completely subconscious. It is hard, if not impossible not to understand speech. For computers on the

  7. Subspace-Based Noise Reduction for Speech Signals via Diagonal and Triangular Matrix Decompositions

    DEFF Research Database (Denmark)

    Hansen, Per Christian; Jensen, Søren Holdt

    2007-01-01

    We survey the definitions and use of rank-revealing matrix decompositions in single-channel noise reduction algorithms for speech signals. Our algorithms are based on the rank-reduction paradigm and, in particular, signal subspace techniques. The focus is on practical working algorithms, using both...... diagonal (eigenvalue and singular value) decompositions and rank-revealing triangular decompositions (ULV, URV, VSV, ULLV and ULLIV). In addition we show how the subspace-based algorithms can be evaluated and compared by means of simple FIR filter interpretations. The algorithms are illustrated...... with working Matlab code and applications in speech processing....

  8. Subspace-Based Noise Reduction for Speech Signals via Diagonal and Triangular Matrix Decompositions

    DEFF Research Database (Denmark)

    Hansen, Per Christian; Jensen, Søren Holdt

    We survey the definitions and use of rank-revealing matrix decompositions in single-channel noise reduction algorithms for speech signals. Our algorithms are based on the rank-reduction paradigm and, in particular, signal subspace techniques. The focus is on practical working algorithms, using both...... diagonal (eigenvalue and singular value) decompositions and rank-revealing triangular decompositions (ULV, URV, VSV, ULLV and ULLIV). In addition we show how the subspace-based algorithms can be evaluated and compared by means of simple FIR filter interpretations. The algorithms are illustrated...... with working Matlab code and applications in speech processing....

  9. Predicting the effect of spectral subtraction on the speech recognition threshold based on the signal-to-noise ratio in the envelope domain

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2011-01-01

    . The SRT was measured in five normal-hearing listeners in six conditions of spectral subtraction. The results showed an increase of the SRT after processing, i.e. a decreased speech intelligibility, in contrast to what is predicted by the Speech Transmission Index (STI). Here, another approach is proposed......, denoted the speech-based envelope power spectrum model (sEPSM) which predicts the intelligibility based on the signal-to-noise ratio in the envelope domain. In contrast to the STI, the sEPSM is sensitive to the increased amount of the noise envelope power as a consequence of the spectral subtraction...

  10. Extension of ITU-T Recommendation P.862 PESQ towards Measuring Speech Intelligibility with Vocoders

    National Research Council Canada - National Science Library

    Beerends, John G; van Wijngaarden, Sander; van Buuren, Ronald

    2005-01-01

    ... signal through a psycho-acoustic model and a model of human quality comparison (cognitive model). Within NATO, testing of low bit rate speech codecs is more focused on speech intelligibility than on speech quality...

  11. Signal Processing Methods for Removing the Effects of Whole Body Vibration upon Speech

    Science.gov (United States)

    Bitner, Rachel M.; Begault, Durand R.

    2014-01-01

    Humans may be exposed to whole-body vibration in environments where clear speech communications are crucial, particularly during the launch phases of space flight and in high-performance aircraft. Prior research has shown that high levels of vibration cause a decrease in speech intelligibility. However, the effects of whole-body vibration upon speech are not well understood, and no attempt has been made to restore speech distorted by whole-body vibration. In this paper, a model for speech under whole-body vibration is proposed and a method to remove its effect is described. The method described reduces the perceptual effects of vibration, yields higher ASR accuracy scores, and may significantly improve intelligibility. Possible applications include incorporation within communication systems to improve radio-communication systems in environments such a spaceflight, aviation, or off-road vehicle operations.

  12. Feature Compensation Employing Multiple Environmental Models for Robust In-Vehicle Speech Recognition

    Science.gov (United States)

    Kim, Wooil; Hansen, John H. L.

    An effective feature compensation method is developed for reliable speech recognition in real-life in-vehicle environments. The CU-Move corpus, used for evaluation, contains a range of speech and noise signals collected for a number of speakers under actual driving conditions. PCGMM-based feature compensation, considered in this paper, utilizes parallel model combination to generate noise-corrupted speech model by combining clean speech and the noise model. In order to address unknown time-varying background noise, an interpolation method of multiple environmental models is employed. To alleviate computational expenses due to multiple models, an Environment Transition Model is employed, which is motivated from Noise Language Model used in Environmental Sniffing. An environment dependent scheme of mixture sharing technique is proposed and shown to be more effective in reducing the computational complexity. A smaller environmental model set is determined by the environment transition model for mixture sharing. The proposed scheme is evaluated on the connected single digits portion of the CU-Move database using the Aurora2 evaluation toolkit. Experimental results indicate that our feature compensation method is effective for improving speech recognition in real-life in-vehicle conditions. A reduction of 73.10% of the computational requirements was obtained by employing the environment dependent mixture sharing scheme with only a slight change in recognition performance. This demonstrates that the proposed method is effective in maintaining the distinctive characteristics among the different environmental models, even when selecting a large number of Gaussian components for mixture sharing.

  13. Open Quotient Measurements Based on Multiscale Product of Speech Signal Wavelet Transform

    Directory of Open Access Journals (Sweden)

    Aïcha Bouzid

    2007-01-01

    Full Text Available This paper describes a multiscale product method (MPM for open quotient measure in voiced speech. The method is based on determining the glottal closing and opening instants. The proposed approach consists of making the products of wavelet transform of speech signal at different scales in order to enhance the edge detection and parameter estimation. We show that the proposed method is effective and robust for detecting speech singularity. Accurate estimation of glottal closing instants (GCIs and opening instants (GOIs is important in a wide range of speech processing tasks. In this paper, accurate estimation of GCIs and GOIs is used to measure the local open quotient (Oq which is the ratio of the open time by the pitch period. Multiscale product operates automatically on speech signal; the reference electroglottogram (EGG signal is used for performance evaluation. The ratio of good GCI detection is 95.5% and that of GOI is 76%. The pitch period relative error is 2.6% and the open phase relative error is 5.6%. The relative error measured on open quotient reaches 3% for the whole Keele database.

  14. ADAPTIVE LEARNING OF HIDDEN MARKOV MODELS FOR EMOTIONAL SPEECH

    Directory of Open Access Journals (Sweden)

    A. V. Tkachenia

    2014-01-01

    Full Text Available An on-line unsupervised algorithm for estimating the hidden Markov models (HMM parame-ters is presented. The problem of hidden Markov models adaptation to emotional speech is solved. To increase the reliability of estimated HMM parameters, a mechanism of forgetting and updating is proposed. A functional block diagram of the hidden Markov models adaptation algorithm is also provided with obtained results, which improve the efficiency of emotional speech recognition.

  15. Acceptance Noise Level: Effects of the Speech Signal, Babble, and Listener Language

    Science.gov (United States)

    Shi, Lu-Feng; Azcona, Gabrielly; Buten, Lupe

    2015-01-01

    Purpose: The acceptable noise level (ANL) measure has gained much research/clinical interest in recent years. The present study examined how the characteristics of the speech signal and the babble used in the measure may affect the ANL in listeners with different native languages. Method: Fifteen English monolingual, 16 Russian-English bilingual,…

  16. Spotting social signals in conversational speech over IP : A deep learning perspective

    NARCIS (Netherlands)

    Brueckner, Raymond; Schmitt, Maximilian; Pantic, Maja; Schuller, Björn

    2017-01-01

    The automatic detection and classification of social signals is an important task, given the fundamental role nonverbal behavioral cues play in human communication. We present the first cross-lingual study on the detection of laughter and fillers in conversational and spontaneous speech collected

  17. Feature Fusion Algorithm for Multimodal Emotion Recognition from Speech and Facial Expression Signal

    Directory of Open Access Journals (Sweden)

    Han Zhiyan

    2016-01-01

    Full Text Available In order to overcome the limitation of single mode emotion recognition. This paper describes a novel multimodal emotion recognition algorithm, and takes speech signal and facial expression signal as the research subjects. First, fuse the speech signal feature and facial expression signal feature, get sample sets by putting back sampling, and then get classifiers by BP neural network (BPNN. Second, measure the difference between two classifiers by double error difference selection strategy. Finally, get the final recognition result by the majority voting rule. Experiments show the method improves the accuracy of emotion recognition by giving full play to the advantages of decision level fusion and feature level fusion, and makes the whole fusion process close to human emotion recognition more, with a recognition rate 90.4%.

  18. Inferring imagined speech using EEG signals: a new approach using Riemannian manifold features

    Science.gov (United States)

    Nguyen, Chuong H.; Karavas, George K.; Artemiadis, Panagiotis

    2018-02-01

    Objective. In this paper, we investigate the suitability of imagined speech for brain-computer interface (BCI) applications. Approach. A novel method based on covariance matrix descriptors, which lie in Riemannian manifold, and the relevance vector machines classifier is proposed. The method is applied on electroencephalographic (EEG) signals and tested in multiple subjects. Main results. The method is shown to outperform other approaches in the field with respect to accuracy and robustness. The algorithm is validated on various categories of speech, such as imagined pronunciation of vowels, short words and long words. The classification accuracy of our methodology is in all cases significantly above chance level, reaching a maximum of 70% for cases where we classify three words and 95% for cases of two words. Significance. The results reveal certain aspects that may affect the success of speech imagery classification from EEG signals, such as sound, meaning and word complexity. This can potentially extend the capability of utilizing speech imagery in future BCI applications. The dataset of speech imagery collected from total 15 subjects is also published.

  19. Improved single-channel speech separation using sinusoidal modeling

    DEFF Research Database (Denmark)

    Mowlaee, Pejman; Christensen, Mads Græsbøll; Jensen, Søren Holdt

    2010-01-01

    ) and Wiener filter (softmask) approaches, the proposed approach works independently of pitch estimates. Furthermore, it is observed that it can achieve acceptable perceptual speech quality with less cross-talk at different signal-tosignal ratios while bringing down the complexity by replacing STFT...

  20. The Effects of Noise on Speech Recognition in Cochlear Implant Subjects: Predictions and Analysis Using Acoustic Models

    Directory of Open Access Journals (Sweden)

    Leslie M. Collins

    2005-11-01

    Full Text Available Cochlear implants can provide partial restoration of hearing, even with limited spectral resolution and loss of fine temporal structure, to severely deafened individuals. Studies have indicated that background noise has significant deleterious effects on the speech recognition performance of cochlear implant patients. This study investigates the effects of noise on speech recognition using acoustic models of two cochlear implant speech processors and several predictive signal-processing-based analyses. The results of a listening test for vowel and consonant recognition in noise are presented and analyzed using the rate of phonemic feature transmission for each acoustic model. Three methods for predicting patterns of consonant and vowel confusion that are based on signal processing techniques calculating a quantitative difference between speech tokens are developed and tested using the listening test results. Results of the listening test and confusion predictions are discussed in terms of comparisons between acoustic models and confusion prediction performance.

  1. An interactive model of auditory-motor speech perception.

    Science.gov (United States)

    Liebenthal, Einat; Möttönen, Riikka

    2017-12-18

    Mounting evidence indicates a role in perceptual decoding of speech for the dorsal auditory stream connecting between temporal auditory and frontal-parietal articulatory areas. The activation time course in auditory, somatosensory and motor regions during speech processing is seldom taken into account in models of speech perception. We critically review the literature with a focus on temporal information, and contrast between three alternative models of auditory-motor speech processing: parallel, hierarchical, and interactive. We argue that electrophysiological and transcranial magnetic stimulation studies support the interactive model. The findings reveal that auditory and somatomotor areas are engaged almost simultaneously, before 100 ms. There is also evidence of early interactions between auditory and motor areas. We propose a new interactive model of auditory-motor speech perception in which auditory and articulatory somatomotor areas are connected from early stages of speech processing. We also discuss how attention and other factors can affect the timing and strength of auditory-motor interactions and propose directions for future research. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. A speech processing study using an acoustic model of a multiple-channel cochlear implant

    Science.gov (United States)

    Xu, Ying

    1998-10-01

    A cochlear implant is an electronic device designed to provide sound information for adults and children who have bilateral profound hearing loss. The task of representing speech signals as electrical stimuli is central to the design and performance of cochlear implants. Studies have shown that the current speech- processing strategies provide significant benefits to cochlear implant users. However, the evaluation and development of speech-processing strategies have been complicated by hardware limitations and large variability in user performance. To alleviate these problems, an acoustic model of a cochlear implant with the SPEAK strategy is implemented in this study, in which a set of acoustic stimuli whose psychophysical characteristics are as close as possible to those produced by a cochlear implant are presented on normal-hearing subjects. To test the effectiveness and feasibility of this acoustic model, a psychophysical experiment was conducted to match the performance of a normal-hearing listener using model- processed signals to that of a cochlear implant user. Good agreement was found between an implanted patient and an age-matched normal-hearing subject in a dynamic signal discrimination experiment, indicating that this acoustic model is a reasonably good approximation of a cochlear implant with the SPEAK strategy. The acoustic model was then used to examine the potential of the SPEAK strategy in terms of its temporal and frequency encoding of speech. It was hypothesized that better temporal and frequency encoding of speech can be accomplished by higher stimulation rates and a larger number of activated channels. Vowel and consonant recognition tests were conducted on normal-hearing subjects using speech tokens processed by the acoustic model, with different combinations of stimulation rate and number of activated channels. The results showed that vowel recognition was best at 600 pps and 8 activated channels, but further increases in stimulation rate and

  3. Predicting binaural speech intelligibility using the signal-to-noise ratio in the envelope power spectrum domain

    DEFF Research Database (Denmark)

    Chabot-Leclerc, Alexandre; MacDonald, Ewen; Dau, Torsten

    2016-01-01

    . The model was validated against three data sets from the literature, which covered the following effects: the number of maskers, the masker types [speech-shaped noise (SSN), speech-modulated SSN, babble, and reversed speech], the masker(s) azimuths, reverberation on the target and masker, and the interaural...

  4. Nonlinear adaptive models for speech enhancement algorithms

    Czech Academy of Sciences Publication Activity Database

    Koula, Ivan; Zezula, R.

    2007-01-01

    Roč. 42, č. 1 (2007), s. 138-145 ISSN 1738-6438 R&D Projects: GA ČR(CZ) GA102/06/1233; GA AV ČR(CZ) 1ET301710509 Institutional research plan: CEZ:AV0Z20670512 Keywords : noise measurement * speech enhancement * neural nets Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering

  5. Speech perception of sine-wave signals by children with cochlear implants.

    Science.gov (United States)

    Nittrouer, Susan; Kuess, Jamie; Lowenstein, Joanna H

    2015-05-01

    Children need to discover linguistically meaningful structures in the acoustic speech signal. Being attentive to recurring, time-varying formant patterns helps in that process. However, that kind of acoustic structure may not be available to children with cochlear implants (CIs), thus hindering development. The major goal of this study was to examine whether children with CIs are as sensitive to time-varying formant structure as children with normal hearing (NH) by asking them to recognize sine-wave speech. The same materials were presented as speech in noise, as well, to evaluate whether any group differences might simply reflect general perceptual deficits on the part of children with CIs. Vocabulary knowledge, phonemic awareness, and "top-down" language effects were all also assessed. Finally, treatment factors were examined as possible predictors of outcomes. Results showed that children with CIs were as accurate as children with NH at recognizing sine-wave speech, but poorer at recognizing speech in noise. Phonemic awareness was related to that recognition. Top-down effects were similar across groups. Having had a period of bimodal stimulation near the time of receiving a first CI facilitated these effects. Results suggest that children with CIs have access to the important time-varying structure of vocal-tract formants.

  6. Sub-Audible Speech Recognition Based upon Electromyographic Signals

    Science.gov (United States)

    Jorgensen, Charles C. (Inventor); Lee, Diana D. (Inventor); Agabon, Shane T. (Inventor)

    2012-01-01

    Method and system for processing and identifying a sub-audible signal formed by a source of sub-audible sounds. Sequences of samples of sub-audible sound patterns ("SASPs") for known words/phrases in a selected database are received for overlapping time intervals, and Signal Processing Transforms ("SPTs") are formed for each sample, as part of a matrix of entry values. The matrix is decomposed into contiguous, non-overlapping two-dimensional cells of entries, and neural net analysis is applied to estimate reference sets of weight coefficients that provide sums with optimal matches to reference sets of values. The reference sets of weight coefficients are used to determine a correspondence between a new (unknown) word/phrase and a word/phrase in the database.

  7. Bridging automatic speech recognition and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition (L)

    Science.gov (United States)

    Scharenborg, Odette; ten Bosch, Louis; Boves, Lou; Norris, Dennis

    2003-12-01

    This letter evaluates potential benefits of combining human speech recognition (HSR) and automatic speech recognition by building a joint model of an automatic phone recognizer (APR) and a computational model of HSR, viz., Shortlist [Norris, Cognition 52, 189-234 (1994)]. Experiments based on ``real-life'' speech highlight critical limitations posed by some of the simplifying assumptions made in models of human speech recognition. These limitations could be overcome by avoiding hard phone decisions at the output side of the APR, and by using a match between the input and the internal lexicon that flexibly copes with deviations from canonical phonemic representations.

  8. The application of bionic wavelet transform to speech signal processing in cochlear implants using neural network simulations.

    Science.gov (United States)

    Yao, Jun; Zhang, Yuan-Ting

    2002-11-01

    Cochlear implants (CIs) restore partial hearing to people with severe to profound sensorineural deafness; but there is still a marked performance gap in speech recognition between those who have received cochlear implant and people with a normal hearing capability. One of the factors that may lead to this performance gap is the inadequate signal processing method used in CIs. This paper investigates the application of an improved signal-processing method called bionic wavelet transform (BWT). This method is based upon the auditory model and allows for signal processing. Comparing the neural network simulations on the same experimental materials processed by wavelet transform (WT) and BWT, the application of BWT to speech signal processing in CI has a number of advantages, including: improvement in recognition rates for both consonants and vowels, reduction of the number of required channels, reduction of the average stimulation duration for words, and high noise tolerance. Consonant recognition results in 15 normal hearing subjects show that the BWT produces significantly better performance than the WT (t = -4.36276, p = 0.00065). The BWT has great potential to reduce the performance gap between CI listeners and people with a normal hearing capability in the future.

  9. Audio visual speech source separation via improved context dependent association model

    Science.gov (United States)

    Kazemi, Alireza; Boostani, Reza; Sobhanmanesh, Fariborz

    2014-12-01

    In this paper, we exploit the non-linear relation between a speech source and its associated lip video as a source of extra information to propose an improved audio-visual speech source separation (AVSS) algorithm. The audio-visual association is modeled using a neural associator which estimates the visual lip parameters from a temporal context of acoustic observation frames. We define an objective function based on mean square error (MSE) measure between estimated and target visual parameters. This function is minimized for estimation of the de-mixing vector/filters to separate the relevant source from linear instantaneous or time-domain convolutive mixtures. We have also proposed a hybrid criterion which uses AV coherency together with kurtosis as a non-Gaussianity measure. Experimental results are presented and compared in terms of visually relevant speech detection accuracy and output signal-to-interference ratio (SIR) of source separation. The suggested audio-visual model significantly improves relevant speech classification accuracy compared to existing GMM-based model and the proposed AVSS algorithm improves the speech separation quality compared to reference ICA- and AVSS-based methods.

  10. Collaboration and abstract representations: towards predictive models based on raw speech and eye-tracking data

    OpenAIRE

    Nüssli, Marc-Antoine; Jermann, Patrick; Sangin, Mirweis; Dillenbourg, Pierre

    2009-01-01

    This study aims to explore the possibility of using machine learning techniques to build predictive models of performance in collaborative induction tasks. More specifically, we explored how signal-level data, like eye-gaze data and raw speech may be used to build such models. The results show that such low level features have effectively some potential to predict performance in such tasks. Implications for future applications design are shortly discussed.

  11. Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Hogden, J.

    1996-11-05

    The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation may decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.

  12. Neural Spike-Train Analyses of the Speech-Based Envelope Power Spectrum Model

    Directory of Open Access Journals (Sweden)

    Varsha H. Rallapalli

    2016-10-01

    Full Text Available Diagnosing and treating hearing impairment is challenging because people with similar degrees of sensorineural hearing loss (SNHL often have different speech-recognition abilities. The speech-based envelope power spectrum model (sEPSM has demonstrated that the signal-to-noise ratio (SNRENV from a modulation filter bank provides a robust speech-intelligibility measure across a wider range of degraded conditions than many long-standing models. In the sEPSM, noise (N is assumed to: (a reduce S + N envelope power by filling in dips within clean speech (S and (b introduce an envelope noise floor from intrinsic fluctuations in the noise itself. While the promise of SNRENV has been demonstrated for normal-hearing listeners, it has not been thoroughly extended to hearing-impaired listeners because of limited physiological knowledge of how SNHL affects speech-in-noise envelope coding relative to noise alone. Here, envelope coding to speech-in-noise stimuli was quantified from auditory-nerve model spike trains using shuffled correlograms, which were analyzed in the modulation-frequency domain to compute modulation-band estimates of neural SNRENV. Preliminary spike-train analyses show strong similarities to the sEPSM, demonstrating feasibility of neural SNRENV computations. Results suggest that individual differences can occur based on differential degrees of outer- and inner-hair-cell dysfunction in listeners currently diagnosed into the single audiological SNHL category. The predicted acoustic-SNR dependence in individual differences suggests that the SNR-dependent rate of susceptibility could be an important metric in diagnosing individual differences. Future measurements of the neural SNRENV in animal studies with various forms of SNHL will provide valuable insight for understanding individual differences in speech-in-noise intelligibility.

  13. Enhancing Speech Recognition Using Improved Particle Swarm Optimization Based Hidden Markov Model

    Directory of Open Access Journals (Sweden)

    Lokesh Selvaraj

    2014-01-01

    Full Text Available Enhancing speech recognition is the primary intention of this work. In this paper a novel speech recognition method based on vector quantization and improved particle swarm optimization (IPSO is suggested. The suggested methodology contains four stages, namely, (i denoising, (ii feature mining (iii, vector quantization, and (iv IPSO based hidden Markov model (HMM technique (IP-HMM. At first, the speech signals are denoised using median filter. Next, characteristics such as peak, pitch spectrum, Mel frequency Cepstral coefficients (MFCC, mean, standard deviation, and minimum and maximum of the signal are extorted from the denoised signal. Following that, to accomplish the training process, the extracted characteristics are given to genetic algorithm based codebook generation in vector quantization. The initial populations are created by selecting random code vectors from the training set for the codebooks for the genetic algorithm process and IP-HMM helps in doing the recognition. At this point the creativeness will be done in terms of one of the genetic operation crossovers. The proposed speech recognition technique offers 97.14% accuracy.

  14. Multiscale Signal Analysis and Modeling

    CERN Document Server

    Zayed, Ahmed

    2013-01-01

    Multiscale Signal Analysis and Modeling presents recent advances in multiscale analysis and modeling using wavelets and other systems. This book also presents applications in digital signal processing using sampling theory and techniques from various function spaces, filter design, feature extraction and classification, signal and image representation/transmission, coding, nonparametric statistical signal processing, and statistical learning theory. This book also: Discusses recently developed signal modeling techniques, such as the multiscale method for complex time series modeling, multiscale positive density estimations, Bayesian Shrinkage Strategies, and algorithms for data adaptive statistics Introduces new sampling algorithms for multidimensional signal processing Provides comprehensive coverage of wavelets with presentations on waveform design and modeling, wavelet analysis of ECG signals and wavelet filters Reviews features extraction and classification algorithms for multiscale signal and image proce...

  15. Identification of Nonlinear Controls in a Developmental Model of Motor Speech in Hearing and Deaf Children.

    Science.gov (United States)

    Waldron, Manjula B.

    1982-01-01

    A quantitative model of speech development is proposed based on observations of normal hearing and congenitally deaf children. Nonlinear controls used during the development of suprasegmental and segmental aspects of speech are identified. Linguistic components of speech are ignored. The importance of the associative cortex in speech-motor control…

  16. Direct classification of all American English phonemes using signals from functional speech motor cortex

    Science.gov (United States)

    Mugler, Emily M.; Patton, James L.; Flint, Robert D.; Wright, Zachary A.; Schuele, Stephan U.; Rosenow, Joshua; Shih, Jerry J.; Krusienski, Dean J.; Slutzky, Marc W.

    2014-06-01

    Objective. Although brain-computer interfaces (BCIs) can be used in several different ways to restore communication, communicative BCI has not approached the rate or efficiency of natural human speech. Electrocorticography (ECoG) has precise spatiotemporal resolution that enables recording of brain activity distributed over a wide area of cortex, such as during speech production. In this study, we sought to decode elements of speech production using ECoG. Approach. We investigated words that contain the entire set of phonemes in the general American accent using ECoG with four subjects. Using a linear classifier, we evaluated the degree to which individual phonemes within each word could be correctly identified from cortical signal. Main results. We classified phonemes with up to 36% accuracy when classifying all phonemes and up to 63% accuracy for a single phoneme. Further, misclassified phonemes follow articulation organization described in phonology literature, aiding classification of whole words. Precise temporal alignment to phoneme onset was crucial for classification success. Significance. We identified specific spatiotemporal features that aid classification, which could guide future applications. Word identification was equivalent to information transfer rates as high as 3.0 bits s-1 (33.6 words min-1), supporting pursuit of speech articulation for BCI control.

  17. Neural Entrainment to Speech Modulates Speech Intelligibility

    OpenAIRE

    Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne

    2018-01-01

    Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and acoustic speech signal, listening task, and speech intelligibility have been observed repeatedly. However, a methodological bottleneck has prevented so far clarifying whether speech-brain entrainme...

  18. Segment-Based Acoustic Models for Continuous Speech Recognition

    Science.gov (United States)

    1993-04-05

    particular realization of the general model expressed in Equation (2). Such a mixture can 7. Kubala , F., Austin, S., Barry, C., Makhoul, J. Place- combine...Through Reevaluation of N-Best Sentence Hypotheses," Proc. DARPA Speech and Natural Language Workshop, pp. 83-87, February 1991. [14] F. Kubala , S. Austin

  19. Shortlist B: A Bayesian model of continuous speech recognition

    NARCIS (Netherlands)

    Norris, D.; McQueen, J.M.

    2008-01-01

    A Bayesian model of continuous speech recognition is presented. It is based on Shortlist (D. Norris, 1994; D. Norris, J. M. McQueen, A. Cutler, & S. Butterfield, 1997) and shares many of its key assumptions: parallel competitive evaluation of multiple lexical hypotheses, phonologically abstract

  20. Shortlist B: A Bayesian Model of Continuous Speech Recognition

    Science.gov (United States)

    Norris, Dennis; McQueen, James M.

    2008-01-01

    A Bayesian model of continuous speech recognition is presented. It is based on Shortlist (D. Norris, 1994; D. Norris, J. M. McQueen, A. Cutler, & S. Butterfield, 1997) and shares many of its key assumptions: parallel competitive evaluation of multiple lexical hypotheses, phonologically abstract prelexical and lexical representations, a feedforward…

  1. A Block-Based Linear MMSE Noise Reduction with a High Temporal Resolution Modeling of the Speech Excitation

    DEFF Research Database (Denmark)

    Li, Chunjian; Andersen, S. V.

    2005-01-01

    A comprehensive linear minimum mean squared error (LMMSE) approach for parametric speech enhancement is developed. The proposed algorithms aim at joint LMMSE estimation of signal power spectra and phase spectra, as well as exploitation of correlation between spectral components. The major cause...... of this interfrequency correlation is shown to be the prominent temporal power localization in the excitation of voiced speech. LMMSE estimators in time domain and frequency domain are first formulated. To obtain the joint estimator, we model the spectral signal covariance matrix as a full covariancematrix instead...

  2. Syntheses by rules of the speech signal in its amplitude-time representation - melody study - phonetic, translation program

    International Nuclear Information System (INIS)

    Santamarina, Carole

    1975-01-01

    The present paper deals with the real-time speech synthesis implemented on a minicomputer. A first program translates the orthographic text into a string of phonetic codes, which is then processed by the synthesis program itself. The method used, a synthesis by rules, directly computes the speech signal in its amplitude-time representation. Emphasis has been put on special cases (diphthongs, 'e muet', consonant-consonant transition) and the implementation of the rhythm and of the melody. (author) [fr

  3. Words in Puddles of Sound: Modelling Psycholinguistic Effects in Speech Segmentation

    Science.gov (United States)

    Monaghan, Padraic; Christiansen, Morten H.

    2010-01-01

    There are numerous models of how speech segmentation may proceed in infants acquiring their first language. We present a framework for considering the relative merits and limitations of these various approaches. We then present a model of speech segmentation that aims to reveal important sources of information for speech segmentation, and to…

  4. Prediction of speech masking release for fluctuating interferers based on the envelope power signal-to-noise ratio

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2012-01-01

    EPSM are compared to data from Kjems et al. [(2009). J. Acoust. Soc. Am. 126 (3), 1415-1426] where speech is mixed with four different interferers, including speech-shaped noise, bottle noise, car noise, and a highly non-stationary cafe noise. The model accounts well for the differences in intelligibility observed...

  5. Specific acoustic models for spontaneous and dictated style in indonesian speech recognition

    Science.gov (United States)

    Vista, C. B.; Satriawan, C. H.; Lestari, D. P.; Widyantoro, D. H.

    2018-03-01

    The performance of an automatic speech recognition system is affected by differences in speech style between the data the model is originally trained upon and incoming speech to be recognized. In this paper, the usage of GMM-HMM acoustic models for specific speech styles is investigated. We develop two systems for the experiments; the first employs a speech style classifier to predict the speech style of incoming speech, either spontaneous or dictated, then decodes this speech using an acoustic model specifically trained for that speech style. The second system uses both acoustic models to recognise incoming speech and decides upon a final result by calculating a confidence score of decoding. Results show that training specific acoustic models for spontaneous and dictated speech styles confers a slight recognition advantage as compared to a baseline model trained on a mixture of spontaneous and dictated training data. In addition, the speech style classifier approach of the first system produced slightly more accurate results than the confidence scoring employed in the second system.

  6. The Combined Effect of Signal Strength and Background Traffic Load on Speech Quality in IEEE 802.11 WLAN

    Directory of Open Access Journals (Sweden)

    P. Pocta

    2011-04-01

    Full Text Available This paper deals with measurements of the combined effect of signal strength and background traffic load on speech quality in IEEE 802.11 WLAN. The ITU-T G.729AB encoding scheme is deployed in this study and the Distributed Internet Traffic Generator (D-ITG is used for the purpose of background traffic generation. The speech quality and background traffic load are assessed by means of the accomplished PESQ algorithm and Wireshark network analyzer, respectively. The results show that background traffic load has a bit higher impact on speech quality than signal strength when both effects are available together. Moreover, background traffic load also partially masks the impact of signal strength. The reasons for those findings are particularly discussed. The results also suggest some implications for designers of wireless networks providing VoIP service.

  7. Shortlist B: A Bayesian model of continuous speech recognition

    OpenAIRE

    Norris, D.; McQueen, J.

    2008-01-01

    A Bayesian model of continuous speech recognition is presented. It is based on Shortlist ( D. Norris, 1994; D. Norris, J. M. McQueen, A. Cutler, & S. Butterfield, 1997) and shares many of its key assumptions: parallel competitive evaluation of multiple lexical hypotheses, phonologically abstract prelexical and lexical representations, a feedforward architecture with no online feedback, and a lexical segmentation algorithm based on the viability of chunks of the input as possible words. Shortl...

  8. Speech Compression

    Directory of Open Access Journals (Sweden)

    Jerry D. Gibson

    2016-06-01

    Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.

  9. The impact of compression of speech signal, background noise and acoustic disturbances on the effectiveness of speaker identification

    Science.gov (United States)

    Kamiński, K.; Dobrowolski, A. P.

    2017-04-01

    The paper presents the architecture and the results of optimization of selected elements of the Automatic Speaker Recognition (ASR) system that uses Gaussian Mixture Models (GMM) in the classification process. Optimization was performed on the process of selection of individual characteristics using the genetic algorithm and the parameters of Gaussian distributions used to describe individual voices. The system that was developed was tested in order to evaluate the impact of different compression methods used, among others, in landline, mobile, and VoIP telephony systems, on effectiveness of the speaker identification. Also, the results were presented of effectiveness of speaker identification at specific levels of noise with the speech signal and occurrence of other disturbances that could appear during phone calls, which made it possible to specify the spectrum of applications of the presented ASR system.

  10. A multi-resolution envelope-power based model for speech intelligibility

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Ewert, Stephan D.; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model (sEPSM) presented by Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] estimates the envelope power signal-to-noise ratio (SNRenv) after modulation-frequency selective processing. Changes in this metric were shown to account well...... to conditions with stationary interferers, due to the long-term integration of the envelope power, and cannot account for increased intelligibility typically obtained with fluctuating maskers. Here, a multi-resolution version of the sEPSM is presented where the SNRenv is estimated in temporal segments...... with a modulation-filter dependent duration. The multi-resolution sEPSM is demonstrated to account for intelligibility obtained in conditions with stationary and fluctuating interferers, and noisy speech distorted by reverberation or spectral subtraction. The results support the hypothesis that the SNRenv...

  11. Subspace-Based Noise Reduction for Speech Signals via Diagonal and Triangular Matrix Decompositions: Survey and Analysis

    Directory of Open Access Journals (Sweden)

    Søren Holdt Jensen

    2007-01-01

    Full Text Available We survey the definitions and use of rank-revealing matrix decompositions in single-channel noise reduction algorithms for speech signals. Our algorithms are based on the rank-reduction paradigm and, in particular, signal subspace techniques. The focus is on practical working algorithms, using both diagonal (eigenvalue and singular value decompositions and rank-revealing triangular decompositions (ULV, URV, VSV, ULLV, and ULLIV. In addition, we show how the subspace-based algorithms can be analyzed and compared by means of simple FIR filter interpretations. The algorithms are illustrated with working Matlab code and applications in speech processing.

  12. Study of wavelet packet energy entropy for emotion classification in speech and glottal signals

    Science.gov (United States)

    He, Ling; Lech, Margaret; Zhang, Jing; Ren, Xiaomei; Deng, Lihua

    2013-07-01

    The automatic speech emotion recognition has important applications in human-machine communication. Majority of current research in this area is focused on finding optimal feature parameters. In recent studies, several glottal features were examined as potential cues for emotion differentiation. In this study, a new type of feature parameter is proposed, which calculates energy entropy on values within selected Wavelet Packet frequency bands. The modeling and classification tasks are conducted using the classical GMM algorithm. The experiments use two data sets: the Speech Under Simulated Emotion (SUSE) data set annotated with three different emotions (angry, neutral and soft) and Berlin Emotional Speech (BES) database annotated with seven different emotions (angry, bored, disgust, fear, happy, sad and neutral). The average classification accuracy achieved for the SUSE data (74%-76%) is significantly higher than the accuracy achieved for the BES data (51%-54%). In both cases, the accuracy was significantly higher than the respective random guessing levels (33% for SUSE and 14.3% for BES).

  13. Models of calcium signalling

    CERN Document Server

    Dupont, Geneviève; Kirk, Vivien; Sneyd, James

    2016-01-01

    This book discusses the ways in which mathematical, computational, and modelling methods can be used to help understand the dynamics of intracellular calcium. The concentration of free intracellular calcium is vital for controlling a wide range of cellular processes, and is thus of great physiological importance. However, because of the complex ways in which the calcium concentration varies, it is also of great mathematical interest.This book presents the general modelling theory as well as a large number of specific case examples, to show how mathematical modelling can interact with experimental approaches, in an interdisciplinary and multifaceted approach to the study of an important physiological control mechanism. Geneviève Dupont is FNRS Research Director at the Unit of Theoretical Chronobiology of the Université Libre de Bruxelles;Martin Falcke is head of the Mathematical Cell Physiology group at the Max Delbrück Center for Molecular Medicine, Berlin;Vivien Kirk is an Associate Professor in the Depar...

  14. Dimension-based quality modeling of transmitted speech

    CERN Document Server

    Wältermann, Marcel

    2013-01-01

    In this book, speech transmission quality is modeled on the basis of perceptual dimensions. The author identifies those dimensions that are relevant for today's public-switched and packet-based telecommunication systems, regarding the complete transmission path from the mouth of the speaker to the ear of the listener. Both narrowband (300-3400 Hz) as well as wideband (50-7000 Hz) speech transmission is taken into account. A new analytical assessment method is presented that allows the dimensions to be rated by non-expert listeners in a direct way. Due to the efficiency of the test method, a relatively large number of stimuli can be assessed in auditory tests. The test method is applied in two auditory experiments. The book gives the evidence that this test method provides meaningful and reliable results. The resulting dimension scores together with respective overall quality ratings form the basis for a new parametric model for the quality estimation of transmitted speech based on the perceptual dimensions. I...

  15. Modeling consonant-vowel coarticulation for articulatory speech synthesis.

    Directory of Open Access Journals (Sweden)

    Peter Birkholz

    Full Text Available A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis.

  16. Refining a model of hearing impairment using speech psychophysics

    DEFF Research Database (Denmark)

    Jepsen, Morten Løve; Dau, Torsten; Ghitza, Oded

    2014-01-01

    The premise of this study is that models of hearing, in general, and of individual hearing impairment, in particular, can be improved by using speech test results as an integral part of the modeling process. A conceptual iterative procedure is presented which, for an individual, considers measures...... of sensitivity, cochlear compression, and phonetic confusions using the Diagnostic Rhyme Test (DRT) framework. The suggested approach is exemplified by presenting data from three hearing-impaired listeners and results obtained with models of the hearing impairment of the individuals. The work reveals...

  17. Modelling vocal anatomy's significant effect on speech

    NARCIS (Netherlands)

    de Boer, B.

    2010-01-01

    This paper investigates the effect of larynx position on the articulatory abilities of a humanlike vocal tract. Previous work has investigated models that were built to resemble the anatomy of existing species or fossil ancestors. This has led to conflicting conclusions about the relation between

  18. The TRACE Model of Speech Perception.

    Science.gov (United States)

    1984-11-01

    Street Eugene. OF T70: New Haven, CT 0651( S- o . rredeser Dr. Stephen Kosslyn i:: erane; & Newman. IZ36 Williak Jases Hal: 50 M34otcr Street 43 Kirkland...Cognitive Science, 9, 113-147. Anderson, J. A. (1977). Neural models with cognitive implications. In D. LaBerge & S. 1. Samuels (Eds.), Basic

  19. Prediction of speech masking release for fluctuating interferers based on the envelope power signal-to-noise ratio

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2012-01-01

    to conditions with stationary interferers due to the long-term estimation of the envelope power and cannot account for the well-known phenomenon of speech masking release. Here, a short-term version of the sEPSM is described [Jørgensen and Dau, 2012, in preparation], which estimates the SNRenv in short temporal...... segments. Predictions obtained with the short-term sEPSM are compared to data from Kjems et al. [(2009). J. Acoust. Soc. Am. 126 (3), 1415-1426] where speech is mixed with four different interferers, including speech-shaped noise, bottle noise, car noise, and a highly non-stationary cafe noise. The model...

  20. Evaluation of Short-Term Cepstral Based Features for Detection of Parkinson’s Disease Severity Levels through Speech signals

    Science.gov (United States)

    Oung, Qi Wei; Nisha Basah, Shafriza; Muthusamy, Hariharan; Vijean, Vikneswaran; Lee, Hoileong

    2018-03-01

    Parkinson’s disease (PD) is one type of progressive neurodegenerative disease known as motor system syndrome, which is due to the death of dopamine-generating cells, a region of the human midbrain. PD normally affects people over 60 years of age, which at present has influenced a huge part of worldwide population. Lately, many researches have shown interest into the connection between PD and speech disorders. Researches have revealed that speech signals may be a suitable biomarker for distinguishing between people with Parkinson’s (PWP) from healthy subjects. Therefore, early diagnosis of PD through the speech signals can be considered for this aim. In this research, the speech data are acquired based on speech behaviour as the biomarker for differentiating PD severity levels (mild and moderate) from healthy subjects. Feature extraction algorithms applied are Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), and Weighted Linear Prediction Cepstral Coefficients (WLPCC). For classification, two types of classifiers are used: k-Nearest Neighbour (KNN) and Probabilistic Neural Network (PNN). The experimental results demonstrated that PNN classifier and KNN classifier achieve the best average classification performance of 92.63% and 88.56% respectively through 10-fold cross-validation measures. Favourably, the suggested techniques have the possibilities of becoming a new choice of promising tools for the PD detection with tremendous performance.

  1. Speech Signal and Facial Image Processing for Obstructive Sleep Apnea Assessment

    Directory of Open Access Journals (Sweden)

    Fernando Espinoza-Cuadros

    2015-01-01

    Full Text Available Obstructive sleep apnea (OSA is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA. OSA is generally diagnosed through a costly procedure requiring an overnight stay of the patient at the hospital. This has led to proposing less costly procedures based on the analysis of patients’ facial images and voice recordings to help in OSA detection and severity assessment. In this paper we investigate the use of both image and speech processing to estimate the apnea-hypopnea index, AHI (which describes the severity of the condition, over a population of 285 male Spanish subjects suspected to suffer from OSA and referred to a Sleep Disorders Unit. Photographs and voice recordings were collected in a supervised but not highly controlled way trying to test a scenario close to an OSA assessment application running on a mobile device (i.e., smartphones or tablets. Spectral information in speech utterances is modeled by a state-of-the-art low-dimensional acoustic representation, called i-vector. A set of local craniofacial features related to OSA are extracted from images after detecting facial landmarks using Active Appearance Models (AAMs. Support vector regression (SVR is applied on facial features and i-vectors to estimate the AHI.

  2. Degraded neural and behavioral processing of speech sounds in a rat model of Rett syndrome.

    Science.gov (United States)

    Engineer, Crystal T; Rahebi, Kimiya C; Borland, Michael S; Buell, Elizabeth P; Centanni, Tracy M; Fink, Melyssa K; Im, Kwok W; Wilson, Linda G; Kilgard, Michael P

    2015-11-01

    Individuals with Rett syndrome have greatly impaired speech and language abilities. Auditory brainstem responses to sounds are normal, but cortical responses are highly abnormal. In this study, we used the novel rat Mecp2 knockout model of Rett syndrome to document the neural and behavioral processing of speech sounds. We hypothesized that both speech discrimination ability and the neural response to speech sounds would be impaired in Mecp2 rats. We expected that extensive speech training would improve speech discrimination ability and the cortical response to speech sounds. Our results reveal that speech responses across all four auditory cortex fields of Mecp2 rats were hyperexcitable, responded slower, and were less able to follow rapidly presented sounds. While Mecp2 rats could accurately perform consonant and vowel discrimination tasks in quiet, they were significantly impaired at speech sound discrimination in background noise. Extensive speech training improved discrimination ability. Training shifted cortical responses in both Mecp2 and control rats to favor the onset of speech sounds. While training increased the response to low frequency sounds in control rats, the opposite occurred in Mecp2 rats. Although neural coding and plasticity are abnormal in the rat model of Rett syndrome, extensive therapy appears to be effective. These findings may help to explain some aspects of communication deficits in Rett syndrome and suggest that extensive rehabilitation therapy might prove beneficial. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. Mathematical Modelling Plant Signalling Networks

    KAUST Repository

    Muraro, D.

    2013-01-01

    During the last two decades, molecular genetic studies and the completion of the sequencing of the Arabidopsis thaliana genome have increased knowledge of hormonal regulation in plants. These signal transduction pathways act in concert through gene regulatory and signalling networks whose main components have begun to be elucidated. Our understanding of the resulting cellular processes is hindered by the complex, and sometimes counter-intuitive, dynamics of the networks, which may be interconnected through feedback controls and cross-regulation. Mathematical modelling provides a valuable tool to investigate such dynamics and to perform in silico experiments that may not be easily carried out in a laboratory. In this article, we firstly review general methods for modelling gene and signalling networks and their application in plants. We then describe specific models of hormonal perception and cross-talk in plants. This mathematical analysis of sub-cellular molecular mechanisms paves the way for more comprehensive modelling studies of hormonal transport and signalling in a multi-scale setting. © EDP Sciences, 2013.

  4. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    Science.gov (United States)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  5. Least 1-Norm Pole-Zero Modeling with Sparse Deconvolution for Speech Analysis

    DEFF Research Database (Denmark)

    Shi, Liming; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2017-01-01

    In this paper, we present a speech analysis method based on sparse pole-zero modeling of speech. Instead of using the all-pole model to approximate the speech production filter, a pole-zero model is used for the combined effect of the vocal tract; radiation at the lips and the glottal pulse shape......-zero, linear prediction and sparse linear prediction methods, experimental results show that the proposed speech analysis method has lower spectral distortion, higher reconstruction SNR and sparser residuals....

  6. The Cortical Organization of Speech Processing: Feedback Control and Predictive Coding the Context of a Dual-Stream Model

    Science.gov (United States)

    Hickok, Gregory

    2012-01-01

    Speech recognition is an active process that involves some form of predictive coding. This statement is relatively uncontroversial. What is less clear is the source of the prediction. The dual-stream model of speech processing suggests that there are two possible sources of predictive coding in speech perception: the motor speech system and the…

  7. Comparison of Linear Prediction Models for Audio Signals

    Directory of Open Access Journals (Sweden)

    2009-03-01

    Full Text Available While linear prediction (LP has become immensely popular in speech modeling, it does not seem to provide a good approach for modeling audio signals. This is somewhat surprising, since a tonal signal consisting of a number of sinusoids can be perfectly predicted based on an (all-pole LP model with a model order that is twice the number of sinusoids. We provide an explanation why this result cannot simply be extrapolated to LP of audio signals. If noise is taken into account in the tonal signal model, a low-order all-pole model appears to be only appropriate when the tonal components are uniformly distributed in the Nyquist interval. Based on this observation, different alternatives to the conventional LP model can be suggested. Either the model should be changed to a pole-zero, a high-order all-pole, or a pitch prediction model, or the conventional LP model should be preceded by an appropriate frequency transform, such as a frequency warping or downsampling. By comparing these alternative LP models to the conventional LP model in terms of frequency estimation accuracy, residual spectral flatness, and perceptual frequency resolution, we obtain several new and promising approaches to LP-based audio modeling.

  8. A Novel Method for Classifying Body Mass Index on the Basis of Speech Signals for Future Clinical Applications: A Pilot Study

    Directory of Open Access Journals (Sweden)

    Bum Ju Lee

    2013-01-01

    Full Text Available Obesity is a serious public health problem because of the risk factors for diseases and psychological problems. The focus of this study is to diagnose the patient BMI (body mass index status without weight and height measurements for the use in future clinical applications. In this paper, we first propose a method for classifying the normal and the overweight using only speech signals. Also, we perform a statistical analysis of the features from speech signals. Based on 1830 subjects, the accuracy and AUC (area under the ROC curve of age- and gender-specific classifications ranged from 60.4 to 73.8% and from 0.628 to 0.738, respectively. We identified several features that were significantly different between normal and overweight subjects (P<0.05. Also, we found compact and discriminatory feature subsets for building models for diagnosing normal or overweight individuals through wrapper-based feature subset selection. Our results showed that predicting BMI status is possible using a combination of speech features, even though significant features are rare and weak in age- and gender-specific groups and that the classification accuracy with feature selection was higher than that without feature selection. Our method has the potential to be used in future clinical applications such as automatic BMI diagnosis in telemedicine or remote healthcare.

  9. DEVELOPING MODIFIED SCAFFOLDING MODEL TO ELICIT LEARNERS‟S SPEECH PRODUCTION

    Directory of Open Access Journals (Sweden)

    Inti Englishtina

    2017-04-01

    children‘s English speech production. It is aimed at describing what the teachers need in eliciting their students‘ speech production; how a scaffolding model should be developed to elicit the children‘s speech production; and how effective is the scaffolding model in eliciting the children‘s speech production. The objects of the study are teachers and students of kindergarten at Mondial SchoolSemarang, Indonesia. Preliminary research was conducted to describe what the teachers need to elicit their students‘ speech production. Referring to the need analysis, a scaffolding model was developed to elicit the children‘s speech production. To explain the effectiveness of the model a try out was carried out on the model developed. Based on the result of the try out, a final model was developed. The findings of the preliminary research suggest that Mondial School kindergarten teachers need a scaffolding model to elicit their students‘ speech production. Referring to the findings a scaffolding model based on speech functions proposed by Celce-Murcia at. al (1997 was developed. To explain the effectiveness of the model the developed initial model was tried out. Based on the result of the try out the final scaffolding model was developed. This study concludes that kindergarten teachers of Mondial School need a scaffolding model to elicit their children‘s English speech production. Based on the need analysis a ModifiedScaffolding Model was developed. Referring to the result of the try out steps it is reasonable to argue that this product of Scaffolding Model is effective in eliciting English speech production of kindergarten students of Mondial School. As teachers use to helping learners to bridge a cognitive

  10. The Linear Model Research on Tibetan Six-Character Poetry's Respiratory Signal

    Science.gov (United States)

    Yonghong, Li; Yangrui, Yang; Lei, Guo; Hongzhi, Yu

    In this paper, we studied the Tibetan six-character pomes' respiratory signal during reading from the perspective of the physiological. Main contents include: 1) Selected 40 representative Tibetan six-character and four lines pomes from ldquo; The Love-songs of 6th Dalai Lama Tshang•yangGya•tsho ", and recorded speech sounds, voice and respiratory signals; 2) Designed a set of respiratory signal parameters for the study of poetry; 3) Extracted the relevant parameters of poetry respiratory signal by using the well-established respiratory signal processing platform; 4) Studied the type of breathing pattern, established the linear model of poetry respiratory signal.

  11. Model-based Sparse Component Analysis for Multiparty Distant Speech Recognition

    OpenAIRE

    Asaei, Afsaneh

    2013-01-01

    This research takes place in the general context of improving the performance of the Distant Speech Recognition (DSR) systems, tackling the reverberation and recognition of overlap speech. Perceptual modeling indicates that sparse representation exists in the auditory cortex. The present project thus builds upon the hypothesis that incorporating this information in DSR front-end processing could improve the speech recognition performance in realistic conditions including overlap and reverbera...

  12. Phonological representations are unconsciously used when processing complex, non-speech signals.

    Directory of Open Access Journals (Sweden)

    Mahan Azadpour

    Full Text Available Neuroimaging studies of speech processing increasingly rely on artificial speech-like sounds whose perceptual status as speech or non-speech is assigned by simple subjective judgments; brain activation patterns are interpreted according to these status assignments. The naïve perceptual status of one such stimulus, spectrally-rotated speech (not consciously perceived as speech by naïve subjects, was evaluated in discrimination and forced identification experiments. Discrimination of variation in spectrally-rotated syllables in one group of naïve subjects was strongly related to the pattern of similarities in phonological identification of the same stimuli provided by a second, independent group of naïve subjects, suggesting either that (1 naïve rotated syllable perception involves phonetic-like processing, or (2 that perception is solely based on physical acoustic similarity, and similar sounds are provided with similar phonetic identities. Analysis of acoustic (Euclidean distances of center frequency values of formants and phonetic similarities in the perception of the vowel portions of the rotated syllables revealed that discrimination was significantly and independently influenced by both acoustic and phonological information. We conclude that simple subjective assessments of artificial speech-like sounds can be misleading, as perception of such sounds may initially and unconsciously utilize speech-like, phonological processing.

  13. Auditory-motor interactions in pediatric motor speech disorders: neurocomputational modeling of disordered development.

    Science.gov (United States)

    Terband, H; Maassen, B; Guenther, F H; Brumberg, J

    2014-01-01

    Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between neurological deficits in auditory and motor processes using computational modeling with the DIVA model. In a series of computer simulations, we investigated the effect of a motor processing deficit alone (MPD), and the effect of a motor processing deficit in combination with an auditory processing deficit (MPD+APD) on the trajectory and endpoint of speech motor development in the DIVA model. Simulation results showed that a motor programming deficit predominantly leads to deterioration on the phonological level (phonemic mappings) when auditory self-monitoring is intact, and on the systemic level (systemic mapping) if auditory self-monitoring is impaired. These findings suggest a close relation between quality of auditory self-monitoring and the involvement of phonological vs. motor processes in children with pediatric motor speech disorders. It is suggested that MPD+APD might be involved in typically apraxic speech output disorders and MPD in pediatric motor speech disorders that also have a phonological component. Possibilities to verify these hypotheses using empirical data collected from human subjects are discussed. The reader will be able to: (1) identify the difficulties in studying disordered speech motor development; (2) describe the differences in speech motor characteristics between SSD and subtype CAS; (3) describe the different types of learning that occur in the sensory-motor system during babbling and early speech acquisition; (4) identify the neural control subsystems involved in speech production; (5) describe the potential role of auditory self-monitoring in developmental speech disorders. Copyright © 2014 Elsevier Inc. All rights reserved.

  14. End-to-End Neural Segmental Models for Speech Recognition

    Science.gov (United States)

    Tang, Hao; Lu, Liang; Kong, Lingpeng; Gimpel, Kevin; Livescu, Karen; Dyer, Chris; Smith, Noah A.; Renals, Steve

    2017-12-01

    Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time. Neural segmental models are segmental models that use neural network-based weight functions. Neural segmental models have achieved competitive results for speech recognition, and their end-to-end training has been explored in several studies. In this work, we review neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder. We study end-to-end segmental models with different weight functions, including ones based on frame-level neural classifiers and on segmental recurrent neural networks. We study how reducing the search space size impacts performance under different weight functions. We also compare several loss functions for end-to-end training. Finally, we explore training approaches, including multi-stage vs. end-to-end training and multitask training that combines segmental and frame-level losses.

  15. A Block-Based Linear MMSE Noise Reduction with a High Temporal Resolution Modeling of the Speech Excitation

    DEFF Research Database (Denmark)

    Li, Chunjian; Andersen, S. V.

    2005-01-01

    A comprehensive linear minimum mean squared error (LMMSE) approach for parametric speech enhancement is developed. The proposed algorithms aim at joint LMMSE estimation of signal power spectra and phase spectra, as well as exploitation of correlation between spectral components. The major cause...... of this interfrequency correlation is shown to be the prominent temporal power localization in the excitation of voiced speech. LMMSE estimators in time domain and frequency domain are first formulated. To obtain the joint estimator, we model the spectral signal covariance matrix as a full covariancematrix instead...... coefficients, and the excitation matrix is built from estimates of the instantaneous power of the excitation sequence. A decision-directed power spectral subtraction method and a modified multipulse linear predictive coding (MPLPC) method are used in these estimations, respectively. The spectral domain...

  16. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  17. Reanalyzing neurocognitive data on the role of the motor system in speech perception within COSMO, a Bayesian perceptuo-motor model of speech communication.

    Science.gov (United States)

    Barnaud, Marie-Lou; Bessière, Pierre; Diard, Julien; Schwartz, Jean-Luc

    2017-12-11

    While neurocognitive data provide clear evidence for the involvement of the motor system in speech perception, its precise role and the way motor information is involved in perceptual decision remain unclear. In this paper, we discuss some recent experimental results in light of COSMO, a Bayesian perceptuo-motor model of speech communication. COSMO enables us to model both speech perception and speech production with probability distributions relating phonological units with sensory and motor variables. Speech perception is conceived as a sensory-motor architecture combining an auditory and a motor decoder thanks to a Bayesian fusion process. We propose the sketch of a neuroanatomical architecture for COSMO, and we capitalize on properties of the auditory vs. motor decoders to address three neurocognitive studies of the literature. Altogether, this computational study reinforces functional arguments supporting the role of a motor decoding branch in the speech perception process. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  18. An introduction to statistical parametric speech synthesis

    Indian Academy of Sciences (India)

    in speech synthesis by listening tests, depends critically on choosing an appropriate configu- ration. The two most important aspects of this configuration are the parameterization of the speech signal (the 'observations' of the model, in HMM terminology) and the choice of mod- elling unit. Since the modelling unit is typically ...

  19. Auditory-model based robust feature selection for speech recognition.

    Science.gov (United States)

    Koniaris, Christos; Kuropatwinski, Marcin; Kleijn, W Bastiaan

    2010-02-01

    It is shown that robust dimension-reduction of a feature set for speech recognition can be based on a model of the human auditory system. Whereas conventional methods optimize classification performance, the proposed method exploits knowledge implicit in the auditory periphery, inheriting its robustness. Features are selected to maximize the similarity of the Euclidean geometry of the feature domain and the perceptual domain. Recognition experiments using mel-frequency cepstral coefficients (MFCCs) confirm the effectiveness of the approach, which does not require labeled training data. For noisy data the method outperforms commonly used discriminant-analysis based dimension-reduction methods that rely on labeling. The results indicate that selecting MFCCs in their natural order results in subsets with good performance.

  20. Impaired motor inhibition in adults who stutter - evidence from speech-free stop-signal reaction time tasks.

    Science.gov (United States)

    Markett, Sebastian; Bleek, Benjamin; Reuter, Martin; Prüss, Holger; Richardt, Kirsten; Müller, Thilo; Yaruss, J Scott; Montag, Christian

    2016-10-01

    Idiopathic stuttering is a fluency disorder characterized by impairments during speech production. Deficits in the motor control circuits of the basal ganglia have been implicated in idiopathic stuttering but it is unclear how these impairments relate to the disorder. Previous work has indicated a possible deficiency in motor inhibition in children who stutter. To extend these findings to adults, we designed two experiments to probe executive motor control in people who stutter using manual reaction time tasks that do not rely on speech production. We used two versions of the stop-signal reaction time task, a measure for inhibitory motor control that has been shown to rely on the basal ganglia circuits. We show increased stop-signal reaction times in two independent samples of adults who stutter compared to age- and sex-matched control groups. Additional measures involved simple reaction time measurements and a task-switching task where no group difference was detected. Results indicate a deficiency in inhibitory motor control in people who stutter in a task that does not rely on overt speech production and cannot be explained by general deficits in executive control or speeded motor execution. This finding establishes the stop-signal reaction time as a possible target for future experimental and neuroimaging studies on fluency disorders and is a further step towards unraveling the contribution of motor control deficits to idiopathic stuttering. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. New Results on Single-Channel Speech Separation Using Sinusoidal Modeling

    DEFF Research Database (Denmark)

    Mowlaee, Pejman; Christensen, Mads Græsbøll; Jensen, Søren Holdt

    2011-01-01

    We present new results on single-channel speech separation and suggest a new separation approach to improve the speech quality of separated signals from an observed mix- ture. The key idea is to derive a mixture estimator based on sinusoidal parameters. The proposed estimator is aimed at finding...... the proposed method over other methods are confirmed by employing perceptual evaluation of speech quality (PESQ) as an objective measure and a MUSHRA listening test as a subjective evaluation for both speaker-dependent and gender-dependent scenarios....

  2. Relationship between speech recognition in noise and sparseness.

    Science.gov (United States)

    Li, Guoping; Lutman, Mark E; Wang, Shouyan; Bleeck, Stefan

    2012-02-01

    Established methods for predicting speech recognition in noise require knowledge of clean speech signals, placing limitations on their application. The study evaluates an alternative approach based on characteristics of noisy speech, specifically its sparseness as represented by the statistic kurtosis. Experiments 1 and 2 involved acoustic analysis of vowel-consonant-vowel (VCV) syllables in babble noise, comparing kurtosis, glimpsing areas, and extended speech intelligibility index (ESII) of noisy speech signals with one another and with pre-existing speech recognition scores. Experiment 3 manipulated kurtosis of VCV syllables and investigated effects on speech recognition scores in normal-hearing listeners. Pre-existing speech recognition data for Experiments 1 and 2; seven normal-hearing participants for Experiment 3. Experiments 1 and 2 demonstrated that kurtosis calculated in the time-domain from noisy speech is highly correlated (r > 0.98) with established prediction models: glimpsing and ESII. All three measures predicted speech recognition scores well. The final experiment showed a clear monotonic relationship between speech recognition scores and kurtosis. Speech recognition performance in noise is closely related to the sparseness (kurtosis) of the noisy speech signal, at least for the types of speech and noise used here and for listeners with normal hearing.

  3. Perceptual centres in speech - an acoustic analysis

    Science.gov (United States)

    Scott, Sophie Kerttu

    Perceptual centres, or P-centres, represent the perceptual moments of occurrence of acoustic signals - the 'beat' of a sound. P-centres underlie the perception and production of rhythm in perceptually regular speech sequences. P-centres have been modelled both in speech and non speech (music) domains. The three aims of this thesis were toatest out current P-centre models to determine which best accounted for the experimental data bto identify a candidate parameter to map P-centres onto (a local approach) as opposed to the previous global models which rely upon the whole signal to determine the P-centre the final aim was to develop a model of P-centre location which could be applied to speech and non speech signals. The first aim was investigated by a series of experiments in which a) speech from different speakers was investigated to determine whether different models could account for variation between speakers b) whether rendering the amplitude time plot of a speech signal affects the P-centre of the signal c) whether increasing the amplitude at the offset of a speech signal alters P-centres in the production and perception of speech. The second aim was carried out by a) manipulating the rise time of different speech signals to determine whether the P-centre was affected, and whether the type of speech sound ramped affected the P-centre shift b) manipulating the rise time and decay time of a synthetic vowel to determine whether the onset alteration was had more affect on P-centre than the offset manipulation c) and whether the duration of a vowel affected the P-centre, if other attributes (amplitude, spectral contents) were held constant. The third aim - modelling P-centres - was based on these results. The Frequency dependent Amplitude Increase Model of P-centre location (FAIM) was developed using a modelling protocol, the APU GammaTone Filterbank and the speech from different speakers. The P-centres of the stimuli corpus were highly predicted by attributes of

  4. Computational Neural Modeling of Speech Motor Control in Childhood Apraxia of Speech (CAS)

    Science.gov (United States)

    Terband, Hayo; Maassen, Ben; Guenther, Frank H.; Brumberg, Jonathan

    2009-01-01

    Purpose: Childhood apraxia of speech (CAS) has been associated with a wide variety of diagnostic descriptions and has been shown to involve different symptoms during successive stages of development. In the present study, the authors attempted to associate the symptoms of CAS in a particular developmental stage with particular…

  5. Computational neural modeling of speech motor control in childhood apraxia of speech (CAS).

    NARCIS (Netherlands)

    Terband, H.R.; Maassen, B.A.M.; Guenther, F.H.; Brumberg, J.

    2009-01-01

    PURPOSE: Childhood apraxia of speech (CAS) has been associated with a wide variety of diagnostic descriptions and has been shown to involve different symptoms during successive stages of development. In the present study, the authors attempted to associate the symptoms of CAS in a particular

  6. Perceptual organization of speech signals by children with and without dyslexia.

    Science.gov (United States)

    Nittrouer, Susan; Lowenstein, Joanna H

    2013-08-01

    Developmental dyslexia is a condition in which children encounter difficulty learning to read in spite of adequate instruction. Although considerable effort has been expended trying to identify the source of the problem, no single solution has been agreed upon. The current study explored a new hypothesis, that developmental dyslexia may be due to faulty perceptual organization of linguistically relevant sensory input. To test that idea, sentence-length speech signals were processed to create either sine-wave or noise-vocoded analogs. Seventy children between 8 and 11 years of age, with and without dyslexia participated. Children with dyslexia were selected to have phonological awareness deficits, although those without such deficits were retained in the study. The processed sentences were presented for recognition, and measures of reading, phonological awareness, and expressive vocabulary were collected. Results showed that children with dyslexia, regardless of phonological subtype, had poorer recognition scores than children without dyslexia for both kinds of degraded sentences. Older children with dyslexia recognized the sine-wave sentences better than younger children with dyslexia, but no such effect of age was found for the vocoded materials. Recognition scores were used as predictor variables in regression analyses with reading, phonological awareness, and vocabulary measures used as dependent variables. Scores for both sorts of sentence materials were strong predictors of performance on all three dependent measures when all children were included, but only performance for the sine-wave materials explained significant proportions of variance when only children with dyslexia were included. Finally, matching young, typical readers with older children with dyslexia on reading abilities did not mitigate the group difference in recognition of vocoded sentences. Conclusions were that children with dyslexia have difficulty organizing linguistically relevant sensory

  7. Perceptual organization of speech signals by children with and without dyslexia

    Science.gov (United States)

    Nittrouer, Susan; Lowenstein, Joanna H.

    2013-01-01

    Developmental dyslexia is a condition in which children encounter difficulty learning to read in spite of adequate instruction. Although considerable effort has been expended trying to identify the source of the problem, no single solution has been agreed upon. The current study explored a new hypothesis, that developmental dyslexia may be due to faulty perceptual organization of linguistically relevant sensory input. To test that idea, sentence-length speech signals were processed to create either sine-wave or noise-vocoded analogs. Seventy children between 8 and 11 years of age, with and without dyslexia participated. Children with dyslexia were selected to have phonological awareness deficits, although those without such deficits were retained in the study. The processed sentences were presented for recognition, and measures of reading, phonological awareness, and expressive vocabulary were collected. Results showed that children with dyslexia, regardless of phonological subtype, had poorer recognition scores than children without dyslexia for both kinds of degraded sentences. Older children with dyslexia recognized the sine-wave sentences better than younger children with dyslexia, but no such effect of age was found for the vocoded materials. Recognition scores were used as predictor variables in regression analyses with reading, phonological awareness, and vocabulary measures used as dependent variables. Scores for both sorts of sentence materials were strong predictors of performance on all three dependent measures when all children were included, but only performance for the sine-wave materials explained significant proportions of variance when only children with dyslexia were included. Finally, matching young, typical readers with older children with dyslexia on reading abilities did not mitigate the group difference in recognition of vocoded sentences. Conclusions were that children with dyslexia have difficulty organizing linguistically relevant sensory

  8. Academic Freedom in Classroom Speech: A Heuristic Model for U.S. Catholic Higher Education

    Science.gov (United States)

    Jacobs, Richard M.

    2010-01-01

    As the nation's Catholic universities and colleges continually clarify their identity, this article examines academic freedom in classroom speech, offering a heuristic model for use as board members, academic administrators, and faculty leaders discuss, evaluate, and judge allegations of misconduct in classroom speech. Focusing upon the practice…

  9. Speech-Language Pathologist and General Educator Collaboration: A Model for Tier 2 Service Delivery

    Science.gov (United States)

    Watson, Gina D.; Bellon-Harn, Monica L.

    2014-01-01

    Tier 2 supplemental instruction within a response to intervention framework provides a unique opportunity for developing partnerships between speech-language pathologists and classroom teachers. Speech-language pathologists may participate in Tier 2 instruction via a consultative or collaborative service delivery model depending on district needs.…

  10. Model-Based Synthesis of Visual Speech Movements from 3D Video

    Directory of Open Access Journals (Sweden)

    Edge JamesD

    2009-01-01

    Full Text Available We describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. Speech lip movements are captured using a 3D stereo face capture system and split up into phonetic units. A dynamic parameterisation of this data is constructed which maintains the relationship between lip shapes and velocities; within this parameterisation a model of how lips move is built and is used in the animation of visual speech movements from speech audio input. The mapping from audio parameters to lip movements is disambiguated by selecting only the most similar stored phonetic units to the target utterance during synthesis. By combining properties of model-based synthesis (e.g., HMMs, neural nets with unit selection we improve the quality of our speech synthesis.

  11. Role of neural network models for developing speech systems

    Indian Academy of Sciences (India)

    The MOS of these listening tests are given in columns 4 and 5 of table 5. The MOS of the quality of the speech without incorporating the prosody have been observed to be low compared to the speech synthesized by incorporating the prosody. The sig- nificance of the differences in the pairs of the MOS for intelligibility and ...

  12. Bistability in biochemical signaling models.

    Science.gov (United States)

    Sobie, Eric A

    2011-09-20

    This Teaching Resource provides lecture notes, slides, and a student assignment for a two-part lecture on the principles underlying bistability in biochemical signaling networks, which are illustrated with examples from the literature. The lectures cover analog, or graded, versus digital, all-or-none, responses in cells, with examples from different types of biological processes requiring each. Rate-balance plots are introduced as a method for determining whether generic one-variable systems exhibit one or several stable steady states. Bifurcation diagrams are presented as a more general method for detecting the presence of bistability in biochemical signaling networks. The examples include an artificial toggle switch, the lac operon in bacteria, and the mitogen-activated protein kinase cascade in both Xenopus oocytes and mammalian cells. The second part of the lecture links the concepts of bistability more closely to the mathematical tools provided by dynamical systems analysis. The examples from the first part of the lecture are analyzed with phase-plane techniques and bifurcation analysis, using the scientific programming language MATLAB. Using these programs as a template, the assignment requires the students to implement a model from the literature and analyze the stability of this model's steady states.

  13. Bayesian model of categorical effects in L1 and L2 speech perception

    Science.gov (United States)

    Kronrod, Yakov

    In this dissertation I present a model that captures categorical effects in both first language (L1) and second language (L2) speech perception. In L1 perception, categorical effects range between extremely strong for consonants to nearly continuous perception of vowels. I treat the problem of speech perception as a statistical inference problem and by quantifying categoricity I obtain a unified model of both strong and weak categorical effects. In this optimal inference mechanism, the listener uses their knowledge of categories and the acoustics of the signal to infer the intended productions of the speaker. The model splits up speech variability into meaningful category variance and perceptual noise variance. The ratio of these two variances, which I call Tau, directly correlates with the degree of categorical effects for a given phoneme or continuum. By fitting the model to behavioral data from different phonemes, I show how a single parametric quantitative variation can lead to the different degrees of categorical effects seen in perception experiments with different phonemes. In L2 perception, L1 categories have been shown to exert an effect on how L2 sounds are identified and how well the listener is able to discriminate them. Various models have been developed to relate the state of L1 categories with both the initial and eventual ability to process the L2. These models largely lacked a formalized metric to measure perceptual distance, a means of making a-priori predictions of behavior for a new contrast, and a way of describing non-discrete gradient effects. In the second part of my dissertation, I apply the same computational model that I used to unify L1 categorical effects to examining L2 perception. I show that we can use the model to make the same type of predictions as other SLA models, but also provide a quantitative framework while formalizing all measures of similarity and bias. Further, I show how using this model to consider L2 learners at

  14. Sparsity in Linear Predictive Coding of Speech

    OpenAIRE

    Giacobello, Daniele

    2010-01-01

    This thesis deals with developing improved techniques for speech coding based on the recent developments in sparse signal representation. In particular, this work is motivated by the need to address some of the limitations of the well- known linear prediction (LP) model currently applied in many modern speech coders. In the first part of the thesis, we provide an overview of Sparse Linear Predic- tion, a set of speech processing tools created by introducing sparsity constraints into the LP fr...

  15. Speech act theory in support of idealised warning models | Carstens ...

    African Journals Online (AJOL)

    In applied communication studies warnings (as components of instructional texts) are often characterised in terms of criteria for effectiveness. An idealised model for warnings include the following elements: a signal word or label appropriate to the level of hazard; a hazard statement; references to the consequences of ...

  16. Speech-like orofacial oscillations in stump-tailed macaque (Macaca arctoides) facial and vocal signals.

    Science.gov (United States)

    Toyoda, Aru; Maruhashi, Tamaki; Malaivijitnond, Suchinda; Koda, Hiroki

    2017-10-01

    Speech is unique to humans and characterized by facial actions of ∼5 Hz oscillations of lip, mouth or jaw movements. Lip-smacking, a facial display of primates characterized by oscillatory actions involving the vertical opening and closing of the jaw and lips, exhibits stable 5-Hz oscillation patterns, matching that of speech, suggesting that lip-smacking is a precursor of speech. We tested if facial or vocal actions exhibiting the same rate of oscillation are found in wide forms of facial or vocal displays in various social contexts, exhibiting diversity among species. We observed facial and vocal actions of wild stump-tailed macaques (Macaca arctoides), and selected video clips including facial displays (teeth chattering; TC), panting calls, and feeding. Ten open-to-open mouth durations during TC and feeding and five amplitude peak-to-peak durations in panting were analyzed. Facial display (TC) and vocalization (panting) oscillated within 5.74 ± 1.19 and 6.71 ± 2.91 Hz, respectively, similar to the reported lip-smacking of long-tailed macaques and the speech of humans. These results indicated a common mechanism for the central pattern generator underlying orofacial movements, which would evolve to speech. Similar oscillations in panting, which evolved from different muscular control than the orofacial action, suggested the sensory foundations for perceptual saliency particular to 5-Hz rhythms in macaques. This supports the pre-adaptation hypothesis of speech evolution, which states a central pattern generator for 5-Hz facial oscillation and perceptual background tuned to 5-Hz actions existed in common ancestors of macaques and humans, before the emergence of speech. © 2017 Wiley Periodicals, Inc.

  17. Hidden Hearing Loss and Computational Models of the Auditory Pathway: Predicting Speech Intelligibility Decline

    Science.gov (United States)

    2016-11-28

    Title: Hidden Hearing Loss and Computational Models of the Auditory Pathway: Predicting Speech Intelligibility Decline Christopher J. Smalt...to utilize computational models of the auditory periphery and auditory cortex to study the effect of low spontaneous rate ANF loss on the cortical...representation of speech intelligibility in noise. The auditory-periphery model of Zilany et al. (JASA 2009,2014) is used to make predictions of

  18. Speech sound discrimination training improves auditory cortex responses in a rat model of autism

    Directory of Open Access Journals (Sweden)

    Crystal T Engineer

    2014-08-01

    Full Text Available Children with autism often have language impairments and degraded cortical responses to speech. Extensive behavioral interventions can improve language outcomes and cortical responses. Prenatal exposure to the antiepileptic drug valproic acid (VPA increases the risk for autism and language impairment. Prenatal exposure to VPA also causes weaker and delayed auditory cortex responses in rats. In this study, we document speech sound discrimination ability in VPA exposed rats and document the effect of extensive speech training on auditory cortex responses. VPA exposed rats were significantly impaired at consonant, but not vowel, discrimination. Extensive speech training resulted in both stronger and faster anterior auditory field responses compared to untrained VPA exposed rats, and restored responses to control levels. This neural response improvement generalized to non-trained sounds. The rodent VPA model of autism may be used to improve the understanding of speech processing in autism and contribute to improving language outcomes.

  19. Prediction of Speech Recognition in Cochlear Implant Users by Adapting Auditory Models to Psychophysical Data

    Directory of Open Access Journals (Sweden)

    Svante Stadler

    2009-01-01

    Full Text Available Users of cochlear implants (CIs vary widely in their ability to recognize speech in noisy conditions. There are many factors that may influence their performance. We have investigated to what degree it can be explained by the users' ability to discriminate spectral shapes. A speech recognition task has been simulated using both a simple and a complex models of CI hearing. The models were individualized by adapting their parameters to fit the results of a spectral discrimination test. The predicted speech recognition performance was compared to experimental results, and they were significantly correlated. The presented framework may be used to simulate the effects of changing the CI encoding strategy.

  20. Blind estimation of the number of speech source in reverberant multisource scenarios based on binaural signals

    DEFF Research Database (Denmark)

    May, Tobias; van de Par, Steven

    2012-01-01

    In this paper we present a new approach for estimating the number of active speech sources in the presence of interfering noise sources and reverberation. First, a binaural front-end is used to detect the spatial positions of all active sound sources, resulting in a binary mask for each candidate...... position. Then, each candidate position is characterized by a set of features. In addition to exploiting the overall spectral shape, a new set of mask-based features is proposed which aims at characterizing the pattern of the estimated binary mask. The decision stage for detecting a speech source is based...... on a support vector machine (SVM) classifier. A systematic analysis shows that the proposed algorithm is able to blindly determine the number and the corresponding spatial positions of speech sources in multisource scenarios and generalizes well to unknown acoustic conditions...

  1. Phonetic perspectives on modelling information in the speech signal

    Indian Academy of Sciences (India)

    Centre for Music and Science, Faculty of Music, University of Cambridge,. Cambridge, CB3 9DP ...... and hence it follows that phonetic detail must be represented in memory and in abstract linguis- tic structure. ..... than any other level of structure: recall the point made in section 1.2d that t in tap has more in common with p in ...

  2. Emotional recognition from the speech signal for a virtual education agent

    Science.gov (United States)

    Tickle, A.; Raghu, S.; Elshaw, M.

    2013-06-01

    This paper explores the extraction of features from the speech wave to perform intelligent emotion recognition. A feature extract tool (openSmile) was used to obtain a baseline set of 998 acoustic features from a set of emotional speech recordings from a microphone. The initial features were reduced to the most important ones so recognition of emotions using a supervised neural network could be performed. Given that the future use of virtual education agents lies with making the agents more interactive, developing agents with the capability to recognise and adapt to the emotional state of humans is an important step.

  3. Contribution to automatic speech recognition. Analysis of the direct acoustical signal. Recognition of isolated words and phoneme identification

    International Nuclear Information System (INIS)

    Dupeyrat, Benoit

    1981-01-01

    This report deals with the acoustical-phonetic step of the automatic recognition of the speech. The parameters used are the extrema of the acoustical signal (coded in amplitude and duration). This coding method, the properties of which are described, is simple and well adapted to a digital processing. The quality and the intelligibility of the coded signal after reconstruction are particularly satisfactory. An experiment for the automatic recognition of isolated words has been carried using this coding system. We have designed a filtering algorithm operating on the parameters of the coding. Thus the characteristics of the formants can be derived under certain conditions which are discussed. Using these characteristics the identification of a large part of the phonemes for a given speaker was achieved. Carrying on the studies has required the development of a particular methodology of real time processing which allowed immediate evaluation of the improvement of the programs. Such processing on temporal coding of the acoustical signal is extremely powerful and could represent, used in connection with other methods an efficient tool for the automatic processing of the speech.(author) [fr

  4. Speech enhancement

    CERN Document Server

    Benesty, Jacob; Chen, Jingdong

    2006-01-01

    We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red

  5. Making Faces - State-Space Models Applied to Multi-Modal Signal Processing

    DEFF Research Database (Denmark)

    Lehn-Schiøler, Tue

    2005-01-01

    The two main focus areas of this thesis are State-Space Models and multi modal signal processing. The general State-Space Model is investigated and an addition to the class of sequential sampling methods is proposed. This new algorithm is denoted as the Parzen Particle Filter. Furthermore...... optimizer can be applied to speed up convergence. The linear version of the State-Space Model, the Kalman Filter, is applied to multi modal signal processing. It is demonstrated how a State-Space Model can be used to map from speech to lip movements. Besides the State-Space Model and the multi modal...

  6. The DYX2 locus and neurochemical signaling genes contribute to speech sound disorder and related neurocognitive domains.

    Science.gov (United States)

    Eicher, J D; Stein, C M; Deng, F; Ciesla, A A; Powers, N R; Boada, R; Smith, S D; Pennington, B F; Iyengar, S K; Lewis, B A; Gruen, J R

    2015-04-01

    A major milestone of child development is the acquisition and use of speech and language. Communication disorders, including speech sound disorder (SSD), can impair a child's academic, social and behavioral development. Speech sound disorder is a complex, polygenic trait with a substantial genetic component. However, specific genes that contribute to SSD remain largely unknown. To identify associated genes, we assessed the association of the DYX2 dyslexia risk locus and markers in neurochemical signaling genes (e.g., nicotinic and dopaminergic) with SSD and related endophenotypes. We first performed separate primary associations in two independent samples - Cleveland SSD (210 affected and 257 unaffected individuals in 127 families) and Denver SSD (113 affected individuals and 106 unaffected individuals in 85 families) - and then combined results by meta-analysis. DYX2 markers, specifically those in the 3' untranslated region of DCDC2 (P = 1.43 × 10(-4) ), showed the strongest associations with phonological awareness. We also observed suggestive associations of dopaminergic-related genes ANKK1 (P = 1.02 × 10(-2) ) and DRD2 (P = 9.22 × 10(-3) ) and nicotinic-related genes CHRNA3 (P = 2.51 × 10(-3) ) and BDNF (P = 8.14 × 10(-3) ) with case-control status and articulation. Our results further implicate variation in putative regulatory regions in the DYX2 locus, particularly in DCDC2, influencing language and cognitive traits. The results also support previous studies implicating variation in dopaminergic and nicotinic neural signaling influencing human communication and cognitive development. Our findings expand the literature showing genetic factors (e.g., DYX2) contributing to multiple related, yet distinct neurocognitive domains (e.g., dyslexia, language impairment, and SSD). How these factors interactively yield different neurocognitive and language-related outcomes remains to be elucidated. © 2015 The Authors. Genes, Brain and Behavior published by

  7. Role of neural network models for developing speech systems

    Indian Academy of Sciences (India)

    Syllable identity: A syllable is a combination of segments of consonants (C) and vowels (V). In this study, syllables with more than four segments (Cs or Vs) are ignored since the number of such syllables present ...... 7.1 Database. Speech data are collected from five different geographical regions (central, eastern, western,.

  8. Optimal speech motor control and token-to-token variability: a Bayesian modeling approach.

    Science.gov (United States)

    Patri, Jean-François; Diard, Julien; Perrier, Pascal

    2015-12-01

    The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.

  9. A computer model of auditory efferent suppression: implications for the recognition of speech in noise.

    Science.gov (United States)

    Brown, Guy J; Ferry, Robert T; Meddis, Ray

    2010-02-01

    The neural mechanisms underlying the ability of human listeners to recognize speech in the presence of background noise are still imperfectly understood. However, there is mounting evidence that the medial olivocochlear system plays an important role, via efferents that exert a suppressive effect on the response of the basilar membrane. The current paper presents a computer modeling study that investigates the possible role of this activity on speech intelligibility in noise. A model of auditory efferent processing [Ferry, R. T., and Meddis, R. (2007). J. Acoust. Soc. Am. 122, 3519-3526] is used to provide acoustic features for a statistical automatic speech recognition system, thus allowing the effects of efferent activity on speech intelligibility to be quantified. Performance of the "basic" model (without efferent activity) on a connected digit recognition task is good when the speech is uncorrupted by noise but falls when noise is present. However, recognition performance is much improved when efferent activity is applied. Furthermore, optimal performance is obtained when the amount of efferent activity is proportional to the noise level. The results obtained are consistent with the suggestion that efferent suppression causes a "release from adaptation" in the auditory-nerve response to noisy speech, which enhances its intelligibility.

  10. Predicting automatic speech recognition performance over communication channels from instrumental speech quality and intelligibility scores

    NARCIS (Netherlands)

    Gallardo, L.F.; Möller, S.; Beerends, J.

    2017-01-01

    The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility

  11. Standardization of Speech Corpus

    Directory of Open Access Journals (Sweden)

    Ai-jun Li

    2007-12-01

    Full Text Available Speech corpus is the basis for analyzing the characteristics of speech signals and developing speech synthesis and recognition systems. In China, almost all speech research and development affiliations are developing their own speech corpora. We have so many different kinds numbers of Chinese speech corpora that it is important to be able to conveniently share these speech corpora to avoid wasting time and money and to make research work more efficient. The primary goal of this research is to find a standard scheme which can make the corpus be established more efficiently and be used or shared more easily. A huge speech corpus on 10 regional accented Chinese, RASC863 (a Regional Accent Speech Corpus funded by National 863 Project will be exemplified to illuminate the standardization of speech corpus production.

  12. Cost-Efficient Development of Acoustic Models for Speech Recognition of Related Languages

    Directory of Open Access Journals (Sweden)

    J. Nouza

    2013-09-01

    Full Text Available When adapting an existing speech recognition system to a new language, major development costs are associated with the creation of an appropriate acoustic model (AM. For its training, a certain amount of recorded and annotated speech is required. In this paper, we show that not only the annotation process, but also the process of speech acquisition can be automated to minimize the need of human and expert work. We demonstrate the proposed methodology on Croatian language, for which the target AM has been built via cross-lingual adaptation of a Czech AM in 2 ways: a using commercially available GlobalPhone database, and b by automatic speech data mining from HRT radio archive. The latter approach is cost-free, yet it yields comparable or better results in LVCSR experiments conducted on 3 Croatian test sets.

  13. Effect of Simultaneous Bilingualism on Speech Intelligibility across Different Masker Types, Modalities, and Signal-to-Noise Ratios in School-Age Children.

    Science.gov (United States)

    Reetzke, Rachel; Lam, Boji Pak-Wing; Xie, Zilong; Sheng, Li; Chandrasekaran, Bharath

    2016-01-01

    Recognizing speech in adverse listening conditions is a significant cognitive, perceptual, and linguistic challenge, especially for children. Prior studies have yielded mixed results on the impact of bilingualism on speech perception in noise. Methodological variations across studies make it difficult to converge on a conclusion regarding the effect of bilingualism on speech-in-noise performance. Moreover, there is a dearth of speech-in-noise evidence for bilingual children who learn two languages simultaneously. The aim of the present study was to examine the extent to which various adverse listening conditions modulate differences in speech-in-noise performance between monolingual and simultaneous bilingual children. To that end, sentence recognition was assessed in twenty-four school-aged children (12 monolinguals; 12 simultaneous bilinguals, age of English acquisition ≤ 3 yrs.). We implemented a comprehensive speech-in-noise battery to examine recognition of English sentences across different modalities (audio-only, audiovisual), masker types (steady-state pink noise, two-talker babble), and a range of signal-to-noise ratios (SNRs; 0 to -16 dB). Results revealed no difference in performance between monolingual and simultaneous bilingual children across each combination of modality, masker, and SNR. Our findings suggest that when English age of acquisition and socioeconomic status is similar between groups, monolingual and bilingual children exhibit comparable speech-in-noise performance across a range of conditions analogous to everyday listening environments.

  14. Bridging computational approaches to speech production: The semantic–lexical–auditory–motor model (SLAM)

    Science.gov (United States)

    Hickok, Gregory

    2017-01-01

    Speech production is studied from both psycholinguistic and motor-control perspectives, with little interaction between the approaches. We assessed the explanatory value of integrating psycholinguistic and motor-control concepts for theories of speech production. By augmenting a popular psycholinguistic model of lexical retrieval with a motor-control-inspired architecture, we created a new computational model to explain speech errors in the context of aphasia. Comparing the model fits to picture-naming data from 255 aphasic patients, we found that our new model improves fits for a theoretically predictable subtype of aphasia: conduction. We discovered that the improved fits for this group were a result of strong auditory-lexical feedback activation, combined with weaker auditory-motor feedforward activation, leading to increased competition from phonologically related neighbors during lexical selection. We discuss the implications of our findings with respect to other extant models of lexical retrieval. PMID:26223468

  15. Bridging computational approaches to speech production: The semantic-lexical-auditory-motor model (SLAM).

    Science.gov (United States)

    Walker, Grant M; Hickok, Gregory

    2016-04-01

    Speech production is studied from both psycholinguistic and motor-control perspectives, with little interaction between the approaches. We assessed the explanatory value of integrating psycholinguistic and motor-control concepts for theories of speech production. By augmenting a popular psycholinguistic model of lexical retrieval with a motor-control-inspired architecture, we created a new computational model to explain speech errors in the context of aphasia. Comparing the model fits to picture-naming data from 255 aphasic patients, we found that our new model improves fits for a theoretically predictable subtype of aphasia: conduction. We discovered that the improved fits for this group were a result of strong auditory-lexical feedback activation, combined with weaker auditory-motor feedforward activation, leading to increased competition from phonologically related neighbors during lexical selection. We discuss the implications of our findings with respect to other extant models of lexical retrieval.

  16. Mu suppression as an index of sensorimotor contributions to speech processing: evidence from continuous EEG signals.

    Science.gov (United States)

    Cuellar, Megan; Bowers, Andrew; Harkrider, Ashley W; Wilson, Matthew; Saltuklaroglu, Tim

    2012-08-01

    Mu rhythm suppression is an index of sensorimotor activity during the processing of sensory stimuli. Two present studies investigate the extent to which this measure is sensitive to differences in acoustic processing. In both studies, participants were required to listen to 90second acoustic stimuli clips with their eyes closed and identify predetermined targets. Experimental conditions were designed to vary the acoustic processing demands. Mu suppression was measured continuously across central electrodes (C3, Cz, and C4). Ten adult females participated in the first study in which the target was a pseudoword presented in three conditions (identification, discrimination, discrimination in noise). Mu suppression was strongest and reached significance relative to baseline only in the discrimination in noise task at C3 (indicative of left hemisphere sensorimotor activity) when measured in a 10-12Hz bandwidth. Thirteen adult females participated in the second study, which measured mu suppression to acoustic stimuli with 'segmentation' (i.e., separating a parsed stimulus into individual components) versus non-segmentation requirements in both speech and tone discrimination conditions. Significantly greater overall suppression to speech relative to tone tasks was found in the 10-12Hz bandwidth. Further, suppression relative to baseline was significant only at C3 during the speech discrimination with segmentation task. Taken together, findings indicate that mu rhythm suppression in acoustic processing is sensitive to dorsal stream processing. More specifically, it is sensitive to (1) increases in overall processing demands and (2) processing linguistic versus non-linguistic information. Copyright © 2012 Elsevier B.V. All rights reserved.

  17. An articulatorily constrained, maximum entropy approach to speech recognition and speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Hogden, J.

    1996-12-31

    Hidden Markov models (HMM`s) are among the most popular tools for performing computer speech recognition. One of the primary reasons that HMM`s typically outperform other speech recognition techniques is that the parameters used for recognition are determined by the data, not by preconceived notions of what the parameters should be. This makes HMM`s better able to deal with intra- and inter-speaker variability despite the limited knowledge of how speech signals vary and despite the often limited ability to correctly formulate rules describing variability and invariance in speech. In fact, it is often the case that when HMM parameter values are constrained using the limited knowledge of speech, recognition performance decreases. However, the structure of an HMM has little in common with the mechanisms underlying speech production. Here, the author argues that by using probabilistic models that more accurately embody the process of speech production, he can create models that have all the advantages of HMM`s, but that should more accurately capture the statistical properties of real speech samples--presumably leading to more accurate speech recognition. The model he will discuss uses the fact that speech articulators move smoothly and continuously. Before discussing how to use articulatory constraints, he will give a brief description of HMM`s. This will allow him to highlight the similarities and differences between HMM`s and the proposed technique.

  18. Degraded speech sound processing in a rat model of fragile X syndrome.

    Science.gov (United States)

    Engineer, Crystal T; Centanni, Tracy M; Im, Kwok W; Rahebi, Kimiya C; Buell, Elizabeth P; Kilgard, Michael P

    2014-05-20

    Fragile X syndrome is the most common inherited form of intellectual disability and the leading genetic cause of autism. Impaired phonological processing in fragile X syndrome interferes with the development of language skills. Although auditory cortex responses are known to be abnormal in fragile X syndrome, it is not clear how these differences impact speech sound processing. This study provides the first evidence that the cortical representation of speech sounds is impaired in Fmr1 knockout rats, despite normal speech discrimination behavior. Evoked potentials and spiking activity in response to speech sounds, noise burst trains, and tones were significantly degraded in primary auditory cortex, anterior auditory field and the ventral auditory field. Neurometric analysis of speech evoked activity using a pattern classifier confirmed that activity in these fields contains significantly less information about speech sound identity in Fmr1 knockout rats compared to control rats. Responses were normal in the posterior auditory field, which is associated with sound localization. The greatest impairment was observed in the ventral auditory field, which is related to emotional regulation. Dysfunction in the ventral auditory field may contribute to poor emotional regulation in fragile X syndrome and may help explain the observation that later auditory evoked responses are more disturbed in fragile X syndrome compared to earlier responses. Rodent models of fragile X syndrome are likely to prove useful for understanding the biological basis of fragile X syndrome and for testing candidate therapies. Copyright © 2014 Elsevier B.V. All rights reserved.

  19. WORD BASED TAMIL SPEECH RECOGNITION USING TEMPORAL FEATURE BASED SEGMENTATION

    Directory of Open Access Journals (Sweden)

    A. Akila

    2015-05-01

    Full Text Available Speech recognition system requires segmentation of speech waveform into fundamental acoustic units. Segmentation is a process of decomposing the speech signal into smaller units. Speech segmentation could be done using wavelet, fuzzy methods, Artificial Neural Networks and Hidden Markov Model. Speech segmentation is a process of breaking continuous stream of sound into some basic units like words, phonemes or syllable that could be recognized. Segmentation could be used to distinguish different types of audio signals from large amount of audio data, often referred as audio classification. The speech segmentation can be divided into two categories based on whether the algorithm uses previous knowledge of data to process the speech. The categories are blind segmentation and aided segmentation.The major issues with the connected speech recognition algorithms were the vocabulary size will be larger with variation in the combination of words in the connected speech and the complexity of the algorithm is more to find the best match for the given test pattern. To overcome these issues, the connected speech has to be segmented into words using the attributes of speech. A methodology using the temporal feature Short Term Energy was proposed and compared with an existing algorithm called Dynamic Thresholding segmentation algorithm which uses spectrogram image of the connected speech for segmentation.

  20. Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training.

    Science.gov (United States)

    Narayanan, Arun; Wang, DeLiang

    2015-01-01

    Although deep neural network (DNN) acoustic models are known to be inherently noise robust, especially with matched training and testing data, the use of speech separation as a frontend and for deriving alternative feature representations has been shown to improve performance in challenging environments. We first present a supervised speech separation system that significantly improves automatic speech recognition (ASR) performance in realistic noise conditions. The system performs separation via ratio time-frequency masking; the ideal ratio mask (IRM) is estimated using DNNs. We then propose a framework that unifies separation and acoustic modeling via joint adaptive training. Since the modules for acoustic modeling and speech separation are implemented using DNNs, unification is done by introducing additional hidden layers with fixed weights and appropriate network architecture. On the CHiME-2 medium-large vocabulary ASR task, and with log mel spectral features as input to the acoustic model, an independently trained ratio masking frontend improves word error rates by 10.9% (relative) compared to the noisy baseline. In comparison, the jointly trained system improves performance by 14.4%. We also experiment with alternative feature representations to augment the standard log mel features, like the noise and speech estimates obtained from the separation module, and the standard feature set used for IRM estimation. Our best system obtains a word error rate of 15.4% (absolute), an improvement of 4.6 percentage points over the next best result on this corpus.

  1. Toward A Dual-Learning Systems Model of Speech Category Learning

    Directory of Open Access Journals (Sweden)

    Bharath eChandrasekaran

    2014-07-01

    Full Text Available More than two decades of work in vision posits the existence of dual-learning systems of category learning. The reflective system uses working memory to develop and test rules for classifying in an explicit fashion, while the reflexive system operates by implicitly associating perception with actions that lead to reinforcement. Dual-learning systems models hypothesize that in learning natural categories, learners initially use the reflective system and, with practice, transfer control to the reflexive system. The role of reflective and reflexive systems in auditory category learning and more specifically in speech category learning has not been systematically examined. In this article we describe a neurobiologically-constrained dual-learning systems theoretical framework that is currently being developed in speech category learning and review recent applications of this framework. Using behavioral and computational modeling approaches, we provide evidence that speech category learning is predominantly mediated by the reflexive learning system. In one application, we explore the effects of normal aging on non-speech and speech category learning. We find an age related deficit in reflective-optimal but not reflexive-optimal auditory category learning. Prominently, we find a large age-related deficit in speech learning. The computational modeling suggests that older adults are less likely to transition from simple, reflective, uni-dimensional rules to more complex, reflexive, multi-dimensional rules. In a second application we summarize a recent study examining auditory category learning in individuals with elevated depressive symptoms. We find a deficit in reflective-optimal and an enhancement in reflexive-optimal auditory category learning. Interestingly, individuals with elevated depressive symptoms also show an advantage in learning speech categories. We end with a brief summary and description of a number of future directions.

  2. Modeling speech imitation and ecological learning of auditory-motor maps

    Directory of Open Access Journals (Sweden)

    Claudia eCanevari

    2013-06-01

    Full Text Available Classical models of speech consider an antero-posterior distinction between perceptive and productive functions. However, the selective alteration of neural activity in speech motor centers, via transcranial magnetic stimulation, was shown to affect speech discrimination. On the automatic speech recognition (ASR side, the recognition systems have classically relied solely on acoustic data, achieving rather good performance in optimal listening conditions. The main limitations of current ASR are mainly evident in the realistic use of such systems. These limitations can be partly reduced by using normalization strategies that minimize inter-speaker variability by either explicitly removing speakers’ peculiarities or adapting different speakers to a reference model. In this paper we aim at modeling a motor-based imitation learning mechanism in ASR. We tested the utility of a speaker normalization strategy that uses motor representations of speech and compare it with strategies that ignore the motor domain. Specifically, we first trained a regressor through state-of-the-art machine learning techniques to build an auditory-motor mapping, in a sense mimicking a human learner that tries to reproduce utterances produced by other speakers. This auditory-motor mapping maps the speech acoustics of a speaker into the motor plans of a reference speaker. Since, during recognition, only speech acoustics are available, the mapping is necessary to recover motor information. Subsequently, in a phone classification task, we tested the system on either one of the speakers that was used during training or a new one. Results show that in both cases the motor-based speaker normalization strategy almost always outperforms all other strategies where only acoustics is taken into account.

  3. Encoding of phonology in a recurrent neural model of grounded speech

    NARCIS (Netherlands)

    Alishahi, Afra; Barking, Marie; Chrupala, Grzegorz; Levy, Roger; Specia, Lucia

    2017-01-01

    We study the representation and encoding of phonemes in a recurrent neural network model of grounded speech. We use a model which processes images and their spoken descriptions, and projects the visual and auditory representations into the same semantic space. We perform a number of analyses on how

  4. An Introduction to Item Response Theory and Rasch Models for Speech-Language Pathologists

    Science.gov (United States)

    Baylor, Carolyn; Hula, William; Donovan, Neila J.; Doyle, Patrick J.; Kendall, Diane; Yorkston, Kathryn

    2011-01-01

    Purpose: To present a primarily conceptual introduction to item response theory (IRT) and Rasch models for speech-language pathologists (SLPs). Method: This tutorial introduces SLPs to basic concepts and terminology related to IRT as well as the most common IRT models. The article then continues with an overview of how instruments are developed…

  5. Discrete dynamic modeling of cellular signaling networks.

    Science.gov (United States)

    Albert, Réka; Wang, Rui-Sheng

    2009-01-01

    Understanding signal transduction in cellular systems is a central issue in systems biology. Numerous experiments from different laboratories generate an abundance of individual components and causal interactions mediating environmental and developmental signals. However, for many signal transduction systems there is insufficient information on the overall structure and the molecular mechanisms involved in the signaling network. Moreover, lack of kinetic and temporal information makes it difficult to construct quantitative models of signal transduction pathways. Discrete dynamic modeling, combined with network analysis, provides an effective way to integrate fragmentary knowledge of regulatory interactions into a predictive mathematical model which is able to describe the time evolution of the system without the requirement for kinetic parameters. This chapter introduces the fundamental concepts of discrete dynamic modeling, particularly focusing on Boolean dynamic models. We describe this method step-by-step in the context of cellular signaling networks. Several variants of Boolean dynamic models including threshold Boolean networks and piecewise linear systems are also covered, followed by two examples of successful application of discrete dynamic modeling in cell biology.

  6. MASCOTTE: analytical model of eddy current signals

    International Nuclear Information System (INIS)

    Delsarte, G.; Levy, R.

    1992-01-01

    Tube examination is a major application of the eddy current technique in the nuclear and petrochemical industries. Such examination configurations being specially adapted to analytical modes, a physical model is developed on portable computers. It includes simple approximations made possible by the effective conditions of the examinations. The eddy current signal is described by an analytical formulation that takes into account the tube dimensions, the sensor conception, the physical characteristics of the defect and the examination parameters. Moreover, the model makes it possible to associate real signals and simulated signals

  7. A Novel DBN Feature Fusion Model for Cross-Corpus Speech Emotion Recognition

    Directory of Open Access Journals (Sweden)

    Zou Cairong

    2016-01-01

    Full Text Available The feature fusion from separate source is the current technical difficulties of cross-corpus speech emotion recognition. The purpose of this paper is to, based on Deep Belief Nets (DBN in Deep Learning, use the emotional information hiding in speech spectrum diagram (spectrogram as image features and then implement feature fusion with the traditional emotion features. First, based on the spectrogram analysis by STB/Itti model, the new spectrogram features are extracted from the color, the brightness, and the orientation, respectively; then using two alternative DBN models they fuse the traditional and the spectrogram features, which increase the scale of the feature subset and the characterization ability of emotion. Through the experiment on ABC database and Chinese corpora, the new feature subset compared with traditional speech emotion features, the recognition result on cross-corpus, distinctly advances by 8.8%. The method proposed provides a new idea for feature fusion of emotion recognition.

  8. Modeling the temporal dynamics of distinctive feature landmark detectors for speech recognition.

    Science.gov (United States)

    Jansen, Aren; Niyogi, Partha

    2008-09-01

    This paper elaborates on a computational model for speech recognition that is inspired by several interrelated strands of research in phonology, acoustic phonetics, speech perception, and neuroscience. The goals are twofold: (i) to explore frameworks for recognition that may provide a viable alternative to the current hidden Markov model (HMM) based speech recognition systems and (ii) to provide a computational platform that will facilitate engaging, quantifying, and testing various theories in the scientific traditions in phonetics, psychology, and neuroscience. This motivation leads to an approach that constructs a hierarchically structured point process representation based on distinctive feature landmark detectors and probabilistically integrates the firing patterns of these detectors to decode a phonological sequence. The accuracy of a broad class recognizer based on this framework is competitive with equivalent HMM-based systems. Various avenues for future development of the presented methodology are outlined.

  9. Language modeling for automatic speech recognition of inflective languages an applications-oriented approach using lexical data

    CERN Document Server

    Donaj, Gregor

    2017-01-01

    This book covers language modeling and automatic speech recognition for inflective languages (e.g. Slavic languages), which represent roughly half of the languages spoken in Europe. These languages do not perform as well as English in speech recognition systems and it is therefore harder to develop an application with sufficient quality for the end user. The authors describe the most important language features for the development of a speech recognition system. This is then presented through the analysis of errors in the system and the development of language models and their inclusion in speech recognition systems, which specifically address the errors that are relevant for targeted applications. The error analysis is done with regard to morphological characteristics of the word in the recognized sentences. The book is oriented towards speech recognition with large vocabularies and continuous and even spontaneous speech. Today such applications work with a rather small number of languages compared to the nu...

  10. A predictive model for diagnosing stroke-related apraxia of speech.

    Science.gov (United States)

    Ballard, Kirrie J; Azizi, Lamiae; Duffy, Joseph R; McNeil, Malcolm R; Halaki, Mark; O'Dwyer, Nicholas; Layfield, Claire; Scholl, Dominique I; Vogel, Adam P; Robin, Donald A

    2016-01-29

    Diagnosis of the speech motor planning/programming disorder, apraxia of speech (AOS), has proven challenging, largely due to its common co-occurrence with the language-based impairment of aphasia. Currently, diagnosis is based on perceptually identifying and rating the severity of several speech features. It is not known whether all, or a subset of the features, are required for a positive diagnosis. The purpose of this study was to assess predictor variables for the presence of AOS after left-hemisphere stroke, with the goal of increasing diagnostic objectivity and efficiency. This population-based case-control study involved a sample of 72 cases, using the outcome measure of expert judgment on presence of AOS and including a large number of independently collected candidate predictors representing behavioral measures of linguistic, cognitive, nonspeech oral motor, and speech motor ability. We constructed a predictive model using multiple imputation to deal with missing data; the Least Absolute Shrinkage and Selection Operator (Lasso) technique for variable selection to define the most relevant predictors, and bootstrapping to check the model stability and quantify the optimism of the developed model. Two measures were sufficient to distinguish between participants with AOS plus aphasia and those with aphasia alone, (1) a measure of speech errors with words of increasing length and (2) a measure of relative vowel duration in three-syllable words with weak-strong stress pattern (e.g., banana, potato). The model has high discriminative ability to distinguish between cases with and without AOS (c-index=0.93) and good agreement between observed and predicted probabilities (calibration slope=0.94). Some caution is warranted, given the relatively small sample specific to left-hemisphere stroke, and the limitations of imputing missing data. These two speech measures are straightforward to collect and analyse, facilitating use in research and clinical settings. Copyright

  11. A Comparison of Speech Sound Intervention Delivered by Telepractice and Side-by-Side Service Delivery Models

    Science.gov (United States)

    Grogan-Johnson, Sue; Schmidt, Anna Marie; Schenker, Jason; Alvares, Robin; Rowan, Lynne E.; Taylor, Jacquelyn

    2013-01-01

    Telepractice has the potential to provide greater access to speech-language intervention services for children with communication impairments. Substantiation of this delivery model is necessary for telepractice to become an accepted alternative delivery model. This study investigated the progress made by school-age children with speech sound…

  12. Mental imagery of speech and movement implicates the dynamics of internal forward models

    Directory of Open Access Journals (Sweden)

    Xing eTian

    2010-10-01

    Full Text Available The classical concept of efference copies in the context of internal forward models has stimulated productive research in cognitive science and neuroscience. There are compelling reasons to argue for such a mechanism, but finding direct evidence in the human brain remains difficult. Here we investigate the dynamics of internal forward models from an unconventional angle: mental imagery, assessed while recording high temporal resolution neuronal activity using magnetoencephalography (MEG. We compare two overt and covert tasks; our covert, mental imagery tasks are unconfounded by overt input/output demands – but in turn necessitate the development of appropriate multi-dimensional topographic analyses. Finger tapping (studies 1-2 and speech experiments (studies 3-5 provide temporally constrained results that implicate the estimation of an efference copy. We suggest that one internal forward model over parietal cortex subserves the kinesthetic feeling in motor imagery. Secondly, observed auditory neural activity ~170 ms after motor estimation in speech experiments (studies 3-5 demonstrates the anticipated auditory consequences of planned motor commands in a second internal forward model in imagery of speech production. Our results provide neurophysiological evidence from the human brain in favor of internal forward models deploying efference copies in somatosensory and auditory cortex, in finger tapping and speech production tasks, respectively, and also suggest the dynamics and sequential updating structure of internal forward models.

  13. A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition

    Science.gov (United States)

    Oh, Yoo Rhee; Kim, Hong Kook

    In this paper, we propose a hybrid model adaptation approach in which pronunciation and acoustic models are adapted by incorporating the pronunciation and acoustic variabilities of non-native speech in order to improve the performance of non-native automatic speech recognition (ASR). Specifically, the proposed hybrid model adaptation can be performed at either the state-tying or triphone-modeling level, depending at which acoustic model adaptation is performed. In both methods, we first analyze the pronunciation variant rules of non-native speakers and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level hybrid method then adapts pronunciation models and acoustic models by accommodating the pronunciation variants in the pronunciation dictionary and by clustering the states of triphone acoustic models using the acoustic variants, respectively. On the other hand, the triphone-modeling level hybrid method initially adapts pronunciation models in the same way as in the state-tying level hybrid method; however, for the acoustic model adaptation, the triphone acoustic models are then re-estimated based on the adapted pronunciation models and the states of the re-estimated triphone acoustic models are clustered using the acoustic variants. From the Korean-spoken English speech recognition experiments, it is shown that ASR systems employing the state-tying and triphone-modeling level adaptation methods can relatively reduce the average word error rates (WERs) by 17.1% and 22.1% for non-native speech, respectively, when compared to a baseline ASR system.

  14. A Neuro-Linguistic Model for Speech Recognition in Tone Language

    African Journals Online (AJOL)

    The primary aim for this work is to develop a speech recognition system that exploits the computational paradigm with learning ability and the inherent robustness and parallelism in ANN coupled with the capability of fuzzy logic to model vagueness, handling uncertainness and support for human reasoning. This research ...

  15. A Test of Some Models of Hemispheric Speech Organization in the Left- and Right-Handed.

    Science.gov (United States)

    Satz, Paul

    1979-01-01

    A new method generates specific predictions concerning the expected frequencies of aphasia after unilateral injury to the brain in the left- and right-handed. These predictions are then compared with the observed data for all known studies between 1935 and 1973 to derive the best-fitting model of hemispheric speech lateralization in the left- and…

  16. A discourse model of affect for text-to-speech synthesis

    CSIR Research Space (South Africa)

    Schlunz, GI

    2013-12-01

    Full Text Available This paper introduces a model of affect to improve prosody in text-to-speech synthesis. It operates on the discourse level of text to predict the underlying linguistic factors that contribute towards emotional appraisal, rather than any particular...

  17. Randomized Controlled Trial of Video Self-Modeling Following Speech Restructuring Treatment for Stuttering

    Science.gov (United States)

    Cream, Angela; O'Brian, Sue; Jones, Mark; Block, Susan; Harrison, Elisabeth; Lincoln, Michelle; Hewat, Sally; Packman, Ann; Menzies, Ross; Onslow, Mark

    2010-01-01

    Purpose: In this study, the authors investigated the efficacy of video self-modeling (VSM) following speech restructuring treatment to improve the maintenance of treatment effects. Method: The design was an open-plan, parallel-group, randomized controlled trial. Participants were 89 adults and adolescents who undertook intensive speech…

  18. Developmental Variables and Speech-Language in a Special Education Intervention Model.

    Science.gov (United States)

    Cruz, Maria del C.; Ayala, Myrna

    Case studies of eight children with speech and language impairments are presented in a review of the intervention efforts at the Demonstration Center for Preschool Special Education (DCPSE) in Puerto Rico. Five components of the intervention model are examined: social medical history, intelligence, motor development, socio-emotional development,…

  19. GAUSSIAN MIXTURE MODELS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS IN AUTOMATIC SPEECH RECOGNITION SYSTEMS

    Directory of Open Access Journals (Sweden)

    Natalia A. Tomashenko

    2016-11-01

    Full Text Available Subject of Research. We study speaker adaptation of deep neural network (DNN acoustic models in automatic speech recognition systems. The aim of speaker adaptation techniques is to improve the accuracy of the speech recognition system for a particular speaker. Method. A novel method for training and adaptation of deep neural network acoustic models has been developed. It is based on using an auxiliary GMM (Gaussian Mixture Models model and GMMD (GMM-derived features. The principle advantage of the proposed GMMD features is the possibility of performing the adaptation of a DNN through the adaptation of the auxiliary GMM. In the proposed approach any methods for the adaptation of the auxiliary GMM can be used, hence, it provides a universal method for transferring adaptation algorithms developed for GMMs to DNN adaptation.Main Results. The effectiveness of the proposed approach was shown by means of one of the most common adaptation algorithms for GMM models – MAP (Maximum A Posteriori adaptation. Different ways of integration of the proposed approach into state-of-the-art DNN architecture have been proposed and explored. Analysis of choosing the type of the auxiliary GMM model is given. Experimental results on the TED-LIUM corpus demonstrate that, in an unsupervised adaptation mode, the proposed adaptation technique can provide, approximately, a 11–18% relative word error reduction (WER on different adaptation sets, compared to the speaker-independent DNN system built on conventional features, and a 3–6% relative WER reduction compared to the SAT-DNN trained on fMLLR adapted features.

  20. Development of simple fixed linear predictors for use in speech ...

    African Journals Online (AJOL)

    Development of simple fixed linear predictors for use in speech compression. ... A very popular method used for compression is Linear Prediction Coding (LPC), by using the Linear Prediction Model. The development of simple ... Various speech signals are used to test the performance of both filters within a DPCM system.

  1. A Posterior Union Model with Applications to Robust Speech and Speaker Recognition

    Directory of Open Access Journals (Sweden)

    Lin Jie

    2006-01-01

    Full Text Available This paper investigates speech and speaker recognition involving partial feature corruption, assuming unknown, time-varying noise characteristics. The probabilistic union model is extended from a conditional-probability formulation to a posterior-probability formulation as an improved solution to the problem. The new formulation allows the order of the model to be optimized for every single frame, thereby enhancing the capability of the model for dealing with nonstationary noise corruption. The new formulation also allows the model to be readily incorporated into a Gaussian mixture model (GMM for speaker recognition. Experiments have been conducted on two databases: TIDIGITS and SPIDRE, for speech recognition and speaker identification. Both databases are subject to unknown, time-varying band-selective corruption. The results have demonstrated the improved robustness for the new model.

  2. Improving Language Models in Speech-Based Human-Machine Interaction

    Directory of Open Access Journals (Sweden)

    Raquel Justo

    2013-02-01

    Full Text Available This work focuses on speech-based human-machine interaction. Specifically, a Spoken Dialogue System (SDS that could be integrated into a robot is considered. Since Automatic Speech Recognition is one of the most sensitive tasks that must be confronted in such systems, the goal of this work is to improve the results obtained by this specific module. In order to do so, a hierarchical Language Model (LM is considered. Different series of experiments were carried out using the proposed models over different corpora and tasks. The results obtained show that these models provide greater accuracy in the recognition task. Additionally, the influence of the Acoustic Modelling (AM in the improvement percentage of the Language Models has also been explored. Finally the use of hierarchical Language Models in a language understanding task has been successfully employed, as shown in an additional series of experiments.

  3. Treatment Model in Children with Speech Disorders and Its Therapeutic Efficiency

    Directory of Open Access Journals (Sweden)

    Barberena, Luciana

    2014-05-01

    Full Text Available Introduction Speech articulation disorders affect the intelligibility of speech. Studies on therapeutic models show the effectiveness of the communication treatment. Objective To analyze the progress achieved by treatment with the ABAB—Withdrawal and Multiple Probes Model in children with different degrees of phonological disorders. Methods The diagnosis of speech articulation disorder was determined by speech and hearing evaluation and complementary tests. The subjects of this research were eight children, with the average age of 5:5. The children were distributed into four groups according to the degrees of the phonological disorders, based on the percentage of correct consonants, as follows: severe, moderate to severe, mild to moderate, and mild. The phonological treatment applied was the ABAB—Withdrawal and Multiple Probes Model. The development of the therapy by generalization was observed through the comparison between the two analyses: contrastive and distinctive features at the moment of evaluation and reevaluation. Results The following types of generalization were found: to the items not used in the treatment (other words, to another position in the word, within a sound class, to other classes of sounds, and to another syllable structure. Conclusion The different types of generalization studied showed the expansion of production and proper use of therapy-trained targets in other contexts or untrained environments. Therefore, the analysis of the generalizations proved to be an important criterion to measure the therapeutic efficacy.

  4. Functional connectivity in the dorsal stream and between bilateral auditory-related cortical areas differentially contribute to speech decoding depending on spectro-temporal signal integrity and performance.

    Science.gov (United States)

    Elmer, Stefan; Kühnis, Jürg; Rauch, Piyush; Abolfazl Valizadeh, Seyed; Jäncke, Lutz

    2017-11-01

    Speech processing relies on the interdependence between auditory perception, sensorimotor integration, and verbal memory functions. Functional and structural connectivity between bilateral auditory-related cortical areas (ARCAs) facilitates spectro-temporal analyses, whereas the dynamic interplay between ARCAs and Broca's area (i.e., dorsal pathway) contributes to verbal memory functions, articulation, and sound-to-motor mapping. However, it remains unclear whether these two neural circuits are preferentially driven by spectral or temporal acoustic information, and whether their recruitment is predictive of speech perception performance and learning. Therefore, we evaluated EEG-based intracranial (eLORETA) functional connectivity (lagged coherence) in both pathways (i.e., between bilateral ARCAs and in the dorsal stream) while good- (GPs, N = 12) and poor performers (PPs, N = 13) learned to decode natural pseudowords (CLEAN) or comparable items (speech-noise chimeras) manipulated in the envelope (ENV) or in the fine-structure (FS). Learning to decode degraded speech was generally associated with increased functional connectivity in the theta, alpha, and beta frequency range in both circuits. Furthermore, GPs exhibited increased connectivity in the left dorsal stream compared to PPs, but only during the FS condition and in the theta frequency band. These results suggest that both pathways contribute to the decoding of spectro-temporal degraded speech by increasing the communication between brain regions involved in perceptual analyses and verbal memory functions. Otherwise, the left-hemispheric recruitment of the dorsal stream in GPs during the FS condition points to a contribution of this pathway to articulatory-based memory processes that are dependent on the temporal integrity of the speech signal. These results enable to better comprehend the neural circuits underlying word-learning as a function of temporal and spectral signal integrity and performance

  5. Comparison of Speech-in-Noise and Localization Benefits in Unilateral Hearing Loss Subjects Using Contralateral Routing of Signal Hearing Aids or Bone-Anchored Implants.

    Science.gov (United States)

    Snapp, Hillary A; Holt, Fred D; Liu, Xuezhong; Rajguru, Suhrud M

    2017-01-01

    To compare the benefit of wireless contralateral routing of signal (CROS) technology to bone-anchored implant (BAI) technology in monaural listeners. Prospective, single-subject. Tertiary academic referral center. Adult English speaking subjects using either a CROS hearing aid or BAI as treatment for unilateral severe-profound hearing loss. Aided performance utilizing the subjects BAI or CROS hearing device. Outcome measures included speech-in-noise perception using the QuickSIN (Etymotic Research, Elkgrove Village, IL, 2001) speech-in-noise test and localization ability using narrow and broadband stimuli. Performance was measured in the unaided and aided condition and compared with normal hearing controls. Subjective outcomes measures included the Speech Spatial and Qualities hearing scale and the Glasgow Hearing Aid Benefit Profile. A significant improvement in speech-in-noise performance for monaural listeners (p hearing aid users. No significant difference was observed between treatment groups for subjective measures of post-treatment residual disability or satisfaction. Our data demonstrate that both CROS and BAI systems provide significant benefit for monaural listeners. There is no significant difference between CROS or BAI systems for objective measures of speech-in-noise performance. CROS and BAI hearing devices do not provide any localization benefit in the horizontal plane for monaural listeners and there is no significant difference between systems.

  6. The early maximum likelihood estimation model of audiovisual integration in speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias

    2015-01-01

    integration to speech perception along with three model variations. In early MLE, integration is based on a continuous internal representation before categorization, which can make the model more parsimonious by imposing constraints that reflect experimental designs. The study also shows that cross......Speech perception is facilitated by seeing the articulatory mouth movements of the talker. This is due to perceptual audiovisual integration, which also causes the McGurk−MacDonald illusion, and for which a comprehensive computational account is still lacking. Decades of research have largely......-validation can evaluate models of audiovisual integration based on typical data sets taking both goodness-of-fit and model flexibility into account. All models were tested on a published data set previously used for testing the FLMP. Cross-validation favored the early MLE while more conventional error measures...

  7. Study of the vocal signal in the amplitude-time representation. Speech segmentation and recognition algorithms

    International Nuclear Information System (INIS)

    Baudry, Marc

    1978-01-01

    This dissertation exposes an acoustical and phonetical study of vocal signal. The complex pattern of the signal is segmented into simple sub-patterns and each one of these sub-patterns may be segmented again into another more simplest patterns with lower level. Application of pattern recognition techniques facilitates on one hand this segmentation and on the other hand the definition of the structural relations between the sub-patterns. Particularly, we have developed syntactic techniques in which the rewriting rules, context-sensitive, are controlled by predicates using parameters evaluated on the sub-patterns themselves. This allow to generalize a pure syntactic analysis by adding a semantic information. The system we expose, realizes pre-classification and a partial identification of the phonemes as also the accurate detection of each pitch period. The voice signal is analysed directly using the amplitude-time representation. This system has been implemented on a mini-computer and it works in the real time. (author) [fr

  8. Patterns of flavor signals in supersymmetric models

    Energy Technology Data Exchange (ETDEWEB)

    Goto, T. [KEK National High Energy Physics, Tsukuba (Japan)]|[Kyoto Univ. (Japan). YITP; Okada, Y. [KEK National High Energy Physics, Tsukuba (Japan)]|[Graduate Univ. for Advanced Studies, Tsukuba (Japan). Dept. of Particle and Nucelar Physics; Shindou, T. [Deutsches Elektronen-Synchrotron (DESY), Hamburg (Germany)]|[International School for Advanced Studies, Trieste (Italy); Tanaka, M. [Osaka Univ., Toyonaka (Japan). Dept. of Physics

    2007-11-15

    Quark and lepton flavor signals are studied in four supersymmetric models, namely the minimal supergravity model, the minimal supersymmetric standard model with right-handed neutrinos, SU(5) supersymmetric grand unified theory with right-handed neutrinos and the minimal supersymmetric standard model with U(2) flavor symmetry. We calculate b{yields}s(d) transition observables in B{sub d} and B{sub s} decays, taking the constraint from the B{sub s}- anti B{sub s} mixing recently observed at Tevatron into account. We also calculate lepton flavor violating processes {mu} {yields} e{gamma}, {tau} {yields} {mu}{gamma} and {tau} {yields} e{gamma} for the models with right-handed neutrinos. We investigate possibilities to distinguish the flavor structure of the supersymmetry breaking sector with use of patterns of various flavor signals which are expected to be measured in experiments such as MEG, LHCb and a future Super B Factory. (orig.)

  9. Patterns of flavor signals in supersymmetric models

    International Nuclear Information System (INIS)

    Goto, T.; Tanaka, M.

    2007-11-01

    Quark and lepton flavor signals are studied in four supersymmetric models, namely the minimal supergravity model, the minimal supersymmetric standard model with right-handed neutrinos, SU(5) supersymmetric grand unified theory with right-handed neutrinos and the minimal supersymmetric standard model with U(2) flavor symmetry. We calculate b→s(d) transition observables in B d and B s decays, taking the constraint from the B s - anti B s mixing recently observed at Tevatron into account. We also calculate lepton flavor violating processes μ → eγ, τ → μγ and τ → eγ for the models with right-handed neutrinos. We investigate possibilities to distinguish the flavor structure of the supersymmetry breaking sector with use of patterns of various flavor signals which are expected to be measured in experiments such as MEG, LHCb and a future Super B Factory. (orig.)

  10. Clinical and MRI models predicting amyloid deposition in progressive aphasia and apraxia of speech

    Directory of Open Access Journals (Sweden)

    Jennifer L. Whitwell

    2016-01-01

    Full Text Available Beta-amyloid (Aβ deposition can be observed in primary progressive aphasia (PPA and progressive apraxia of speech (PAOS. While it is typically associated with logopenic PPA, there are exceptions that make predicting Aβ status challenging based on clinical diagnosis alone. We aimed to determine whether MRI regional volumes or clinical data could help predict Aβ deposition. One hundred and thirty-nine PPA (n = 97; 15 agrammatic, 53 logopenic, 13 semantic and 16 unclassified and PAOS (n = 42 subjects were prospectively recruited into a cross-sectional study and underwent speech/language assessments, 3.0 T MRI and C11-Pittsburgh Compound B PET. The presence of Aβ was determined using a 1.5 SUVR cut-point. Atlas-based parcellation was used to calculate gray matter volumes of 42 regions-of-interest across the brain. Penalized binary logistic regression was utilized to determine what combination of MRI regions, and what combination of speech and language tests, best predicts Aβ (+ status. The optimal MRI model and optimal clinical model both performed comparably in their ability to accurately classify subjects according to Aβ status. MRI accurately classified 81% of subjects using 14 regions. Small left superior temporal and inferior parietal volumes and large left Broca's area volumes were particularly predictive of Aβ (+ status. Clinical scores accurately classified 83% of subjects using 12 tests. Phonological errors and repetition deficits, and absence of agrammatism and motor speech deficits were particularly predictive of Aβ (+ status. In comparison, clinical diagnosis was able to accurately classify 89% of subjects. However, the MRI model performed well in predicting Aβ deposition in unclassified PPA. Clinical diagnosis provides optimum prediction of Aβ status at the group level, although regional MRI measurements and speech and language testing also performed well and could have advantages in predicting Aβ status in unclassified

  11. Clinical and MRI models predicting amyloid deposition in progressive aphasia and apraxia of speech.

    Science.gov (United States)

    Whitwell, Jennifer L; Weigand, Stephen D; Duffy, Joseph R; Strand, Edythe A; Machulda, Mary M; Senjem, Matthew L; Gunter, Jeffrey L; Lowe, Val J; Jack, Clifford R; Josephs, Keith A

    2016-01-01

    Beta-amyloid (Aβ) deposition can be observed in primary progressive aphasia (PPA) and progressive apraxia of speech (PAOS). While it is typically associated with logopenic PPA, there are exceptions that make predicting Aβ status challenging based on clinical diagnosis alone. We aimed to determine whether MRI regional volumes or clinical data could help predict Aβ deposition. One hundred and thirty-nine PPA (n = 97; 15 agrammatic, 53 logopenic, 13 semantic and 16 unclassified) and PAOS (n = 42) subjects were prospectively recruited into a cross-sectional study and underwent speech/language assessments, 3.0 T MRI and C11-Pittsburgh Compound B PET. The presence of Aβ was determined using a 1.5 SUVR cut-point. Atlas-based parcellation was used to calculate gray matter volumes of 42 regions-of-interest across the brain. Penalized binary logistic regression was utilized to determine what combination of MRI regions, and what combination of speech and language tests, best predicts Aβ (+) status. The optimal MRI model and optimal clinical model both performed comparably in their ability to accurately classify subjects according to Aβ status. MRI accurately classified 81% of subjects using 14 regions. Small left superior temporal and inferior parietal volumes and large left Broca's area volumes were particularly predictive of Aβ (+) status. Clinical scores accurately classified 83% of subjects using 12 tests. Phonological errors and repetition deficits, and absence of agrammatism and motor speech deficits were particularly predictive of Aβ (+) status. In comparison, clinical diagnosis was able to accurately classify 89% of subjects. However, the MRI model performed well in predicting Aβ deposition in unclassified PPA. Clinical diagnosis provides optimum prediction of Aβ status at the group level, although regional MRI measurements and speech and language testing also performed well and could have advantages in predicting Aβ status in unclassified PPA subjects.

  12. Computer models of vocal tract evolution: an overview and critique

    NARCIS (Netherlands)

    de Boer, B.; Fitch, W. T.

    2010-01-01

    Human speech has been investigated with computer models since the invention of digital computers, and models of the evolution of speech first appeared in the late 1960s and early 1970s. Speech science and computer models have a long shared history because speech is a physical signal and can be

  13. Multilevel Analysis in Analyzing Speech Data

    Science.gov (United States)

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  14. Signal modelling in speaker recognition | Sharma | Botswana ...

    African Journals Online (AJOL)

    This paper introduces recent advancements in speaker recognition system. It incorporates the background concepts and emphasizes the signal model design technique most commonly used in the state-of-the-art of speaker recognition system. An analytical description is provided of procedures applied in speaker ...

  15. Stochastic models of intracellular calcium signals

    Energy Technology Data Exchange (ETDEWEB)

    Rüdiger, Sten, E-mail: sten.ruediger@physik.hu-berlin.de

    2014-01-10

    Cellular signaling operates in a noisy environment shaped by low molecular concentrations and cellular heterogeneity. For calcium release through intracellular channels–one of the most important cellular signaling mechanisms–feedback by liberated calcium endows fluctuations with critical functions in signal generation and formation. In this review it is first described, under which general conditions the environment makes stochasticity relevant, and which conditions allow approximating or deterministic equations. This analysis provides a framework, in which one can deduce an efficient hybrid description combining stochastic and deterministic evolution laws. Within the hybrid approach, Markov chains model gating of channels, while the concentrations of calcium and calcium binding molecules (buffers) are described by reaction–diffusion equations. The article further focuses on the spatial representation of subcellular calcium domains related to intracellular calcium channels. It presents analysis for single channels and clusters of channels and reviews the effects of buffers on the calcium release. For clustered channels, we discuss the application and validity of coarse-graining as well as approaches based on continuous gating variables (Fokker–Planck and chemical Langevin equations). Comparison with recent experiments substantiates the stochastic and spatial approach, identifies minimal requirements for a realistic modeling, and facilitates an understanding of collective channel behavior. At the end of the review, implications of stochastic and local modeling for the generation and properties of cell-wide release and the integration of calcium dynamics into cellular signaling models are discussed.

  16. The speech-based envelope power spectrum model (sEPSM) family: Development, achievements, and current challenges

    DEFF Research Database (Denmark)

    Relano-Iborra, Helia; Chabot-Leclerc, Alexandre; Scheidiger, Christoph

    2017-01-01

    have extended the predictive power of the original model to a broad range of conditions. This contribution presents the most recent developments within the sEPSM “family:” (i) A binaural extension, the B-sEPSM [Chabot-Leclerc et al. (2016). J. Acoust. Soc. Am. 140(1), 192-205] which combines better......Intelligibility models provide insights regarding the effects of target speech characteristics, transmission channels and/or auditory processing on the speech perception performance of listeners. In 2011, Jørgensen and Dau proposed the speech-based envelope power spectrum model [sEPSM, Jørgensen...

  17. Speech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering

    Directory of Open Access Journals (Sweden)

    M. H. Savoji

    2014-09-01

    Full Text Available Gaussian Mixture Models (GMMs of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equations whose solutions lead to the first estimates of speech and noise power spectra. The noise source is also identified and the input SNR estimated in this first step. These first estimates are then refined using approximate but explicit MMSE and MAP estimation formulations. The refined estimates are then used in a Wiener filter to reduce noise and enhance the noisy speech. The proposed schemes show good results. Nevertheless, it is shown that the MAP explicit solution, introduced here for the first time, reduces the computation time to less than one third with a slight higher improvement in SNR and PESQ score and also less distortion in comparison to the MMSE solution.

  18. Emergence of language structures from exposure to visually grounded speech signal

    NARCIS (Netherlands)

    Chrupala, Grzegorz; Alishahi, Afra; Gelderloos, Lieke

    2017-01-01

    A variety of computational models can learn meanings of words and sentences from exposure to word sequences coupled with the perceptual context in which they occur. More recently, neural network models have been applied to more naturalistic and more challenging versions of this problem: for example

  19. Shortlist: A Connectionist Model of Continuous Speech Recognition.

    Science.gov (United States)

    Norris, Dennis

    1994-01-01

    The Shortlist model is presented, which incorporates the desirable properties of earlier models of back-propagation networks with recurrent connections that successfully model many aspects of human spoken word recognition. The new model is entirely bottom-up and can readily perform simulations with vocabularies of tens of thousands of words. (DR)

  20. Speech quality estimation of voice over internet protocol codec using a packet loss impairment model.

    Science.gov (United States)

    Lee, Min-Ki; Kang, Hong-Goo

    2013-11-01

    This letter proposes a degradation and cognition model to estimate speech quality impairment because of packet loss concealment (PLC) algorithm implemented in the speech CODEC SILK. By considering the fact that the quality degradation caused by packet loss is highly related to the PLC algorithm, the impact of quality degradation on various types of previous and lost packet classes is analyzed. Then, the PLC effects to the proposed class types are measured by the class conditional expectation of the degradation scores. Finally, the cognition module is derived to estimate the total quality degradation in a mean opinion score (MOS) scale. When assessed for correlation with subject test results, the correlation coefficient of the encoder-based class model is 0.93, and that of the decoder-based model is 0.87.

  1. Modeling High-Dimensional Multichannel Brain Signals

    KAUST Repository

    Hu, Lechuan

    2017-12-12

    Our goal is to model and measure functional and effective (directional) connectivity in multichannel brain physiological signals (e.g., electroencephalograms, local field potentials). The difficulties from analyzing these data mainly come from two aspects: first, there are major statistical and computational challenges for modeling and analyzing high-dimensional multichannel brain signals; second, there is no set of universally agreed measures for characterizing connectivity. To model multichannel brain signals, our approach is to fit a vector autoregressive (VAR) model with potentially high lag order so that complex lead-lag temporal dynamics between the channels can be captured. Estimates of the VAR model will be obtained by our proposed hybrid LASSLE (LASSO + LSE) method which combines regularization (to control for sparsity) and least squares estimation (to improve bias and mean-squared error). Then we employ some measures of connectivity but put an emphasis on partial directed coherence (PDC) which can capture the directional connectivity between channels. PDC is a frequency-specific measure that explains the extent to which the present oscillatory activity in a sender channel influences the future oscillatory activity in a specific receiver channel relative to all possible receivers in the network. The proposed modeling approach provided key insights into potential functional relationships among simultaneously recorded sites during performance of a complex memory task. Specifically, this novel method was successful in quantifying patterns of effective connectivity across electrode locations, and in capturing how these patterns varied across trial epochs and trial types.

  2. Modeling high dimensional multichannel brain signals

    KAUST Repository

    Hu, Lechuan

    2017-03-27

    In this paper, our goal is to model functional and effective (directional) connectivity in network of multichannel brain physiological signals (e.g., electroencephalograms, local field potentials). The primary challenges here are twofold: first, there are major statistical and computational difficulties for modeling and analyzing high dimensional multichannel brain signals; second, there is no set of universally-agreed measures for characterizing connectivity. To model multichannel brain signals, our approach is to fit a vector autoregressive (VAR) model with sufficiently high order so that complex lead-lag temporal dynamics between the channels can be accurately characterized. However, such a model contains a large number of parameters. Thus, we will estimate the high dimensional VAR parameter space by our proposed hybrid LASSLE method (LASSO+LSE) which is imposes regularization on the first step (to control for sparsity) and constrained least squares estimation on the second step (to improve bias and mean-squared error of the estimator). Then to characterize connectivity between channels in a brain network, we will use various measures but put an emphasis on partial directed coherence (PDC) in order to capture directional connectivity between channels. PDC is a directed frequency-specific measure that explains the extent to which the present oscillatory activity in a sender channel influences the future oscillatory activity in a specific receiver channel relative all possible receivers in the network. Using the proposed modeling approach, we have achieved some insights on learning in a rat engaged in a non-spatial memory task.

  3. Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach

    Directory of Open Access Journals (Sweden)

    B. Yegnanarayana

    2007-01-01

    Full Text Available Speech recorded from a throat microphone is robust to the surrounding noise, but sounds unnatural unlike the speech recorded from a close-speaking microphone. This paper addresses the issue of improving the perceptual quality of the throat microphone speech by mapping the speech spectra from the throat microphone to the close-speaking microphone. A neural network model is used to capture the speaker-dependent functional relationship between the feature vectors (cepstral coefficients of the two speech signals. A method is proposed to ensure the stability of the all-pole synthesis filter. Objective evaluations indicate the effectiveness of the proposed mapping scheme. The advantage of this method is that the model gives a smooth estimate of the spectra of the close-speaking microphone speech. No distortions are perceived in the reconstructed speech. This mapping technique is also used for bandwidth extension of telephone speech.

  4. Bandwidth Extension of Telephone Speech Aided by Data Embedding

    Directory of Open Access Journals (Sweden)

    David Malah

    2007-01-01

    Full Text Available A system for bandwidth extension of telephone speech, aided by data embedding, is presented. The proposed system uses the transmitted analog narrowband speech signal as a carrier of the side information needed to carry out the bandwidth extension. The upper band of the wideband speech is reconstructed at the receiving end from two components: a synthetic wideband excitation signal, generated from the narrowband telephone speech and a wideband spectral envelope, parametrically represented and transmitted as embedded data in the telephone speech. We propose a novel data embedding scheme, in which the scalar Costa scheme is combined with an auditory masking model allowing high rate transparent embedding, while maintaining a low bit error rate. The signal is transformed to the frequency domain via the discrete Hartley transform (DHT and is partitioned into subbands. Data is embedded in an adaptively chosen subset of subbands by modifying the DHT coefficients. In our simulations, high quality wideband speech was obtained from speech transmitted over a telephone line (characterized by spectral magnitude distortion, dispersion, and noise, in which side information data is transparently embedded at the rate of 600 information bits/second and with a bit error rate of approximately 3⋅10−4. In a listening test, the reconstructed wideband speech was preferred (at different degrees over conventional telephone speech in 92.5% of the test utterances.

  5. Bandwidth Extension of Telephone Speech Aided by Data Embedding

    Directory of Open Access Journals (Sweden)

    Sagi Ariel

    2007-01-01

    Full Text Available A system for bandwidth extension of telephone speech, aided by data embedding, is presented. The proposed system uses the transmitted analog narrowband speech signal as a carrier of the side information needed to carry out the bandwidth extension. The upper band of the wideband speech is reconstructed at the receiving end from two components: a synthetic wideband excitation signal, generated from the narrowband telephone speech and a wideband spectral envelope, parametrically represented and transmitted as embedded data in the telephone speech. We propose a novel data embedding scheme, in which the scalar Costa scheme is combined with an auditory masking model allowing high rate transparent embedding, while maintaining a low bit error rate. The signal is transformed to the frequency domain via the discrete Hartley transform (DHT and is partitioned into subbands. Data is embedded in an adaptively chosen subset of subbands by modifying the DHT coefficients. In our simulations, high quality wideband speech was obtained from speech transmitted over a telephone line (characterized by spectral magnitude distortion, dispersion, and noise, in which side information data is transparently embedded at the rate of 600 information bits/second and with a bit error rate of approximately . In a listening test, the reconstructed wideband speech was preferred (at different degrees over conventional telephone speech in of the test utterances.

  6. Hybrid model decomposition of speech and noise in a radial basis function neural model framework

    DEFF Research Database (Denmark)

    Sørensen, Helge Bjarup Dissing; Hartmann, Uwe

    1994-01-01

    The aim of the paper is to focus on a new approach to automatic speech recognition in noisy environments where the noise has either stationary or non-stationary statistical characteristics. The aim is to perform automatic recognition of speech in the presence of additive car noise. The technique...

  7. Speaker specificity in speech perception: the importance of what is and is not in the signal

    Science.gov (United States)

    Dahan, Delphine; Scarborough, Rebecca A.

    2005-09-01

    In some American English dialects, /ae/ before /g/ (but not before /k/) raises to a vowel approaching [E], in effect reducing phonetic overlap between (e.g.) ``bag'' and ``back.'' Here, participants saw four written words on a computer screen (e.g., ``bag,'' ``back,'' ``dog,'' ``dock'') and heard a spoken word. Their task was to indicate which word they heard. Participants' eye movements to the written words were recorded. Participants in the ``ae-raising'' group heard identity-spliced ``bag''-like words containing the raised vowel [E] participants in the ``control'' group heard cross-spliced ``bag''-like words containing standard [ae]. Acoustically identical ``back''-like words were subsequently presented to both groups. The ae-raising-group participants identified ``back''-like words faster and more accurately, and made fewer fixations to the competitor ``bag,'' than control-group participants did. Thus, exposure to ae-raised realizations of ``bag'' facilitated the identification of ``back'' because of the reduced fit between the input and the altered representation of the competing hypothesis ``bag.'' This demonstrates that listeners evaluate the spoken input with respect to what is, but also what is not, in the signal, and that this evaluation involves speaker-specific representations. [Work supported by NSF Human and Social Dynamics 0433567.

  8. The Role of Empirical Evidence in Modeling Speech Segmentation

    Science.gov (United States)

    Phillips, Lawrence

    2015-01-01

    Choosing specific implementational details is one of the most important aspects of creating and evaluating a model. In order to properly model cognitive processes, choices for these details must be made based on empirical research. Unfortunately, modelers are often forced to make decisions in the absence of relevant data. My work investigates the…

  9. Robust Transmission of Speech LSFs Using Hidden Markov Model-Based Multiple Description Index Assignments

    Directory of Open Access Journals (Sweden)

    Rondeau Paul

    2008-01-01

    Full Text Available Speech coding techniques capable of generating encoded representations which are robust against channel losses play an important role in enabling reliable voice communication over packet networks and mobile wireless systems. In this paper, we investigate the use of multiple description index assignments (MDIAs for loss-tolerant transmission of line spectral frequency (LSF coefficients, typically generated by state-of-the-art speech coders. We propose a simulated annealing-based approach for optimizing MDIAs for Markov-model-based decoders which exploit inter- and intraframe correlations in LSF coefficients to reconstruct the quantized LSFs from coded bit streams corrupted by channel losses. Experimental results are presented which compare the performance of a number of novel LSF transmission schemes. These results clearly demonstrate that Markov-model-based decoders, when used in conjunction with optimized MDIA, can yield average spectral distortion much lower than that produced by methods such as interleaving/interpolation, commonly used to combat the packet losses.

  10. Robust Transmission of Speech LSFs Using Hidden Markov Model-Based Multiple Description Index Assignments

    Directory of Open Access Journals (Sweden)

    Pradeepa Yahampath

    2008-03-01

    Full Text Available Speech coding techniques capable of generating encoded representations which are robust against channel losses play an important role in enabling reliable voice communication over packet networks and mobile wireless systems. In this paper, we investigate the use of multiple description index assignments (MDIAs for loss-tolerant transmission of line spectral frequency (LSF coefficients, typically generated by state-of-the-art speech coders. We propose a simulated annealing-based approach for optimizing MDIAs for Markov-model-based decoders which exploit inter- and intraframe correlations in LSF coefficients to reconstruct the quantized LSFs from coded bit streams corrupted by channel losses. Experimental results are presented which compare the performance of a number of novel LSF transmission schemes. These results clearly demonstrate that Markov-model-based decoders, when used in conjunction with optimized MDIA, can yield average spectral distortion much lower than that produced by methods such as interleaving/interpolation, commonly used to combat the packet losses.

  11. The Multi-Structural Model of Speech and Language Development in the Aspect of Holistic Approach

    OpenAIRE

    Tomele, Gundega

    2015-01-01

    The article analyses the theories on the child's language acquisition and development process (psychological nativism, cognitivism, interactionism, bihaviorism), and it is concluded that the various models of language acquisition raised in these theories depend on language development stage and its representative factors - the dominant neural processes, language acquisition strategies and the results in a context of language development. Speech and language development and their interconnecti...

  12. Multilingual Techniques for Low Resource Automatic Speech Recognition

    Science.gov (United States)

    2016-05-20

    networks for acoustic modeling in speech recognition. In IEEE Signal Processing Magazine, volume 28, pages 82–97, November 2012. 27, 42, 44, 47 [36] G...applications. IEEE Trans. on Acoustics, Speech and Signal Processing , 23(3):283–296, 1975. 36 [55] J. Mamou, J. Cui, X. Cui, M. J. Gales, B. Kingsbury, K. Knill...Mandy, Marcia, Michael, Mitch, Mitra , Najim, Patrick, Scott, Sean, Sree, Stephanie, Stephen, Timo, Tuka, Wei-Ning, William, Xue, Yaodong, Yonatan, Yu

  13. Speech intelligibility and recall of first and second language words heard at different signal-to-noise ratios

    Directory of Open Access Journals (Sweden)

    Staffan eHygge

    2015-09-01

    Full Text Available Free recall of spoken words in Swedish (native tongue and English were assessed in two signal-to-noise ratio (SNR conditions (+3 and +12 dB, with and without half of the heard words being repeated back orally directly after presentation (shadowing, speech intelligibility, (SI. A total of 24 wordlists with 12 words each were presented in English and in Swedish to Swedish speaking college students. Pre-experimental measures of working memory capacity (OSPAN were taken.A basic hypothesis was that the recall of the words would be impaired when the encoding of the words required more processing resources, thereby depleting working memory resources. This would be the case when the SNR was low or when the language was English. A low SNR was also expected to impair SI, but we wanted to compare the sizes of the SNR-effects on SI and recall. A low score on working memory capacity was expected to further add to the negative effects of SNR and Language on both SI and recall.The results indicated that SNR had strong effects on both SI and recall, but also that the effect size was larger for recall than for SI. Language had a main effect on recall, but not on SI. The shadowing procedure had different effects on recall of the early and late parts of the word lists. Working memory capacity was unimportant for the effect on SI and recall.Thus, recall appear to be a more sensitive indicator than SI for the acoustics of learning, which has implications for building codes and recommendations concerning classrooms and other workplaces where both hearing and learning is important.

  14. Toward a Model of Pediatric Speech Sound Disorders (SSD) for Differential Diagnosis and Therapy Planning

    NARCIS (Netherlands)

    Terband, Hayo; Maassen, Bernardus; Maas, Edwin; van Lieshout, Pascal; Maassen, Ben; Terband, Hayo

    2016-01-01

    The classification and differentiation of pediatric speech sound disorders (SSD) is one of the main questions in the field of speech- and language pathology. Terms for classifying childhood and SSD and motor speech disorders (MSD) refer to speech production processes, and a variety of methods of

  15. Logic integer programming models for signaling networks.

    Science.gov (United States)

    Haus, Utz-Uwe; Niermann, Kathrin; Truemper, Klaus; Weismantel, Robert

    2009-05-01

    We propose a static and a dynamic approach to model biological signaling networks, and show how each can be used to answer relevant biological questions. For this, we use the two different mathematical tools of Propositional Logic and Integer Programming. The power of discrete mathematics for handling qualitative as well as quantitative data has so far not been exploited in molecular biology, which is mostly driven by experimental research, relying on first-order or statistical models. The arising logic statements and integer programs are analyzed and can be solved with standard software. For a restricted class of problems the logic models reduce to a polynomial-time solvable satisfiability algorithm. Additionally, a more dynamic model enables enumeration of possible time resolutions in poly-logarithmic time. Computational experiments are included.

  16. Ear, Hearing and Speech

    DEFF Research Database (Denmark)

    Poulsen, Torben

    2000-01-01

    An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...

  17. Enhanced Modified Bark Spectral Distortion (EMBSD): An objective speech quality measure based on audible distortion and cognition model

    Science.gov (United States)

    Yang, Wonho

    The Speech Processing Lab at Temple University developed an objective speech quality measure called the Modified Bark Spectral Distortion (MBSD). The MBSD uses auditory perception models derived from psychoacoustic studies. The MBSD measure extends the Bark Spectral Distortion (BSD) method by incorporating noise making threshold to differentiate audible/inaudible distortions. The performance of the MBSD was comparable to that of the ITU-T Recommendation P.861 for various coding distortions. Based on the experiments with Time Division Multiple Access (TDMA) data that contains distortions encountered in real network applications, modifications have been made to the MBSD algorithm. These are: use of the first 15 loudness components, normalization of loudness vectors, deletion of the spreading function in the noise masking threshold calculation, and use of a new cognition model based on postmasking effects. The Enhanced MBSD (EMBSD) shows significant improvement over the MBSD for TDMA data. Also, the performance of the EMBSD is better than that of the ITU-T Recommendation P.861 and Measuring Normalizing Blocks (MNB) measures for TDMA data. The performance of the EMBSD was compared to various other objective speech quality measures with the speech data including a wide range of distortion conditions. The EMBSD showed clear improvement over the MBSD and had the correlation coefficient of 0.89 for the conditions of MNRUs, codecs, tandem cases, bit errors, and frame erasures. Mean Opinion Score (MOS) has been used to evaluate objective speech quality measures. Recognizing the procedural difference between the MOS test and current objective speech quality measures, it is proposed that current objective speech quality measures should be evaluated with Degradation Mean Opinion Score (DMOS). The Pearson product-moment correlation coefficient has been the main performance parameter for evaluation of objective speech quality measures. The Standard Error of the Estimates (SEE

  18. Matrix sentence intelligibility prediction using an automatic speech recognition system.

    Science.gov (United States)

    Schädler, Marc René; Warzybok, Anna; Hochmuth, Sabine; Kollmeier, Birger

    2015-01-01

    The feasibility of predicting the outcome of the German matrix sentence test for different types of stationary background noise using an automatic speech recognition (ASR) system was studied. Speech reception thresholds (SRT) of 50% intelligibility were predicted in seven noise conditions. The ASR system used Mel-frequency cepstral coefficients as a front-end and employed whole-word Hidden Markov models on the back-end side. The ASR system was trained and tested with noisy matrix sentences on a broad range of signal-to-noise ratios. The ASR-based predictions were compared to data from the literature ( Hochmuth et al, 2015 ) obtained with 10 native German listeners with normal hearing and predictions of the speech intelligibility index (SII). The ASR-based predictions showed a high and significant correlation (R² = 0.95, p speech and noise signals. Minimum assumptions were made about human speech processing already incorporated in a reference-free ordinary ASR system.

  19. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    Directory of Open Access Journals (Sweden)

    Santiago-Omar Caballero-Morales

    2013-01-01

    Full Text Available An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR system was built with Hidden Markov Models (HMMs, where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness. Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR’s output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech.

  20. The temporal representation of speech in a nonlinear model of the guinea pig cochlea

    Science.gov (United States)

    Holmes, Stephen D.; Sumner, Christian J.; O'Mard, Lowel P.; Meddis, Ray

    2004-12-01

    The temporal representation of speechlike stimuli in the auditory-nerve output of a guinea pig cochlea model is described. The model consists of a bank of dual resonance nonlinear filters that simulate the vibratory response of the basilar membrane followed by a model of the inner hair cell/auditory nerve complex. The model is evaluated by comparing its output with published physiological auditory nerve data in response to single and double vowels. The evaluation includes analyses of individual fibers, as well as ensemble responses over a wide range of best frequencies. In all cases the model response closely follows the patterns in the physiological data, particularly the tendency for the temporal firing pattern of each fiber to represent the frequency of a nearby formant of the speech sound. In the model this behavior is largely a consequence of filter shapes; nonlinear filtering has only a small contribution at low frequencies. The guinea pig cochlear model produces a useful simulation of the measured physiological response to simple speech sounds and is therefore suitable for use in more advanced applications including attempts to generalize these principles to the response of human auditory system, both normal and impaired. .

  1. Unifying Speech and Language in a Developmentally Sensitive Model of Production.

    Science.gov (United States)

    Redford, Melissa A

    2015-11-01

    Speaking is an intentional activity. It is also a complex motor skill; one that exhibits protracted development and the fully automatic character of an overlearned behavior. Together these observations suggest an analogy with skilled behavior in the non-language domain. This analogy is used here to argue for a model of production that is grounded in the activity of speaking and structured during language acquisition. The focus is on the plan that controls the execution of fluent speech; specifically, on the units that are activated during the production of an intonational phrase. These units are schemas: temporally structured sequences of remembered actions and their sensory outcomes. Schemas are activated and inhibited via associated goals, which are linked to specific meanings. Schemas may fuse together over developmental time with repeated use to form larger units, thereby affecting the relative timing of sequential action in participating schemas. In this way, the hierarchical structure of the speech plan and ensuing rhythm patterns of speech are a product of development. Individual schemas may also become differentiated during development, but only if subsequences are associated with meaning. The necessary association of action and meaning gives rise to assumptions about the primacy of certain linguistic forms in the production process. Overall, schema representations connect usage-based theories of language to the action of speaking.

  2. Speech recognition from spectral dynamics

    Indian Academy of Sciences (India)

    Abstract. Information is carried in changes of a signal. The paper starts with revis- iting Dudley's concept of the carrier nature of speech. It points to its close connection to modulation spectra of speech and argues against short-term spectral envelopes as dominant carriers of the linguistic information in speech. The history of ...

  3. Principles of speech coding

    CERN Document Server

    Ogunfunmi, Tokunbo

    2010-01-01

    It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the

  4. Blind Separation of Acoustic Signals Combining SIMO-Model-Based Independent Component Analysis and Binary Masking

    Directory of Open Access Journals (Sweden)

    Hiekata Takashi

    2006-01-01

    Full Text Available A new two-stage blind source separation (BSS method for convolutive mixtures of speech is proposed, in which a single-input multiple-output (SIMO-model-based independent component analysis (ICA and a new SIMO-model-based binary masking are combined. SIMO-model-based ICA enables us to separate the mixed signals, not into monaural source signals but into SIMO-model-based signals from independent sources in their original form at the microphones. Thus, the separated signals of SIMO-model-based ICA can maintain the spatial qualities of each sound source. Owing to this attractive property, our novel SIMO-model-based binary masking can be applied to efficiently remove the residual interference components after SIMO-model-based ICA. The experimental results reveal that the separation performance can be considerably improved by the proposed method compared with that achieved by conventional BSS methods. In addition, the real-time implementation of the proposed BSS is illustrated.

  5. What drives the perceptual change resulting from speech motor adaptation? Evaluation of hypotheses in a Bayesian modeling framework

    Science.gov (United States)

    Perrier, Pascal; Schwartz, Jean-Luc; Diard, Julien

    2018-01-01

    Shifts in perceptual boundaries resulting from speech motor learning induced by perturbations of the auditory feedback were taken as evidence for the involvement of motor functions in auditory speech perception. Beyond this general statement, the precise mechanisms underlying this involvement are not yet fully understood. In this paper we propose a quantitative evaluation of some hypotheses concerning the motor and auditory updates that could result from motor learning, in the context of various assumptions about the roles of the auditory and somatosensory pathways in speech perception. This analysis was made possible thanks to the use of a Bayesian model that implements these hypotheses by expressing the relationships between speech production and speech perception in a joint probability distribution. The evaluation focuses on how the hypotheses can (1) predict the location of perceptual boundary shifts once the perturbation has been removed, (2) account for the magnitude of the compensation in presence of the perturbation, and (3) describe the correlation between these two behavioral characteristics. Experimental findings about changes in speech perception following adaptation to auditory feedback perturbations serve as reference. Simulations suggest that they are compatible with a framework in which motor adaptation updates both the auditory-motor internal model and the auditory characterization of the perturbed phoneme, and where perception involves both auditory and somatosensory pathways. PMID:29357357

  6. Speech processing in mobile environments

    CERN Document Server

    Rao, K Sreenivasa

    2014-01-01

    This book focuses on speech processing in the presence of low-bit rate coding and varying background environments. The methods presented in the book exploit the speech events which are robust in noisy environments. Accurate estimation of these crucial events will be useful for carrying out various speech tasks such as speech recognition, speaker recognition and speech rate modification in mobile environments. The authors provide insights into designing and developing robust methods to process the speech in mobile environments. Covering temporal and spectral enhancement methods to minimize the effect of noise and examining methods and models on speech and speaker recognition applications in mobile environments.

  7. A comparison between the first-fit settings of two multichannel digital signal-processing strategies: music quality ratings and speech-in-noise scores.

    Science.gov (United States)

    Higgins, Paul; Searchfield, Grant; Coad, Gavin

    2012-06-01

    The aim of this study was to determine which level-dependent hearing aid digital signal-processing strategy (DSP) participants preferred when listening to music and/or performing a speech-in-noise task. Two receiver-in-the-ear hearing aids were compared: one using 32-channel adaptive dynamic range optimization (ADRO) and the other wide dynamic range compression (WDRC) incorporating dual fast (4 channel) and slow (15 channel) processing. The manufacturers' first-fit settings based on participants' audiograms were used in both cases. Results were obtained from 18 participants on a quick speech-in-noise (QuickSIN; Killion, Niquette, Gudmundsen, Revit, & Banerjee, 2004) task and for 3 music listening conditions (classical, jazz, and rock). Participants preferred the quality of music and performed better at the QuickSIN task using the hearing aids with ADRO processing. A potential reason for the better performance of the ADRO hearing aids was less fluctuation in output with change in sound dynamics. ADRO processing has advantages for both music quality and speech recognition in noise over the multichannel WDRC processing that was used in the study. Further evaluations of which DSP aspects contribute to listener preference are required.

  8. Assessment of the Speech Intelligibility Performance of Post Lingual Cochlear Implant Users at Different Signal-to-Noise Ratios Using the Turkish Matrix Test

    Directory of Open Access Journals (Sweden)

    Zahra Polat

    2016-10-01

    Full Text Available Background: Spoken word recognition and speech perception tests in quiet are being used as a routine in assessment of the benefit which children and adult cochlear implant users receive from their devices. Cochlear implant users generally demonstrate high level performances in these test materials as they are able to achieve high level speech perception ability in quiet situations. Although these test materials provide valuable information regarding Cochlear Implant (CI users’ performances in optimal listening conditions, they do not give realistic information regarding performances in adverse listening conditions, which is the case in the everyday environment. Aims: The aim of this study was to assess the speech intelligibility performance of post lingual CI users in the presence of noise at different signal-to-noise ratio with the Matrix Test developed for Turkish language. Study Design: Cross-sectional study. Methods: The thirty post lingual implant user adult subjects, who had been using implants for a minimum of one year, were evaluated with Turkish Matrix test. Subjects’ speech intelligibility was measured using the adaptive and non-adaptive Matrix Test in quiet and noisy environments. Results: The results of the study show a correlation between Pure Tone Average (PTA values of the subjects and Matrix test Speech Reception Threshold (SRT values in the quiet. Hence, it is possible to asses PTA values of CI users using the Matrix Test also. However, no correlations were found between Matrix SRT values in the quiet and Matrix SRT values in noise. Similarly, the correlation between PTA values and intelligibility scores in noise was also not significant. Therefore, it may not be possible to assess the intelligibility performance of CI users using test batteries performed in quiet conditions. Conclusion: The Matrix Test can be used to assess the benefit of CI users from their systems in everyday life, since it is possible to perform

  9. Effects of pain on vowel production – Towards a new way of pain-level estimation based on acoustic speech-signal analyses

    DEFF Research Database (Denmark)

    Salinas-Ranneberg, Melissa; Niebuhr, Oliver; Kunz, Miriam

    2017-01-01

    step into a line of research whose long-term goal is to automatically detect and measure – at a new level of detail and with ubiquitous technical devices – a patient's pain level from changes in the acoustic source and filter characteristics of the speech signal. Our first study focused on prosodic...... source characteristics and was based on 50 German speakers who immersed their hands in water tanks with temperatures from 40° to 47 °C. A multi-parametric acoustic analysis of sustained vowel productions showed an increase of mean F0 and mean acoustic-energy level for the painful 47 °C condition...

  10. A dynamical model of hierarchical selection and coordination in speech planning.

    Directory of Open Access Journals (Sweden)

    Sam Tilsen

    Full Text Available studies of the control of complex sequential movements have dissociated two aspects of movement planning: control over the sequential selection of movement plans, and control over the precise timing of movement execution. This distinction is particularly relevant in the production of speech: utterances contain sequentially ordered words and syllables, but articulatory movements are often executed in a non-sequential, overlapping manner with precisely coordinated relative timing. This study presents a hybrid dynamical model in which competitive activation controls selection of movement plans and coupled oscillatory systems govern coordination. The model departs from previous approaches by ascribing an important role to competitive selection of articulatory plans within a syllable. Numerical simulations show that the model reproduces a variety of speech production phenomena, such as effects of preparation and utterance composition on reaction time, and asymmetries in patterns of articulatory timing associated with onsets and codas. The model furthermore provides a unified understanding of a diverse group of phonetic and phonological phenomena which have not previously been related.

  11. Animal Models of Speech and Vocal Communication Deficits Associated With Psychiatric Disorders.

    Science.gov (United States)

    Konopka, Genevieve; Roberts, Todd F

    2016-01-01

    Disruptions in speech, language, and vocal communication are hallmarks of several neuropsychiatric disorders, most notably autism spectrum disorders. Historically, the use of animal models to dissect molecular pathways and connect them to behavioral endophenotypes in cognitive disorders has proven to be an effective approach for developing and testing disease-relevant therapeutics. The unique aspects of human language compared with vocal behaviors in other animals make such an approach potentially more challenging. However, the study of vocal learning in species with analogous brain circuits to humans may provide entry points for understanding this human-specific phenotype and diseases. We review animal models of vocal learning and vocal communication and specifically link phenotypes of psychiatric disorders to relevant model systems. Evolutionary constraints in the organization of neural circuits and synaptic plasticity result in similarities in the brain mechanisms for vocal learning and vocal communication. Comparative approaches and careful consideration of the behavioral limitations among different animal models can provide critical avenues for dissecting the molecular pathways underlying cognitive disorders that disrupt speech, language, and vocal communication. Copyright © 2016 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  12. High-order hidden Markov model for piecewise linear processes and applications to speech recognition.

    Science.gov (United States)

    Lee, Lee-Min; Jean, Fu-Rong

    2016-08-01

    The hidden Markov models have been widely applied to systems with sequential data. However, the conditional independence of the state outputs will limit the output of a hidden Markov model to be a piecewise constant random sequence, which is not a good approximation for many real processes. In this paper, a high-order hidden Markov model for piecewise linear processes is proposed to better approximate the behavior of a real process. A parameter estimation method based on the expectation-maximization algorithm was derived for the proposed model. Experiments on speech recognition of noisy Mandarin digits were conducted to examine the effectiveness of the proposed method. Experimental results show that the proposed method can reduce the recognition error rate compared to a baseline hidden Markov model.

  13. Speech Recognition

    Directory of Open Access Journals (Sweden)

    Adrian Morariu

    2009-01-01

    Full Text Available This paper presents a method of speech recognition by pattern recognition techniques. Learning consists in determining the unique characteristics of a word (cepstral coefficients by eliminating those characteristics that are different from one word to another. For learning and recognition, the system will build a dictionary of words by determining the characteristics of each word to be used in the recognition. Determining the characteristics of an audio signal consists in the following steps: noise removal, sampling it, applying Hamming window, switching to frequency domain through Fourier transform, calculating the magnitude spectrum, filtering data, determining cepstral coefficients.

  14. Cross-Modal Interactions during Perception of Audiovisual Speech and Nonspeech Signals: An fMRI Study

    Science.gov (United States)

    Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

    2011-01-01

    During speech communication, visual information may interact with the auditory system at various processing stages. Most noteworthy, recent magnetoencephalography (MEG) data provided first evidence for early and preattentive phonetic/phonological encoding of the visual data stream--prior to its fusion with auditory phonological features [Hertrich,…

  15. Speech Problems

    Science.gov (United States)

    ... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...

  16. How does language model size effects speech recognition accuracy for the Turkish language?

    Directory of Open Access Journals (Sweden)

    Behnam ASEFİSARAY

    2016-05-01

    Full Text Available In this paper we aimed at investigating the effect of Language Model (LM size on Speech Recognition (SR accuracy. We also provided details of our approach for obtaining the LM for Turkish. Since LM is obtained by statistical processing of raw text, we expect that by increasing the size of available data for training the LM, SR accuracy will improve. Since this study is based on recognition of Turkish, which is a highly agglutinative language, it is important to find out the appropriate size for the training data. The minimum required data size is expected to be much higher than the data needed to train a language model for a language with low level of agglutination such as English. In the experiments we also tried to adjust the Language Model Weight (LMW and Active Token Count (ATC parameters of LM as these are expected to be different for a highly agglutinative language. We showed that by increasing the training data size to an appropriate level, the recognition accuracy improved on the other hand changes on LMW and ATC did not have a positive effect on Turkish speech recognition accuracy.

  17. Automatic Speech Recognition Using Template Model for Man-Machine Interface

    OpenAIRE

    Mishra, Neema; Shrawankar, Urmila; Thakare, V M

    2013-01-01

    Speech is a natural form of communication for human beings, and computers with the ability to understand speech and speak with a human voice are expected to contribute to the development of more natural man-machine interfaces. Computers with this kind of ability are gradually becoming a reality, through the evolution of speech recognition technologies. Speech is being an important mode of interaction with computers. In this paper Feature extraction is implemented using well-known Mel-Frequenc...

  18. Auditory-Motor Interactions in Pediatric Motor Speech Disorders: Neurocomputational Modeling of Disordered Development

    NARCIS (Netherlands)

    Terband, H.R.; Maassen, B.A.M.; Guenther, F.H.; Brumberg, J.

    2014-01-01

    Background/Purpose: Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between

  19. Auditory-motor interactions in pediatric motor speech disorders: Neurocomputational modeling of disordered development

    NARCIS (Netherlands)

    Terband, H.; Maassen, B.; Guenther, F. H.; Brumberg, J.

    2014-01-01

    BACKGROUND/PURPOSE: Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between

  20. Methods and models for quantative assessment of speech intelligibility in cross-language communication

    NARCIS (Netherlands)

    Wijngaarden, S.J. van; Steeneken, H.J.M.; Houtgast, T.

    2001-01-01

    To deal with the effects of nonnative speech communication on speech intelligibility, one must know the magnitude of these effects. To measure this magnitude, suitable test methods must be available. Many of the methods used in cross-language speech communication research are not very suitable for

  1. Modeling of surface myoelectric signals--Part II: Model-based signal interpretation.

    Science.gov (United States)

    Merletti, R; Roy, S H; Kupa, E; Roatta, S; Granata, A

    1999-07-01

    Experimental electromyogram (EMG) data from the human biceps brachii were simulated using the model described in [10] of this work. A multichannel linear electrode array, spanning the length of the biceps, was used to detect monopolar and bipolar signals, from which double differential signals were computed, during either voluntary or electrically elicited isometric contractions. For relatively low-level voluntary contractions (10%-30% of maximum force) individual firings of three to four-different motor units were identified and their waveforms were closely approximated by the model. Motor unit parameters such as depth, size, fiber orientation and length, location of innervation and tendonous zones, propagation velocity, and source width were estimated using the model. Two applications of the model are described. The first analyzes the effects of electrode rotation with respect to the muscle fiber direction and shows the possibility of conduction velocity (CV) over- and under-estimation. The second focuses on the myoelectric manifestations of fatigue during a sustained electrically elicited contraction and the interrelationship between muscle fiber CV, spectral and amplitude variables, and the length of the depolarization zone. It is concluded that a) surface EMG detection using an electrode array, when combined with a model of signal propagation, provides a useful method for understanding the physiological and anatomical determinants of EMG waveform characteristics and b) the model provides a way for the interpretation of fatigue plots.

  2. An optimal speech processor for efficient human speech ...

    Indian Academy of Sciences (India)

    above, the speech signal is recorded at 21739 Hz for English subjects and 20000 Hz for. Cantonese and Georgian subjects. We downsampled the speech signals to 16 kHz for our anal- ysis. Using these parallel acoustic and articulatory data from Cantonese and Georgian, we will be able to examine our communication ...

  3. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    Science.gov (United States)

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  4. A frequency-selective feedback model of auditory efferent suppression and its implications for the recognition of speech in noise.

    Science.gov (United States)

    Clark, Nicholas R; Brown, Guy J; Jürgens, Tim; Meddis, Ray

    2012-09-01

    The potential contribution of the peripheral auditory efferent system to our understanding of speech in a background of competing noise was studied using a computer model of the auditory periphery and assessed using an automatic speech recognition system. A previous study had shown that a fixed efferent attenuation applied to all channels of a multi-channel model could improve the recognition of connected digit triplets in noise [G. J. Brown, R. T. Ferry, and R. Meddis, J. Acoust. Soc. Am. 127, 943-954 (2010)]. In the current study an anatomically justified feedback loop was used to automatically regulate separate attenuation values for each auditory channel. This arrangement resulted in a further enhancement of speech recognition over fixed-attenuation conditions. Comparisons between multi-talker babble and pink noise interference conditions suggest that the benefit originates from the model's ability to modify the amount of suppression in each channel separately according to the spectral shape of the interfering sounds.

  5. The role of the motor system in discriminating normal and degraded speech sounds.

    Science.gov (United States)

    D'Ausilio, Alessandro; Bufalari, Ilaria; Salmas, Paola; Fadiga, Luciano

    2012-07-01

    Listening to speech recruits a network of fronto-temporo-parietal cortical areas. Classical models consider anterior, motor, sites involved in speech production whereas posterior sites involved in comprehension. This functional segregation is more and more challenged by action-perception theories suggesting that brain circuits for speech articulation and speech perception are functionally interdependent. Recent studies report that speech listening elicits motor activities analogous to production. However, the motor system could be crucially recruited only under certain conditions that make speech discrimination hard. Here, by using event-related double-pulse transcranial magnetic stimulation (TMS) on lips and tongue motor areas, we show data suggesting that the motor system may play a role in noisy, but crucially not in noise-free environments, for the discrimination of speech signals. Copyright © 2011 Elsevier Srl. All rights reserved.

  6. Quantitative insight into models of Hedgehog signal transduction.

    Science.gov (United States)

    Farzan, Shohreh F; Ogden, Stacey K; Robbins, David J

    2010-01-01

    The Hedgehog (Hh) signaling pathway is an essential regulator of embryonic development and a key factor in carcinogenesis.(1,2) Hh, a secreted morphogen, activates intracellular signaling events via downstream effector proteins, which translate the signal to regulate target gene transcription.(3,4) In a recent publication, we quantitatively compared two commonly accepted models of Hh signal transduction.(5) Each model requires a different ratio of signaling components to be feasible. Thus, we hypothesized that knowing the steady-state ratio of core signaling components might allow us to distinguish between models. We reported vast differences in the molar concentrations of endogenous effectors of Hh signaling, with Smo present in limiting concentrations.(5) This extra view summarizes the implications of this endogenous ratio in relation to current models of Hh signaling and places our results in the context of recent work describing the involvement of guanine nucleotide binding protein Galphai and Cos2 motility.

  7. A parametric framework for modelling of bioelectrical signals

    CERN Document Server

    Mughal, Yar Muhammad

    2016-01-01

    This book examines non-invasive, electrical-based methods for disease diagnosis and assessment of heart function. In particular, a formalized signal model is proposed since this offers several advantages over methods that rely on measured data alone. By using a formalized representation, the parameters of the signal model can be easily manipulated and/or modified, thus providing mechanisms that allow researchers to reproduce and control such signals. In addition, having such a formalized signal model makes it possible to develop computer tools that can be used for manipulating and understanding how signal changes result from various heart conditions, as well as for generating input signals for experimenting with and evaluating the performance of e.g. signal extraction methods. The work focuses on bioelectrical information, particularly electrical bio-impedance (EBI). Once the EBI has been measured, the corresponding signals have to be modelled for analysis. This requires a structured approach in order to move...

  8. Fundamental Frequency and Direction-of-Arrival Estimation for Multichannel Speech Enhancement

    DEFF Research Database (Denmark)

    Karimian-Azari, Sam

    their time differences which eventually may further reduce the effects of noise. This thesis introduces a number of principles and methods to estimate periodic signals in noisy environments with application to multichannel speech enhancement. We propose model-based signal enhancement concerning the model...... estimate the speech signal of interest from the noisy signals using a priori knowledge about it. A human speech signal is broadband and consists of both voiced and unvoiced parts. The voiced part is quasi-periodic with a time-varying fundamental frequency (or pitch as it is commonly referred to). We...... of periodic signals. Therefore, the parameters of the model must be estimated in advance. The signal of interest is often contaminated by different types of noise that may render many estimation methods suboptimal due to an incorrect white Gaussian noise assumption. We therefore propose robust estimators...

  9. Human behavior state profile mapping based on recalibrated speech affective space model.

    Science.gov (United States)

    Kamaruddin, N; Wahab, A

    2012-01-01

    People typically associate health with only physical health. However, health is also interconnected to mental and emotional health. People who are emotionally healthy are in control of their behaviors and experience better quality of life. Hence, understanding human behavior is very important in ensuring the complete understanding of one's holistic health. In this paper, we attempt to map human behavior state (HBS) profiles onto recalibrated speech affective space model (rSASM). Such an approach is derived from hypotheses that: 1) Behavior is influenced by emotion, 2) Emotion can be quantified through speech, 3) Emotion is dynamic and changes over time and 4) the emotion conveyance is conditioned by culture. Empirical results illustrated that the proposed approach can complement other types of behavior analysis in such a way that it offers more explanatory components from the perspective of emotion primitives (valence and arousal). Four different driving HBS; namely: distracted, laughing, sleepy and normal are profiled onto the rSASM to visualize the correlation between HBS and emotion. This approach can be incorporated in the future behavior analysis to envisage better performance.

  10. Time Series Neural Network Model for Part-of-Speech Tagging Indonesian Language

    Science.gov (United States)

    Tanadi, Theo

    2018-03-01

    Part-of-speech tagging (POS tagging) is an important part in natural language processing. Many methods have been used to do this task, including neural network. This paper models a neural network that attempts to do POS tagging. A time series neural network is modelled to solve the problems that a basic neural network faces when attempting to do POS tagging. In order to enable the neural network to have text data input, the text data will get clustered first using Brown Clustering, resulting a binary dictionary that the neural network can use. To further the accuracy of the neural network, other features such as the POS tag, suffix, and affix of previous words would also be fed to the neural network.

  11. Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech and music

    Directory of Open Access Journals (Sweden)

    Hwee Ling eLee

    2014-08-01

    Full Text Available This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogues of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms. Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past three years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.

  12. From where to what: a neuroanatomically based evolutionary model of the emergence of speech in humans [version 2; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Oren Poliva

    2016-01-01

    Full Text Available In the brain of primates, the auditory cortex connects with the frontal lobe via the temporal pole (auditory ventral stream; AVS and via the inferior parietal lobe (auditory dorsal stream; ADS. The AVS is responsible for sound recognition, and the ADS for sound-localization, voice detection and integration of calls with faces. I propose that the primary role of the ADS in non-human primates is the detection and response to contact calls. These calls are exchanged between tribe members (e.g., mother-offspring and are used for monitoring location. Detection of contact calls occurs by the ADS identifying a voice, localizing it, and verifying that the corresponding face is out of sight. Once a contact call is detected, the primate produces a contact call in return via descending connections from the frontal lobe to a network of limbic and brainstem regions. Because the ADS of present day humans also performs speech production, I further propose an evolutionary course for the transition from contact call exchange to an early form of speech. In accordance with this model, structural changes to the ADS endowed early members of the genus Homo with partial vocal control. This development was beneficial as it enabled offspring to modify their contact calls with intonations for signaling high or low levels of distress to their mother. Eventually, individuals were capable of participating in yes-no question-answer conversations. In these conversations the offspring emitted a low-level distress call for inquiring about the safety of objects (e.g., food, and his/her mother responded with a high- or low-level distress call to signal approval or disapproval of the interaction. Gradually, the ADS and its connections with brainstem motor regions became more robust and vocal control became more volitional. Speech emerged once vocal control was sufficient for inventing novel calls.

  13. Why do I hear but not understand? Stochastic undersampling as a model of degraded neural encoding of speech

    Directory of Open Access Journals (Sweden)

    Enrique A Lopez-Poveda

    2014-10-01

    Full Text Available Hearing impairment is a serious disease with increasing prevalence. It is defined based on increased audiometric thresholds but increased thresholds are only partly responsible for the greater difficulty understanding speech in noisy environments experienced by some older listeners or by hearing-impaired listeners. Identifying the additional factors and mechanisms that impair intelligibility is fundamental to understanding hearing impairment but these factors remain uncertain. Traditionally, these additional factors have been sought in the way the speech spectrum is encoded in the pattern of impaired mechanical cochlear responses. Recent studies, however, are steering the focus toward impaired encoding of the speech waveform in the auditory nerve. In our recent work, we gave evidence that a significant factor might be the loss of afferent auditory nerve fibers, a pathology that comes with aging or noise overexposure. Our approach was based on a signal-processing analogy whereby the auditory nerve may be regarded as a stochastic sampler of the sound waveform and deafferentation may be described in terms of waveform undersampling. We showed that stochastic undersampling simultaneously degrades the encoding of soft and rapid waveform features, and that this degrades speech intelligibility in noise more than in quiet without significant increases in audiometric thresholds. Here, we review our recent work in a broader context and argue that the stochastic undersampling analogy may be extended to study the perceptual consequences of various different hearing pathologies and their treatment.

  14. Neuronal basis of speech comprehension.

    Science.gov (United States)

    Specht, Karsten

    2014-01-01

    Verbal communication does not rely only on the simple perception of auditory signals. It is rather a parallel and integrative processing of linguistic and non-linguistic information, involving temporal and frontal areas in particular. This review describes the inherent complexity of auditory speech comprehension from a functional-neuroanatomical perspective. The review is divided into two parts. In the first part, structural and functional asymmetry of language relevant structures will be discus. The second part of the review will discuss recent neuroimaging studies, which coherently demonstrate that speech comprehension processes rely on a hierarchical network involving the temporal, parietal, and frontal lobes. Further, the results support the dual-stream model for speech comprehension, with a dorsal stream for auditory-motor integration, and a ventral stream for extracting meaning but also the processing of sentences and narratives. Specific patterns of functional asymmetry between the left and right hemisphere can also be demonstrated. The review article concludes with a discussion on interactions between the dorsal and ventral streams, particularly the involvement of motor related areas in speech perception processes, and outlines some remaining unresolved issues. This article is part of a Special Issue entitled Human Auditory Neuroimaging. Copyright © 2013 Elsevier B.V. All rights reserved.

  15. A model with nonzero rise time for AE signals

    Indian Academy of Sciences (India)

    Acoustic emission (AE) signals are conventionally modelled as damped or decaying sinusoidal functions. A major drawback of this model is its negligible or zero rise time. This paper proposes an alternative model, which provides for the rising part of the signal without sacrificing the analytical tractability and simplicity of the ...

  16. Signal Processing in the Linear Statistical Model

    Science.gov (United States)

    1994-11-04

    Covariance Bounds," Proc 07th Asilo - mar Conf on Signals, Systems, and Computers, Pacific Grove, CA (November 1993). [MuS9l] C. T. Mullis and L. L. Scharf...Transforms," Proc Asilo - mar Con. on Signals, Systems, and Computers, Asilomar, CA (November 1991). [SpS94] M. Spurbeck and L. L. Scharf, "Least Squares...McWhorter and L. L. Scharf, "Multiwindow Estimators of Correlation," Proc 28th Annual Asilo - mar Conf on Signals, Systems, and Computers, Asilomar, CA

  17. Speech Intelligibility Prediction Based on Mutual Information

    DEFF Research Database (Denmark)

    Jensen, Jesper; Taal, Cees H.

    2014-01-01

    a minimum mean-square error (mmse) estimator based on the noisy/processed amplitude. The proposed model predicts that speech intelligibility cannot be improved by any processing of noisy critical-band amplitudes. Furthermore, the proposed intelligibility predictor performs well ( ρ > 0.95) in predicting......This paper deals with the problem of predicting the average intelligibility of noisy and potentially processed speech signals, as observed by a group of normal hearing listeners. We propose a model which performs this prediction based on the hypothesis that intelligibility is monotonically related...... to the mutual information between critical-band amplitude envelopes of the clean signal and the corresponding noisy/processed signal. The resulting intelligibility predictor turns out to be a simple function of the mean-square error (mse) that arises when estimating a clean critical-band amplitude using...

  18. Statistical Challenges in Modeling Big Brain Signals

    KAUST Repository

    Yu, Zhaoxia

    2017-11-01

    Brain signal data are inherently big: massive in amount, complex in structure, and high in dimensions. These characteristics impose great challenges for statistical inference and learning. Here we review several key challenges, discuss possible solutions, and highlight future research directions.

  19. A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

    Directory of Open Access Journals (Sweden)

    Dansereau Richard M

    2007-01-01

    Full Text Available We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA. For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.

  20. A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

    Directory of Open Access Journals (Sweden)

    Mohammad H. Radfar

    2006-11-01

    Full Text Available We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA. For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.

  1. Integrating Music Therapy Services and Speech-Language Therapy Services for Children with Severe Communication Impairments: A Co-Treatment Model

    Science.gov (United States)

    Geist, Kamile; McCarthy, John; Rodgers-Smith, Amy; Porter, Jessica

    2008-01-01

    Documenting how music therapy can be integrated with speech-language therapy services for children with communication delay is not evident in the literature. In this article, a collaborative model with procedures, experiences, and communication outcomes of integrating music therapy with the existing speech-language services is given. Using…

  2. Modelling the Architecture of Phonetic Plans: Evidence from Apraxia of Speech

    Science.gov (United States)

    Ziegler, Wolfram

    2009-01-01

    In theories of spoken language production, the gestural code prescribing the movements of the speech organs is usually viewed as a linear string of holistic, encapsulated, hard-wired, phonetic plans, e.g., of the size of phonemes or syllables. Interactions between phonetic units on the surface of overt speech are commonly attributed to either the…

  3. Building a Model of Support for Preschool Children with Speech and Language Disorders

    Science.gov (United States)

    Robertson, Natalie; Ohi, Sarah

    2016-01-01

    Speech and language disorders impede young children's abilities to communicate and are often associated with a number of behavioural problems arising in the preschool classroom. This paper reports a small-scale study that investigated 23 Australian educators' and 7 Speech Pathologists' experiences in working with three to five year old children…

  4. A Computational Model of Word Segmentation from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events

    Science.gov (United States)

    Rasanen, Okko

    2011-01-01

    Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this…

  5. The Relative Weight of Statistical and Prosodic Cues in Speech Segmentation: A Matter of Language-(Independency and of Signal Quality

    Directory of Open Access Journals (Sweden)

    Tânia Fernandes

    2011-06-01

    Full Text Available In an artificial language setting, we investigated the relative weight of statistical cues (transitional probabilities, TPs in comparison to two prosodic cues, Intonational Phrases (IPs, a language-independent cue and lexical stress (a language-dependent cue. The signal quality was also manipulated through white-noise superimposition. Both IPs and TPs were highly resilient to physical degradation of the signal. An overall performance gain was found when these cues were congruent, but when they were incongruent IPs prevailed over TPs (Experiment 1. After ensuring that duration is treated by Portuguese listeners as a correlate of lexical stress (Experiment 2A, the role of lexical stress and TPs in segmentation was evaluated in Experiment 2B. Lexical stress effects only emerged with physically degraded signal, constraining the extraction of TP-words to the ones supported by both TPs and IPs. Speech segmentation does not seem to be the product of one preponderant cue acting as a filter of the outputs of another, lower-weighted cue. Instead, it mainly depends on the listening conditions, and the weighting of the cues according to their role in a particular language.

  6. Tactile Modulation of Emotional Speech Samples

    Directory of Open Access Journals (Sweden)

    Katri Salminen

    2012-01-01

    Full Text Available Traditionally only speech communicates emotions via mobile phone. However, in daily communication the sense of touch mediates emotional information during conversation. The present aim was to study if tactile stimulation affects emotional ratings of speech when measured with scales of pleasantness, arousal, approachability, and dominance. In the Experiment 1 participants rated speech-only and speech-tactile stimuli. The tactile signal mimicked the amplitude changes of the speech. In the Experiment 2 the aim was to study whether the way the tactile signal was produced affected the ratings. The tactile signal either mimicked the amplitude changes of the speech sample in question, or the amplitude changes of another speech sample. Also, concurrent static vibration was included. The results showed that the speech-tactile stimuli were rated as more arousing and dominant than the speech-only stimuli. The speech-only stimuli were rated as more approachable than the speech-tactile stimuli, but only in the Experiment 1. Variations in tactile stimulation also affected the ratings. When the tactile stimulation was static vibration the speech-tactile stimuli were rated as more arousing than when the concurrent tactile stimulation was mimicking speech samples. The results suggest that tactile stimulation offers new ways of modulating and enriching the interpretation of speech.

  7. Evaluating quantitative and conceptual models of speech production: how does SLAM fare?

    Science.gov (United States)

    Walker, Grant M; Hickok, Gregory

    2016-04-01

    In a previous publication, we presented a new computational model called SLAM (Walker & Hickok, Psychonomic Bulletin & Review doi: 10.3758/s13423-015-0903 ), based on the hierarchical state feedback control (HSFC) theory (Hickok Nature Reviews Neuroscience, 13(2), 135-145, 2012). In his commentary, Goldrick (Psychonomic Bulletin & Review doi: 10.3758/s13423-015-0946-9 ) claims that SLAM does not represent a theoretical advancement, because it cannot be distinguished from an alternative lexical + postlexical (LPL) theory proposed by Goldrick and Rapp (Cognition, 102(2), 219-260, 2007). First, we point out that SLAM implements a portion of a conceptual model (HSFC) that encompasses LPL. Second, we show that SLAM accounts for a lexical bias present in sound-related errors that LPL does not explain. Third, we show that SLAM's explanatory advantage is not a result of approximating the architectural or computational assumptions of LPL, since an implemented version of LPL fails to provide the same fit improvements as SLAM. Finally, we show that incorporating a mechanism that violates some core theoretical assumptions of LPL-making it more like SLAM in terms of interactivity-allows the model to capture some of the same effects as SLAM. SLAM therefore provides new modeling constraints regarding interactions among processing levels, while also elaborating on the structure of the phonological level. We view this as evidence that an integration of psycholinguistic, neuroscience, and motor control approaches to speech production is feasible and may lead to substantial new insights.

  8. Improving traffic signal management and operations : a basic service model.

    Science.gov (United States)

    2009-12-01

    This report provides a guide for achieving a basic service model for traffic signal management and : operations. The basic service model is based on simply stated and defensible operational objectives : that consider the staffing level, expertise and...

  9. Efficient ECG Signal Compression Using Adaptive Heart Model

    National Research Council Canada - National Science Library

    Szilagyi, S

    2001-01-01

    This paper presents an adaptive, heart-model-based electrocardiography (ECG) compression method. After conventional pre-filtering the waves from the signal are localized and the model's parameters are determined...

  10. Speech Matters

    DEFF Research Database (Denmark)

    Hasse Jørgensen, Stina

    2011-01-01

    About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....

  11. Group delay functions and its applications in speech technology

    Indian Academy of Sciences (India)

    respectively. The features that characterize the digital filter can be obtained directly from the spectrum or can be derived from the model. The source information may be derived by passing the speech signal through the inverse of the model system. The source and system become additive as in homomorphic processing, i.e.,.

  12. Speech-to-Speech Relay Service

    Science.gov (United States)

    Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...

  13. Sinusoidal Representation of Acoustic Signals

    Science.gov (United States)

    Honda, Masaaki

    Sinusoidal representation of acoustic signals has been an important tool in speech and music processing like signal analysis, synthesis and time scale or pitch modifications. It can be applicable to arbitrary signals, which is an important advantage over other signal representations like physical modeling of acoustic signals. In sinusoidal representation, acoustic signals are composed as sums of sinusoid (sine wave) with different amplitudes, frequencies and phases, which is based on the timedependent short-time Fourier transform (STFT). This article describes the principles of acoustic signal analysis/synthesis based on a sinusoid representation with focus on sine waves with rapidly varying frequency.

  14. Automated speech understanding: the next generation

    Science.gov (United States)

    Picone, J.; Ebel, W. J.; Deshmukh, N.

    1995-04-01

    Modern speech understanding systems merge interdisciplinary technologies from Signal Processing, Pattern Recognition, Natural Language, and Linguistics into a unified statistical framework. These systems, which have applications in a wide range of signal processing problems, represent a revolution in Digital Signal Processing (DSP). Once a field dominated by vector-oriented processors and linear algebra-based mathematics, the current generation of DSP-based systems rely on sophisticated statistical models implemented using a complex software paradigm. Such systems are now capable of understanding continuous speech input for vocabularies of several thousand words in operational environments. The current generation of deployed systems, based on small vocabularies of isolated words, will soon be replaced by a new technology offering natural language access to vast information resources such as the Internet, and provide completely automated voice interfaces for mundane tasks such as travel planning and directory assistance.

  15. Signals and Systems in Biomedical Engineering Signal Processing and Physiological Systems Modeling

    CERN Document Server

    Devasahayam, Suresh R

    2013-01-01

    The use of digital signal processing is ubiquitous in the field of physiology and biomedical engineering. The application of such mathematical and computational tools requires a formal or explicit understanding of physiology. Formal models and analytical techniques are interlinked in physiology as in any other field. This book takes a unitary approach to physiological systems, beginning with signal measurement and acquisition, followed by signal processing, linear systems modelling, and computer simulations. The signal processing techniques range across filtering, spectral analysis and wavelet analysis. Emphasis is placed on fundamental understanding of the concepts as well as solving numerical problems. Graphs and analogies are used extensively to supplement the mathematics. Detailed models of nerve and muscle at the cellular and systemic levels provide examples for the mathematical methods and computer simulations. Several of the models are sufficiently sophisticated to be of value in understanding real wor...

  16. Processing on weak electric signals by the autoregressive model

    Science.gov (United States)

    Ding, Jinli; Zhao, Jiayin; Wang, Lanzhou; Li, Qiao

    2008-10-01

    A model of the autoregressive model of weak electric signals in two plants was set up for the first time. The result of the AR model to forecast 10 values of the weak electric signals is well. It will construct a standard set of the AR model coefficient of the plant electric signal and the environmental factor, and can be used as the preferences for the intelligent autocontrol system based on the adaptive characteristic of plants to achieve the energy saving on agricultural productions.

  17. MPD model for radar echo signal of hypersonic targets

    Directory of Open Access Journals (Sweden)

    Xu Xuefei

    2014-08-01

    Full Text Available The stop-and-go (SAG model is typically used for echo signal received by the radar using linear frequency modulation pulse compression. In this study, the authors demonstrate that this model is not applicable to hypersonic targets. Instead of SAG model, they present a more realistic echo signal model (moving-in-pulse duration (MPD for hypersonic targets. Following that, they evaluate the performances of pulse compression under the SAG and MPD models by theoretical analysis and simulations. They found that the pulse compression gain has an increase of 3 dB by using the MPD model compared with the SAG model in typical cases.

  18. Speech perception as categorization.

    Science.gov (United States)

    Holt, Lori L; Lotto, Andrew J

    2010-07-01

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has resulted from these challenges. We focus here on issues and experiments that define open research questions relevant to phoneme categorization, arguing that SP is best understood as perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition.

  19. Separating Underdetermined Convolutive Speech Mixtures

    DEFF Research Database (Denmark)

    Pedersen, Michael Syskind; Wang, DeLiang; Larsen, Jan

    2006-01-01

    a method for underdetermined blind source separation of convolutive mixtures. The proposed framework is applicable for separation of instantaneous as well as convolutive speech mixtures. It is possible to iteratively extract each speech signal from the mixture by combining blind source separation...

  20. Introductory speeches

    International Nuclear Information System (INIS)

    2001-01-01

    This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering

  1. A model with nonzero rise time for AE signals

    Indian Academy of Sciences (India)

    Springer Verlag Heidelberg #4 2048 1996 Dec 15 10:16:45

    models, which, while retaining these merits, can also incorporate rise time. We present such a model in the following. 2. Proposed model. The decaying sinusoidal model of (1) can be described in terms of communication terminology as the envelope function A0 exp(−αt) amplitude modulating the sinusoidal signal sin 2 f0t.

  2. Speech Algorithm Optimization at 16 KBPS.

    Science.gov (United States)

    1980-09-30

    9. M. D. Paez and T. H. Glisson, "Minimum Mean Squared-Error Quantization in Speech PCM and DPCM Systems," IEEE Trans. Communications, Vol. COM-20...34 IEEE Trans. Acoustic, Speech and Signal Processing, Vol. ASSP-27, June 1979. 13. N. S. Jayant, "Digital Coding of Speech Waveform: PCM, DPCM , and DM

  3. Speech Segmentation Using Bayesian Autoregressive Changepoint Detector

    Directory of Open Access Journals (Sweden)

    P. Sovka

    1998-12-01

    Full Text Available This submission is devoted to the study of the Bayesian autoregressive changepoint detector (BCD and its use for speech segmentation. Results of the detector application to autoregressive signals as well as to real speech are given. BCD basic properties are described and discussed. The novel two-step algorithm consisting of cepstral analysis and BCD for automatic speech segmentation is suggested.

  4. Quantitative modelling in cognitive ergonomics: predicting signals passed at danger.

    Science.gov (United States)

    Moray, Neville; Groeger, John; Stanton, Neville

    2017-02-01

    This paper shows how to combine field observations, experimental data and mathematical modelling to produce quantitative explanations and predictions of complex events in human-machine interaction. As an example, we consider a major railway accident. In 1999, a commuter train passed a red signal near Ladbroke Grove, UK, into the path of an express. We use the Public Inquiry Report, 'black box' data, and accident and engineering reports to construct a case history of the accident. We show how to combine field data with mathematical modelling to estimate the probability that the driver observed and identified the state of the signals, and checked their status. Our methodology can explain the SPAD ('Signal Passed At Danger'), generate recommendations about signal design and placement and provide quantitative guidance for the design of safer railway systems' speed limits and the location of signals. Practitioner Summary: Detailed ergonomic analysis of railway signals and rail infrastructure reveals problems of signal identification at this location. A record of driver eye movements measures attention, from which a quantitative model for out signal placement and permitted speeds can be derived. The paper is an example of how to combine field data, basic research and mathematical modelling to solve ergonomic design problems.

  5. A simple statistical signal loss model for deep underground garage

    DEFF Research Database (Denmark)

    Nguyen, Huan Cong; Gimenez, Lucas Chavarria; Kovacs, Istvan

    2016-01-01

    In this paper we address the channel modeling aspects for a deep-indoor scenario with extreme coverage conditions in terms of signal losses, namely underground garage areas. We provide an in-depth analysis in terms of path loss (gain) and large scale signal shadowing, and a propose simple...

  6. Modelling and simulation of signal transductions in an apoptosis ...

    Indian Academy of Sciences (India)

    2006-12-12

    Dec 12, 2006 ... This paper first presents basic Petri net components representing molecular interactions and mechanisms of signalling pathways, and introduces a method to construct a Petri net model of a signalling pathway with these components. Then a simulation method of determining the delay time of transitions, ...

  7. The complementary roles of auditory and motor information evaluated in a Bayesian perceptuo-motor model of speech perception.

    Science.gov (United States)

    Laurent, Raphaël; Barnaud, Marie-Lou; Schwartz, Jean-Luc; Bessière, Pierre; Diard, Julien

    2017-10-01

    There is a consensus concerning the view that both auditory and motor representations intervene in the perceptual processing of speech units. However, the question of the functional role of each of these systems remains seldom addressed and poorly understood. We capitalized on the formal framework of Bayesian Programming to develop COSMO (Communicating Objects using Sensory-Motor Operations), an integrative model that allows principled comparisons of purely motor or purely auditory implementations of a speech perception task and tests the gain of efficiency provided by their Bayesian fusion. Here, we show 3 main results: (a) In a set of precisely defined "perfect conditions," auditory and motor theories of speech perception are indistinguishable; (b) When a learning process that mimics speech development is introduced into COSMO, it departs from these perfect conditions. Then auditory recognition becomes more efficient than motor recognition in dealing with learned stimuli, while motor recognition is more efficient in adverse conditions. We interpret this result as a general "auditory-narrowband versus motor-wideband" property; and (c) Simulations of plosive-vowel syllable recognition reveal possible cues from motor recognition for the invariant specification of the place of plosive articulation in context that are lacking in the auditory pathway. This provides COSMO with a second property, where auditory cues would be more efficient for vowel decoding and motor cues for plosive articulation decoding. These simulations provide several predictions, which are in good agreement with experimental data and suggest that there is natural complementarity between auditory and motor processing within a perceptuo-motor theory of speech perception. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  8. [Improving speech comprehension using a new cochlear implant speech processor].

    Science.gov (United States)

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  9. [The application of cybernetic modeling methods for the forensic medical personality identification based on the voice and sounding speech characteristics].

    Science.gov (United States)

    Kaganov, A Sh; Kir'yanov, P A

    2015-01-01

    The objective of the present publication was to discuss the possibility of application of cybernetic modeling methods to overcome the apparent discrepancy between two kinds of the speech records, viz. initial ones (e.g. obtained in the course of special investigation activities) and the voice prints obtained from the persons subjected to the criminalistic examination. The paper is based on the literature sources and the materials of original criminalistics expertises performed by the authors.

  10. Speech-specific audiovisual perception affects identification but not detection of speech

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate...... this issue, Tuomainen et al. (2005) used sine-wave speech stimuli created from three time-varying sine waves tracking the formants of a natural speech signal. Naïve observers tend not to recognize sine wave speech as speech but become able to decode its phonetic content when informed of the speech......-like nature of the signal. The sine-wave speech was dubbed onto congruent and incongruent video of a talking face. Tuomainen et al. found that the McGurk effect did not occur for naïve observers, but did occur when observers were informed. This indicates that the McGurk illusion is due to a mechanism...

  11. Model-based Bayesian signal extraction algorithm for peripheral nerves

    Science.gov (United States)

    Eggers, Thomas E.; Dweiri, Yazan M.; McCallum, Grant A.; Durand, Dominique M.

    2017-10-01

    Objective. Multi-channel cuff electrodes have recently been investigated for extracting fascicular-level motor commands from mixed neural recordings. Such signals could provide volitional, intuitive control over a robotic prosthesis for amputee patients. Recent work has demonstrated success in extracting these signals in acute and chronic preparations using spatial filtering techniques. These extracted signals, however, had low signal-to-noise ratios and thus limited their utility to binary classification. In this work a new algorithm is proposed which combines previous source localization approaches to create a model based method which operates in real time. Approach. To validate this algorithm, a saline benchtop setup was created to allow the precise placement of artificial sources within a cuff and interference sources outside the cuff. The artificial source was taken from five seconds of chronic neural activity to replicate realistic recordings. The proposed algorithm, hybrid Bayesian signal extraction (HBSE), is then compared to previous algorithms, beamforming and a Bayesian spatial filtering method, on this test data. An example chronic neural recording is also analyzed with all three algorithms. Main results. The proposed algorithm improved the signal to noise and signal to interference ratio of extracted test signals two to three fold, as well as increased the correlation coefficient between the original and recovered signals by 10–20%. These improvements translated to the chronic recording example and increased the calculated bit rate between the recovered signals and the recorded motor activity. Significance. HBSE significantly outperforms previous algorithms in extracting realistic neural signals, even in the presence of external noise sources. These results demonstrate the feasibility of extracting dynamic motor signals from a multi-fascicled intact nerve trunk, which in turn could extract motor command signals from an amputee for the end goal of

  12. Two-Microphone Separation of Speech Mixtures

    DEFF Research Database (Denmark)

    Pedersen, Michael Syskind; Wang, DeLiang; Larsen, Jan

    2008-01-01

    Separation of speech mixtures, often referred to as the cocktail party problem, has been studied for decades. In many source separation tasks, the separation method is limited by the assumption of at least as many sensors as sources. Further, many methods require that the number of signals within...... combined, independent component analysis (ICA) and binary time–frequency (T–F) masking. By estimating binary masks from the outputs of an ICA algorithm, it is possible in an iterative way to extract basis speech signals from a convolutive mixture. The basis signals are afterwards improved by grouping...... similar signals. Using two microphones, we can separate, in principle, an arbitrary number of mixed speech signals. We show separation results for mixtures with as many as seven speech signals under instantaneous conditions. We also show that the proposed method is applicable to segregate speech signals...

  13. THE SIGNAL APPROACH TO MODELLING THE BALANCE OF PAYMENT CRISIS

    Directory of Open Access Journals (Sweden)

    O. Chernyak

    2016-12-01

    Full Text Available The paper considers and presents synthesis of theoretical models of balance of payment crisis and investigates the most effective ways to model the crisis in Ukraine. For mathematical formalization of balance of payment crisis, comparative analysis of the effectiveness of different calculation methods of Exchange Market Pressure Index was performed. A set of indicators that signal the growing likelihood of balance of payments crisis was defined using signal approach. With the help of minimization function thresholds indicators were selected, the crossing of which signalize increase in the probability of balance of payment crisis.

  14. Automatic Emotion Recognition in Speech: Possibilities and Significance

    Directory of Open Access Journals (Sweden)

    Milana Bojanić

    2009-12-01

    Full Text Available Automatic speech recognition and spoken language understanding are crucial steps towards a natural humanmachine interaction. The main task of the speech communication process is the recognition of the word sequence, but the recognition of prosody, emotion and stress tags may be of particular importance as well. This paper discusses thepossibilities of recognition emotion from speech signal in order to improve ASR, and also provides the analysis of acoustic features that can be used for the detection of speaker’s emotion and stress. The paper also provides a short overview of emotion and stress classification techniques. The importance and place of emotional speech recognition is shown in the domain of human-computer interactive systems and transaction communication model. The directions for future work are given at the end of this work.

  15. Semiconductor Modeling For Simulating Signal, Power, and Electromagneticintegrity

    CERN Document Server

    Leventhal, Roy

    2006-01-01

    Assists engineers in designing high-speed circuits. The emphasis is on semiconductor modeling, with PCB transmission line effects, equipment enclosure effects, and other modeling issues discussed as needed. This text addresses practical considerations, including process variation, model accuracy, validation and verification, and signal integrity.

  16. An accurate and simple large signal model of HEMT

    DEFF Research Database (Denmark)

    Liu, Qing

    1989-01-01

    A large-signal model of discrete HEMTs (high-electron-mobility transistors) has been developed. It is simple and suitable for SPICE simulation of hybrid digital ICs. The model parameters are extracted by using computer programs and data provided by the manufacturer. Based on this model, a hybrid...

  17. Small Signal Modeling of Wind Farms

    DEFF Research Database (Denmark)

    Ebrahimzadeh, Esmaeil; Blaabjerg, Frede; Wang, Xiongfei

    2017-01-01

    In large power electronic systems like a wind farm, the mutual interactions between the control systems of the power converters can lead to various stability and power quality problems. In order to predict the system dynamic behavior, this paper presents an approach to model a wind farm as a Multi......-Input Multi-Output (MIMO) dynamic system, where the current control loops with Phase-Locked Loops (PLLs) are linearized around an operating point. Each sub-module of the wind farm is modeled as a 2×2 admittance matrix in dq-domain and all are combined together by using a dq nodal admittance matrix....... The frequency and damping of the oscillatory modes are calculated by finding the poles of the introduced MIMO matrix. Time-domain simulation results obtained from a 400-MW wind farm are used to verify the effectiveness of the presented model....

  18. A computational model of human auditory signal processing and perception

    DEFF Research Database (Denmark)

    Jepsen, Morten Løve; Ewert, Stephan D.; Dau, Torsten

    2008-01-01

    A model of computational auditory signal-processing and perception that accounts for various aspects of simultaneous and nonsimultaneous masking in human listeners is presented. The model is based on the modulation filterbank model described by Dau et al. [J. Acoust. Soc. Am. 102, 2892 (1997...... discrimination with pure tones and broadband noise, tone-in-noise detection, spectral masking with narrow-band signals and maskers, forward masking with tone signals and tone or noise maskers, and amplitude-modulation detection with narrow- and wideband noise carriers. The model can account for most of the key...... properties of the data and is more powerful than the original model. The model might be useful as a front end in technical applications....

  19. A compartmental model for computer virus propagation with kill signals

    Science.gov (United States)

    Ren, Jianguo; Xu, Yonghong

    2017-11-01

    Research in the area of kill signals for prevention of computer virus is of significant importance for computer users. The kill signals allow computer users to take precautions beforehand. In this paper, a computer virus propagation model based on the kill signals, called SEIR-KS model, is formulated and full dynamics of the proposed model are theoretically analyzed. An epidemic threshold is obtained and the existence and uniqueness of the virus equilibrium are investigated. It is proved that the virus-free equilibrium and virus equilibrium are locally and globally asymptotically stable by applying Routh-Hurwitz criterion and Lyapunov functional approach. The results of numerical simulations are provided that verifies the theoretical results. The availability of the proposed model has been validated with following observations: (1) the density of infected nodes in the proposed model drops to approximately 75% compared to the model in related literature; and (2) a higher density of KS is conductive to inhibition of virus diffusion.

  20. Vibration Signal Forecasting on Rotating Machinery by means of Signal Decomposition and Neurofuzzy Modeling

    Directory of Open Access Journals (Sweden)

    Daniel Zurita-Millán

    2016-01-01

    Full Text Available Vibration monitoring plays a key role in the industrial machinery reliability since it allows enhancing the performance of the machinery under supervision through the detection of failure modes. Thus, vibration monitoring schemes that give information regarding future condition, that is, prognosis approaches, are of growing interest for the scientific and industrial communities. This work proposes a vibration signal prognosis methodology, applied to a rotating electromechanical system and its associated kinematic chain. The method combines the adaptability of neurofuzzy modeling with a signal decomposition strategy to model the patterns of the vibrations signal under different fault scenarios. The model tuning is performed by means of Genetic Algorithms along with a correlation based interval selection procedure. The performance and effectiveness of the proposed method are validated experimentally with an electromechanical test bench containing a kinematic chain. The results of the study indicate the suitability of the method for vibration forecasting in complex electromechanical systems and their associated kinematic chains.

  1. Relation Between Listening Effort and Speech Intelligibility in Noise.

    Science.gov (United States)

    Krueger, Melanie; Schulte, Michael; Zokoll, Melanie A; Wagener, Kirsten C; Meis, Markus; Brand, Thomas; Holube, Inga

    2017-10-12

    Subjective ratings of listening effort might be applicable to estimate hearing difficulties at positive signal-to-noise ratios (SNRs) at which speech intelligibility scores are near 100%. Hence, ratings of listening effort were compared with speech intelligibility scores at different SNRs, and the benefit of hearing aids was evaluated. Two groups of listeners, 1 with normal hearing and 1 with hearing impairment, performed adaptive speech intelligibility and adaptive listening effort tests (Adaptive Categorical Listening Effort Scaling; Krueger, Schulte, Brand, & Holube, 2017) with sentences of the Oldenburg Sentence Test (Wagener, Brand, & Kollmeier, 1999a, 1999b; Wagener, Kühnel, & Kollmeier, 1999) in 4 different maskers. Model functions were fitted to the data to estimate the speech reception threshold and listening effort ratings for extreme effort and no effort. Listeners with hearing impairment showed higher rated listening effort compared with listeners with normal hearing. For listeners with hearing impairment, the rating extreme effort, which corresponds to negative SNRs, was more correlated to the speech reception threshold than the rating no effort, which corresponds to positive SNRs. A benefit of hearing aids on speech intelligibility was only verifiable at negative SNRs, whereas the effect on listening effort showed high individual differences mainly at positive SNRs. The adaptive procedure for rating subjective listening effort yields information beyond using speech intelligibility to estimate hearing difficulties and to evaluate hearing aids.

  2. Discrete dynamic modeling of T cell survival signaling networks

    Science.gov (United States)

    Zhang, Ranran

    2009-03-01

    Biochemistry-based frameworks are often not applicable for the modeling of heterogeneous regulatory systems that are sparsely documented in terms of quantitative information. As an alternative, qualitative models assuming a small set of discrete states are gaining acceptance. This talk will present a discrete dynamic model of the signaling network responsible for the survival and long-term competence of cytotoxic T cells in the blood cancer T-LGL leukemia. We integrated the signaling pathways involved in normal T cell activation and the known deregulations of survival signaling in leukemic T-LGL, and formulated the regulation of each network element as a Boolean (logic) rule. Our model suggests that the persistence of two signals is sufficient to reproduce all known deregulations in leukemic T-LGL. It also indicates the nodes whose inactivity is necessary and sufficient for the reversal of the T-LGL state. We have experimentally validated several model predictions, including: (i) Inhibiting PDGF signaling induces apoptosis in leukemic T-LGL. (ii) Sphingosine kinase 1 and NFκB are essential for the long-term survival of T cells in T-LGL leukemia. (iii) T box expressed in T cells (T-bet) is constitutively activated in the T-LGL state. The model has identified potential therapeutic targets for T-LGL leukemia and can be used for generating long-term competent CTL necessary for tumor and cancer vaccine development. The success of this model, and of other discrete dynamic models, suggests that the organization of signaling networks has an determining role in their dynamics. Reference: R. Zhang, M. V. Shah, J. Yang, S. B. Nyland, X. Liu, J. K. Yun, R. Albert, T. P. Loughran, Jr., Network Model of Survival Signaling in LGL Leukemia, PNAS 105, 16308-16313 (2008).

  3. Audiovisual integration in speech perception: a multi-stage process

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2011-01-01

    Integration of speech signals from ear and eye is a well-known feature of speech perception. This is evidenced by the McGurk illusion in which visual speech alters auditory speech perception and by the advantage observed in auditory speech detection when a visual signal is present. Here we...... investigate whether the integration of auditory and visual speech observed in these two audiovisual integration effects are specific traits of speech perception. We further ask whether audiovisual integration is undertaken in a single processing stage or multiple processing stages....

  4. Speech-specificity of two audiovisual integration effects

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2010-01-01

    Seeing the talker’s articulatory mouth movements can influence the auditory speech percept both in speech identification and detection tasks. Here we show that these audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers...

  5. Specialization in audiovisual speech perception: a replication study

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    participated in the experiment, which consisted of 3 conditions. In the non-speech condition, observers were trained and tested in their ability to categorize sine wave speech tokens in arbitrary categories. The natural speech condition was similar but used natural speech signals and observers categorized...

  6. Age-Related Differences in Lexical Access Relate to Speech Recognition in Noise

    Science.gov (United States)

    Carroll, Rebecca; Warzybok, Anna; Kollmeier, Birger; Ruigendijk, Esther

    2016-01-01

    Vocabulary size has been suggested as a useful measure of “verbal abilities” that correlates with speech recognition scores. Knowing more words is linked to better speech recognition. How vocabulary knowledge translates to general speech recognition mechanisms, how these mechanisms relate to offline speech recognition scores, and how they may be modulated by acoustical distortion or age, is less clear. Age-related differences in linguistic measures may predict age-related differences in speech recognition in noise performance. We hypothesized that speech recognition performance can be predicted by the efficiency of lexical access, which refers to the speed with which a given word can be searched and accessed relative to the size of the mental lexicon. We tested speech recognition in a clinical German sentence-in-noise test at two signal-to-noise ratios (SNRs), in 22 younger (18–35 years) and 22 older (60–78 years) listeners with normal hearing. We also assessed receptive vocabulary, lexical access time, verbal working memory, and hearing thresholds as measures of individual differences. Age group, SNR level, vocabulary size, and lexical access time were significant predictors of individual speech recognition scores, but working memory and hearing threshold were not. Interestingly, longer accessing times were correlated with better speech recognition scores. Hierarchical regression models for each subset of age group and SNR showed very similar patterns: the combination of vocabulary size and lexical access time contributed most to speech recognition performance; only for the younger group at the better SNR (yielding about 85% correct speech recognition) did vocabulary size alone predict performance. Our data suggest that successful speech recognition in noise is mainly modulated by the efficiency of lexical access. This suggests that older adults’ poorer performance in the speech recognition task may have arisen from reduced efficiency in lexical access

  7. Network modeling reveals prevalent negative regulatory relationships between signaling sectors in Arabidopsis immune signaling.

    Directory of Open Access Journals (Sweden)

    Masanao Sato

    Full Text Available Biological signaling processes may be mediated by complex networks in which network components and network sectors interact with each other in complex ways. Studies of complex networks benefit from approaches in which the roles of individual components are considered in the context of the network. The plant immune signaling network, which controls inducible responses to pathogen attack, is such a complex network. We studied the Arabidopsis immune signaling network upon challenge with a strain of the bacterial pathogen Pseudomonas syringae expressing the effector protein AvrRpt2 (Pto DC3000 AvrRpt2. This bacterial strain feeds multiple inputs into the signaling network, allowing many parts of the network to be activated at once. mRNA profiles for 571 immune response genes of 22 Arabidopsis immunity mutants and wild type were collected 6 hours after inoculation with Pto DC3000 AvrRpt2. The mRNA profiles were analyzed as detailed descriptions of changes in the network state resulting from the genetic perturbations. Regulatory relationships among the genes corresponding to the mutations were inferred by recursively applying a non-linear dimensionality reduction procedure to the mRNA profile data. The resulting static network model accurately predicted 23 of 25 regulatory relationships reported in the literature, suggesting that predictions of novel regulatory relationships are also accurate. The network model revealed two striking features: (i the components of the network are highly interconnected; and (ii negative regulatory relationships are common between signaling sectors. Complex regulatory relationships, including a novel negative regulatory relationship between the early microbe-associated molecular pattern-triggered signaling sectors and the salicylic acid sector, were further validated. We propose that prevalent negative regulatory relationships among the signaling sectors make the plant immune signaling network a "sector

  8. Development of a quick speech-in-noise test for measuring signal-to-noise ratio loss in normal-hearing and hearing-impaired listeners.

    Science.gov (United States)

    Killion, Mead C; Niquette, Patricia A; Gudmundsen, Gail I; Revit, Lawrence J; Banerjee, Shilpi

    2004-10-01

    This paper describes a shortened and improved version of the Speech in Noise (SIN) Test (Etymotic Research, 1993). In the first two of four experiments, the level of a female talker relative to that of four-talker babble was adjusted sentence by sentence to produce 50% correct scores for normal-hearing subjects. In the second two experiments, those sentences-in-babble that produced either lack of equivalence or high across-subject variability in scores were discarded. These experiments produced 12 equivalent lists, each containing six sentences, with one sentence at each adjusted signal-to-noise ratio of 25, 20, 15, 10, 5, and 0 dB. Six additional lists were also made equivalent when the scores of particular pairs were averaged. The final lists comprise the "QuickSIN" test that measures the SNR a listener requires to understand 50% of key words in sentences in a background of babble. The standard deviation of single-list scores is 1.4 dB SNR for hearing-impaired subjects, based on test-retest data. A single QuickSIN list takes approximately one minute to administer and provides an estimate of SNR loss accurate to +/-2.7 dB at the 95% confidence level.

  9. Implementation of a Tour Guide Robot System Using RFID Technology and Viterbi Algorithm-Based HMM for Speech Recognition

    Directory of Open Access Journals (Sweden)

    Neng-Sheng Pai

    2014-01-01

    Full Text Available This paper applied speech recognition and RFID technologies to develop an omni-directional mobile robot into a robot with voice control and guide introduction functions. For speech recognition, the speech signals were captured by short-time processing. The speaker first recorded the isolated words for the robot to create speech database of specific speakers. After the speech pre-processing of this speech database, the feature parameters of cepstrum and delta-cepstrum were obtained using linear predictive coefficient (LPC. Then, the Hidden Markov Model (HMM was used for model training of the speech database, and the Viterbi algorithm was used to find an optimal state sequence as the reference sample for speech recognition. The trained reference model was put into the industrial computer on the robot platform, and the user entered the isolated words to be tested. After processing by the same reference model and comparing with previous reference model, the path of the maximum total probability in various models found using the Viterbi algorithm in the recognition was the recognition result. Finally, the speech recognition and RFID systems were achieved in an actual environment to prove its feasibility and stability, and implemented into the omni-directional mobile robot.

  10. Towards an auditory account of speech rhythm: application of a model of the auditory 'primal sketch' to two multi-language corpora.

    Science.gov (United States)

    Lee, Christopher S; Todd, Neil P McAngus

    2004-10-01

    The world's languages display important differences in their rhythmic organization; most particularly, different languages seem to privilege different phonological units (mora, syllable, or stress foot) as their basic rhythmic unit. There is now considerable evidence that such differences have important consequences for crucial aspects of language acquisition and processing. Several questions remain, however, as to what exactly characterizes the rhythmic differences, how they are manifested at an auditory/acoustic level and how listeners, whether adult native speakers or young infants, process rhythmic information. In this paper it is proposed that the crucial determinant of rhythmic organization is the variability in the auditory prominence of phonetic events. In order to test this auditory prominence hypothesis, an auditory model is run on two multi-language data-sets, the first consisting of matched pairs of English and French sentences, and the second consisting of French, Italian, English and Dutch sentences. The model is based on a theory of the auditory primal sketch, and generates a primitive representation of an acoustic signal (the rhythmogram) which yields a crude segmentation of the speech signal and assigns prominence values to the obtained sequence of events. Its performance is compared with that of several recently proposed phonetic measures of vocalic and consonantal variability.

  11. Segmenting Words from Natural Speech: Subsegmental Variation in Segmental Cues

    Science.gov (United States)

    Rytting, C. Anton; Brew, Chris; Fosler-Lussier, Eric

    2010-01-01

    Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We…

  12. An experimental comparison of modelling techniques for speaker ...

    Indian Academy of Sciences (India)

    Feature extraction involves extracting speaker-specific features from the speech signal at reduced data rate. The extracted features are further combined using modelling techniques to generate speaker models. The speaker models are then tested using the features extracted from the test speech signal. The improvement in ...

  13. Small Signal Audiosusceptibility Model for Series Resonant Converter

    OpenAIRE

    G., Subhash Joshi T.; John, Vinod

    2018-01-01

    Models that accurately predict the output voltage ripple magnitude are essential for applications with stringent performance target for it. Impact of dc input ripple on the output ripple for a Series Resonant Converter (SRC) using discrete domain exact discretization modelling method is analysed in this paper. A novel discrete state space model along with a small signal model for SRC considering 3 state variables is presented. The audiosusceptibility (AS) transfer function which relates the i...

  14. A computational model of human auditory signal processing and perception

    OpenAIRE

    Jepsen, Morten Løve; Ewert, Stephan D.; Dau, Torsten

    2008-01-01

    A model of computational auditory signal-processing and perception that accounts for various aspects of simultaneous and nonsimultaneous masking in human listeners is presented. The model is based on the modulation filterbank model described by Dau et al. [J. Acoust. Soc. Am. 102, 2892 (1997)] but includes major changes at the peripheral and more central stages of processing. The model contains outer- and middle-ear transformations, a nonlinear basilar-membrane processing stage, a hair-cell t...

  15. The role of across-frequency envelope processing for speech intelligibility

    DEFF Research Database (Denmark)

    Chabot-Leclerc, Alexandre; Jørgensen, Søren; Dau, Torsten

    2013-01-01

    speech intelligibility models, the spectro-temporal modulation index (STMI; Elhilali et al., 2003) and the speech-based envelope power spectrum model (sEPSM; Jørgensen and Dau, 2011) were evaluated in conditions of noisy speech subjected to reverberation, and to nonlinear distortions through either...... stage, as assumed in the STMI, together with the metric based on the envelope power signal-to-noise ratio, as assumed in the sEPSM, are required to account for all three conditions. However, a simple across audio-frequency mechanism combined with a purely temporal modulation filterbank is assumed...

  16. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    Science.gov (United States)

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  17. Mixed-signal instrumentation for large-signal device characterization and modelling

    NARCIS (Netherlands)

    Marchetti, M.

    2013-01-01

    This thesis concentrates on the development of advanced large-signal measurement and characterization tools to support technology development, model extraction and validation, and power amplifier (PA) designs that address the newly introduced third and fourth generation (3G and 4G) wideband

  18. Detection of visual signals by rats: A computational model

    Science.gov (United States)

    We applied a neural network model of classical conditioning proposed by Schmajuk, Lam, and Gray (1996) to visual signal detection and discrimination tasks designed to assess sustained attention in rats (Bushnell, 1999). The model describes the animals’ expectation of receiving fo...

  19. Speech Development

    Science.gov (United States)

    ... are placed in the mouth, much like an orthodontic retainer. The two most common types are 1) the speech bulb and 2) the palatal lift. The speech bulb is designed to partially close off the space between the soft palate and the throat. The palatal lift appliance serves to lift the soft palate to a ...

  20. Beyond production: Brain responses during speech perception in adults who stutter

    Directory of Open Access Journals (Sweden)

    Tali Halag-Milo

    2016-01-01

    Full Text Available Developmental stuttering is a speech disorder that disrupts the ability to produce speech fluently. While stuttering is typically diagnosed based on one's behavior during speech production, some models suggest that it involves more central representations of language, and thus may affect language perception as well. Here we tested the hypothesis that developmental stuttering implicates neural systems involved in language perception, in a task that manipulates comprehensibility without an overt speech production component. We used functional magnetic resonance imaging to measure blood oxygenation level dependent (BOLD signals in adults who do and do not stutter, while they were engaged in an incidental speech perception task. We found that speech perception evokes stronger activation in adults who stutter (AWS compared to controls, specifically in the right inferior frontal gyrus (RIFG and in left Heschl's gyrus (LHG. Significant differences were additionally found in the lateralization of response in the inferior frontal cortex: AWS showed bilateral inferior frontal activity, while controls showed a left lateralized pattern of activation. These findings suggest that developmental stuttering is associated with an imbalanced neural network for speech processing, which is not limited to speech production, but also affects cortical responses during speech perception.

  1. Compressed Sensing Adaptive Speech Characteristics Research

    Directory of Open Access Journals (Sweden)

    Long Tao

    2014-09-01

    Full Text Available The sparsity of the speech signals is utilized in the DCT domain. According to the characteristics of the voice which may be separated into voiceless and voiced one, an adaptive measurement speech recovery method is proposed in this paper based on compressed sensing. First, the observed points are distributed based on the voicing energy ratio which the entire speech segment occupies. Then the speech segment is enflamed, if the frame is an unvoiced speech, the numbers of measurement can be allocated according to its zeros and energy rate. If the frame is voiced speech, the numbers of measurement can be allocated according to its energy. The experiment results shows that the performance of speech signal based on the method above is superior to utilize compress sensing directly.

  2. Novel Techniques for Dialectal Arabic Speech Recognition

    CERN Document Server

    Elmahdy, Mohamed; Minker, Wolfgang

    2012-01-01

    Novel Techniques for Dialectal Arabic Speech describes approaches to improve automatic speech recognition for dialectal Arabic. Since speech resources for dialectal Arabic speech recognition are very sparse, the authors describe how existing Modern Standard Arabic (MSA) speech data can be applied to dialectal Arabic speech recognition, while assuming that MSA is always a second language for all Arabic speakers. In this book, Egyptian Colloquial Arabic (ECA) has been chosen as a typical Arabic dialect. ECA is the first ranked Arabic dialect in terms of number of speakers, and a high quality ECA speech corpus with accurate phonetic transcription has been collected. MSA acoustic models were trained using news broadcast speech. In order to cross-lingually use MSA in dialectal Arabic speech recognition, the authors have normalized the phoneme sets for MSA and ECA. After this normalization, they have applied state-of-the-art acoustic model adaptation techniques like Maximum Likelihood Linear Regression (MLLR) and M...

  3. Neural bases of accented speech perception

    Directory of Open Access Journals (Sweden)

    Patti eAdank

    2015-10-01

    Full Text Available The recognition of unfamiliar regional and foreign accents represents a challenging task for the speech perception system (Adank, Evans, Stuart-Smith, & Scott, 2009; Floccia, Goslin, Girard, & Konopczynski, 2006. Despite the frequency with which we encounter such accents, the neural mechanisms supporting successful perception of accented speech are poorly understood. Nonetheless, candidate neural substrates involved in processing speech in challenging listening conditions, including accented speech, are beginning to be identified. This review will outline neural bases associated with perception of accented speech in the light of current models of speech perception, and compare these data to brain areas associated with processing other speech distortions. We will subsequently evaluate competing models of speech processing with regards to neural processing of accented speech. See Cristia et al. (2012 for an in-depth overview of behavioural aspects of accent processing.

  4. Corrected Four-Sphere Head Model for EEG Signals

    Directory of Open Access Journals (Sweden)

    Solveig Næss

    2017-10-01

    Full Text Available The EEG signal is generated by electrical brain cell activity, often described in terms of current dipoles. By applying EEG forward models we can compute the contribution from such dipoles to the electrical potential recorded by EEG electrodes. Forward models are key both for generating understanding and intuition about the neural origin of EEG signals as well as inverse modeling, i.e., the estimation of the underlying dipole sources from recorded EEG signals. Different models of varying complexity and biological detail are used in the field. One such analytical model is the four-sphere model which assumes a four-layered spherical head where the layers represent brain tissue, cerebrospinal fluid (CSF, skull, and scalp, respectively. While conceptually clear, the mathematical expression for the electric potentials in the four-sphere model is cumbersome, and we observed that the formulas presented in the literature contain errors. Here, we derive and present the correct analytical formulas with a detailed derivation. A useful application of the analytical four-sphere model is that it can serve as ground truth to test the accuracy of numerical schemes such as the Finite Element Method (FEM. We performed FEM simulations of the four-sphere head model and showed that they were consistent with the corrected analytical formulas. For future reference we provide scripts for computing EEG potentials with the four-sphere model, both by means of the correct analytical formulas and numerical FEM simulations.

  5. HP Memristor mathematical model for periodic signals and DC

    KAUST Repository

    Radwan, Ahmed G.

    2012-07-28

    In this paper mathematical models of the HP Memristor for DC and periodic signal inputs are provided. The need for a rigid model for the Memristor using conventional current and voltage quantities is essential for the development of many promising Memristors\\' applications. Unlike the previous works, which focuses on the sinusoidal input waveform, we derived rules for any periodic signals in general in terms of voltage and current. Square and triangle waveforms are studied explicitly, extending the formulas for any general square wave. The limiting conditions for saturation are also provided in case of either DC or periodic signals. The derived equations are compared to the SPICE model of the Memristor showing a perfect match.

  6. Road Impedance Model Study under the Control of Intersection Signal

    Directory of Open Access Journals (Sweden)

    Yunlin Luo

    2015-01-01

    Full Text Available Road traffic impedance model is a difficult and critical point in urban traffic assignment and route guidance. The paper takes a signalized intersection as the research object. On the basis of traditional traffic wave theory including the implementation of traffic wave model and the analysis of vehicles’ gathering and dissipating, the road traffic impedance model is researched by determining the basic travel time and waiting delay time. Numerical example results have proved that the proposed model in this paper has received better calculation performance compared to existing model, especially in flat hours. The values of mean absolute percentage error (MAPE and mean absolute deviation (MAD are separately reduced by 3.78% and 2.62 s. It shows that the proposed model has feasibility and availability in road traffic impedance under intersection signal.

  7. Detailed signal model of coherent wind measurement lidar

    Science.gov (United States)

    Ma, Yuechao; Li, Sining; Lu, Wei

    2016-11-01

    Lidar is short for light detection and ranging, which is a tool to help measuring some useful information of atmosphere. In the recent years, more and more attention was paid to the research of wind measurement by lidar. Because the accurate wind information can be used not only in weather report, but also the safety guarantee of the airplanes. In this paper, a more detailed signal model of wind measurement lidar is proposed. It includes the laser transmitting part which describes the broadening of the spectral, the laser attenuation in the atmosphere, the backscattering signal and the detected signal. A Voigt profile is used to describe the broadening of the transmitting laser spectral, which is the most common situation that is the convolution of different broadening line shapes. The laser attenuation includes scattering and absorption. We use a Rayleigh scattering model and partially-Correlated quadratic-Velocity-Dependent Hard-Collision (pCqSDHC) model to describe the molecule scattering and absorption. When calculate the particles scattering and absorption, the Gaussian particles model is used to describe the shape of particles. Because of the Doppler Effect occurred between the laser and atmosphere, the wind velocity can be calculated by the backscattering signal. Then, a two parameter Weibull distribution is used to describe the wind filed, so that we can use it to do the future work. After all the description, the signal model of coherent wind measurement lidar is decided. And some of the simulation is given by MATLAB. This signal model can describe the system more accurate and more detailed, so that the following work will be easier and more efficient.

  8. Synchronous Modeling of Modular Avionics Architectures using the SIGNAL Language

    OpenAIRE

    Gamatié , Abdoulaye; Gautier , Thierry

    2002-01-01

    This document presents a study on the modeling of architecture components for avionics applications. We consider the avionics standard ARINC 653 specifications as basis, as well as the synchronous language SIGNAL to describe the modeling. A library of APEX object models (partition, process, communication and synchronization services, etc.) has been implemented. This should allow to describe distributed real-time applications using POLYCHRONY, so as to access formal tools and techniques for ar...

  9. The development of sensorimotor influences in the audiovisual speech domain: some critical questions

    OpenAIRE

    Guellaï, Bahia; Streri, Arlette; Yeung, H. Henny

    2014-01-01

    Speech researchers have long been interested in how auditory and visual speech signals are integrated, and the recent work has revived interest in the role of speech production with respect to this process. Here, we discuss these issues from a developmental perspective. Because speech perception abilities typically outstrip speech production abilities in infancy and childhood, it is unclear how speech-like movements could influence audiovisual speech perception in development. While work on t...

  10. Salient phonetic features of Indian languages in speech technology

    Indian Academy of Sciences (India)

    Abstract. Speech signal is the basic study and analysis material in speech tech- nology as well phonetics. To form meaningful chunks of language, the speech signal should have dynamically varying spectral characteristics, sometimes varying within a stretch of a few milliseconds. Phonetics groups these temporally varying ...

  11. Multistage audiovisual integration of speech: dissociating identification and detection

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2011-01-01

    signal. Here we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers...

  12. Sensorimotor Interactions in Speech Learning

    Directory of Open Access Journals (Sweden)

    Douglas M Shiller

    2011-10-01

    Full Text Available Auditory input is essential for normal speech development and plays a key role in speech production throughout the life span. In traditional models, auditory input plays two critical roles: 1 establishing the acoustic correlates of speech sounds that serve, in part, as the targets of speech production, and 2 as a source of feedback about a talker's own speech outcomes. This talk will focus on both of these roles, describing a series of studies that examine the capacity of children and adults to adapt to real-time manipulations of auditory feedback during speech production. In one study, we examined sensory and motor adaptation to a manipulation of auditory feedback during production of the fricative “s”. In contrast to prior accounts, adaptive changes were observed not only in speech motor output but also in subjects' perception of the sound. In a second study, speech adaptation was examined following a period of auditory–perceptual training targeting the perception of vowels. The perceptual training was found to systematically improve subjects' motor adaptation response to altered auditory feedback during speech production. The results of both studies support the idea that perceptual and motor processes are tightly coupled in speech production learning, and that the degree and nature of this coupling may change with development.

  13. New Modeling Approaches to Investigate Cell Signaling in Radiation Response

    Science.gov (United States)

    Plante, Ianik; Cucinotta, Francis A.; Ponomarev, Artem L.

    2011-01-01

    Ionizing radiation damages individual cells and tissues leading to harmful biological effects. Among many radiation-induced lesions, DNA double-strand breaks (DSB) are considered the key precursors of most early and late effects [1] leading to direct mutation or aberrant signal transduction processes. In response to damage, a flow of information is communicated to cells not directly hit by the radiation through signal transduction pathways [2]. Non-targeted effects (NTE), which includes bystander effects and genomic instability in the progeny of irradiated cells and tissues, may be particularly important for space radiation risk assessment [1], because astronauts are exposed to a low fluence of heavy ions and only a small fraction of cells are traversed by an ion. NTE may also have important consequences clinical radiotherapy [3]. In the recent years, new simulation tools and modeling approaches have become available to study the tissue response to radiation. The simulation of signal transduction pathways require many elements such as detailed track structure calculations, a tissue or cell culture model, knowledge of biochemical pathways and Brownian Dynamics (BD) propagators of the signaling molecules in their micro-environment. Recently, the Monte-Carlo simulation code of radiation track structure RITRACKS was used for micro and nano-dosimetry calculations [4]. RITRACKS will be used to calculate the fraction of cells traversed by an ion and delta-rays and the energy deposited in cells in a tissue model. RITRACKS also simulates the formation of chemical species by the radiolysis of water [5], notably the .OH radical. This molecule is implicated in DNA damage and in the activation of the transforming growth factor beta (TGF), a signaling molecule involved in NTE. BD algorithms for a particle near a membrane comprising receptors were also developed and will be used to simulate trajectories of signaling molecules in the micro-environment and characterize autocrine

  14. A computational model of human auditory signal processing and perception.

    Science.gov (United States)

    Jepsen, Morten L; Ewert, Stephan D; Dau, Torsten

    2008-07-01

    A model of computational auditory signal-processing and perception that accounts for various aspects of simultaneous and nonsimultaneous masking in human listeners is presented. The model is based on the modulation filterbank model described by Dau et al. [J. Acoust. Soc. Am. 102, 2892 (1997)] but includes major changes at the peripheral and more central stages of processing. The model contains outer- and middle-ear transformations, a nonlinear basilar-membrane processing stage, a hair-cell transduction stage, a squaring expansion, an adaptation stage, a 150-Hz lowpass modulation filter, a bandpass modulation filterbank, a constant-variance internal noise, and an optimal detector stage. The model was evaluated in experimental conditions that reflect, to a different degree, effects of compression as well as spectral and temporal resolution in auditory processing. The experiments include intensity discrimination with pure tones and broadband noise, tone-in-noise detection, spectral masking with narrow-band signals and maskers, forward masking with tone signals and tone or noise maskers, and amplitude-modulation detection with narrow- and wideband noise carriers. The model can account for most of the key properties of the data and is more powerful than the original model. The model might be useful as a front end in technical applications.

  15. SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support

    Directory of Open Access Journals (Sweden)

    Giampiero Salvi

    2009-01-01

    Full Text Available This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling.

  16. Representation of speech variability.

    Science.gov (United States)

    Bent, Tessa; Holt, Rachael F

    2017-07-01

    Speech signals provide both linguistic information (e.g., words and sentences) as well as information about the speaker who produced the message (i.e., social-indexical information). Listeners store highly detailed representations of these speech signals, which are simultaneously indexed with linguistic and social category membership. A variety of methodologies-forced-choice categorization, rating, and free classification-have shed light on listeners' cognitive-perceptual representations of the social-indexical information present in the speech signal. Specifically, listeners can accurately identify some talker characteristics, including native language status, approximate age, sex, and gender. Additionally, listeners have sensitivity to other speaker characteristics-such as sexual orientation, regional dialect, native language for non-native speakers, race, and ethnicity-but listeners tend to be less accurate or more variable at categorizing or rating speakers based on these constructs. However, studies have not necessarily incorporated more recent conceptions of these constructs (e.g., separating listeners' perceptions of race vs ethnicity) or speakers who do not fit squarely into specific categories (e.g., for sex perception, intersex individuals; for gender perception, genderqueer speakers; for race perception, multiracial speakers). Additional research on how the intersections of social-indexical categories influence speech perception is also needed. As the field moves forward, scholars from a variety of disciplines should be incorporated into investigations of how listeners' extract and represent facets of personal identity from speech. Further, the impact of these representations on our interactions with one another in contexts outside of the laboratory should continue to be explored. WIREs Cogn Sci 2017, 8:e1434. doi: 10.1002/wcs.1434 This article is categorized under: Linguistics > Language Acquisition Linguistics > Language in Mind and Brain Psychology

  17. Modelling discontinuous well log signal to identify lithological ...

    Indian Academy of Sciences (India)

    In this paper, we have proposed anew wavelet transform-based algorithm to model the abrupt discontinuous changes from well log databy taking care of nonstationary characteristics of the signal. Prior to applying the algorithm on thegeophysical well data, we analyzed the distribution of wavelet coefficients using synthetic ...

  18. Psychotic speech: a neurolinguistic perspective.

    Science.gov (United States)

    Anand, A; Wales, R J

    1994-06-01

    The existence of an aphasia-like language disorder in psychotic speech has been the subject of much debate. This paper argues that a discrete language disorder could be an important cause of the disturbance seen in psychotic speech. A review is presented of classical clinical descriptions and experimental studies that have explored the similarities between psychotic language impairment and aphasic speech. The paper proposes neurolinguistic tasks which may be used in future studies to elicit subtle language impairments in psychotic speech. The usefulness of a neurolinguistic model for further research in the aetiology and treatment of psychosis is discussed.

  19. Modelling and Analysis of Biochemical Signalling Pathway Cross-talk

    Directory of Open Access Journals (Sweden)

    Robin Donaldson

    2010-02-01

    Full Text Available Signalling pathways are abstractions that help life scientists structure the coordination of cellular activity. Cross-talk between pathways accounts for many of the complex behaviours exhibited by signalling pathways and is often critical in producing the correct signal-response relationship. Formal models of signalling pathways and cross-talk in particular can aid understanding and drive experimentation. We define an approach to modelling based on the concept that a pathway is the (synchronising parallel composition of instances of generic modules (with internal and external labels. Pathways are then composed by (synchronising parallel composition and renaming; different types of cross-talk result from different combinations of synchronisation and renaming. We define a number of generic modules in PRISM and five types of cross-talk: signal flow, substrate availability, receptor function, gene expression and intracellular communication. We show that Continuous Stochastic Logic properties can both detect and distinguish the types of cross-talk. The approach is illustrated with small examples and an analysis of the cross-talk between the TGF-b/BMP, WNT and MAPK pathways.

  20. Regulation of Wnt signaling by nociceptive input in animal models

    Directory of Open Access Journals (Sweden)

    Shi Yuqiang

    2012-06-01

    Full Text Available Abstract Background Central sensitization-associated synaptic plasticity in the spinal cord dorsal horn (SCDH critically contributes to the development of chronic pain, but understanding of the underlying molecular pathways is still incomplete. Emerging evidence suggests that Wnt signaling plays a crucial role in regulation of synaptic plasticity. Little is known about the potential function of the Wnt signaling cascades in chronic pain development. Results Fluorescent immunostaining results indicate that β-catenin, an essential protein in the canonical Wnt signaling pathway, is expressed in the superficial layers of the mouse SCDH with enrichment at synapses in lamina II. In addition, Wnt3a, a prototypic Wnt ligand that activates the canonical pathway, is also enriched in the superficial layers. Immunoblotting analysis indicates that both Wnt3a a β-catenin are up-regulated in the SCDH of various mouse pain models created by hind-paw injection of capsaicin, intrathecal (i.t. injection of HIV-gp120 protein or spinal nerve ligation (SNL. Furthermore, Wnt5a, a prototypic Wnt ligand for non-canonical pathways, and its receptor Ror2 are also up-regulated in the SCDH of these models. Conclusion Our results suggest that Wnt signaling pathways are regulated by nociceptive input. The activation of Wnt signaling may regulate the expression of spinal central sensitization during the development of acute and chronic pain.

  1. Low-dimensional recurrent neural network-based Kalman filter for speech enhancement.

    Science.gov (United States)

    Xia, Youshen; Wang, Jun

    2015-07-01

    This paper proposes a new recurrent neural network-based Kalman filter for speech enhancement, based on a noise-constrained least squares estimate. The parameters of speech signal modeled as autoregressive process are first estimated by using the proposed recurrent neural network and the speech signal is then recovered from Kalman filtering. The proposed recurrent neural network is globally asymptomatically stable to the noise-constrained estimate. Because the noise-constrained estimate has a robust performance against non-Gaussian noise, the proposed recurrent neural network-based speech enhancement algorithm can minimize the estimation error of Kalman filter parameters in non-Gaussian noise. Furthermore, having a low-dimensional model feature, the proposed neural network-based speech enhancement algorithm has a much faster speed than two existing recurrent neural networks-based speech enhancement algorithms. Simulation results show that the proposed recurrent neural network-based speech enhancement algorithm can produce a good performance with fast computation and noise reduction. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. Multimicrophone Speech Dereverberation: Experimental Validation

    Directory of Open Access Journals (Sweden)

    Marc Moonen

    2007-05-01

    Full Text Available Dereverberation is required in various speech processing applications such as handsfree telephony and voice-controlled systems, especially when signals are applied that are recorded in a moderately or highly reverberant environment. In this paper, we compare a number of classical and more recently developed multimicrophone dereverberation algorithms, and validate the different algorithmic settings by means of two performance indices and a speech recognition system. It is found that some of the classical solutions obtain a moderate signal enhancement. More advanced subspace-based dereverberation techniques, on the other hand, fail to enhance the signals despite their high-computational load.

  3. Adaptive redundant speech transmission over wireless multimedia sensor networks based on estimation of perceived speech quality.

    Science.gov (United States)

    Kang, Jin Ah; Kim, Hong Kook

    2011-01-01

    An adaptive redundant speech transmission (ARST) approach to improve the perceived speech quality (PSQ) of speech streaming applications over wireless multimedia sensor networks (WMSNs) is proposed in this paper. The proposed approach estimates the PSQ as well as the packet loss rate (PLR) from the received speech data. Subsequently, it decides whether the transmission of redundant speech data (RSD) is required in order to assist a speech decoder to reconstruct lost speech signals for high PLRs. According to the decision, the proposed ARST approach controls the RSD transmission, then it optimizes the bitrate of speech coding to encode the current speech data (CSD) and RSD bitstream in order to maintain the speech quality under packet loss conditions. The effectiveness of the proposed ARST approach is then demonstrated using the adaptive multirate-narrowband (AMR-NB) speech codec and ITU-T Recommendation P.563 as a scalable speech codec and the PSQ estimation, respectively. It is shown from the experiments that a speech streaming application employing the proposed ARST approach significantly improves speech quality under packet loss conditions in WMSNs.

  4. Modeling, estimation and optimal filtration in signal processing

    CERN Document Server

    Najim, Mohamed

    2010-01-01

    The purpose of this book is to provide graduate students and practitioners with traditional methods and more recent results for model-based approaches in signal processing.Firstly, discrete-time linear models such as AR, MA and ARMA models, their properties and their limitations are introduced. In addition, sinusoidal models are addressed.Secondly, estimation approaches based on least squares methods and instrumental variable techniques are presented.Finally, the book deals with optimal filters, i.e. Wiener and Kalman filtering, and adaptive filters such as the RLS, the LMS and the

  5. Signalling network construction for modelling plant defence response.

    Directory of Open Access Journals (Sweden)

    Dragana Miljkovic

    Full Text Available Plant defence signalling response against various pathogens, including viruses, is a complex phenomenon. In resistant interaction a plant cell perceives the pathogen signal, transduces it within the cell and performs a reprogramming of the cell metabolism leading to the pathogen replication arrest. This work focuses on signalling pathways crucial for the plant defence response, i.e., the salicylic acid, jasmonic acid and ethylene signal transduction pathways, in the Arabidopsis thaliana model plant. The initial signalling network topology was constructed manually by defining the representation formalism, encoding the information from public databases and literature, and composing a pathway diagram. The manually constructed network structure consists of 175 components and 387 reactions. In order to complement the network topology with possibly missing relations, a new approach to automated information extraction from biological literature was developed. This approach, named Bio3graph, allows for automated extraction of biological relations from the literature, resulting in a set of (component1, reaction, component2 triplets and composing a graph structure which can be visualised, compared to the manually constructed topology and examined by the experts. Using a plant defence response vocabulary of components and reaction types, Bio3graph was applied to a set of 9,586 relevant full text articles, resulting in 137 newly detected reactions between the components. Finally, the manually constructed topology and the new reactions were merged to form a network structure consisting of 175 components and 524 reactions. The resulting pathway diagram of plant defence signalling represents a valuable source for further computational modelling and interpretation of omics data. The developed Bio3graph approach, implemented as an executable language processing and graph visualisation workflow, is publically available at http://ropot.ijs.si/bio3graph/and can be

  6. Improved Methods for Pitch Synchronous Linear Prediction Analysis of Speech

    OpenAIRE

    劉, 麗清

    2015-01-01

    Linear prediction (LP) analysis has been applied to speech system over the last few decades. LP technique is well-suited for speech analysis due to its ability to model speech production process approximately. Hence LP analysis has been widely used for speech enhancement, low-bit-rate speech coding in cellular telephony, speech recognition, characteristic parameter extraction (vocal tract resonances frequencies, fundamental frequency called pitch) and so on. However, the performance of the co...

  7. YIN, a fundamental frequency estimator for speech and music

    Science.gov (United States)

    de Cheveigné, Alain; Kawahara, Hideki

    2002-04-01

    An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds. It is based on the well-known autocorrelation method with a number of modifications that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. There is no upper limit on the frequency search range, so the algorithm is suited for high-pitched voices and music. The algorithm is relatively simple and may be implemented efficiently and with low latency, and it involves few parameters that must be tuned. It is based on a signal model (periodic signal) that may be extended in several ways to handle various forms of aperiodicity that occur in particular applications. Finally, interesting parallels may be drawn with models of auditory processing.

  8. A Joint Approach for Single-Channel Speaker Identification and Speech Separation

    DEFF Research Database (Denmark)

    Mowlaee, Pejman; Saeidi, Rahim; Christensen, Mads Græsbøll

    2012-01-01

    a sinusoidal model-based algorithm. The speech separation algorithm consists of a double-talk/single-talk detector followed by a minimum mean square error estimator of sinusoidal parameters for finding optimal codevectors from pre-trained speaker codebooks. In evaluating the proposed system, we start from......In this paper, we present a novel system for joint speaker identification and speech separation. For speaker identification a single-channel speaker identification algorithm is proposed which provides an estimate of signal-to-signal ratio (SSR) as a by-product. For speech separation, we propose...... a situation where we have prior information of codebook indices, speaker identities and SSR-level, and then, by relaxing these assumptions one by one, we demonstrate the efficiency of the proposed fully blind system. In contrast to previous studies that mostly focus on automatic speech recognition (ASR...

  9. Mouth-clicks used by blind expert human echolocators - signal description and model based signal synthesis.

    Directory of Open Access Journals (Sweden)

    Lore Thaler

    2017-08-01

    Full Text Available Echolocation is the ability to use sound-echoes to infer spatial information about the environment. Some blind people have developed extraordinary proficiency in echolocation using mouth-clicks. The first step of human biosonar is the transmission (mouth click and subsequent reception of the resultant sound through the ear. Existing head-related transfer function (HRTF data bases provide descriptions of reception of the resultant sound. For the current report, we collected a large database of click emissions with three blind people expertly trained in echolocation, which allowed us to perform unprecedented analyses. Specifically, the current report provides the first ever description of the spatial distribution (i.e. beam pattern of human expert echolocation transmissions, as well as spectro-temporal descriptions at a level of detail not available before. Our data show that transmission levels are fairly constant within a 60° cone emanating from the mouth, but levels drop gradually at further angles, more than for speech. In terms of spectro-temporal features, our data show that emissions are consistently very brief (~3ms duration with peak frequencies 2-4kHz, but with energy also at 10kHz. This differs from previous reports of durations 3-15ms and peak frequencies 2-8kHz, which were based on less detailed measurements. Based on our measurements we propose to model transmissions as sum of monotones modulated by a decaying exponential, with angular attenuation by a modified cardioid. We provide model parameters for each echolocator. These results are a step towards developing computational models of human biosonar. For example, in bats, spatial and spectro-temporal features of emissions have been used to derive and test model based hypotheses about behaviour. The data we present here suggest similar research opportunities within the context of human echolocation. Relatedly, the data are a basis to develop synthetic models of human echolocation

  10. Neural entrainment to speech modulates speech intelligibility

    NARCIS (Netherlands)

    Riecke, Lars; Formisano, Elia; Sorger, Bettina; Başkent, Deniz; Gaudrain, Etienne

    2018-01-01

    Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and

  11. Neural Entrainment to Speech Modulates Speech Intelligibility

    NARCIS (Netherlands)

    Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne

    2018-01-01

    Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and

  12. Classroom acoustics design for speakers’ comfort and speech intelligibility

    DEFF Research Database (Denmark)

    Garcia, David Pelegrin; Rasmussen, Birgit; Brunskog, Jonas

    2014-01-01

    Current European regulatory requirements or guidelines for reverberation time in classrooms have the goal of enhancing speech intelligibility for students and reducing noise levels in classrooms. At the same time, school teachers suffer frequently from voice problems due to high vocal load...... experienced at work. With the aim of improving teachers’ working conditions, this paper proposes adjustments to current regulatory requirements on classroom acoustics in Europe from novel insights on classroom acoustics design that meet simultaneously criteria of vocal comfort for teachers and speech...... are combined with a model of speech intelligibility based on the useful-to-detrimental ratio and empirical models of signal-to-noise ratio in classrooms in order to derive classroom acoustic guidelines, taking into account physical volume restrictions linked to the number of students present in a classroom...

  13. Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

    Science.gov (United States)

    Ye, Sherry

    2015-01-01

    NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.

  14. The role of auditory spectro-temporal modulation filtering and the decision metric for speech intelligibility prediction

    DEFF Research Database (Denmark)

    Chabot-Leclerc, Alexandre; Jørgensen, Søren; Dau, Torsten

    2014-01-01

    by comparing predictions from models based on the signal-to-noise envelope power ratio, SNRenv, and the modulation transfer function, MTF. The models were evaluated in conditions of noisy speech (1) subjected to reverberation, (2) distorted by phase jitter, or (3) processed by noise reduction via spectral...... with a measure of across (audio) frequency variability at the output of the auditory preprocessing. A complex spectro-temporal modulation filterbank might therefore not be required for speech intelligibility prediction....

  15. Neural pathways for visual speech perception

    Directory of Open Access Journals (Sweden)

    Lynne E Bernstein

    2014-12-01

    Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.

  16. Evaluating the benefit of recorded early reflections from a classroom for speech intelligibility

    Science.gov (United States)

    Larsen, Jeffery B.

    Recent standards for classrooms acoustics recommend achieving low levels of reverberation to provide suitable conditions for speech communication (ANSI, 2002; ASHA, 1995). Another viewpoint recommends optimizing classroom acoustics to emphasize early reflections and reduce later arriving reflections (Boothroyd, 2004; Bradley, Sato, & Picard, 2003). The idea of emphasizing early reflections is based in the useful-to-detrimental ratio (UDR) model of speech intelligibility in rooms (Lochner & Burger, 1964). The UDR model predicts that listeners integrate energy from early reflections to improve the signal-to-noise (SNR) of the direct speech signal. However, both early and more recent studies of early reflections and speech intelligibility have used simulated reflections that may not accurately represent the effects of real early reflections on the speech intelligibility of listeners. Is speech intelligibility performance enhanced by the presence of real early reflections in noisy classroom environments? The effect of actual early reflections on speech intelligibility was evaluated by recording a binaural impulse response (BRIR) with a K.E.M.A.R. in a college classroom. From the BRIR, five listening conditions were created with varying amounts of early reflections. Young-adult listeners with normal hearing participated in a fixed SNR word intelligibility task and a variable SNR task to test if speech intelligibility was improved in competing noise when recorded early reflections were present as compared to direct speech alone. Mean speech intelligibility performance gains or SNR benefits were not observed with recorded early reflections. When simulated early reflections were included, improved speech understanding was observed for simulated reflections but for with real reflections. Spectral, temporal, and phonemic analyses were performed to investigate acoustic differences in recorded and simulated reflections. Spectral distortions in the recorded reflections may have

  17. Automatic Voice Pathology Detection With Running Speech by Using Estimation of Auditory Spectrum and Cepstral Coefficients Based on the All-Pole Model.

    Science.gov (United States)

    Ali, Zulfiqar; Elamvazuthi, Irraivan; Alsulaiman, Mansour; Muhammad, Ghulam

    2016-11-01

    Automatic voice pathology detection using sustained vowels has been widely explored. Because of the stationary nature of the speech waveform, pathology detection with a sustained vowel is a comparatively easier task than that using a running speech. Some disorder detection systems with running speech have also been developed, although most of them are based on a voice activity detection (VAD), that is, itself a challenging task. Pathology detection with running speech needs more investigation, and systems with good accuracy (ACC) are required. Furthermore, pathology classification systems with running speech have not received any attention from the research community. In this article, automatic pathology detection and classification systems are developed using text-dependent running speech without adding a VAD module. A set of three psychophysics conditions of hearing (critical band spectral estimation, equal loudness hearing curve, and the intensity loudness power law of hearing) is used to estimate the auditory spectrum. The auditory spectrum and all-pole models of the auditory spectrums are computed and analyzed and used in a Gaussian mixture model for an automatic decision. In the experiments using the Massachusetts Eye & Ear Infirmary database, an ACC of 99.56% is obtained for pathology detection, and an ACC of 93.33% is obtained for the pathology classification system. The results of the proposed systems outperform the existing running-speech-based systems. The developed system can effectively be used in voice pathology detection and classification systems, and the proposed features can visually differentiate between normal and pathological samples. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  18. Simulating Children's Retrieval Errors in Picture-Naming: A Test of Foygel and Dell's (2000) Semantic/Phonological Model of Speech Production

    Science.gov (United States)

    Budd, Mary-Jane; Hanley, J. Richard; Griffiths, Yvonne

    2011-01-01

    This study investigated whether Foygel and Dell's (2000) interactive two-step model of speech production could simulate the number and type of errors made in picture-naming by 68 children of elementary-school age. Results showed that the model provided a satisfactory simulation of the mean error profile of children aged five, six, seven, eight and…

  19. Stable 1-Norm Error Minimization Based Linear Predictors for Speech Modeling

    DEFF Research Database (Denmark)

    Giacobello, Daniele; Christensen, Mads Græsbøll; Jensen, Tobias Lindstrøm

    2014-01-01

    saturations when this is used to synthesize speech. In this paper, we introduce two new methods to obtain intrinsically stable predictors with the 1-norm minimization. The first method is based on constraining the roots of the predictor to lie within the unit circle by reducing the numerical range...... of the shift operator associated with the particular prediction problem considered. The second method uses the alternative Cauchy bound to impose a convex constraint on the predictor in the 1-norm error minimization. These methods are compared with two existing methods: the Burg method, based on the 1-norm...

  20. A logical model provides insights into T cell receptor signaling.

    Directory of Open Access Journals (Sweden)

    Julio Saez-Rodriguez

    2007-08-01

    Full Text Available Cellular decisions are determined by complex molecular interaction networks. Large-scale signaling networks are currently being reconstructed, but the kinetic parameters and quantitative data that would allow for dynamic modeling are still scarce. Therefore, computational studies based upon the structure of these networks are of great interest. Here, a methodology relying on a logical formalism is applied to the functional analysis of the complex signaling network governing the activation of T cells via the T cell receptor, the CD4/CD8 co-receptors, and the accessory signaling receptor CD28. Our large-scale Boolean model, which comprises 94 nodes and 123 interactions and is based upon well-established qualitative knowledge from primary T cells, reveals important structural features (e.g., feedback loops and network-wide dependencies and recapitulates the global behavior of this network for an array of published data on T cell activation in wild-type and knock-out conditions. More importantly, the model predicted unexpected signaling events after antibody-mediated perturbation of CD28 and after genetic knockout of the kinase Fyn that were subsequently experimentally validated. Finally, we show that the logical model reveals key elements and potential failure modes in network functioning and provides candidates for missing links. In summary, our large-scale logical model for T cell activation proved to be a promising in silico tool, and it inspires immunologists to ask new questions. We think that it holds valuable potential in foreseeing the effects of drugs and network modifications.

  1. Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2016-01-01

    In this paper, single channel speech enhancement in the time domain is considered. We address the problem of modelling non-stationary speech by describing the voiced speech parts by a harmonic linear chirp model instead of using the traditional harmonic model. This means that the speech signal...... covariance matrix estimate is obtained. We suggest using this estimate in combination with other filters such as the Wiener filter. The performance of the Wiener filter and LCMV filter are compared using the APES noise covariance matrix estimate and a power spectral density (PSD) based noise covariance...... matrix estimate. It is shown that the APES covariance matrix works well in combination with the Wiener filter, and the PSD based covariance matrix works well in combination with the LCMV filter....

  2. Speech recognition employing biologically plausible receptive fields

    DEFF Research Database (Denmark)

    Fereczkowski, Michal; Bothe, Hans-Heinrich

    2011-01-01

    The main idea of the project is to build a widely speaker-independent, biologically motivated automatic speech recognition (ASR) system. The two main differences between our approach and current state-of-the-art ASRs are that i) the features used here are based on the responses of neuronlike...... Model-based adaptation procedures. Two databases are used, TI46 for discrete speech a subset of the TIMIT database collected from speakers belonging to the New York dialect region. Each of the selection of 10 sentences is uttered once by each of 35 speakers. The major differences between the two data...... sets initiate the development and comparison of two distinct ASRs within the project, which will be presented in the following. Employing a reduced sampling frequency and bandwidth of the signals, the ASR algorithm reaches and goes beyond recognition results that are known from humans....

  3. Development of binaural speech transmission index

    NARCIS (Netherlands)

    Wijngaarden, S.J. van; Drullman, R.

    2006-01-01

    Although the speech transmission index (STI) is a well-accepted and standardized method for objective prediction of speech intelligibility in a wide range of-environments and applications, it is essentially a monaural model. Advantages of binaural hearing to the intelligibility of speech are

  4. Examining speech perception in noise and cognitive functions in the elderly.

    Science.gov (United States)

    Meister, Hartmut; Schreitmüller, Stefan; Grugel, Linda; Beutner, Dirk; Walger, Martin; Meister, Ingo

    2013-12-01

    The purpose of this study was to investigate the relationship of cognitive functions (i.e., working memory [WM]) and speech recognition against different background maskers in older individuals. Speech reception thresholds (SRTs) were determined using a matrix-sentence test. Unmodulated noise, modulated noise (International Collegium for Rehabilitative Audiology [ICRA] noise 5-250), and speech fragments (International Speech Test Signal [ISTS]) were used as background maskers. Verbal WM was assessed using the Verbal Learning and Memory Test (VLMT; Helmstaedter & Durwen, 1990). Measurements were conducted with 14 normal-hearing older individuals and a control group of 12 normal-hearing young listeners. Despite their normal hearing ability, the young listeners outperformed the older individuals in all background maskers. These differences were largest for the modulated maskers. SRTs were significantly correlated with the scores of the VLMT. A linear regression model also included WM as the only significant predictor variable. The results support the assumption that WM plays an important role for speech understanding and that it might have impact on results obtained using speech audiometry. Thus, an individual's WM capacity should be considered with aural diagnosis and rehabilitation. The VLMT proved to be a clinically applicable test for WM. Further cognitive functions important with speech understanding are currently being investigated within the SAKoLA (Sprachaudiometrie und kognitive Leistungen im Alter [Speech Audiometry and Cognitive Functions in the Elderly]) project.

  5. Broadening the "ports of entry" for speech-language pathologists: a relational and reflective model for clinical supervision.

    Science.gov (United States)

    Geller, Elaine; Foley, Gilbert M

    2009-02-01

    To offer a framework for clinical supervision in speech-language pathology that embeds a mental health perspective within the study of communication sciences and disorders. Key mental health constructs are examined as to how they are applied in traditional versus relational and reflective supervision models. Comparisons between traditional and relational and reflective approaches are outlined, with reference to each mental health construct and the developmental level of the supervisee. Three stages of supervisee development are proposed based on research from various disciplines, including nursing, psychology, speech-language pathology, social work, and education. Each developmental stage is characterized by shifts or changes in the supervisee's underlying assumptions, beliefs, and patterns of behavior. This article makes the case that both the cognitive and affective dimensions of the supervisor-supervisee relationship need to be addressed without minimizing the necessary development of discipline-specific expertise. The developmental stages outlined in this paradigm can be used to understand supervisees' patterns of change and growth over time, as well as to create optimal learning environments that match their developmental level and knowledge base.

  6. Expanding the "ports of entry" for speech-language pathologists: a relational and reflective model for clinical practice.

    Science.gov (United States)

    Geller, Elaine; Foley, Gilbert M

    2009-02-01

    To outline an expanded framework for clinical practice in speech-language pathology. This framework broadens the focus on discipline-specific knowledge and infuses mental health constructs within the study of communication sciences and disorders, with the objective of expanding the potential "ports or points of entry" (D. Stern, 1995) for clinical intervention with young children who are language impaired. Specific mental health constructs are highlighted in this article. These include relationship-based learning, attachment theory, working dyadically (the client is the child and parent), reflective practice, transference-countertransference, and the use of self. Each construct is explored as to the way it has been applied in traditional and contemporary models of clinical practice. The underlying premise in this framework is that working from a relationally based and reflective perspective augments change and growth in both client and parent(s). The challenge is for speech-language pathologists to embed mental health constructs within their discipline-specific expertise. This leads to paying attention to both observable aspects of clients' behaviors as well as their internal affective states.

  7. Sources of Variability in Consonant Perception and Implications for Speech Perception Modeling

    DEFF Research Database (Denmark)

    Zaar, Johannes; Dau, Torsten

    2016-01-01

    to the considered sources of variability using a measure of the perceptual distance between responses. The largest effect was found across different CVs. For stimuli of the same phonetic identity, the speech­induced  variability  across  and  within talkers  and the  across­listener  variability were  substantial......The  present  study  investigated  the  influence  of  various  sources  of response  variability  in  consonant  perception.  A  distinction  was  made  between source­induced variability and receiver­related variability. The former refers to perceptual differences induced by differences in the speech...

  8. Acoustic signal characterization of a ball milling machine model

    International Nuclear Information System (INIS)

    Andrade-Romero, J Alexis; Romero, Jesus F A; Amestegui, Mauricio

    2011-01-01

    Los Angeles machine is used both for mining process and for standard testing covering strength of materials. As the present work is focused on the latter application, an improvement in the estimation procedure for the resistance percentage of small-size coarse aggregate is presented. More precisely, is proposed a pattern identification strategy of the vibratory signal for estimating the resistance percentage using a simplified chaotic model and the continuous wavelet transform.

  9. Apraxia of Speech

    Science.gov (United States)

    ... here Home » Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... additional information about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...

  10. Nonlinear signal processing using neural networks: Prediction and system modelling

    Energy Technology Data Exchange (ETDEWEB)

    Lapedes, A.; Farber, R.

    1987-06-01

    The backpropagation learning algorithm for neural networks is developed into a formalism for nonlinear signal processing. We illustrate the method by selecting two common topics in signal processing, prediction and system modelling, and show that nonlinear applications can be handled extremely well by using neural networks. The formalism is a natural, nonlinear extension of the linear Least Mean Squares algorithm commonly used in adaptive signal processing. Simulations are presented that document the additional performance achieved by using nonlinear neural networks. First, we demonstrate that the formalism may be used to predict points in a highly chaotic time series with orders of magnitude increase in accuracy over conventional methods including the Linear Predictive Method and the Gabor-Volterra-Weiner Polynomial Method. Deterministic chaos is thought to be involved in many physical situations including the onset of turbulence in fluids, chemical reactions and plasma physics. Secondly, we demonstrate the use of the formalism in nonlinear system modelling by providing a graphic example in which it is clear that the neural network has accurately modelled the nonlinear transfer function. It is interesting to note that the formalism provides explicit, analytic, global, approximations to the nonlinear maps underlying the various time series. Furthermore, the neural net seems to be extremely parsimonious in its requirements for data points from the time series. We show that the neural net is able to perform well because it globally approximates the relevant maps by performing a kind of generalized mode decomposition of the maps. 24 refs., 13 figs.

  11. Stochastic Model of Traffic Jam and Traffic Signal Control

    Science.gov (United States)

    Shin, Ji-Sun; Cui, Cheng-You; Lee, Tae-Hong; Lee, Hee-Hyol

    Traffic signal control is an effective method to solve the traffic jam. and forecasting traffic density has been known as an important part of the Intelligent Transportation System (ITS). The several methods of the traffic signal control are known such as random walk method, Neuron Network method, Bayesian Network method, and so on. In this paper, we propose a new method of a traffic signal control using a predicted distribution of traffic jam based on a Dynamic Bayesian Network model. First, a forecasting model to predict a probabilistic distribution of the traffic jam during each period of traffic lights is built. As the forecasting model, the Dynamic Bayesian Network is used to predict the probabilistic distribution of a density of the traffic jam. According to measurement of two crossing points for each cycle, the inflow and outflow of each direction and the number of standing vehicles at former cycle are obtained. The number of standing vehicle at k-th cycle will be calculated synchronously. Next, the probabilistic distribution of the density of standing vehicle in each cycle will be predicted using the Dynamic Bayesian Network constructed for the traffic jam. And then a control rule to adjust the split and the cycle to increase the probability between a lower limit and ceiling of the standing vehicles is deduced. As the results of the simulation using the actual traffic data of Kitakyushu city, the effectiveness of the method is shown.

  12. Expectations and speech intelligibility.

    Science.gov (United States)

    Babel, Molly; Russell, Jamie

    2015-05-01

    Socio-indexical cues and paralinguistic information are often beneficial to speech processing as this information assists listeners in parsing the speech stream. Associations that particular populations speak in a certain speech style can, however, make it such that socio-indexical cues have a cost. In this study, native speakers of Canadian English who identify as Chinese Canadian and White Canadian read sentences that were presented to listeners in noise. Half of the sentences were presented with a visual-prime in the form of a photo of the speaker and half were presented in control trials with fixation crosses. Sentences produced by Chinese Canadians showed an intelligibility cost in the face-prime condition, whereas sentences produced by White Canadians did not. In an accentedness rating task, listeners rated White Canadians as less accented in the face-prime trials, but Chinese Canadians showed no such change in perceived accentedness. These results suggest a misalignment between an expected and an observed speech signal for the face-prime trials, which indicates that social information about a speaker can trigger linguistic associations that come with processing benefits and costs.

  13. MOTORCYCLE CRASH PREDICTION MODEL FOR NON-SIGNALIZED INTERSECTIONS

    Directory of Open Access Journals (Sweden)

    S. HARNEN

    2003-01-01

    Full Text Available This paper attempts to develop a prediction model for motorcycle crashes at non-signalized intersections on urban roads in Malaysia. The Generalized Linear Modeling approach was used to develop the model. The final model revealed that an increase in motorcycle and non-motorcycle flows entering an intersection is associated with an increase in motorcycle crashes. Non-motorcycle flow on major road had the greatest effect on the probability of motorcycle crashes. Approach speed, lane width, number of lanes, shoulder width and land use were also found to be significant in explaining motorcycle crashes. The model should assist traffic engineers to decide the need for appropriate intersection treatment that specifically designed for non-exclusive motorcycle lane facilities.

  14. Speech identity conversion

    Czech Academy of Sciences Publication Activity Database

    Vondra, Martin; Vích, Robert

    Vol. 3445, - (2005), s. 421-426 ISSN 0302-9743. [International Summer School on Neural Nets "E. R. Caianiello". Course: Nonlinear Speech Modeling and Applications /9./. Vietri sul Mare, 13.09.2004-18.09.2004] R&D Projects: GA ČR(CZ) GA102/04/1097; GA ČR(CZ) GA102/02/0124; GA MŠk(CZ) OC 277.001 Institutional research plan: CEZ:AV0Z2067918 Keywords : speech synthesis * computer science Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering Impact factor: 0.402, year: 2005

  15. Volume conductor model of transcutaneous electrical stimulation with kilohertz signals

    Science.gov (United States)

    Medina, Leonel E.; Grill, Warren M.

    2014-12-01

    Objective. Incorporating high-frequency components in transcutaneous electrical stimulation (TES) waveforms may make it possible to stimulate deeper nerve fibers since the impedance of tissue declines with increasing frequency. However, the mechanisms of high-frequency TES remain largely unexplored. We investigated the properties of TES with frequencies beyond those typically used in neural stimulation. Approach. We implemented a multilayer volume conductor model including dispersion and capacitive effects, coupled to a cable model of a nerve fiber. We simulated voltage- and current-controlled transcutaneous stimulation, and quantified the effects of frequency on the distribution of potentials and fiber excitation. We also quantified the effects of a novel transdermal amplitude modulated signal (TAMS) consisting of a non-zero offset sinusoidal carrier modulated by a square-pulse train. Main results. The model revealed that high-frequency signals generated larger potentials at depth than did low frequencies, but this did not translate into lower stimulation thresholds. Both TAMS and conventional rectangular pulses activated more superficial fibers in addition to the deeper, target fibers, and at no frequency did we observe an inversion of the strength-distance relationship. Current regulated stimulation was more strongly influenced by fiber depth, whereas voltage regulated stimulation was more strongly influenced by skin thickness. Finally, our model reproduced the threshold-frequency relationship of experimentally measured motor thresholds. Significance. The model may be used for prediction of motor thresholds in TES, and contributes to the understanding of high-frequency TES.

  16. Modeling Guidelines for Code Generation in the Railway Signaling Context

    Science.gov (United States)

    Ferrari, Alessio; Bacherini, Stefano; Fantechi, Alessandro; Zingoni, Niccolo

    2009-01-01

    Modeling guidelines constitute one of the fundamental cornerstones for Model Based Development. Their relevance is essential when dealing with code generation in the safety-critical domain. This article presents the experience of a railway signaling systems manufacturer on this issue. Introduction of Model-Based Development (MBD) and code generation in the industrial safety-critical sector created a crucial paradigm shift in the development process of dependable systems. While traditional software development focuses on the code, with MBD practices the focus shifts to model abstractions. The change has fundamental implications for safety-critical systems, which still need to guarantee a high degree of confidence also at code level. Usage of the Simulink/Stateflow platform for modeling, which is a de facto standard in control software development, does not ensure by itself production of high-quality dependable code. This issue has been addressed by companies through the definition of modeling rules imposing restrictions on the usage of design tools components, in order to enable production of qualified code. The MAAB Control Algorithm Modeling Guidelines (MathWorks Automotive Advisory Board)[3] is a well established set of publicly available rules for modeling with Simulink/Stateflow. This set of recommendations has been developed by a group of OEMs and suppliers of the automotive sector with the objective of enforcing and easing the usage of the MathWorks tools within the automotive industry. The guidelines have been published in 2001 and afterwords revisited in 2007 in order to integrate some additional rules developed by the Japanese division of MAAB [5]. The scope of the current edition of the guidelines ranges from model maintainability and readability to code generation issues. The rules are conceived as a reference baseline and therefore they need to be tailored to comply with the characteristics of each industrial context. Customization of these

  17. An exploratory study on the driving method of speech synthesis based on the human eye reading imaging data

    Science.gov (United States)

    Gao, Pei-pei; Liu, Feng

    2016-10-01

    With the development of information technology and artificial intelligence, speech synthesis plays a significant role in the fields of Human-Computer Interaction Techniques. However, the main problem of current speech synthesis techniques is lacking of naturalness and expressiveness so that it is not yet close to the standard of natural language. Another problem is that the human-computer interaction based on the speech synthesis is too monotonous to realize mechanism of user subjective drive. This thesis introduces the historical development of speech synthesis and summarizes the general process of this technique. It is pointed out that prosody generation module is an important part in the process of speech synthesis. On the basis of further research, using eye activity rules when reading to control and drive prosody generation was introduced as a new human-computer interaction method to enrich the synthetic form. In this article, the present situation of speech synthesis technology is reviewed in detail. Based on the premise of eye gaze data extraction, using eye movement signal in real-time driving, a speech synthesis method which can express the real speech rhythm of the speaker is proposed. That is, when reader is watching corpora with its eyes in silent reading, capture the reading information such as the eye gaze duration per prosodic unit, and establish a hierarchical prosodic pattern of duration model to determine the duration parameters of synthesized speech. At last, after the analysis, the feasibility of the above method is verified.

  18. A Segmented Signal Progression Model for the Modern Streetcar System

    Directory of Open Access Journals (Sweden)

    Baojie Wang

    2015-01-01

    Full Text Available This paper is on the purpose of developing a segmented signal progression model for modern streetcar system. The new method is presented with the following features: (1 the control concept is based on the assumption of only one streetcar line operating along an arterial under a constant headway and no bandwidth demand for streetcar system signal progression; (2 the control unit is defined as a coordinated intersection group associated with several streetcar stations, and the control joints must be streetcar stations; (3 the objective function is built to ensure the two-way streetcar arrival times distributing within the available time of streetcar phase; (4 the available time of streetcar phase is determined by timing schemes, intersection structures, track locations, streetcar speeds, and vehicular accelerations; (5 the streetcar running speed is constant separately whether it is in upstream or downstream route; (6 the streetcar dwell time is preset according to historical data distribution or charging demand. The proposed method is experimentally examined in Hexi New City Streetcar Project in Nanjing, China. In the experimental results, the streetcar system operation and the progression impacts are shown to affect transit and vehicular traffic. The proposed model presents promising outcomes through the design of streetcar system segmented signal progression, in terms of ensuring high streetcar system efficiency and minimizing negative impacts on transit and vehicular traffic.

  19. Timing in audiovisual speech perception: A mini review and new psychophysical data.

    Science.gov (United States)

    Venezia, Jonathan H; Thurman, Steven M; Matchin, William; George, Sahara E; Hickok, Gregory

    2016-02-01

    Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.

  20. Mathematical modeling of gonadotropin-releasing hormone signaling.

    Science.gov (United States)

    Pratap, Amitesh; Garner, Kathryn L; Voliotis, Margaritis; Tsaneva-Atanasova, Krasimira; McArdle, Craig A

    2017-07-05

    Gonadotropin-releasing hormone (GnRH) acts via G-protein coupled receptors on pituitary gonadotropes to control reproduction. These are G q -coupled receptors that mediate acute effects of GnRH on the exocytotic secretion of luteinizing hormone (LH) and follicle-stimulating hormone (FSH), as well as the chronic regulation of their synthesis. GnRH is secreted in short pulses and GnRH effects on its target cells are dependent upon the dynamics of these pulses. Here we overview GnRH receptors and their signaling network, placing emphasis on pulsatile signaling, and how mechanistic mathematical models and an information theoretic approach have helped further this field. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  1. An extended car-following model at signalized intersections

    Science.gov (United States)

    Yu, Shaowei; Shi, Zhongke

    2014-08-01

    To simulate car-following behaviors better when the traffic light is red, three successive car-following data at a signalized intersection of Jinan in China were collected by using a new proposed data acquisition method and then analyzed to select input variables of the extended car-following model. An extended car-following model considering two leading cars' accelerations was proposed, calibrated and verified with field data obtained on the basis of the full velocity difference model and then a comparative model used for comparative research was also proposed and calibrated in the light of the GM model. The results indicate that the extended car-following model could fit measured data well, and that the fitting precision of the extended model is prior to the comparative model, whose mean absolute error is reduced by 22.83%. Finally a theoretical car-following model considering multiple leading cars' accelerations was put forward which has potential applicable to vehicle automation system and vehicle safety early warning system, and then the linear stability analysis and numerical simulations were conducted to analyze some observed physical features existing in the realistic traffic.

  2. Speech Enhancement

    DEFF Research Database (Denmark)

    Benesty, Jacob; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    and their performance bounded and assessed in terms of noise reduction and speech distortion. The book shows how various filter designs can be obtained in this framework, including the maximum SNR, Wiener, LCMV, and MVDR filters, and how these can be applied in various contexts, like in single-channel and multichannel...

  3. Speech Intelligibility

    Science.gov (United States)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  4. Speech masking and cancelling and voice obscuration

    Science.gov (United States)

    Holzrichter, John F.

    2013-09-10

    A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

  5. Phonemic Characteristics of Apraxia of Speech Resulting from Subcortical Hemorrhage

    Science.gov (United States)

    Peach, Richard K.; Tonkovich, John D.

    2004-01-01

    Reports describing subcortical apraxia of speech (AOS) have received little consideration in the development of recent speech processing models because the speech characteristics of patients with this diagnosis have not been described precisely. We describe a case of AOS with aphasia secondary to basal ganglia hemorrhage. Speech-language symptoms…

  6. Speech perception in the presence of other sounds

    Science.gov (United States)

    Darwin, C. J.

    2005-04-01

    The human listener's remarkable ability to recognize speech when it is mixed with other sounds presents a challenge both to models of speech perception and to approaches to speech recognition. This talk will review some of the work on how human listeners can perceive speech in sound mixtures and will try to indicate areas that might be particularly fruitful for future research.

  7. 78 FR 49717 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities... Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay...

  8. Signal analysis of accelerometry data using gravity-based modeling

    Science.gov (United States)

    Davey, Neil P.; James, Daniel A.; Anderson, Megan E.

    2004-03-01

    Triaxial accelerometers have been used to measure human movement parameters in swimming. Interpretation of data is difficult due to interference sources including interaction of external bodies. In this investigation the authors developed a model to simulate the physical movement of the lower back. Theoretical accelerometery outputs were derived thus giving an ideal, or noiseless dataset. An experimental data collection apparatus was developed by adapting a system to the aquatic environment for investigation of swimming. Model data was compared against recorded data and showed strong correlation. Comparison of recorded and modeled data can be used to identify changes in body movement, this is especially useful when cyclic patterns are present in the activity. Strong correlations between data sets allowed development of signal processing algorithms for swimming stroke analysis using first the pure noiseless data set which were then applied to performance data. Video analysis was also used to validate study results and has shown potential to provide acceptable results.

  9. Technical foundations of TANDEM-STRAIGHT, a speech analysis ...

    Indian Academy of Sciences (India)

    Speech analysis; fundamental frequency; speech synthesis; consistent sampling; periodic signals. Abstract. This article presents comprehensive technical information about STRAIGHT and TANDEM-STRAIGHT, a widely used speech modification tool and its successor. They share the same concept: the periodic excitation ...

  10. Using the PLUM procedure of SPSS to fit unequal variance and generalized signal detection models.

    Science.gov (United States)

    DeCarlo, Lawrence T

    2003-02-01

    The recent addition of aprocedure in SPSS for the analysis of ordinal regression models offers a simple means for researchers to fit the unequal variance normal signal detection model and other extended signal detection models. The present article shows how to implement the analysis and how to interpret the SPSS output. Examples of fitting the unequal variance normal model and other generalized signal detection models are given. The approach offers a convenient means for applying signal detection theory to a variety of research.

  11. Metabolic networks: a signal-oriented approach to cellular models.

    Science.gov (United States)

    Lengeler, J W

    2000-01-01

    Complete genomes, far advanced proteomes, and even 'metabolomes' are available for at least a few organisms, e.g., Escherichia coli. Systematic functional analyses of such complete data sets will produce a wealth of information and promise an understanding of the dynamics of complex biological networks and perhaps even of entire living organisms. Such complete and holistic descriptions of biological systems, however, will increasingly require a quantitative analysis and the help of mathematical models for simulating whole systems. In particular, new procedures are required that allow a meaningful reduction of the information derived from complex systems that will consequently be used in the modeling process. In this review the biological elements of such a modeling procedure will be described. In a first step, complex living systems must be structured into well-defined and clearly delimited functional units, the elements of which have a common physiological goal, belong to a single genetic unit, and respond to the signals of a signal transduction system that senses changes in physiological states of the organism. These functional units occur at each level of complexity and more complex units originate by grouping several lower level elements into a single, more complex unit. To each complexity level corresponds a global regulator that is epistatic over lower level regulators. After its structuring into modules (functional units), a biological system is converted in a second step into mathematical submodels that by progressive combination can also be assembled into more aggregated model structures. Such a simplification of a cell (an organism) reduces its complexity to a level amenable to present modeling capacities. The universal biochemistry, however, promises a set of rules valid for modeling biological systems, from unicellular microorganisms and cells, to multicellular organisms and to populations.

  12. Multinomial Logit Model of Pedestrian Crossing Behaviors at Signalized Intersections

    Directory of Open Access Journals (Sweden)

    Zhu-Ping Zhou

    2013-01-01

    Full Text Available Pedestrian crashes, making up a large proportion of road casualties, are more likely to occur at signalized intersections in China. This paper aims to study the different pedestrian behaviors of regular users, late starters, sneakers, and partial sneakers. Behavior information was observed manually in the field study. After that, the survey team distributed a questionnaire to the same participant who has been observed, to acquire detailed demographic and socioeconomic characteristics as well as attitude and preference indicators. Totally, 1878 pedestrians were surveyed at 16 signalized intersections in Nanjing. First, correlation analysis is performed to analyze each factor’s effect. Then, five latent variables including safety, conformity, comfort, flexibility, and fastness are obtained by structure equation modeling (SEM. Moreover, based on the results of SEM, a multinomial logit model with latent variables is developed to describe how the factors influence pedestrians’ behavior. Finally, some conclusions are drawn from the model: (1 for the choice of being late starters, arrival time, the presence of oncoming cars, and crosswalk length are the most important factors; (2 gender has the most significant effect on the pedestrians to be sneakers; and (3 age is the most important factor when pedestrians choose to be partial sneakers.

  13. A causal test of the motor theory of speech perception: a case of impaired speech production and spared speech perception.

    Science.gov (United States)

    Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z

    2015-01-01

    The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.

  14. Left dorsal speech stream components and their contribution to phonological processing.

    Science.gov (United States)

    Murakami, Takenobu; Kell, Christian A; Restle, Julia; Ugawa, Yoshikazu; Ziemann, Ulf

    2015-01-28

    Models propose an auditory-motor mapping via a left-hemispheric dorsal speech-processing stream, yet its detailed contributions to speech perception and production are unclear. Using fMRI-navigated repetitive transcranial magnetic stimulation (rTMS), we virtually lesioned left dorsal stream components in healthy human subjects and probed the consequences on speech-related facilitation of articulatory motor cortex (M1) excitability, as indexed by increases in motor-evoked potential (MEP) amplitude of a lip muscle, and on speech processing performance in phonological tests. Speech-related MEP facilitation was disrupted by rTMS of the posterior superior temporal sulcus (pSTS), the sylvian parieto-temporal region (SPT), and by double-knock-out but not individual lesioning of pars opercularis of the inferior frontal gyrus (pIFG) and the dorsal premotor cortex (dPMC), and not by rTMS of the ventral speech-processing stream or an occipital control site. RTMS of the dorsal stream but not of the ventral stream or the occipital control site caused deficits specifically in the processing of fast transients of the acoustic speech signal. Performance of syllable and pseudoword repetition correlated with speech-related MEP facilitation, and this relation was abolished with rTMS of pSTS, SPT, and pIFG. Findings provide direct evidence that auditory-motor mapping in the left dorsal stream causes reliable and specific speech-related MEP facilitation in left articulatory M1. The left dorsal stream targets the articulatory M1 through pSTS and SPT constituting essential posterior input regions and parallel via frontal pathways through pIFG and dPMC. Finally, engagement of the left dorsal stream is necessary for processing of fast transients in the auditory signal. Copyright © 2015 the authors 0270-6474/15/351411-12$15.00/0.

  15. Internal modeling of upcoming speech: A causal role of the right posterior cerebellum in non-motor aspects of language production.

    Science.gov (United States)

    Runnqvist, Elin; Bonnard, Mireille; Gauvin, Hanna S; Attarian, Shahram; Trébuchon, Agnès; Hartsuiker, Robert J; Alario, F-Xavier

    2016-08-01

    Some language processing theories propose that, just as for other somatic actions, self-monitoring of language production is achieved through internal modeling. The cerebellum is the proposed center of such internal modeling in motor control, and the right cerebellum has been linked to an increasing number of language functions, including predictive processing during comprehension. Relating these findings, we tested whether the right posterior cerebellum has a causal role for self-monitoring of speech errors. Participants received 1 Hz repetitive transcranial magnetic stimulation during 15 min to lobules Crus I and II in the right hemisphere, and, in counterbalanced orders, to the contralateral area in the left cerebellar hemisphere (control) in order to induce a temporary inactivation of one of these zones. Immediately afterwards, they engaged in a speech production task priming the production of speech errors. Language production was impaired after right compared to left hemisphere stimulation, a finding that provides evidence for a causal role of the cerebellum during language production. We interpreted this role in terms of internal modeling of upcoming speech through a verbal working memory process used to prevent errors. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Predicting the perceived sound quality of frequency-compressed speech.

    Directory of Open Access Journals (Sweden)

    Rainer Huber

    Full Text Available The performance of objective speech and audio quality measures for the prediction of the perceived quality of frequency-compressed speech in hearing aids is investigated in this paper. A number of existing quality measures have been applied to speech signals processed by a hearing aid, which compresses speech spectra along frequency in order to make information contained in higher frequencies audible for listeners with severe high-frequency hearing loss. Quality measures were compared with subjective ratings obtained from normal hearing and hearing impaired children and adults in an earlier study. High correlations were achieved with quality measures computed by quality models that are based on the auditory model of Dau et al., namely, the measure PSM, computed by the quality model PEMO-Q; the measure qc, computed by the quality model proposed by Hansen and Kollmeier; and the linear subcomponent of the HASQI. For the prediction of quality ratings by hearing impaired listeners, extensions of some models incorporating hearing loss were implemented and shown to achieve improved prediction accuracy. Results indicate that these objective quality measures can potentially serve as tools for assisting in initial setting of frequency compression parameters.

  17. An integrated approach to improving noisy speech perception

    Science.gov (United States)

    Koval, Serguei; Stolbov, Mikhail; Smirnova, Natalia; Khitrov, Mikhail

    2002-05-01

    For a number of practical purposes and tasks, experts have to decode speech recordings of very poor quality. A combination of techniques is proposed to improve intelligibility and quality of distorted speech messages and thus facilitate their comprehension. Along with the application of noise cancellation and speech signal enhancement techniques removing and/or reducing various kinds of distortions and interference (primarily unmasking and normalization in time and frequency fields), the approach incorporates optimal listener expert tactics based on selective listening, nonstandard binaural listening, accounting for short-term and long-term human ear adaptation to noisy speech, as well as some methods of speech signal enhancement to support speech decoding during listening. The approach integrating the suggested techniques ensures high-quality ultimate results and has successfully been applied by Speech Technology Center experts and by numerous other users, mainly forensic institutions, to perform noisy speech records decoding for courts, law enforcement and emergency services, accident investigation bodies, etc.

  18. The relationship between the intelligibility of time-compressed speech and speech in noise in young and elderly listeners

    NARCIS (Netherlands)

    Versfeld, Niek J.; Dreschler, Wouter A.

    2002-01-01

    A conventional measure to determine the ability to understand speech in noisy backgrounds is the so-called speech reception threshold (SRT) for sentences. It yields the signal-to-noise ratio (in dB) for which half of the sentences are correctly perceived. The SRT defines to what degree speech must

  19. Small- and large-signal modeling of InP HBTs in transferred-substrate technology

    DEFF Research Database (Denmark)

    Johansen, Tom Keinicke; Rudolph, Matthias; Jensen, Thomas

    2014-01-01

    a direct parameter extraction methodology dedicated to III–V based HBTs. It is shown that the modeling of measured S-parameters can be improved in the millimeter-wave frequency range by augmenting the small-signal model with a description of AC current crowding. The extracted elements of the small......-signal model structure are employed as a starting point for the extraction of a large-signal model. The developed large-signal model for the TS-HBTs accurately predicts the DC over temperature and small-signal performance over bias as well as the large-signal performance at millimeter-wave frequencies....

  20. Speech perception in noise with a harmonic complex excited vocoder.

    Science.gov (United States)

    Churchill, Tyler H; Kan, Alan; Goupell, Matthew J; Ihlefeld, Antje; Litovsky, Ruth Y

    2014-04-01

    A cochlear implant (CI) presents band-pass-filtered acoustic envelope information by modulating current pulse train levels. Similarly, a vocoder presents envelope information by modulating an acoustic carrier. By studying how normal hearing (NH) listeners are able to understand degraded speech signals with a vocoder, the parameters that best simulate electric hearing and factors that might contribute to the NH-CI performance difference may be better understood. A vocoder with harmonic complex carriers (fundamental frequency, f0 = 100 Hz) was used to study the effect of carrier phase dispersion on speech envelopes and intelligibility. The starting phases of the harmonic components were randomly dispersed to varying degrees prior to carrier filtering and modulation. NH listeners were tested on recognition of a closed set of vocoded words in background noise. Two sets of synthesis filters simulated different amounts of current spread in CIs. Results showed that the speech vocoded with carriers whose starting phases were maximally dispersed was the most intelligible. Superior speech understanding may have been a result of the flattening of the dispersed-phase carrier's intrinsic temporal envelopes produced by the large number of interacting components in the high-frequency channels. Cross-correlogram analyses of auditory nerve model simulations confirmed that randomly dispersing the carrier's component starting phases resulted in better neural envelope representation. However, neural metrics extracted from these analyses were not found to accurately predict speech recognition scores for all vocoded speech conditions. It is possible that central speech understanding mechanisms are insensitive to the envelope-fine structure dichotomy exploited by vocoders.

  1. Acquiring neural signals for developing a perception and cognition model

    Science.gov (United States)

    Li, Wei; Li, Yunyi; Chen, Genshe; Shen, Dan; Blasch, Erik; Pham, Khanh; Lynch, Robert

    2012-06-01

    The understanding of how humans process information, determine salience, and combine seemingly unrelated information is essential to automated processing of large amounts of information that is partially relevant, or of unknown relevance. Recent neurological science research in human perception, and in information science regarding contextbased modeling, provides us with a theoretical basis for using a bottom-up approach for automating the management of large amounts of information in ways directly useful for human operators. However, integration of human intelligence into a game theoretic framework for dynamic and adaptive decision support needs a perception and cognition model. For the purpose of cognitive modeling, we present a brain-computer-interface (BCI) based humanoid robot system to acquire brainwaves during human mental activities of imagining a humanoid robot-walking behavior. We use the neural signals to investigate relationships between complex humanoid robot behaviors and human mental activities for developing the perception and cognition model. The BCI system consists of a data acquisition unit with an electroencephalograph (EEG), a humanoid robot, and a charge couple CCD camera. An EEG electrode cup acquires brainwaves from the skin surface on scalp. The humanoid robot has 20 degrees of freedom (DOFs); 12 DOFs located on hips, knees, and ankles for humanoid robot walking, 6 DOFs on shoulders and arms for arms motion, and 2 DOFs for head yaw and pitch motion. The CCD camera takes video clips of the human subject's hand postures to identify mental activities that are correlated to the robot-walking behaviors. We use the neural signals to investigate relationships between complex humanoid robot behaviors and human mental activities for developing the perception and cognition model.

  2. Speech Enhancement Based on Compressed Sensing Technology

    Directory of Open Access Journals (Sweden)

    Huiyan Xu

    2014-10-01

    Full Text Available Compressed sensing (CS is a sampled approach on signal sparsity-base, and it can effectively extract the information which is contained in the signal. This paper presents a noisy speech enhancement new method based on CS process. Algorithm uses a voice sparsity in the discrete fast Fourier transform (Fast Fourier transform, FFT, and complex domain observation matrix is designed, and the noisy speech compression measurement and de-noising are made by soft threshold, and the speech signal is sparsely reconstructed by separable approximation (Sparse Reconstruction by Separable Approximation, SpaRSA algorithm to restore, speech enhancement is improved. Experimental results show that the denoising compression reconstruction of the noisy signal is done in the algorithm, SNR margin is improved greatly, and the background noise can been more effectively suppressed.

  3. Psychophysics of Complex Auditory and Speech Stimuli

    National Research Council Canada - National Science Library

    Pastore, Richard

    1996-01-01

    The supported research provides a careful examination of the many different interrelated factors, processes, and constructs important to the perception by humans of complex acoustic signals, including speech and music...

  4. Linear collider signal of anomaly mediated supersymmetry breaking model

    International Nuclear Information System (INIS)

    Ghosh Dilip Kumar; Kundu, Anirban; Roy, Probir; Roy, Sourov

    2001-01-01

    Though the minimal model of anomaly mediated supersymmetry breaking has been significantly constrained by recent experimental and theoretical work, there are still allowed regions of the parameter space for moderate to large values of tan β. We show that these regions will be comprehensively probed in a √s = 1 TeV e + e - linear collider. Diagnostic signals to this end are studied by zeroing in on a unique and distinct feature of a large class of models in this genre: a neutral winolike Lightest Supersymmetric Particle closely degenerate in mass with a winolike chargino. The pair production processes e + e - → e tilde L ± e tilde L ± , e tilde R ± e tilde R ± , e tilde L ± e tilde R ± , ν tilde anti ν tilde, χ tilde 1 0 χ tilde 2 0 , χ tilde 2 0 χ tilde 2 0 are all considered at √s = 1 TeV corresponding to the proposed TESLA linear collider in two natural categories of mass ordering in the sparticle spectra. The signals analysed comprise multiple combinations of fast charged leptons (any of which can act as the trigger) plus displaced vertices X D (any of which can be identified by a heavy ionizing track terminating in the detector) and/or associated soft pions with characteristic momentum distributions. (author)

  5. Hierarchic stochastic modelling applied to intracellular Ca(2+ signals.

    Directory of Open Access Journals (Sweden)

    Gregor Moenke

    Full Text Available Important biological processes like cell signalling and gene expression have noisy components and are very complex at the same time. Mathematical analysis of such systems has often been limited to the study of isolated subsystems, or approximations are used that are difficult to justify. Here we extend a recently published method (Thurley and Falcke, PNAS 2011 which is formulated in observable system configurations instead of molecular transitions. This reduces the number of system states by several orders of magnitude and avoids fitting of kinetic parameters. The method is applied to Ca(2+ signalling. Ca(2+ is a ubiquitous second messenger transmitting information by stochastic sequences of concentration spikes, which arise by coupling of subcellular Ca(2+ release events (puffs. We derive analytical expressions for a mechanistic Ca(2+ model, based on recent data from live cell imaging, and calculate Ca(2+ spike statistics in dependence on cellular parameters like stimulus strength or number of Ca(2+ channels. The new approach substantiates a generic Ca(2+ model, which is a very convenient way to simulate Ca(2+ spike sequences with correct spiking statistics.

  6. Mathematical model with autoregressive process for electrocardiogram signals

    Science.gov (United States)

    Evaristo, Ronaldo M.; Batista, Antonio M.; Viana, Ricardo L.; Iarosz, Kelly C.; Szezech, José D., Jr.; Godoy, Moacir F. de

    2018-04-01

    The cardiovascular system is composed of the heart, blood and blood vessels. Regarding the heart, cardiac conditions are determined by the electrocardiogram, that is a noninvasive medical procedure. In this work, we propose autoregressive process in a mathematical model based on coupled differential equations in order to obtain the tachograms and the electrocardiogram signals of young adults with normal heartbeats. Our results are compared with experimental tachogram by means of Poincaré plot and dentrended fluctuation analysis. We verify that the results from the model with autoregressive process show good agreement with experimental measures from tachogram generated by electrical activity of the heartbeat. With the tachogram we build the electrocardiogram by means of coupled differential equations.

  7. Large-Signal DG-MOSFET Modelling for RFID Rectification

    Directory of Open Access Journals (Sweden)

    R. Rodríguez

    2016-01-01

    Full Text Available This paper analyses the undoped DG-MOSFETs capability for the operation of rectifiers for RFIDs and Wireless Power Transmission (WPT at microwave frequencies. For this purpose, a large-signal compact model has been developed and implemented in Verilog-A. The model has been numerically validated with a device simulator (Sentaurus. It is found that the number of stages to achieve the optimal rectifier performance is inferior to that required with conventional MOSFETs. In addition, the DC output voltage could be incremented with the use of appropriate mid-gap metals for the gate, as TiN. Minor impact of short channel effects (SCEs on rectification is also pointed out.

  8. Development of a System for Automatic Recognition of Speech

    Directory of Open Access Journals (Sweden)

    Roman Jarina

    2003-01-01

    Full Text Available The article gives a review of a research on processing and automatic recognition of speech signals (ARR at the Department of Telecommunications of the Faculty of Electrical Engineering, University of iilina. On-going research is oriented to speech parametrization using 2-dimensional cepstral analysis, and to an application of HMMs and neural networks for speech recognition in Slovak language. The article summarizes achieved results and outlines future orientation of our research in automatic speech recognition.

  9. Speech enhancement on smartphone voice recording

    International Nuclear Information System (INIS)

    Atmaja, Bagus Tris; Farid, Mifta Nur; Arifianto, Dhany

    2016-01-01

    Speech enhancement is challenging task in audio signal processing to enhance the quality of targeted speech signal while suppress other noises. In the beginning, the speech enhancement algorithm growth rapidly from spectral subtraction, Wiener filtering, spectral amplitude MMSE estimator to Non-negative Matrix Factorization (NMF). Smartphone as revolutionary device now is being used in all aspect of life including journalism; personally and professionally. Although many smartphones have two microphones (main and rear) the only main microphone is widely used for voice recording. This is why the NMF algorithm widely used for this purpose of speech enhancement. This paper evaluate speech enhancement on smartphone voice recording by using some algorithms mentioned previously. We also extend the NMF algorithm to Kulback-Leibler NMF with supervised separation. The last algorithm shows improved result compared to others by spectrogram and PESQ score evaluation. (paper)

  10. Segregation of Unvoiced Speech from Nonspeech Interference

    Science.gov (United States)

    2007-01-01

    ideal binary mask and its psychoacoustical support . As an illustration, Figure 1(a) shows a T-F representation of the waveform signal in Figure 1...speech signal that is periodic (harmonic) or quasi-periodic. In English, voiced speech includes all vowels , approximants, nasals, and certain stops...analysis concludes that unvoiced phonemes account for 21.0% of the total phoneme usage. For spoken English, French et al. (1930; see also Fletcher

  11. Prosodic Contrasts in Ironic Speech

    Science.gov (United States)

    Bryant, Gregory A.

    2010-01-01

    Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…

  12. Computational neuroanatomy of speech production.

    Science.gov (United States)

    Hickok, Gregory

    2012-01-05

    Speech production has been studied predominantly from within two traditions, psycholinguistics and motor control. These traditions have rarely interacted, and the resulting chasm between these approaches seems to reflect a level of analysis difference: whereas motor control is concerned with lower-level articulatory control, psycholinguistics focuses on higher-level linguistic processing. However, closer examination of both approaches reveals a substantial convergence of ideas. The goal of this article is to integrate psycholinguistic and motor control approaches to speech production. The result of this synthesis is a neuroanatomically grounded, hierarchical state feedback control model of speech production.

  13. Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity

    Science.gov (United States)

    Moses, David A.; Mesgarani, Nima; Leonard, Matthew K.; Chang, Edward F.

    2016-10-01

    Objective. The superior temporal gyrus (STG) and neighboring brain regions play a key role in human language processing. Previous studies have attempted to reconstruct speech information from brain activity in the STG, but few of them incorporate the probabilistic framework and engineering methodology used in modern speech recognition systems. In this work, we describe the initial efforts toward the design of a neural speech recognition (NSR) system that performs continuous phoneme recognition on English stimuli with arbitrary vocabulary sizes using the high gamma band power of local field potentials in the STG and neighboring cortical areas obtained via electrocorticography. Approach. The system implements a Viterbi decoder that incorporates phoneme likelihood estimates from a linear discriminant analysis model and transition probabilities from an n-gram phonemic language model. Grid searches were used in an attempt to determine optimal parameterizations of the feature vectors and Viterbi decoder. Main results. The performance of the system was significantly improved by using spatiotemporal representations of the neural activity (as opposed to purely spatial representations) and by including language modeling and Viterbi decoding in the NSR system. Significance. These results emphasize the importance of modeling the temporal dynamics of neural responses when analyzing their variations with respect to varying stimuli and demonstrate that speech recognition techniques can be successfully leveraged when decoding speech from neural signals. Guided by the results detailed in this work, further development of the NSR system could have applications in the fields of automatic speech recognition and neural prosthetics.

  14. Optimal sinusoidal modelling of gear mesh vibration signals for gear diagnosis and prognosis

    Science.gov (United States)

    Man, Zhihong; Wang, Wenyi; Khoo, Suiyang; Yin, Juliang

    2012-11-01

    In this paper, the synchronous signal average of gear mesh vibration signals is modelled with the multiple modulated sinusoidal representations. The signal model parameters are optimised against the measured signal averages by using the batch learning of the least squares technique. With the optimal signal model, all components of a gear mesh vibration signal, including the amplitude modulations, the phase modulations and the impulse vibration component induced by gear tooth cracking, are identified and analysed with insight of the gear tooth crack development and propagation. In particular, the energy distribution of the impulse vibration signal, extracted from the optimal signal model, provides sufficient information for monitoring and diagnosing the evolution of the tooth cracking process, leading to the prognosis of gear tooth cracking. The new methodologies for gear mesh signal modelling and the diagnosis of the gear tooth fault development and propagation are validated with a set of rig test data, which has shown excellent performance.

  15. Hybrid metaheuristic approaches to the expectation maximization for estimation of the hidden Markov model for signal modeling.

    Science.gov (United States)

    Huda, Shamsul; Yearwood, John; Togneri, Roberto

    2014-10-01

    The expectation maximization (EM) is the standard training algorithm for hidden Markov model (HMM). However, EM faces a local convergence problem in HMM estimation. This paper attempts to overcome this problem of EM and proposes hybrid metaheuristic approaches to EM for HMM. In our earlier research, a hybrid of a constraint-based evolutionary learning approach to EM (CEL-EM) improved HMM estimation. In this paper, we propose a hybrid simulated annealing stochastic version of EM (SASEM) that combines simulated annealing (SA) with EM. The novelty of our approach is that we develop a mathematical reformulation of HMM estimation by introducing a stochastic step between the EM steps and combine SA with EM to provide better control over the acceptance of stochastic and EM steps for better HMM estimation. We also extend our earlier work and propose a second hybrid which is a combination of an EA and the proposed SASEM, (EA-SASEM). The proposed EA-SASEM uses the best constraint-based EA strategies from CEL-EM and stochastic reformulation of HMM. The complementary properties of EA and SA and stochastic reformulation of HMM of SASEM provide EA-SASEM with sufficient potential to find better estimation for HMM. To the best of our knowledge, this type of hybridization and mathematical reformulation have not been explored in the context of EM and HMM training. The proposed approaches have been evaluated through comprehensive experiments to justify their effectiveness in signal modeling using the speech corpus: TIMIT. Experimental results show that proposed approaches obtain higher recognition accuracies than the EM algorithm and CEL-EM as well.

  16. Modelling noninvasively measured cerebral signals during a hypoxemia challenge: steps towards individualised modelling.

    Directory of Open Access Journals (Sweden)

    Beth Jelfs

    Full Text Available Noninvasive approaches to measuring cerebral circulation and metabolism are crucial to furthering our understanding of brain function. These approaches also have considerable potential for clinical use "at the bedside". However, a highly nontrivial task and precondition if such methods are to be used routinely is the robust physiological interpretation of the data. In this paper, we explore the ability of a previously developed model of brain circulation and metabolism to explain and predict quantitatively the responses of physiological signals. The five signals all noninvasively-measured during hypoxemia in healthy volunteers include four signals measured using near-infrared spectroscopy along with middle cerebral artery blood flow measured using transcranial Doppler flowmetry. We show that optimising the model using partial data from an individual can increase its predictive power thus aiding the interpretation of NIRS signals in individuals. At the same time such optimisation can also help refine model parametrisation and provide confidence intervals on model parameters. Discrepancies between model and data which persist despite model optimisation are used to flag up important questions concerning the underlying physiology, and the reliability and physiological meaning of the signals.

  17. Imaging speech production using fMRI.

    Science.gov (United States)

    Gracco, Vincent L; Tremblay, Pascale; Pike, Bruce

    2005-05-15

    Human speech is a well-learned, sensorimotor, and ecological behavior ideal for the study of neural processes and brain-behavior relations. With the advent of modern neuroimaging techniques such as positron emission tomography (PET) and functional magnetic resonance imaging (fMRI), the potential for investigating neural mechanisms of speech motor control, speech motor disorders, and speech motor development has increased. However, a practical issue has limited the application of fMRI to issues in spoken language production and other related behaviors (singing, swallowing). Producing these behaviors during volume acquisition introduces motion-induced signal changes that confound the activation signals of interest. A number of approaches, ranging from signal processing to using silent or covert speech, have attempted to remove or prevent the effects of motion-induced artefact. However, these approaches are flawed for a variety of reasons. An alternative approach, that has only recently been applied to study single-word production, uses pauses in volume acquisition during the production of natural speech motion. Here we present some representative data illustrating the problems associated with motion artefacts and some qualitative results acquired from subjects producing short sentences and orofacial nonspeech movements in the scanner. Using pauses or silent intervals in volume acquisition and block designs, results from individual subjects result in robust activation without motion-induced signal artefact. This approach is an efficient method for studying the neural basis of spoken language production and the effects of speech and language disorders using fMRI.

  18. Channel modeling, signal processing and coding for perpendicular magnetic recording

    Science.gov (United States)

    Wu, Zheng

    With the increasing areal density in magnetic recording systems, perpendicular recording has replaced longitudinal recording to overcome the superparamagnetic limit. Studies on perpendicular recording channels including aspects of channel modeling, signal processing and coding techniques are presented in this dissertation. To optimize a high density perpendicular magnetic recording system, one needs to know the tradeoffs between various components of the system including the read/write transducers, the magnetic medium, and the read channel. We extend the work by Chaichanavong on the parameter optimization for systems via design curves. Different signal processing and coding techniques are studied. Information-theoretic tools are utilized to determine the acceptable region for the channel parameters when optimal detection and linear coding techniques are used. Our results show that a considerable gain can be achieved by the optimal detection and coding techniques. The read-write process in perpendicular magnetic recording channels includes a number of nonlinear effects. Nonlinear transition shift (NLTS) is one of them. The signal distortion induced by NLTS can be reduced by write precompensation during data recording. We numerically evaluate the effect of NLTS on the read-back signal and examine the effectiveness of several write precompensation schemes in combating NLTS in a channel characterized by both transition jitter noise and additive white Gaussian electronics noise. We also present an analytical method to estimate the bit-error-rate and use it to help determine the optimal write precompensation values in multi-level precompensation schemes. We propose a mean-adjusted pattern-dependent noise predictive (PDNP) detection algorithm for use on the channel with NLTS. We show that this detector can offer significant improvements in bit-error-rate (BER) compared to conventional Viterbi and PDNP detectors. Moreover, the system performance can be further improved by

  19. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities... amends telecommunications relay services (TRS) mandatory minimum standards applicable to Speech- to...

  20. Underwater Signal Modeling for Subsurface Classification Using Computational Intelligence.

    Science.gov (United States)

    Setayeshi, Saeed

    In the thesis a method for underwater layered media (UWLM) modeling is proposed, and a simple nonlinear structure for implementation of this model based on the behaviour of its characteristics and the propagation of the acoustic signal in the media accounting for attenuation effects is designed. The model that responds to the acoustic input is employed to test the artificial intelligence classifiers ability. Neural network models, the basic principles of the back-propagation algorithm, and the Hopfield model of associative memories are reviewed, and they are employed to use min-max amplitude ranges of a reflected signal of UWLM based on attenuation effects, to define the classes of the synthetic data, detect its peak features and estimate parameters of the media. It has been found that there is a correlation between the number of layers in the media and the optimum number of nodes in the hidden layer of the neural networks. The integration of the result of the neural networks that classify and detect underwater layered media acoustic signals based on attenuation effects to prove the correspondence between the peak points and decay values has introduced a powerful tool for UWLM identification. The methods appear to have applications in replacing original system, for parameter estimation and output prediction in system identification by the proposed networks. The results of computerized simulation of the UWLM modeling in conjunction with the proposed neural networks training process are given. Fuzzy sets is an idea that allows representing and manipulating inexact concepts, fuzzy min-max pattern classification method, and the learning and recalling algorithms for fuzzy neural networks implementation is explained in this thesis. A fuzzy neural network that uses peak amplitude ranges to define classes is proposed and evaluated for UWLM pattern recognition. It is demonstrated to be able to classify the layered media data sets, and can distinguish between the peak points

  1. Source Separation via Spectral Masking for Speech Recognition Systems

    Directory of Open Access Journals (Sweden)

    Gustavo Fernandes Rodrigues

    2012-12-01

    Full Text Available In this paper we present an insight into the use of spectral masking techniques in time-frequency domain, as a preprocessing step for the speech signal recognition. Speech recognition systems have their performance negatively affected in noisy environments or in the presence of other speech signals. The limits of these masking techniques for different levels of the signal-to-noise ratio are discussed. We show the robustness of the spectral masking techniques against four types of noise: white, pink, brown and human speech noise (bubble noise. The main contribution of this work is to analyze the performance limits of recognition systems  using spectral masking. We obtain an increase of 18% on the speech hit rate, when the speech signals were corrupted by other speech signals or bubble noise, with different signal-to-noise ratio of approximately 1, 10 and 20 dB. On the other hand, applying the ideal binary masks to mixtures corrupted by white, pink and brown noise, results an average growth of 9% on the speech hit rate, with the same different signal-to-noise ratio. The experimental results suggest that the masking spectral techniques are more suitable for the case when it is applied a bubble noise, which is produced by human speech, than for the case of applying white, pink and brown noise.

  2. Expressive facial animation synthesis by learning speech coarticulation and expression spaces.

    Science.gov (United States)

    Deng, Zhigang; Neumann, Ulrich; Lewis, J P; Kim, Tae-Yong; Bulut, Murtaza; Narayanan, Shrikanth

    2006-01-01

    Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D motions of the markers on the face of a human subject are captured while he/she recites a predesigned corpus, with specific spoken and visual expressions. We present a novel motion capture mining technique that "learns" speech coarticulation models for diphones and triphones from the recorded data. A Phoneme-Independent Expression Eigenspace (PIEES) that encloses the dynamic expression signals is constructed by motion signal processing (phoneme-based time-warping and subtraction) and Principal Component Analysis (PCA) reduction. New expressive facial animations are synthesized as follows: First, the learned coarticulation models are concatenated to synthesize neutral visual speech according to novel speech input, then a texture-synthesis-based approach is used to generate a novel dynamic expression signal from the PIEES model, and finally the synthesized expression signal is blended with the synthesized neutral visual speech to create the final expressive facial animation. Our experiments demonstrate that the system can effectively synthesize realistic expressive facial animation.

  3. Speech Intelligibility Evaluation for Mobile Phones

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Cubick, Jens; Dau, Torsten

    2015-01-01

    In the development process of modern telecommunication systems, such as mobile phones, it is common practice to use computer models to objectively evaluate the transmission quality of the system, instead of time-consuming perceptual listening tests. Such models have typically focused on the quality...... of the transmitted speech, while little or no attention has been provided to speech intelligibility. The present study investigated to what extent three state-of-the art speech intelligibility models could predict the intelligibility of noisy speech transmitted through mobile phones. Sentences from the Danish...... Dantale II speech material were mixed with three different kinds of background noise, transmitted through three different mobile phones, and recorded at the receiver via a local network simulator. The speech intelligibility of the transmitted sentences was assessed by six normal-hearing listeners...

  4. Improved Open-Microphone Speech Recognition

    Science.gov (United States)

    Abrash, Victor

    2002-12-01

    dialog manager extra flexibility to recognize the signal with no audio gaps between recognition requests, as well as to rerecognize portions of the signal, or to rerecognize speech with different grammars, acoustic models, recognizers, start times, and so on. SRI expects that this new open-mic functionality will enable NASA to develop better error-correction mechanisms for spoken dialog systems, and may also enable new interaction strategies.

  5. Human neuromagnetic steady-state responses to amplitude-modulated tones, speech, and music.

    Science.gov (United States)

    Lamminmäki, Satu; Parkkonen, Lauri; Hari, Riitta

    2014-01-01

    Auditory steady-state responses that can be elicited by various periodic sounds inform about subcortical and early cortical auditory processing. Steady-state responses to amplitude-modulated pure tones have been used to scrutinize binaural interaction by frequency-tagging the two ears' inputs at different frequencies. Unlike pure tones, speech and music are physically very complex, as they include many frequency components, pauses, and large temporal variations. To examine the utility of magnetoencephalographic (MEG) steady-state fields (SSFs) in the study of early cortical processing of complex natural sounds, the authors tested the extent to which amplitude-modulated speech and music can elicit reliable SSFs. MEG responses were recorded to 90-s-long binaural tones, speech, and music, amplitude-modulated at 41.1 Hz at four different depths (25, 50, 75, and 100%). The subjects were 11 healthy, normal-hearing adults. MEG signals were averaged in phase with the modulation frequency, and the sources of the resulting SSFs were modeled by current dipoles. After the MEG recording, intelligibility of the speech, musical quality of the music stimuli, naturalness of music and speech stimuli, and the perceived deterioration caused by the modulation were evaluated on visual analog scales. The perceived quality of the stimuli decreased as a function of increasing modulation depth, more strongly for music than speech; yet, all subjects considered the speech intelligible even at the 100% modulation. SSFs were the strongest to tones and the weakest to speech stimuli; the amplitudes increased with increasing modulation depth for all stimuli. SSFs to tones were reliably detectable at all modulation depths (in all subjects in the right hemisphere, in 9 subjects in the left hemisphere) and to music stimuli at 50 to 100% depths, whereas speech usually elicited clear SSFs only at 100% depth.The hemispheric balance of SSFs was toward the right hemisphere for tones and speech, whereas

  6. Toddlers' recognition of noise-vocoded speech.

    Science.gov (United States)

    Newman, Rochelle; Chatterjee, Monita

    2013-01-01

    Despite their remarkable clinical success, cochlear-implant listeners today still receive spectrally degraded information. Much research has examined normally hearing adult listeners' ability to interpret spectrally degraded signals, primarily using noise-vocoded speech to simulate cochlear implant processing. Far less research has explored infants' and toddlers' ability to interpret spectrally degraded signals, despite the fact that children in this age range are frequently implanted. This study examines 27-month-old typically developing toddlers' recognition of noise-vocoded speech in a language-guided looking study. Children saw two images on each trial and heard a voice instructing them to look at one item ("Find the cat!"). Full-spectrum sentences or their noise-vocoded versions were presented with varying numbers of spectral channels. Toddlers showed equivalent proportions of looking to the target object with full-speech and 24- or 8-channel noise-vocoded speech; they failed to look appropriately with 2-channel noise-vocoded speech and showed variable performance with 4-channel noise-vocoded speech. Despite accurate looking performance for speech with at least eight channels, children were slower to respond appropriately as the number of channels decreased. These results indicate that 2-yr-olds have developed the ability to interpret vocoded speech, even without practice, but that doing so requires additional processing. These findings have important implications for pediatric cochlear implantation.

  7. Speech perception as an active cognitive process

    Directory of Open Access Journals (Sweden)

    Shannon eHeald

    2014-03-01

    Full Text Available One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processingd with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or

  8. Speech Entrainment Compensates for Broca's Area Damage

    Science.gov (United States)

    Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris

    2015-01-01

    Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to speech entrainment. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during speech entrainment versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of speech entrainment to improve speech production and may help select patients for speech entrainment treatment. PMID:25989443

  9. Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

    Science.gov (United States)

    Kayasith, Prakasith; Theeramunkong, Thanaruk

    It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.

  10. Analysis and Generation of Logical Signals for Discrete Events Behavioral Modeling

    OpenAIRE

    Campos-Rebelo, Rogério; Costa, Anikó; Gomes, Luis

    2015-01-01

    Part 6: Embedded Systems; International audience; This paper presents a proposal for structuring logical signals for discrete events behavioral modeling. Graphical formalisms will be used to illustrate applicability of the proposed techniques. Input logical signals are generated based on the analysis of physical signals coming from the environment and other input logical signals. Their analysis is introduced in the flow of the input signal analysis, defined as pre-processing when using a mode...

  11. Voice Activity Detection for Speech Enhancement Applications

    Directory of Open Access Journals (Sweden)

    E. Verteletskaya

    2010-01-01

    Full Text Available This paper describes a study of noise-robust voice activity detection (VAD utilizing the periodicity of the signal, full band signal energy and high band to low band signal energy ratio. Conventional VADs are sensitive to a variably noisy environment especially with low SNR, and also result in cutting off unvoiced regions of speech as well as random oscillating of output VAD decisions. To overcome these problems, the proposed algorithm first identifies voiced regions of speech and then differentiates unvoiced regions from silence or background noise using the energy ratio and total signal energy. The performance of the proposed VAD algorithm is tested on real speech signals. Comparisons confirm that the proposed VAD algorithm outperforms the conventional VAD algorithms, especially in the presence of background noise.

  12. Automatic transcription of continuous speech into syllable-like units ...

    Indian Academy of Sciences (India)

    Abstract. The focus of this paper is to automatically segment and label continu- ous speech signal into syllable-like units for Indian languages. In this approach, the continuous speech signal is first automatically segmented into syllable-like units using group delay based algorithm. Similar syllable segments are then grouped.

  13. Numerical modelling of the pump-to-signal relative intensity noise ...

    Indian Academy of Sciences (India)

    , pump depletion as ... is considered as detrimental in communication systems as it deteriorates the signal wave and affects the device ... In this paper, a comprehensive numerical model for investigating the pump-to-signal. RIN transfer in 2-P ...

  14. Interdependent processing and encoding of speech and concurrent background noise.

    Science.gov (United States)

    Cooper, Angela; Brouwer, Susanne; Bradlow, Ann R

    2015-05-01

    Speech processing can often take place in adverse listening conditions that involve the mixing of speech and background noise. In this study, we investigated processing dependencies between background noise and indexical speech features, using a speeded classification paradigm (Garner, 1974; Exp. 1), and whether background noise is encoded and represented in memory for spoken words in a continuous recognition memory paradigm (Exp. 2). Whether or not the noise spectrally overlapped with the speech signal was also manipulated. The results of Experiment 1 indicated that background noise and indexical features of speech (gender, talker identity) cannot be completely segregated during processing, even when the two auditory streams are spectrally nonoverlapping. Perceptual interference was asymmetric, whereby irrelevant indexical feature variation in the speech signal slowed noise classification to a greater extent than irrelevant noise variation slowed speech classification. This asymmetry may stem from the fact that speech features have greater functional relevance to listeners, and are thus more difficult to selectively ignore than background noise. Experiment 2 revealed that a recognition cost for words embedded in different types of background noise on the first and second occurrences only emerged when the noise and the speech signal were spectrally overlapping. Together, these data suggest integral processing of speech and background noise, modulated by the level of processing and the spectral separation of the speech and noise.

  15. Recent advances in nonlinear speech processing

    CERN Document Server

    Faundez-Zanuy, Marcos; Esposito, Antonietta; Cordasco, Gennaro; Drugman, Thomas; Solé-Casals, Jordi; Morabito, Francesco

    2016-01-01

    This book presents recent advances in nonlinear speech processing beyond nonlinear techniques. It shows that it exploits heuristic and psychological models of human interaction in order to succeed in the implementations of socially believable VUIs and applications for human health and psychological support. The book takes into account the multifunctional role of speech and what is “outside of the box” (see Björn Schuller’s foreword). To this aim, the book is organized in 6 sections, each collecting a small number of short chapters reporting advances “inside” and “outside” themes related to nonlinear speech research. The themes emphasize theoretical and practical issues for modelling socially believable speech interfaces, ranging from efforts to capture the nature of sound changes in linguistic contexts and the timing nature of speech; labors to identify and detect speech features that help in the diagnosis of psychological and neuronal disease, attempts to improve the effectiveness and performa...

  16. Sensorimotor Representation of Speech Perception. Cross-Decoding of Place of Articulation Features during Selective Attention to Syllables in 7T fMRI

    NARCIS (Netherlands)

    Archila-Meléndez, Mario E.; Valente, Giancarlo; Correia, Joao M.; Rouhl, Rob P. W.; van Kranen-Mastenbroek, Vivianne H.; Jansma, Bernadette M.

    2018-01-01

    Sensorimotor integration, the translation between acoustic signals and motoric programs, may constitute a crucial mechanism for speech. During speech perception, the acoustic-motoric translations include the recruitment of cortical areas for the representation of speech articulatory features, such

  17. Modeling auditory processing and speech perception in hearing-impaired listeners

    DEFF Research Database (Denmark)

    Jepsen, Morten Løve

    in the inner ear, or cochlea. The model was shown to account for various aspects of spectro-temporal processing and perception in tasks of intensity discrimination, tone-in-noise detection, forward masking, spectral masking and amplitude modulation detection. Secondly, a series of experiments was performed......-output functions, frequency selectivity, intensity discrimination limens and effects of simultaneous- and forward masking. Part of the measured data was used to adjust the parameters of the stages in the model, that simulate the cochlear processing. The remaining data were used to evaluate the fitted models....... It was shown that most observations in the measured consonant discrimination error patterns were predicted by the model, although error rates were systematically underestimated by the model in few particular acoustic-phonetic features. These results reflect a relation between basic auditory processing deficits...

  18. Studying overt word reading and speech production with event-related fMRI: a method for detecting, assessing, and correcting articulation-induced signal changes and for measuring onset time and duration of articulation.

    Science.gov (United States)

    Huang, Jie; Francis, Andrea P; Carr, Thomas H

    2008-01-01

    A quantitative method is introduced for detecting and correcting artifactual signal changes in BOLD time series data arising from the magnetic field warping caused by motion of the articulatory apparatus when speaking aloud, with extensions to detection of subvocal articulatory activity during silent reading. Whole-head images allow the large, spike-like signal changes from the moving tongue and other components of the articulatory apparatus to be detected and localized in time, providing a measure of the time of vocalization onset, the vocalization duration, and also an estimate of the magnitude and shape of the signal change resulting from motion. Data from brain voxels are then examined during the vocalization period, and statistical outliers corresponding to contamination from articulatory motion are removed and replaced by linear interpolation from adjacent, uncontaminated data points. This quantitative approach to cleansing brain time series data of articulatory-motion-induced artifact is combined with a pre-scanning training regimen that reduces gross head movement during reading aloud to the levels observed during reading silently, which can be corrected with available image registration techniques. The combination of quantitative analysis of articulatory motion artifacts and pre-scanning training makes possible a much wider range of tasks involving overt speech than are currently being used in fMRI studies of language and cognition, as well as characterization of subvocal movements of the articulatory apparatus that are relevant to theories of reading skill, verbal rehearsal in working memory, and problem solving.

  19. Non-fluent speech following stroke is caused by impaired efference copy.

    Science.gov (United States)

    Feenaughty, Lynda; Basilakos, Alexandra; Bonilha, Leonardo; den Ouden, Dirk-Bart; Rorden, Chris; Stark, Brielle; Fridriksson, Julius

    2017-09-01

    Efference copy is a cognitive mechanism argued to be critical for initiating and monitoring speech: however, the extent to which breakdown of efference copy mechanisms impact speech production is unclear. This study examined the best mechanistic predictors of non-fluent speech among 88 stroke survivors. Objective speech fluency measures were subjected to a principal component analysis (PCA). The primary PCA factor was then entered into a multiple stepwise linear regression analysis as the dependent variable, with a set of independent mechanistic variables. Participants' ability to mimic audio-visual speech ("speech entrainment response") was the best independent predictor of non-fluent speech. We suggest that this "speech entrainment" factor reflects integrity of internal monitoring (i.e., efference copy) of speech production, which affects speech initiation and maintenance. Results support models of normal speech production and suggest that therapy focused on speech initiation and maintenance may improve speech fluency for individuals with chronic non-fluent aphasia post stroke.

  20. Measures of metacognition on signal-detection theoretic models.

    Science.gov (United States)

    Barrett, Adam B; Dienes, Zoltan; Seth, Anil K

    2013-12-01

    Analyzing metacognition, specifically knowledge of accuracy of internal perceptual, memorial, or other knowledge states, is vital for many strands of psychology, including determining the accuracy of feelings of knowing and discriminating conscious from unconscious cognition. Quantifying metacognitive sensitivity is however more challenging than quantifying basic stimulus sensitivity. Under popular signal-detection theory (SDT) models for stimulus classification tasks, approaches based on Type II receiver-operating characteristic (ROC) curves or Type II d-prime risk confounding metacognition with response biases in either the Type I (classification) or Type II (metacognitive) tasks. A new approach introduces meta-d': The Type I d-prime that would have led to the observed Type II data had the subject used all the Type I information. Here, we (a) further establish the inconsistency of the Type II d-prime and ROC approaches with new explicit analyses of the standard SDT model and (b) analyze, for the first time, the behavior of meta-d' under nontrivial scenarios, such as when metacognitive judgments utilize enhanced or degraded versions of the Type I evidence. Analytically, meta-d' values typically reflect the underlying model well and are stable under changes in decision criteria; however, in relatively extreme cases, meta-d' can become unstable. We explore bias and variance of in-sample measurements of meta-d' and supply MATLAB code for estimation in general cases. Our results support meta-d' as a useful measure of metacognition and provide rigorous methodology for its application. Our recommendations are useful for any researchers interested in assessing metacognitive accuracy. PsycINFO Database Record (c) 2014 APA, all rights reserved.