WorldWideScience

Sample records for modeling speech signals

  1. Enhancement of speech signals - with a focus on voiced speech models

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie

    This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based...... on this model. The basic model used in this thesis is the harmonic model which is a commonly used model for describing the voiced part of the speech signal. We show that it can be beneficial to extend the model to take inharmonicities or the non-stationarity of speech into account. Extending the model...

  2. A speech production model including the nasal Cavity: A novel approach to articulatory analysis of speech signals

    DEFF Research Database (Denmark)

    Olesen, Morten

    In order to obtain articulatory analysis of speech production the model is improved. the standard model, as used in LPC analysis, to a large extent only models the acoustic properties of speech signal as opposed to articulatory modelling of the speech production. In spite of this the LPC model...... is by far the most widely used model in speech technology....

  3. Mathematical modeling and signal processing in speech and hearing sciences

    CERN Document Server

    Xin, Jack

    2014-01-01

    The aim of the book is to give an accessible introduction of mathematical models and signal processing methods in speech and hearing sciences for senior undergraduate and beginning graduate students with basic knowledge of linear algebra, differential equations, numerical analysis, and probability. Speech and hearing sciences are fundamental to numerous technological advances of the digital world in the past decade, from music compression in MP3 to digital hearing aids, from network based voice enabled services to speech interaction with mobile phones. Mathematics and computation are intimately related to these leaps and bounds. On the other hand, speech and hearing are strongly interdisciplinary areas where dissimilar scientific and engineering publications and approaches often coexist and make it difficult for newcomers to enter.

  4. Phonetic perspectives on modelling information in the speech signal

    Indian Academy of Sciences (India)

    Centre for Music and Science, Faculty of Music, University of Cambridge,. Cambridge .... However, to develop systems that can han- .... 1.2a Phonemes are not clearly identifiable in movement or in the acoustic speech signal: As ..... while the speaker role-played the part of a mother at a child's athletics meeting where the.

  5. Modeling speech intelligibility based on the signal-to-noise envelope power ratio

    DEFF Research Database (Denmark)

    Jørgensen, Søren

    of modulation frequency selectivity in the auditory processing of sound with a decision metric for intelligibility that is based on the signal-to-noise envelope power ratio (SNRenv). The proposed speech-based envelope power spectrum model (sEPSM) is demonstrated to account for the effects of stationary...... through three commercially available mobile phones. The model successfully accounts for the performance across the phones in conditions with a stationary speech-shaped background noise, whereas deviations were observed in conditions with “Traffic” and “Pub” noise. Overall, the results of this thesis...

  6. Model-Based Speech Signal Coding Using Optimized Temporal Decomposition for Storage and Broadcasting Applications

    Science.gov (United States)

    Athaudage, Chandranath R. N.; Bradley, Alan B.; Lech, Margaret

    2003-12-01

    A dynamic programming-based optimization strategy for a temporal decomposition (TD) model of speech and its application to low-rate speech coding in storage and broadcasting is presented. In previous work with the spectral stability-based event localizing (SBEL) TD algorithm, the event localization was performed based on a spectral stability criterion. Although this approach gave reasonably good results, there was no assurance on the optimality of the event locations. In the present work, we have optimized the event localizing task using a dynamic programming-based optimization strategy. Simulation results show that an improved TD model accuracy can be achieved. A methodology of incorporating the optimized TD algorithm within the standard MELP speech coder for the efficient compression of speech spectral information is also presented. The performance evaluation results revealed that the proposed speech coding scheme achieves 50%-60% compression of speech spectral information with negligible degradation in the decoded speech quality.

  7. Robust digital processing of speech signals

    CERN Document Server

    Kovacevic, Branko; Veinović, Mladen; Marković, Milan

    2017-01-01

    This book focuses on speech signal phenomena, presenting a robustification of the usual speech generation models with regard to the presumed types of excitation signals, which is equivalent to the introduction of a class of nonlinear models and the corresponding criterion functions for parameter estimation. Compared to the general class of nonlinear models, such as various neural networks, these models possess good properties of controlled complexity, the option of working in “online” mode, as well as a low information volume for efficient speech encoding and transmission. Providing comprehensive insights, the book is based on the authors’ research, which has already been published, supplemented by additional texts discussing general considerations of speech modeling, linear predictive analysis and robust parameter estimation.

  8. Automatic Smoker Detection from Telephone Speech Signals

    DEFF Research Database (Denmark)

    Poorjam, Amir Hossein; Hesaraki, Soheila; Safavi, Saeid

    2017-01-01

    This paper proposes an automatic smoking habit detection from spontaneous telephone speech signals. In this method, each utterance is modeled using i-vector and non-negative factor analysis (NFA) frameworks, which yield low-dimensional representation of utterances by applying factor analysis...... method is evaluated on telephone speech signals of speakers whose smoking habits are known drawn from the National Institute of Standards and Technology (NIST) 2008 and 2010 Speaker Recognition Evaluation databases. Experimental results over 1194 utterances show the effectiveness of the proposed approach...... for the automatic smoking habit detection task....

  9. Negative blood oxygen level dependent signals during speech comprehension.

    Science.gov (United States)

    Rodriguez Moreno, Diana; Schiff, Nicholas D; Hirsch, Joy

    2015-05-01

    Speech comprehension studies have generally focused on the isolation and function of regions with positive blood oxygen level dependent (BOLD) signals with respect to a resting baseline. Although regions with negative BOLD signals in comparison to a resting baseline have been reported in language-related tasks, their relationship to regions of positive signals is not fully appreciated. Based on the emerging notion that the negative signals may represent an active function in language tasks, the authors test the hypothesis that negative BOLD signals during receptive language are more associated with comprehension than content-free versions of the same stimuli. Regions associated with comprehension of speech were isolated by comparing responses to passive listening to natural speech to two incomprehensible versions of the same speech: one that was digitally time reversed and one that was muffled by removal of high frequencies. The signal polarity was determined by comparing the BOLD signal during each speech condition to the BOLD signal during a resting baseline. As expected, stimulation-induced positive signals relative to resting baseline were observed in the canonical language areas with varying signal amplitudes for each condition. Negative BOLD responses relative to resting baseline were observed primarily in frontoparietal regions and were specific to the natural speech condition. However, the BOLD signal remained indistinguishable from baseline for the unintelligible speech conditions. Variations in connectivity between brain regions with positive and negative signals were also specifically related to the comprehension of natural speech. These observations of anticorrelated signals related to speech comprehension are consistent with emerging models of cooperative roles represented by BOLD signals of opposite polarity.

  10. A speech production model including the nasal Cavity

    DEFF Research Database (Denmark)

    Olesen, Morten

    In order to obtain articulatory analysis of speech production the model is improved. the standard model, as used in LPC analysis, to a large extent only models the acoustic properties of speech signal as opposed to articulatory modelling of the speech production. In spite of this the LPC model...... is by far the most widely used model in speech technology....

  11. Acquirement and enhancement of remote speech signals

    Science.gov (United States)

    Lü, Tao; Guo, Jin; Zhang, He-yong; Yan, Chun-hui; Wang, Can-jin

    2017-07-01

    To address the challenges of non-cooperative and remote acoustic detection, an all-fiber laser Doppler vibrometer (LDV) is established. The all-fiber LDV system can offer the advantages of smaller size, lightweight design and robust structure, hence it is a better fit for remote speech detection. In order to improve the performance and the efficiency of LDV for long-range hearing, the speech enhancement technology based on optimally modified log-spectral amplitude (OM-LSA) algorithm is used. The experimental results show that the comprehensible speech signals within the range of 150 m can be obtained by the proposed LDV. The signal-to-noise ratio ( SNR) and mean opinion score ( MOS) of the LDV speech signal can be increased by 100% and 27%, respectively, by using the speech enhancement technology. This all-fiber LDV, which combines the speech enhancement technology, can meet the practical demand in engineering.

  12. Effects of human fatigue on speech signals

    Science.gov (United States)

    Stamoulis, Catherine

    2004-05-01

    Cognitive performance may be significantly affected by fatigue. In the case of critical personnel, such as pilots, monitoring human fatigue is essential to ensure safety and success of a given operation. One of the modalities that may be used for this purpose is speech, which is sensitive to respiratory changes and increased muscle tension of vocal cords, induced by fatigue. Age, gender, vocal tract length, physical and emotional state may significantly alter speech intensity, duration, rhythm, and spectral characteristics. In addition to changes in speech rhythm, fatigue may also affect the quality of speech, such as articulation. In a noisy environment, detecting fatigue-related changes in speech signals, particularly subtle changes at the onset of fatigue, may be difficult. Therefore, in a performance-monitoring system, speech parameters which are significantly affected by fatigue need to be identified and extracted from input signals. For this purpose, a series of experiments was performed under slowly varying cognitive load conditions and at different times of the day. The results of the data analysis are presented here.

  13. Hidden Markov models in automatic speech recognition

    Science.gov (United States)

    Wrzoskowicz, Adam

    1993-11-01

    This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.

  14. A Model for the representation of Speech Signals in Normal and Impaired Ears

    DEFF Research Database (Denmark)

    Christiansen, Thomas Ulrich

    2004-01-01

    hearing was modelled as a combination of outer- and inner hair cell loss. The percentage of dead inner hair cells was calculated based on a new computational method relating auditory nerve fibre thresholds to behavioural thresholds. Finally, a model of the entire auditory nerve fibre population......A model of human auditory periphery, ranging from the outer ear to the auditory nerve, was developed. The model consists of the following components: outer ear transfer function, middle ear transfer function, basilar membrane velocity, inner hair cell receptor potential, inner hair cell probability...... of neurotransmitter release and auditory nerve fibre refractoriness. The model builds on previously published models, however, parameters for basilar membrane velocity and inner hair cell probability of neurotransmitter release were successfully fitted to model data from psychophysical and physiological data...

  15. Modeling speech intelligibility in adverse conditions

    DEFF Research Database (Denmark)

    Dau, Torsten

    2012-01-01

    ) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting...... understanding speech when more than one person is talking, even when reduced audibility has been fully compensated for by a hearing aid. The reasons for these difficulties are not well understood. This presentation highlights recent concepts of the monaural and binaural signal processing strategies employed...... by the normal as well as impaired auditory system. Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII...

  16. Modelling speech intelligibility in adverse conditions

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    Jørgensen and Dau (J Acoust Soc Am 130:1475-1487, 2011) proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII) in conditions with nonlinearly processed speech...... subjected to phase jitter, a condition in which the spectral structure of the intelligibility of speech signal is strongly affected, while the broadband temporal envelope is kept largely intact. In contrast, the effects of this distortion can be predicted -successfully by the spectro-temporal modulation...... suggest that the SNRenv might reflect a powerful decision metric, while some explicit across-frequency analysis seems crucial in some conditions. How such across-frequency analysis is "realized" in the auditory system remains unresolved....

  17. The speech signal segmentation algorithm using pitch synchronous analysis

    Directory of Open Access Journals (Sweden)

    Amirgaliyev Yedilkhan

    2017-03-01

    Full Text Available Parameterization of the speech signal using the algorithms of analysis synchronized with the pitch frequency is discussed. Speech parameterization is performed by the average number of zero transitions function and the signal energy function. Parameterization results are used to segment the speech signal and to isolate the segments with stable spectral characteristics. Segmentation results can be used to generate a digital voice pattern of a person or be applied in the automatic speech recognition. Stages needed for continuous speech segmentation are described.

  18. Automatic Speech Recognition from Neural Signals: A Focused Review

    Directory of Open Access Journals (Sweden)

    Christian Herff

    2016-09-01

    Full Text Available Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e.~patients suffering from locked-in syndrome. For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people.This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography. As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the emph{Brain-to-text} system.

  19. Modeling Speech Intelligibility in Hearing Impaired Listeners

    DEFF Research Database (Denmark)

    Scheidiger, Christoph; Jørgensen, Søren; Dau, Torsten

    2014-01-01

    speech, e.g. phase jitter or spectral subtraction. Recent studies predict SI for normal-hearing (NH) listeners based on a signal-to-noise ratio measure in the envelope domain (SNRenv), in the framework of the speech-based envelope power spectrum model (sEPSM, [20, 21]). These models have shown good...... agreement with measured data under a broad range of conditions, including stationary and modulated interferers, reverberation, and spectral subtraction. Despite the advances in modeling intelligibility in NH listeners, a broadly applicable model that can predict SI in hearing-impaired (HI) listeners...... is not yet available. As a firrst step towards such a model, this study investigates to what extent eects of hearing impairment on SI can be modeled in the sEPSM framework. Preliminary results show that, by only modeling the loss of audibility, the model cannot account for the higher speech reception...

  20. Random Deep Belief Networks for Recognizing Emotions from Speech Signals.

    Science.gov (United States)

    Wen, Guihua; Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang

    2017-01-01

    Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.

  1. Speech spectrum envelope modeling

    Czech Academy of Sciences Publication Activity Database

    Vích, Robert; Vondra, Martin

    Vol. 4775, - (2007), s. 129-137 ISSN 0302-9743. [COST Action 2102 International Workshop. Vietri sul Mare, 29.03.2007-31.03.2007] R&D Projects: GA AV ČR(CZ) 1ET301710509 Institutional research plan: CEZ:AV0Z20670512 Keywords : speech * speech processing * cepstral analysis Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering Impact factor: 0.302, year: 2005

  2. Start/End Delays of Voiced and Unvoiced Speech Signals

    Energy Technology Data Exchange (ETDEWEB)

    Herrnstein, A

    1999-09-24

    Recent experiments using low power EM-radar like sensors (e.g, GEMs) have demonstrated a new method for measuring vocal fold activity and the onset times of voiced speech, as vocal fold contact begins to take place. Similarly the end time of a voiced speech segment can be measured. Secondly it appears that in most normal uses of American English speech, unvoiced-speech segments directly precede or directly follow voiced-speech segments. For many applications, it is useful to know typical duration times of these unvoiced speech segments. A corpus, assembled earlier of spoken ''Timit'' words, phrases, and sentences and recorded using simultaneously measured acoustic and EM-sensor glottal signals, from 16 male speakers, was used for this study. By inspecting the onset (or end) of unvoiced speech, using the acoustic signal, and the onset (or end) of voiced speech using the EM sensor signal, the average duration times for unvoiced segments preceding onset of vocalization were found to be 300ms, and for following segments, 500ms. An unvoiced speech period is then defined in time, first by using the onset of the EM-sensed glottal signal, as the onset-time marker for the voiced speech segment and end marker for the unvoiced segment. Then, by subtracting 300ms from the onset time mark of voicing, the unvoiced speech segment start time is found. Similarly, the times for a following unvoiced speech segment can be found. While data of this nature have proven to be useful for work in our laboratory, a great deal of additional work remains to validate such data for use with general populations of users. These procedures have been useful for applying optimal processing algorithms over time segments of unvoiced, voiced, and non-speech acoustic signals. For example, these data appear to be of use in speaker validation, in vocoding, and in denoising algorithms.

  3. Epoch-based analysis of speech signals

    Indian Academy of Sciences (India)

    on speech production characteristics, but also helps in accurate analysis of speech. .... include time delay estimation, speech enhancement from single and multi- ...... log. (. E[k]. ∑K−1 l=0. E[l]. ) ,. (7) where K is the number of samples in the ...

  4. Robust signal selection for lineair prediction analysis of voiced speech

    NARCIS (Netherlands)

    Ma, C.; Kamp, Y.; Willems, L.F.

    1993-01-01

    This paper investigates a weighted LPC analysis of voiced speech. In view of the speech production model, the weighting function is either chosen to be the short-time energy function of the preemphasized speech sample sequence with certain delays or is obtained by thresholding the short-time energy

  5. A homology sound-based algorithm for speech signal interference

    Science.gov (United States)

    Jiang, Yi-jiao; Chen, Hou-jin; Li, Ju-peng; Zhang, Zhan-song

    2015-12-01

    Aiming at secure analog speech communication, a homology sound-based algorithm for speech signal interference is proposed in this paper. We first split speech signal into phonetic fragments by a short-term energy method and establish an interference noise cache library with the phonetic fragments. Then we implement the homology sound interference by mixing the randomly selected interferential fragments and the original speech in real time. The computer simulation results indicated that the interference produced by this algorithm has advantages of real time, randomness, and high correlation with the original signal, comparing with the traditional noise interference methods such as white noise interference. After further studies, the proposed algorithm may be readily used in secure speech communication.

  6. Predicting speech intelligibility in adverse conditions: evaluation of the speech-based envelope power spectrum model

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2011-01-01

    conditions by comparing predictions to measured data from [Kjems et al. (2009). J. Acoust. Soc. Am. 126 (3), 1415-1426] where speech is mixed with four different interferers, including speech-shaped noise, bottle noise, car noise, and cafe noise. The model accounts well for the differences in intelligibility......The speech-based envelope power spectrum model (sEPSM) [Jørgensen and Dau (2011). J. Acoust. Soc. Am., 130 (3), 1475–1487] estimates the envelope signal-to-noise ratio (SNRenv) of distorted speech and accurately describes the speech recognition thresholds (SRT) for normal-hearing listeners...... observed for the different interferers. None of the standardized models successfully describe these data....

  7. Pitch Synchronous Segmentation of Speech Signals

    Data.gov (United States)

    National Aeronautics and Space Administration — The Pitch Synchronous Segmentation (PSS) that accelerates speech without changing its fundamental frequency method could be applied and evaluated for use at NASA....

  8. Random Deep Belief Networks for Recognizing Emotions from Speech Signals

    Directory of Open Access Journals (Sweden)

    Guihua Wen

    2017-01-01

    Full Text Available Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.

  9. Histogram Equalization to Model Adaptation for Robust Speech Recognition

    Directory of Open Access Journals (Sweden)

    Suh Youngjoo

    2010-01-01

    Full Text Available We propose a new model adaptation method based on the histogram equalization technique for providing robustness in noisy environments. The trained acoustic mean models of a speech recognizer are adapted into environmentally matched conditions by using the histogram equalization algorithm on a single utterance basis. For more robust speech recognition in the heavily noisy conditions, trained acoustic covariance models are efficiently adapted by the signal-to-noise ratio-dependent linear interpolation between trained covariance models and utterance-level sample covariance models. Speech recognition experiments on both the digit-based Aurora2 task and the large vocabulary-based task showed that the proposed model adaptation approach provides significant performance improvements compared to the baseline speech recognizer trained on the clean speech data.

  10. What Information Is Necessary for Speech Categorization? Harnessing Variability in the Speech Signal by Integrating Cues Computed Relative to Expectations

    Science.gov (United States)

    McMurray, Bob; Jongman, Allard

    2011-01-01

    Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important are the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context dependent. This study assessed the…

  11. Automatic speech signal segmentation based on the innovation adaptive filter

    Directory of Open Access Journals (Sweden)

    Makowski Ryszard

    2014-06-01

    Full Text Available Speech segmentation is an essential stage in designing automatic speech recognition systems and one can find several algorithms proposed in the literature. It is a difficult problem, as speech is immensely variable. The aim of the authors’ studies was to design an algorithm that could be employed at the stage of automatic speech recognition. This would make it possible to avoid some problems related to speech signal parametrization. Posing the problem in such a way requires the algorithm to be capable of working in real time. The only such algorithm was proposed by Tyagi et al., (2006, and it is a modified version of Brandt’s algorithm. The article presents a new algorithm for unsupervised automatic speech signal segmentation. It performs segmentation without access to information about the phonetic content of the utterances, relying exclusively on second-order statistics of a speech signal. The starting point for the proposed method is time-varying Schur coefficients of an innovation adaptive filter. The Schur algorithm is known to be fast, precise, stable and capable of rapidly tracking changes in second order signal statistics. A transfer from one phoneme to another in the speech signal always indicates a change in signal statistics caused by vocal track changes. In order to allow for the properties of human hearing, detection of inter-phoneme boundaries is performed based on statistics defined on the mel spectrum determined from the reflection coefficients. The paper presents the structure of the algorithm, defines its properties, lists parameter values, describes detection efficiency results, and compares them with those for another algorithm. The obtained segmentation results, are satisfactory.

  12. Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders

    CERN Document Server

    Baghai-Ravary, Ladan

    2013-01-01

    Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders provides a survey of methods designed to aid clinicians in the diagnosis and monitoring of speech disorders such as dysarthria and dyspraxia, with an emphasis on the signal processing techniques, statistical validity of the results presented in the literature, and the appropriateness of methods that do not require specialized equipment, rigorously controlled recording procedures or highly skilled personnel to interpret results. Such techniques offer the promise of a simple and cost-effective, yet objective, assessment of a range of medical conditions, which would be of great value to clinicians. The ideal scenario would begin with the collection of examples of the clients’ speech, either over the phone or using portable recording devices operated by non-specialist nursing staff. The recordings could then be analyzed initially to aid diagnosis of conditions, and subsequently to monitor the clients’ progress and res...

  13. Signal-to-Signal Ratio Independent Speaker Identification for Co-channel Speech Signals

    DEFF Research Database (Denmark)

    Saeidi, Rahim; Mowlaee, Pejman; Kinnunen, Tomi

    2010-01-01

    In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is recorded by one microphone only. The goal is to identify both of the speakers from their mixed signal. High recognition accuracies have already been reported when an accurately...

  14. Applications of Hilbert Spectral Analysis for Speech and Sound Signals

    Science.gov (United States)

    Huang, Norden E.

    2003-01-01

    A new method for analyzing nonlinear and nonstationary data has been developed, and the natural applications are to speech and sound signals. The key part of the method is the Empirical Mode Decomposition method with which any complicated data set can be decomposed into a finite and often small number of Intrinsic Mode Functions (IMF). An IMF is defined as any function having the same numbers of zero-crossing and extrema, and also having symmetric envelopes defined by the local maxima and minima respectively. The IMF also admits well-behaved Hilbert transform. This decomposition method is adaptive, and, therefore, highly efficient. Since the decomposition is based on the local characteristic time scale of the data, it is applicable to nonlinear and nonstationary processes. With the Hilbert transform, the Intrinsic Mode Functions yield instantaneous frequencies as functions of time, which give sharp identifications of imbedded structures. This method invention can be used to process all acoustic signals. Specifically, it can process the speech signals for Speech synthesis, Speaker identification and verification, Speech recognition, and Sound signal enhancement and filtering. Additionally, as the acoustical signals from machinery are essentially the way the machines are talking to us. Therefore, the acoustical signals, from the machines, either from sound through air or vibration on the machines, can tell us the operating conditions of the machines. Thus, we can use the acoustic signal to diagnosis the problems of machines.

  15. Monitoring the Simultaneous Presentation of Multiple Spatialized Speech Signals in the Free Field

    National Research Council Canada - National Science Library

    Nelson, W. T; Bolia, Robert S; Ericson, Mark A; McKinley, Richard L

    1998-01-01

    .... Factorial combinations of three variables, including the number of localized speech signals, the location of the speech signals along the horizontal plane, and the sex of the talker were employed...

  16. A variational EM method for pole-zero modeling of speech with mixed block sparse and Gaussian excitation

    DEFF Research Database (Denmark)

    Shi, Liming; Nielsen, Jesper Kjær; Jensen, Jesper Rindom

    2017-01-01

    The modeling of speech can be used for speech synthesis and speech recognition. We present a speech analysis method based on pole-zero modeling of speech with mixed block sparse and Gaussian excitation. By using a pole-zero model, instead of the all-pole model, a better spectral fitting can...... be expected. Moreover, motivated by the block sparse glottal flow excitation during voiced speech and the white noise excitation for unvoiced speech, we model the excitation sequence as a combination of block sparse signals and white noise. A variational EM (VEM) method is proposed for estimating...... in reconstructing of the block sparse excitation....

  17. Tools for signal compression applications to speech and audio coding

    CERN Document Server

    Moreau, Nicolas

    2013-01-01

    This book presents tools and algorithms required to compress/uncompress signals such as speech and music. These algorithms are largely used in mobile phones, DVD players, HDTV sets, etc. In a first rather theoretical part, this book presents the standard tools used in compression systems: scalar and vector quantization, predictive quantization, transform quantization, entropy coding. In particular we show the consistency between these different tools. The second part explains how these tools are used in the latest speech and audio coders. The third part gives Matlab programs simulating t

  18. On the Influence of Inharmonicities in Model-Based Speech Enhancement

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2013-01-01

    In relation to speech enhancement, we study the influence of modifying the harmonic signal model for voiced speech to include small perturbations in the frequencies of the harmonics. A perturbed signal model is incorporated in the nonlinear least squares method, the Capon filter and the amplitude...

  19. Detecting Parkinson's disease from sustained phonation and speech signals.

    Directory of Open Access Journals (Sweden)

    Evaldas Vaiciukynas

    Full Text Available This study investigates signals from sustained phonation and text-dependent speech modalities for Parkinson's disease screening. Phonation corresponds to the vowel /a/ voicing task and speech to the pronunciation of a short sentence in Lithuanian language. Signals were recorded through two channels simultaneously, namely, acoustic cardioid (AC and smart phone (SP microphones. Additional modalities were obtained by splitting speech recording into voiced and unvoiced parts. Information in each modality is summarized by 18 well-known audio feature sets. Random forest (RF is used as a machine learning algorithm, both for individual feature sets and for decision-level fusion. Detection performance is measured by the out-of-bag equal error rate (EER and the cost of log-likelihood-ratio. Essentia audio feature set was the best using the AC speech modality and YAAFE audio feature set was the best using the SP unvoiced modality, achieving EER of 20.30% and 25.57%, respectively. Fusion of all feature sets and modalities resulted in EER of 19.27% for the AC and 23.00% for the SP channel. Non-linear projection of a RF-based proximity matrix into the 2D space enriched medical decision support by visualization.

  20. Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language

    Science.gov (United States)

    Narayanan, Shrikanth; Georgiou, Panayiotis G.

    2013-01-01

    The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion. PMID:24039277

  1. Modeling binaural signal detection

    NARCIS (Netherlands)

    Breebaart, D.J.

    2001-01-01

    With the advent of multimedia technology and powerful signal processing systems, audio processing and reproduction has gained renewed interest. Examples of products that have been developed are audio coding algorithms to efficiently store and transmit music and speech, or audio reproduction systems

  2. An ALE meta-analysis on the audiovisual integration of speech signals.

    Science.gov (United States)

    Erickson, Laura C; Heeg, Elizabeth; Rauschecker, Josef P; Turkeltaub, Peter E

    2014-11-01

    The brain improves speech processing through the integration of audiovisual (AV) signals. Situations involving AV speech integration may be crudely dichotomized into those where auditory and visual inputs contain (1) equivalent, complementary signals (validating AV speech) or (2) inconsistent, different signals (conflicting AV speech). This simple framework may allow the systematic examination of broad commonalities and differences between AV neural processes engaged by various experimental paradigms frequently used to study AV speech integration. We conducted an activation likelihood estimation metaanalysis of 22 functional imaging studies comprising 33 experiments, 311 subjects, and 347 foci examining "conflicting" versus "validating" AV speech. Experimental paradigms included content congruency, timing synchrony, and perceptual measures, such as the McGurk effect or synchrony judgments, across AV speech stimulus types (sublexical to sentence). Colocalization of conflicting AV speech experiments revealed consistency across at least two contrast types (e.g., synchrony and congruency) in a network of dorsal stream regions in the frontal, parietal, and temporal lobes. There was consistency across all contrast types (synchrony, congruency, and percept) in the bilateral posterior superior/middle temporal cortex. Although fewer studies were available, validating AV speech experiments were localized to other regions, such as ventral stream visual areas in the occipital and inferior temporal cortex. These results suggest that while equivalent, complementary AV speech signals may evoke activity in regions related to the corroboration of sensory input, conflicting AV speech signals recruit widespread dorsal stream areas likely involved in the resolution of conflicting sensory signals. Copyright © 2014 Wiley Periodicals, Inc.

  3. Modeling the Development of Audiovisual Cue Integration in Speech Perception.

    Science.gov (United States)

    Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C

    2017-03-21

    Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.

  4. Digital speech processing using Matlab

    CERN Document Server

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  5. Fundamental Frequency Estimation of the Speech Signal Compressed by MP3 Algorithm Using PCC Interpolation

    Directory of Open Access Journals (Sweden)

    MILIVOJEVIC, Z. N.

    2010-02-01

    Full Text Available In this paper the fundamental frequency estimation results of the MP3 modeled speech signal are analyzed. The estimation of the fundamental frequency was performed by the Picking-Peaks algorithm with the implemented Parametric Cubic Convolution (PCC interpolation. The efficiency of PCC was tested for Catmull-Rom, Greville and Greville two-parametric kernel. Depending on MSE, a window that gives optimal results was chosen.

  6. Mathematical pattern, smoothing and digital filtering of a speech signal

    International Nuclear Information System (INIS)

    Razzam, Mohamed Habib

    1979-01-01

    After presentation of speech synthesis methods, characterized by a treatment of pre-recorded natural signals, or by an analog simulation of vocal tract, we present a new synthesis method especially based on a mathematical pattern of the signal, as a development of M. RODET's method. For their physiological origin, these signals are partially or totally voiced, or aleatory. For the phoneme voiced parts, we compute the formant curves, the sum of which constitute the wave, directly in time-domain by applying a specific envelope (operating as a time-window analysis) to a sinusoidal wave, The sinusoidal wave computation is made at the beginning of each signal's pseudo-period. The transition from successive periods is assured by a polynomial smoothing followed by a digital filtering. For the aleatory parts, we present an aleatory computation method of formant curves. Each signal is subjected to a melodic diagrams computed in accordance with the nature of the phoneme (vowel or consonant) and its context (isolated or not). (author) [fr

  7. The application of sparse linear prediction dictionary to compressive sensing in speech signals

    Directory of Open Access Journals (Sweden)

    YOU Hanxu

    2016-04-01

    Full Text Available Appling compressive sensing (CS,which theoretically guarantees that signal sampling and signal compression can be achieved simultaneously,into audio and speech signal processing is one of the most popular research topics in recent years.In this paper,K-SVD algorithm was employed to learn a sparse linear prediction dictionary regarding as the sparse basis of underlying speech signals.Compressed signals was obtained by applying random Gaussian matrix to sample original speech frames.Orthogonal matching pursuit (OMP and compressive sampling matching pursuit (CoSaMP were adopted to recovery original signals from compressed one.Numbers of experiments were carried out to investigate the impact of speech frames length,compression ratios,sparse basis and reconstruction algorithms on CS performance.Results show that sparse linear prediction dictionary can advance the performance of speech signals reconstruction compared with discrete cosine transform (DCT matrix.

  8. Emotion Recognition of Speech Signals Based on Filter Methods

    Directory of Open Access Journals (Sweden)

    Narjes Yazdanian

    2016-10-01

    Full Text Available Speech is the basic mean of communication among human beings.With the increase of transaction between human and machine, necessity of automatic dialogue and removing human factor has been considered. The aim of this study was to determine a set of affective features the speech signal is based on emotions. In this study system was designs that include three mains sections, features extraction, features selection and classification. After extraction of useful features such as, mel frequency cepstral coefficient (MFCC, linear prediction cepstral coefficients (LPC, perceptive linear prediction coefficients (PLP, ferment frequency, zero crossing rate, cepstral coefficients and pitch frequency, Mean, Jitter, Shimmer, Energy, Minimum, Maximum, Amplitude, Standard Deviation, at a later stage with filter methods such as Pearson Correlation Coefficient, t-test, relief and information gain, we came up with a method to rank and select effective features in emotion recognition. Then Result, are given to the classification system as a subset of input. In this classification stage, multi support vector machine are used to classify seven type of emotion. According to the results, that method of relief, together with multi support vector machine, has the most classification accuracy with emotion recognition rate of 93.94%.

  9. A Novel Voice Sensor for the Detection of Speech Signals

    Directory of Open Access Journals (Sweden)

    Kun-Ching Wang

    2013-12-01

    Full Text Available In order to develop a novel voice sensor to detect human voices, the use of features which are more robust to noise is an important issue. Voice sensor is also called voice activity detection (VAD. Due to that the inherent nature of the formant structure only occurred on the speech spectrogram (well-known as voiceprint, Wu et al. were the first to use band-spectral entropy (BSE to describe the characteristics of voiceprints. However, the performance of VAD based on BSE feature was degraded in colored noise (or voiceprint-like noise environments. In order to solve this problem, we propose the two-dimensional part-band energy entropy (TD-PBEE parameter based on two variables: part-band partition number upon frequency index and long-term window size upon time index to further improve the BSE-based VAD algorithm. The two variables can efficiently represent the characteristics of voiceprints on each critical frequency band and use long-term information for noisy speech spectrograms, respectively. The TD-PBEE parameter can be regarded as a PBEE parameter over time. First, the strength of voiceprints can be partly enhanced by using four entropies applied to four part-bands. We can use the four part-band energy entropies for describing the voiceprints in detail. Due to the characteristics of non-stationary for speech and various noises, we will then use long-term information processing to refine the PBEE, so the voice-like noise can be distinguished from noisy speech through the concept of PBEE with long-term information. Our experiments show that the proposed feature extraction with the TD-PBEE parameter is quite insensitive to background noise. The proposed TD-PBEE-based VAD algorithm is evaluated for four types of noises and five signal-to-noise ratio (SNR levels. We find that the accuracy of the proposed TD-PBEE-based VAD algorithm averaged over all noises and all SNR levels is better than that of other considered VAD algorithms.

  10. Subspace-Based Noise Reduction for Speech Signals via Diagonal and Triangular Matrix Decompositions

    DEFF Research Database (Denmark)

    Hansen, Per Christian; Jensen, Søren Holdt

    2007-01-01

    We survey the definitions and use of rank-revealing matrix decompositions in single-channel noise reduction algorithms for speech signals. Our algorithms are based on the rank-reduction paradigm and, in particular, signal subspace techniques. The focus is on practical working algorithms, using both...... with working Matlab code and applications in speech processing....

  11. Humanistic Speech Education to Create Leadership Models.

    Science.gov (United States)

    Oka, Beverley Jeanne

    A theoretical framework based primarily on the humanistic psychology of Abraham Maslow is used in developing a humanistic approach to speech education. The holistic view of human learning and behavior, inherent in this approach, is seen to be compatible with a model of effective leadership. Specific applications of this approach to speech…

  12. Ordinal models of audiovisual speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias

    2011-01-01

    Audiovisual information is integrated in speech perception. One manifestation of this is the McGurk illusion in which watching the articulating face alters the auditory phonetic percept. Understanding this phenomenon fully requires a computational model with predictive power. Here, we describe...

  13. Auditory analysis for speech recognition based on physiological models

    Science.gov (United States)

    Jeon, Woojay; Juang, Biing-Hwang

    2004-05-01

    To address the limitations of traditional cepstrum or LPC based front-end processing methods for automatic speech recognition, more elaborate methods based on physiological models of the human auditory system may be used to achieve more robust speech recognition in adverse environments. For this purpose, a modified version of a model of the primary auditory cortex featuring a three dimensional mapping of auditory spectra [Wang and Shamma, IEEE Trans. Speech Audio Process. 3, 382-395 (1995)] is adopted and investigated for its use as an improved front-end processing method. The study is conducted in two ways: first, by relating the model's redundant representation to traditional spectral representations and showing that the former not only encompasses information provided by the latter, but also reveals more relevant information that makes it superior in describing the identifying features of speech signals; and second, by observing the statistical features of the representation for various classes of sound to show how different identifying features manifest themselves as specific patterns on the cortical map, thereby becoming a place-coded data set on which detection theory could be applied to simulate auditory perception and cognition.

  14. Speech Denoising in White Noise Based on Signal Subspace Low-rank Plus Sparse Decomposition

    Directory of Open Access Journals (Sweden)

    yuan Shuai

    2017-01-01

    Full Text Available In this paper, a new subspace speech enhancement method using low-rank and sparse decomposition is presented. In the proposed method, we firstly structure the corrupted data as a Toeplitz matrix and estimate its effective rank for the underlying human speech signal. Then the low-rank and sparse decomposition is performed with the guidance of speech rank value to remove the noise. Extensive experiments have been carried out in white Gaussian noise condition, and experimental results show the proposed method performs better than conventional speech enhancement methods, in terms of yielding less residual noise and lower speech distortion.

  15. Selection of individual features of a speech signal using genetic algorithms

    Directory of Open Access Journals (Sweden)

    Kamil Kamiński

    2016-03-01

    Full Text Available The paper presents an automatic speaker’s recognition system, implemented in the Matlab environment, and demonstrates how to achieve and optimize various elements of the system. The main emphasis was put on features selection of a speech signal using a genetic algorithm which takes into account synergy of features. The results of optimization of selected elements of a classifier have been also shown, including the number of Gaussian distributions used to model each of the voices. In addition, for creating voice models, a universal voice model has been used.[b]Keywords[/b]: biometrics, automatic speaker recognition, genetic algorithms, feature selection

  16. A Network Model of Observation and Imitation of Speech

    Science.gov (United States)

    Mashal, Nira; Solodkin, Ana; Dick, Anthony Steven; Chen, E. Elinor; Small, Steven L.

    2012-01-01

    Much evidence has now accumulated demonstrating and quantifying the extent of shared regional brain activation for observation and execution of speech. However, the nature of the actual networks that implement these functions, i.e., both the brain regions and the connections among them, and the similarities and differences across these networks has not been elucidated. The current study aims to characterize formally a network for observation and imitation of syllables in the healthy adult brain and to compare their structure and effective connectivity. Eleven healthy participants observed or imitated audiovisual syllables spoken by a human actor. We constructed four structural equation models to characterize the networks for observation and imitation in each of the two hemispheres. Our results show that the network models for observation and imitation comprise the same essential structure but differ in important ways from each other (in both hemispheres) based on connectivity. In particular, our results show that the connections from posterior superior temporal gyrus and sulcus to ventral premotor, ventral premotor to dorsal premotor, and dorsal premotor to primary motor cortex in the left hemisphere are stronger during imitation than during observation. The first two connections are implicated in a putative dorsal stream of speech perception, thought to involve translating auditory speech signals into motor representations. Thus, the current results suggest that flow of information during imitation, starting at the posterior superior temporal cortex and ending in the motor cortex, enhances input to the motor cortex in the service of speech execution. PMID:22470360

  17. Role of the middle ear muscle apparatus in mechanisms of speech signal discrimination

    Science.gov (United States)

    Moroz, B. S.; Bazarov, V. G.; Sachenko, S. V.

    1980-01-01

    A method of impedance reflexometry was used to examine 101 students with hearing impairment in order to clarify the interrelation between speech discrimination and the state of the middle ear muscles. Ability to discriminate speech signals depends to some extent on the functional state of intraaural muscles. Speech discrimination was greatly impaired in the absence of stapedial muscle acoustic reflex, in the presence of low thresholds of stimulation and in very small values of reflex amplitude increase. Discrimination was not impeded in positive AR, high values of relative thresholds and normal increase of reflex amplitude in response to speech signals with augmenting intensity.

  18. Analysis of vocal signal in its amplitude - time representation. speech synthesis-by-rules

    International Nuclear Information System (INIS)

    Rodet, Xavier

    1977-01-01

    In the first part of this dissertation, the natural speech production and the resulting acoustic waveform are examined under various aspects: communication, phonetics, frequency and temporal analysis. Our own study of direct signal is compared to other researches in these different fields, and fundamental features of vocal signals are described. The second part deals with the numerous methods already used for automatic text-to-speech synthesis. In the last part, we expose the new speech synthesis-by-rule methods that we have worked out, and we present in details the structure of the real-time speech synthesiser that we have implemented on a mini-computer. (author) [fr

  19. Speech-To-Text Conversion STT System Using Hidden Markov Model HMM

    Directory of Open Access Journals (Sweden)

    Su Myat Mon

    2015-06-01

    Full Text Available Abstract Speech is an easiest way to communicate with each other. Speech processing is widely used in many applications like security devices household appliances cellular phones ATM machines and computers. The human computer interface has been developed to communicate or interact conveniently for one who is suffering from some kind of disabilities. Speech-to-Text Conversion STT systems have a lot of benefits for the deaf or dumb people and find their applications in our daily lives. In the same way the aim of the system is to convert the input speech signals into the text output for the deaf or dumb students in the educational fields. This paper presents an approach to extract features by using Mel Frequency Cepstral Coefficients MFCC from the speech signals of isolated spoken words. And Hidden Markov Model HMM method is applied to train and test the audio files to get the recognized spoken word. The speech database is created by using MATLAB.Then the original speech signals are preprocessed and these speech samples are extracted to the feature vectors which are used as the observation sequences of the Hidden Markov Model HMM recognizer. The feature vectors are analyzed in the HMM depending on the number of states.

  20. A New Method to Represent Speech Signals Via Predefined Signature and Envelope Sequences

    Directory of Open Access Journals (Sweden)

    Binboga Sıddık Yarman

    2007-01-01

    Full Text Available A novel systematic procedure referred to as “SYMPES” to model speech signals is introduced. The structure of SYMPES is based on the creation of the so-called predefined “signature S={SR(n} and envelope E={EK(n}” sets. These sets are speaker and language independent. Once the speech signals are divided into frames with selected lengths, then each frame sequence Xi(n is reconstructed by means of the mathematical form Xi(n=CiEK(nSR(n. In this representation, Ci is called the gain factor, SR(n and EK(n are properly assigned from the predefined signature and envelope sets, respectively. Examples are given to exhibit the implementation of SYMPES. It is shown that for the same compression ratio or better, SYMPES yields considerably better speech quality over the commercially available coders such as G.726 (ADPCM at 16 kbps and voice excited LPC-10E (FS1015 at 2.4 kbps.

  1. Non-intrusive speech quality assessment in simplified e-model

    OpenAIRE

    Vozňák, Miroslav

    2012-01-01

    The E-model brings a modern approach to the computation of estimated quality, allowing for easy implementation. One of its advantages is that it can be applied in real time. The method is based on a mathematical computation model evaluating transmission path impairments influencing speech signal, especially delays and packet losses. These parameters, common in an IP network, can affect speech quality dramatically. The paper deals with a proposal for a simplified E-model and its pr...

  2. Auditory Modeling for Noisy Speech Recognition

    National Research Council Canada - National Science Library

    2000-01-01

    ... digital filtering for noise cancellation which interfaces to speech recognition software. It uses auditory features in speech recognition training, and provides applications to multilingual spoken language translation...

  3. Phoneme Compression: processing of the speech signal and effects on speech intelligibility in hearing-Impaired listeners

    NARCIS (Netherlands)

    A. Goedegebure (Andre)

    2005-01-01

    textabstractHearing-aid users often continue to have problems with poor speech understanding in difficult acoustical conditions. Another generally accounted problem is that certain sounds become too loud whereas other sounds are still not audible. Dynamic range compression is a signal processing

  4. Speech Motor Development in Childhood Apraxia of Speech : Generating Testable Hypotheses by Neurocomputational Modeling

    NARCIS (Netherlands)

    Terband, H.; Maassen, B.

    2010-01-01

    Childhood apraxia of speech (CAS) is a highly controversial clinical entity, with respect to both clinical signs and underlying neuromotor deficit. In the current paper, we advocate a modeling approach in which a computational neural model of speech acquisition and production is utilized in order to

  5. Speech motor development in childhood apraxia of speech: generating testable hypotheses by neurocomputational modeling.

    NARCIS (Netherlands)

    Terband, H.R.; Maassen, B.A.M.

    2010-01-01

    Childhood apraxia of speech (CAS) is a highly controversial clinical entity, with respect to both clinical signs and underlying neuromotor deficit. In the current paper, we advocate a modeling approach in which a computational neural model of speech acquisition and production is utilized in order to

  6. Application of adaptive digital signal processing to speech enhancement for the hearing impaired.

    Science.gov (United States)

    Chabries, D M; Christiansen, R W; Brey, R H; Robinette, M S; Harris, R W

    1987-01-01

    A major complaint of individuals with normal hearing and hearing impairments is a reduced ability to understand speech in a noisy environment. This paper describes the concept of adaptive noise cancelling for removing noise from corrupted speech signals. Application of adaptive digital signal processing has long been known and is described from a historical as well as technical perspective. The Widrow-Hoff LMS (least mean square) algorithm developed in 1959 forms the introduction to modern adaptive signal processing. This method uses a "primary" input which consists of the desired speech signal corrupted with noise and a second "reference" signal which is used to estimate the primary noise signal. By subtracting the adaptively filtered estimate of the noise, the desired speech signal is obtained. Recent developments in the field as they relate to noise cancellation are described. These developments include more computationally efficient algorithms as well as algorithms that exhibit improved learning performance. A second method for removing noise from speech, for use when no independent reference for the noise exists, is referred to as single channel noise suppression. Both adaptive and spectral subtraction techniques have been applied to this problem--often with the result of decreased speech intelligibility. Current techniques applied to this problem are described, including signal processing techniques that offer promise in the noise suppression application.

  7. On the Perception of Speech Sounds as Biologically Significant Signals1,2

    Science.gov (United States)

    Pisoni, David B.

    2012-01-01

    This paper reviews some of the major evidence and arguments currently available to support the view that human speech perception may require the use of specialized neural mechanisms for perceptual analysis. Experiments using synthetically produced speech signals with adults are briefly summarized and extensions of these results to infants and other organisms are reviewed with an emphasis towards detailing those aspects of speech perception that may require some need for specialized species-specific processors. Finally, some comments on the role of early experience in perceptual development are provided as an attempt to identify promising areas of new research in speech perception. PMID:399200

  8. Two Methods of Automatic Evaluation of Speech Signal Enhancement Recorded in the Open-Air MRI Environment

    Science.gov (United States)

    Přibil, Jiří; Přibilová, Anna; Frollo, Ivan

    2017-12-01

    The paper focuses on two methods of evaluation of successfulness of speech signal enhancement recorded in the open-air magnetic resonance imager during phonation for the 3D human vocal tract modeling. The first approach enables to obtain a comparison based on statistical analysis by ANOVA and hypothesis tests. The second method is based on classification by Gaussian mixture models (GMM). The performed experiments have confirmed that the proposed ANOVA and GMM classifiers for automatic evaluation of the speech quality are functional and produce fully comparable results with the standard evaluation based on the listening test method.

  9. Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model

    Directory of Open Access Journals (Sweden)

    Lotter Thomas

    2005-01-01

    Full Text Available This contribution presents two spectral amplitude estimators for acoustical background noise suppression based on maximum a posteriori estimation and super-Gaussian statistical modelling of the speech DFT amplitudes. The probability density function of the speech spectral amplitude is modelled with a simple parametric function, which allows a high approximation accuracy for Laplace- or Gamma-distributed real and imaginary parts of the speech DFT coefficients. Also, the statistical model can be adapted to optimally fit the distribution of the speech spectral amplitudes for a specific noise reduction system. Based on the super-Gaussian statistical model, computationally efficient maximum a posteriori speech estimators are derived, which outperform the commonly applied Ephraim-Malah algorithm.

  10. Neural correlates of quality perception for complex speech signals

    CERN Document Server

    Antons, Jan-Niklas

    2015-01-01

    This book interconnects two essential disciplines to study the perception of speech: Neuroscience and Quality of Experience, which to date have rarely been used together for the purposes of research on speech quality perception. In five key experiments, the book demonstrates the application of standard clinical methods in neurophysiology on the one hand, and of methods used in fields of research concerned with speech quality perception on the other. Using this combination, the book shows that speech stimuli with different lengths and different quality impairments are accompanied by physiological reactions related to quality variations, e.g., a positive peak in an event-related potential. Furthermore, it demonstrates that – in most cases – quality impairment intensity has an impact on the intensity of physiological reactions.

  11. Predicting the effect of spectral subtraction on the speech recognition threshold based on the signal-to-noise ratio in the envelope domain

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2011-01-01

    rarely been evaluated perceptually in terms of speech intelligibility. This study analyzed the effects of the spectral subtraction strategy proposed by Berouti at al. [ICASSP 4 (1979), 208-211] on the speech recognition threshold (SRT) obtained with sentences presented in stationary speech-shaped noise....... The SRT was measured in five normal-hearing listeners in six conditions of spectral subtraction. The results showed an increase of the SRT after processing, i.e. a decreased speech intelligibility, in contrast to what is predicted by the Speech Transmission Index (STI). Here, another approach is proposed......, denoted the speech-based envelope power spectrum model (sEPSM) which predicts the intelligibility based on the signal-to-noise ratio in the envelope domain. In contrast to the STI, the sEPSM is sensitive to the increased amount of the noise envelope power as a consequence of the spectral subtraction...

  12. Compact Acoustic Models for Embedded Speech Recognition

    Directory of Open Access Journals (Sweden)

    Lévy Christophe

    2009-01-01

    Full Text Available Speech recognition applications are known to require a significant amount of resources. However, embedded speech recognition only authorizes few KB of memory, few MIPS, and small amount of training data. In order to fit the resource constraints of embedded applications, an approach based on a semicontinuous HMM system using state-independent acoustic modelling is proposed. A transformation is computed and applied to the global model in order to obtain each HMM state-dependent probability density functions, authorizing to store only the transformation parameters. This approach is evaluated on two tasks: digit and voice-command recognition. A fast adaptation technique of acoustic models is also proposed. In order to significantly reduce computational costs, the adaptation is performed only on the global model (using related speaker recognition adaptation techniques with no need for state-dependent data. The whole approach results in a relative gain of more than 20% compared to a basic HMM-based system fitting the constraints.

  13. [Modeling developmental aspects of sensorimotor control of speech production].

    Science.gov (United States)

    Kröger, B J; Birkholz, P; Neuschaefer-Rube, C

    2007-05-01

    Detailed knowledge of the neurophysiology of speech acquisition is important for understanding the developmental aspects of speech perception and production and for understanding developmental disorders of speech perception and production. A computer implemented neural model of sensorimotor control of speech production was developed. The model is capable of demonstrating the neural functions of different cortical areas during speech production in detail. (i) Two sensory and two motor maps or neural representations and the appertaining neural mappings or projections establish the sensorimotor feedback control system. These maps and mappings are already formed and trained during the prelinguistic phase of speech acquisition. (ii) The feedforward sensorimotor control system comprises the lexical map (representations of sounds, syllables, and words of the first language) and the mappings from lexical to sensory and to motor maps. The training of the appertaining mappings form the linguistic phase of speech acquisition. (iii) Three prelinguistic learning phases--i. e. silent mouthing, quasi stationary vocalic articulation, and realisation of articulatory protogestures--can be defined on the basis of our simulation studies using the computational neural model. These learning phases can be associated with temporal phases of prelinguistic speech acquisition obtained from natural data. The neural model illuminates the detailed function of specific cortical areas during speech production. In particular it can be shown that developmental disorders of speech production may result from a delayed or incorrect process within one of the prelinguistic learning phases defined by the neural model.

  14. FUSING SPEECH SIGNAL AND PALMPRINT FEATURES FOR AN SECURED AUTHENTICATION SYSTEM

    Directory of Open Access Journals (Sweden)

    P.K. Mahesh

    2011-11-01

    Full Text Available In the application of Biometric authentication, personal identification is regarded as an effective method for automatic recognition, with a high confidence, a person’s identity. Using multimodal biometric systems we typically get better performance compare to single biometric modality. This paper proposes the multimodal biometrics system for identity verification using two traits, i.e., speech signal and palmprint. Integrating the palmprint and speech information increases robustness of person authentication. The proposed system is designed for applications where the training data contains a speech signal and palmprint. It is well known that the performance of person authentication using only speech signal or palmprint is deteriorated by feature changes with time. The final decision is made by fusion at matching score level architecture in which feature vectors are created independently for query measures and are then compared to the enrolment templates, which are stored during database preparation.

  15. Didactic speech synthesizer – acoustic module, formants model

    OpenAIRE

    Teixeira, João Paulo; Fernandes, Anildo

    2013-01-01

    Text-to-speech synthesis is the main subject treated in this work. It will be presented the constitution of a generic text-to-speech system conversion, explained the functions of the various modules and described the development techniques using the formants model. The development of a didactic formant synthesiser under Matlab environment will also be described. This didactic synthesiser is intended for a didactic understanding of the formant model of speech production.

  16. Model based Binaural Enhancement of Voiced and Unvoiced Speech

    DEFF Research Database (Denmark)

    Kavalekalam, Mathew Shaji; Christensen, Mads Græsbøll; Boldt, Jesper B.

    2017-01-01

    This paper deals with the enhancement of speech in presence of non-stationary babble noise. A binaural speech enhancement framework is proposed which takes into account both the voiced and unvoiced speech production model. The usage of this model in enhancement requires the Short term predictor...... (STP) parameters and the pitch information to be estimated. This paper uses a codebook based approach for estimating the STP parameters and a parametric binaural method is proposed for estimating the pitch parameters. Improvements in objective score are shown when using the voicedunvoiced speech model...

  17. Multichannel infinite clipping as a form of sampling of speech signals

    International Nuclear Information System (INIS)

    Guidarelli, G.

    1985-01-01

    A remarkable improvement of both intelligibility and naturalness of infinitely clipped speech can be achieved by means of a multichannel system in which the speech signal is splitted into several band-pass channels before the clipping and successively reconstructed by summing up the clipped outputs of each channel. A possible explanation of such an improvement is given, founded on the so-called zero-based representation of band limited signals where the zero-crossings sequence is considered as a set of samples of the signal

  18. Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2011-01-01

    A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The ...... process provides a key measure of speech intelligibility. © 2011 Acoustical Society of America.......A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data....... The model estimates the speech-to-noise envelope power ratio, SNR env, at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech...

  19. Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model.

    Science.gov (United States)

    Jürgens, Tim; Brand, Thomas

    2009-11-01

    This study compares the phoneme recognition performance in speech-shaped noise of a microscopic model for speech recognition with the performance of normal-hearing listeners. "Microscopic" is defined in terms of this model twofold. First, the speech recognition rate is predicted on a phoneme-by-phoneme basis. Second, microscopic modeling means that the signal waveforms to be recognized are processed by mimicking elementary parts of human's auditory processing. The model is based on an approach by Holube and Kollmeier [J. Acoust. Soc. Am. 100, 1703-1716 (1996)] and consists of a psychoacoustically and physiologically motivated preprocessing and a simple dynamic-time-warp speech recognizer. The model is evaluated while presenting nonsense speech in a closed-set paradigm. Averaged phoneme recognition rates, specific phoneme recognition rates, and phoneme confusions are analyzed. The influence of different perceptual distance measures and of the model's a-priori knowledge is investigated. The results show that human performance can be predicted by this model using an optimal detector, i.e., identical speech waveforms for both training of the recognizer and testing. The best model performance is yielded by distance measures which focus mainly on small perceptual distances and neglect outliers.

  20. Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach

    Directory of Open Access Journals (Sweden)

    W. Bastiaan Kleijn

    2005-06-01

    Full Text Available Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features that are most relevant to the listener. The aim of this paper is to provide a tutorial on perceptual speech and audio coding using an invertible auditory model. In this approach, the audio signal is converted into an auditory representation using an invertible auditory model. The auditory representation is quantized and coded. Upon decoding, it is then transformed back into the acoustic domain. This transformation converts a complex distortion criterion into a simple one, thus facilitating quantization with low complexity. We briefly review past work on auditory models and describe in more detail the components of our invertible model and its inversion procedure, that is, the method to reconstruct the signal from the output of the auditory model. We summarize attempts to use the auditory representation for low-bit-rate coding. Our approach also allows the exploitation of the inherent redundancy of the human auditory system for the purpose of multiple description (joint source-channel coding.

  1. Modeling auditory processing and speech perception in hearing-impaired listeners

    DEFF Research Database (Denmark)

    Jepsen, Morten Løve

    in a diagnostic rhyme test. The framework was constructed such that discrimination errors originating from the front-end and the back-end were separated. The front-end was fitted to individual listeners with cochlear hearing loss according to non-speech data, and speech data were obtained in the same listeners......A better understanding of how the human auditory system represents and analyzes sounds and how hearing impairment affects such processing is of great interest for researchers in the fields of auditory neuroscience, audiology, and speech communication as well as for applications in hearing......-instrument and speech technology. In this thesis, the primary focus was on the development and evaluation of a computational model of human auditory signal-processing and perception. The model was initially designed to simulate the normal-hearing auditory system with particular focus on the nonlinear processing...

  2. Decoding Speech With Integrated Hybrid Signals Recorded From the Human Ventral Motor Cortex

    Directory of Open Access Journals (Sweden)

    Kenji Ibayashi

    2018-04-01

    Full Text Available Restoration of speech communication for locked-in patients by means of brain computer interfaces (BCIs is currently an important area of active research. Among the neural signals obtained from intracranial recordings, single/multi-unit activity (SUA/MUA, local field potential (LFP, and electrocorticography (ECoG are good candidates for an input signal for BCIs. However, the question of which signal or which combination of the three signal modalities is best suited for decoding speech production remains unverified. In order to record SUA, LFP, and ECoG simultaneously from a highly localized area of human ventral sensorimotor cortex (vSMC, we fabricated an electrode the size of which was 7 by 13 mm containing sparsely arranged microneedle and conventional macro contacts. We determined which signal modality is the most capable of decoding speech production, and tested if the combination of these signals could improve the decoding accuracy of spoken phonemes. Feature vectors were constructed from spike frequency obtained from SUAs and event-related spectral perturbation derived from ECoG and LFP signals, then input to the decoder. The results showed that the decoding accuracy for five spoken vowels was highest when features from multiple signals were combined and optimized for each subject, and reached 59% when averaged across all six subjects. This result suggests that multi-scale signals convey complementary information for speech articulation. The current study demonstrated that simultaneous recording of multi-scale neuronal activities could raise decoding accuracy even though the recording area is limited to a small portion of cortex, which is advantageous for future implementation of speech-assisting BCIs.

  3. Subspace-Based Noise Reduction for Speech Signals via Diagonal and Triangular Matrix Decompositions

    DEFF Research Database (Denmark)

    Hansen, Per Christian; Jensen, Søren Holdt

    We survey the definitions and use of rank-revealing matrix decompositions in single-channel noise reduction algorithms for speech signals. Our algorithms are based on the rank-reduction paradigm and, in particular, signal subspace techniques. The focus is on practical working algorithms, using both...... diagonal (eigenvalue and singular value) decompositions and rank-revealing triangular decompositions (ULV, URV, VSV, ULLV and ULLIV). In addition we show how the subspace-based algorithms can be evaluated and compared by means of simple FIR filter interpretations. The algorithms are illustrated...... with working Matlab code and applications in speech processing....

  4. JNDS of interaural time delay (ITD) of selected frequency bands in speech and music signals

    Science.gov (United States)

    Aliphas, Avner; Colburn, H. Steven; Ghitza, Oded

    2002-05-01

    JNDS of interaural time delay (ITD) of selected frequency bands in the presence of other frequency bands have been reported for noiseband stimuli [Zurek (1985); Trahiotis and Bernstein (1990)]. Similar measurements will be reported for speech and music signals. When stimuli are synthesized with bandpass/band-stop operations, performance with complex stimuli are similar to noisebands (JNDS in tens or hundreds of microseconds); however, the resulting waveforms, when viewed through a model of the auditory periphery, show distortions (irregularities in phase and level) at the boundaries of the target band of frequencies. An alternate synthesis method based upon group-delay filtering operations does not show these distortions and is being used for the current measurements. Preliminary measurements indicate that when music stimuli are created using the new techniques, JNDS of ITDs are increased significantly compared to previous studies, with values on the order of milliseconds.

  5. Cognitive Bias for Learning Speech Sounds From a Continuous Signal Space Seems Nonlinguistic

    Directory of Open Access Journals (Sweden)

    Sabine van der Ham

    2015-10-01

    Full Text Available When learning language, humans have a tendency to produce more extreme distributions of speech sounds than those observed most frequently: In rapid, casual speech, vowel sounds are centralized, yet cross-linguistically, peripheral vowels occur almost universally. We investigate whether adults’ generalization behavior reveals selective pressure for communication when they learn skewed distributions of speech-like sounds from a continuous signal space. The domain-specific hypothesis predicts that the emergence of sound categories is driven by a cognitive bias to make these categories maximally distinct, resulting in more skewed distributions in participants’ reproductions. However, our participants showed more centered distributions, which goes against this hypothesis, indicating that there are no strong innate linguistic biases that affect learning these speech-like sounds. The centralization behavior can be explained by a lack of communicative pressure to maintain categories.

  6. Signal Processing Methods for Removing the Effects of Whole Body Vibration upon Speech

    Science.gov (United States)

    Bitner, Rachel M.; Begault, Durand R.

    2014-01-01

    Humans may be exposed to whole-body vibration in environments where clear speech communications are crucial, particularly during the launch phases of space flight and in high-performance aircraft. Prior research has shown that high levels of vibration cause a decrease in speech intelligibility. However, the effects of whole-body vibration upon speech are not well understood, and no attempt has been made to restore speech distorted by whole-body vibration. In this paper, a model for speech under whole-body vibration is proposed and a method to remove its effect is described. The method described reduces the perceptual effects of vibration, yields higher ASR accuracy scores, and may significantly improve intelligibility. Possible applications include incorporation within communication systems to improve radio-communication systems in environments such a spaceflight, aviation, or off-road vehicle operations.

  7. Speech Enhancement by Classification of Noisy Signals Decomposed Using NMF and Wiener Filtering

    DEFF Research Database (Denmark)

    Fakhry, Mahmoud; Poorjam, Amir Hossein; Christensen, Mads Græsbøll

    2018-01-01

    are identified in the cepstral domain using the trained classifier. We apply unsupervised NMF followed by Wiener filtering for the decomposition, and use a support vector machine trained on the mel-frequency cepstral coefficients of the parts of training speech and noise signals for the classification...

  8. Feature Fusion Algorithm for Multimodal Emotion Recognition from Speech and Facial Expression Signal

    Directory of Open Access Journals (Sweden)

    Han Zhiyan

    2016-01-01

    Full Text Available In order to overcome the limitation of single mode emotion recognition. This paper describes a novel multimodal emotion recognition algorithm, and takes speech signal and facial expression signal as the research subjects. First, fuse the speech signal feature and facial expression signal feature, get sample sets by putting back sampling, and then get classifiers by BP neural network (BPNN. Second, measure the difference between two classifiers by double error difference selection strategy. Finally, get the final recognition result by the majority voting rule. Experiments show the method improves the accuracy of emotion recognition by giving full play to the advantages of decision level fusion and feature level fusion, and makes the whole fusion process close to human emotion recognition more, with a recognition rate 90.4%.

  9. Inferring imagined speech using EEG signals: a new approach using Riemannian manifold features

    Science.gov (United States)

    Nguyen, Chuong H.; Karavas, George K.; Artemiadis, Panagiotis

    2018-02-01

    Objective. In this paper, we investigate the suitability of imagined speech for brain-computer interface (BCI) applications. Approach. A novel method based on covariance matrix descriptors, which lie in Riemannian manifold, and the relevance vector machines classifier is proposed. The method is applied on electroencephalographic (EEG) signals and tested in multiple subjects. Main results. The method is shown to outperform other approaches in the field with respect to accuracy and robustness. The algorithm is validated on various categories of speech, such as imagined pronunciation of vowels, short words and long words. The classification accuracy of our methodology is in all cases significantly above chance level, reaching a maximum of 70% for cases where we classify three words and 95% for cases of two words. Significance. The results reveal certain aspects that may affect the success of speech imagery classification from EEG signals, such as sound, meaning and word complexity. This can potentially extend the capability of utilizing speech imagery in future BCI applications. The dataset of speech imagery collected from total 15 subjects is also published.

  10. Hearing aid processing of loud speech and noise signals: Consequences for loudness perception and listening comfort

    DEFF Research Database (Denmark)

    Schmidt, Erik

    2007-01-01

    sounds, has found that both normal-hearing and hearing-impaired listeners prefer loud sounds to be closer to the most comfortable loudness-level, than suggested by common non-linear fitting rules. During this project, two listening experiments were carried out. In the first experiment, hearing aid users......Hearing aid processing of loud speech and noise signals: Consequences for loudness perception and listening comfort. Sound processing in hearing aids is determined by the fitting rule. The fitting rule describes how the hearing aid should amplify speech and sounds in the surroundings......, such that they become audible again for the hearing impaired person. The general goal is to place all sounds within the hearing aid users’ audible range, such that speech intelligibility and listening comfort become as good as possible. Amplification strategies in hearing aids are in many cases based on empirical...

  11. A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments

    Directory of Open Access Journals (Sweden)

    Jing Mi

    2016-09-01

    Full Text Available Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.

  12. A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments.

    Science.gov (United States)

    Mi, Jing; Colburn, H Steven

    2016-10-03

    Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model. © The Author(s) 2016.

  13. A speech processing study using an acoustic model of a multiple-channel cochlear implant

    Science.gov (United States)

    Xu, Ying

    1998-10-01

    A cochlear implant is an electronic device designed to provide sound information for adults and children who have bilateral profound hearing loss. The task of representing speech signals as electrical stimuli is central to the design and performance of cochlear implants. Studies have shown that the current speech- processing strategies provide significant benefits to cochlear implant users. However, the evaluation and development of speech-processing strategies have been complicated by hardware limitations and large variability in user performance. To alleviate these problems, an acoustic model of a cochlear implant with the SPEAK strategy is implemented in this study, in which a set of acoustic stimuli whose psychophysical characteristics are as close as possible to those produced by a cochlear implant are presented on normal-hearing subjects. To test the effectiveness and feasibility of this acoustic model, a psychophysical experiment was conducted to match the performance of a normal-hearing listener using model- processed signals to that of a cochlear implant user. Good agreement was found between an implanted patient and an age-matched normal-hearing subject in a dynamic signal discrimination experiment, indicating that this acoustic model is a reasonably good approximation of a cochlear implant with the SPEAK strategy. The acoustic model was then used to examine the potential of the SPEAK strategy in terms of its temporal and frequency encoding of speech. It was hypothesized that better temporal and frequency encoding of speech can be accomplished by higher stimulation rates and a larger number of activated channels. Vowel and consonant recognition tests were conducted on normal-hearing subjects using speech tokens processed by the acoustic model, with different combinations of stimulation rate and number of activated channels. The results showed that vowel recognition was best at 600 pps and 8 activated channels, but further increases in stimulation rate and

  14. Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...

  15. Sub-Audible Speech Recognition Based upon Electromyographic Signals

    Science.gov (United States)

    Jorgensen, Charles C. (Inventor); Lee, Diana D. (Inventor); Agabon, Shane T. (Inventor)

    2012-01-01

    Method and system for processing and identifying a sub-audible signal formed by a source of sub-audible sounds. Sequences of samples of sub-audible sound patterns ("SASPs") for known words/phrases in a selected database are received for overlapping time intervals, and Signal Processing Transforms ("SPTs") are formed for each sample, as part of a matrix of entry values. The matrix is decomposed into contiguous, non-overlapping two-dimensional cells of entries, and neural net analysis is applied to estimate reference sets of weight coefficients that provide sums with optimal matches to reference sets of values. The reference sets of weight coefficients are used to determine a correspondence between a new (unknown) word/phrase and a word/phrase in the database.

  16. Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Hogden, J.

    1996-11-05

    The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation may decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.

  17. Neural Spike-Train Analyses of the Speech-Based Envelope Power Spectrum Model

    Science.gov (United States)

    Rallapalli, Varsha H.

    2016-01-01

    Diagnosing and treating hearing impairment is challenging because people with similar degrees of sensorineural hearing loss (SNHL) often have different speech-recognition abilities. The speech-based envelope power spectrum model (sEPSM) has demonstrated that the signal-to-noise ratio (SNRENV) from a modulation filter bank provides a robust speech-intelligibility measure across a wider range of degraded conditions than many long-standing models. In the sEPSM, noise (N) is assumed to: (a) reduce S + N envelope power by filling in dips within clean speech (S) and (b) introduce an envelope noise floor from intrinsic fluctuations in the noise itself. While the promise of SNRENV has been demonstrated for normal-hearing listeners, it has not been thoroughly extended to hearing-impaired listeners because of limited physiological knowledge of how SNHL affects speech-in-noise envelope coding relative to noise alone. Here, envelope coding to speech-in-noise stimuli was quantified from auditory-nerve model spike trains using shuffled correlograms, which were analyzed in the modulation-frequency domain to compute modulation-band estimates of neural SNRENV. Preliminary spike-train analyses show strong similarities to the sEPSM, demonstrating feasibility of neural SNRENV computations. Results suggest that individual differences can occur based on differential degrees of outer- and inner-hair-cell dysfunction in listeners currently diagnosed into the single audiological SNHL category. The predicted acoustic-SNR dependence in individual differences suggests that the SNR-dependent rate of susceptibility could be an important metric in diagnosing individual differences. Future measurements of the neural SNRENV in animal studies with various forms of SNHL will provide valuable insight for understanding individual differences in speech-in-noise intelligibility.

  18. Cross-modal enhancement of speech detection in young and older adults: does signal content matter?

    Science.gov (United States)

    Tye-Murray, Nancy; Spehar, Brent; Myerson, Joel; Sommers, Mitchell S; Hale, Sandra

    2011-01-01

    The purpose of the present study was to examine the effects of age and visual content on cross-modal enhancement of auditory speech detection. Visual content consisted of three clearly distinct types of visual information: an unaltered video clip of a talker's face, a low-contrast version of the same clip, and a mouth-like Lissajous figure. It was hypothesized that both young and older adults would exhibit reduced enhancement as visual content diverged from the original clip of the talker's face, but that the decrease would be greater for older participants. Nineteen young adults and 19 older adults were asked to detect a single spoken syllable (/ba/) in speech-shaped noise, and the level of the signal was adaptively varied to establish the signal-to-noise ratio (SNR) at threshold. There was an auditory-only baseline condition and three audiovisual conditions in which the syllable was accompanied by one of the three visual signals (the unaltered clip of the talker's face, the low-contrast version of that clip, or the Lissajous figure). For each audiovisual condition, the SNR at threshold was compared with the SNR at threshold for the auditory-only condition to measure the amount of cross-modal enhancement. Young adults exhibited significant cross-modal enhancement with all three types of visual stimuli, with the greatest amount of enhancement observed for the unaltered clip of the talker's face. Older adults, in contrast, exhibited significant cross-modal enhancement only with the unaltered face. Results of this study suggest that visual signal content affects cross-modal enhancement of speech detection in both young and older adults. They also support a hypothesized age-related deficit in processing low-contrast visual speech stimuli, even in older adults with normal contrast sensitivity.

  19. Speech emotion recognition based on statistical pitch model

    Institute of Scientific and Technical Information of China (English)

    WANG Zhiping; ZHAO Li; ZOU Cairong

    2006-01-01

    A modified Parzen-window method, which keep high resolution in low frequencies and keep smoothness in high frequencies, is proposed to obtain statistical model. Then, a gender classification method utilizing the statistical model is proposed, which have a 98% accuracy of gender classification while long sentence is dealt with. By separation the male voice and female voice, the mean and standard deviation of speech training samples with different emotion are used to create the corresponding emotion models. Then the Bhattacharyya distance between the test sample and statistical models of pitch, are utilized for emotion recognition in speech.The normalization of pitch for the male voice and female voice are also considered, in order to illustrate them into a uniform space. Finally, the speech emotion recognition experiment based on K Nearest Neighbor shows that, the correct rate of 81% is achieved, where it is only 73.85%if the traditional parameters are utilized.

  20. Enhancing Speech Recognition Using Improved Particle Swarm Optimization Based Hidden Markov Model

    Directory of Open Access Journals (Sweden)

    Lokesh Selvaraj

    2014-01-01

    Full Text Available Enhancing speech recognition is the primary intention of this work. In this paper a novel speech recognition method based on vector quantization and improved particle swarm optimization (IPSO is suggested. The suggested methodology contains four stages, namely, (i denoising, (ii feature mining (iii, vector quantization, and (iv IPSO based hidden Markov model (HMM technique (IP-HMM. At first, the speech signals are denoised using median filter. Next, characteristics such as peak, pitch spectrum, Mel frequency Cepstral coefficients (MFCC, mean, standard deviation, and minimum and maximum of the signal are extorted from the denoised signal. Following that, to accomplish the training process, the extracted characteristics are given to genetic algorithm based codebook generation in vector quantization. The initial populations are created by selecting random code vectors from the training set for the codebooks for the genetic algorithm process and IP-HMM helps in doing the recognition. At this point the creativeness will be done in terms of one of the genetic operation crossovers. The proposed speech recognition technique offers 97.14% accuracy.

  1. Teaching autistic children conversational speech using video modeling.

    Science.gov (United States)

    Charlop, M H; Milstein, J P

    1989-01-01

    We assessed the effects of video modeling on acquisition and generalization of conversational skills among autistic children. Three autistic boys observed videotaped conversations consisting of two people discussing specific toys. When criterion for learning was met, generalization of conversational skills was assessed with untrained topics of conversation; new stimuli (toys); unfamiliar persons, siblings, and autistic peers; and other settings. The results indicated that the children learned through video modeling, generalized their conversational skills, and maintained conversational speech over a 15-month period. Video modeling shows much promise as a rapid and effective procedure for teaching complex verbal skills such as conversational speech. PMID:2793634

  2. Adaptive filtration of speech signals in the presence of correlated noise with random variation of probabilistic characteristics

    OpenAIRE

    M. O. Partala; S. Ya. Zhuk

    2007-01-01

    On the base of mixed Markoff process in discrete time optimal and quasioptimal algorithms is designed for adaptive filtration of speech signals in the presence of correlated noise with random variation of probabilistic characteristics.

  3. Prediction of speech intelligibility based on an auditory preprocessing model

    DEFF Research Database (Denmark)

    Christiansen, Claus Forup Corlin; Pedersen, Michael Syskind; Dau, Torsten

    2010-01-01

    in noise experiment was used for training and an ideal binary mask experiment was used for evaluation. All three models were able to capture the trends in the speech in noise training data well, but the proposed model provides a better prediction of the binary mask test data, particularly when the binary...... masks degenerate to a noise vocoder....

  4. High-performance speech recognition using consistency modeling

    Science.gov (United States)

    Digalakis, Vassilios; Murveit, Hy; Monaco, Peter; Neumeyer, Leo; Sankar, Ananth

    1994-12-01

    The goal of SRI's consistency modeling project is to improve the raw acoustic modeling component of SRI's DECIPHER speech recognition system and develop consistency modeling technology. Consistency modeling aims to reduce the number of improper independence assumptions used in traditional speech recognition algorithms so that the resulting speech recognition hypotheses are more self-consistent and, therefore, more accurate. At the initial stages of this effort, SRI focused on developing the appropriate base technologies for consistency modeling. We first developed the Progressive Search technology that allowed us to perform large-vocabulary continuous speech recognition (LVCSR) experiments. Since its conception and development at SRI, this technique has been adopted by most laboratories, including other ARPA contracting sites, doing research on LVSR. Another goal of the consistency modeling project is to attack difficult modeling problems, when there is a mismatch between the training and testing phases. Such mismatches may include outlier speakers, different microphones and additive noise. We were able to either develop new, or transfer and evaluate existing, technologies that adapted our baseline genonic HMM recognizer to such difficult conditions.

  5. Speech-discrimination scores modeled as a binomial variable.

    Science.gov (United States)

    Thornton, A R; Raffin, M J

    1978-09-01

    Many studies have reported variability data for tests of speech discrimination, and the disparate results of these studies have not been given a simple explanation. Arguments over the relative merits of 25- vs 50-word tests have ignored the basic mathematical properties inherent in the use of percentage scores. The present study models performance on clinical tests of speech discrimination as a binomial variable. A binomial model was developed, and some of its characteristics were tested against data from 4120 scores obtained on the CID Auditory Test W-22. A table for determining significant deviations between scores was generated and compared to observed differences in half-list scores for the W-22 tests. Good agreement was found between predicted and observed values. Implications of the binomial characteristics of speech-discrimination scores are discussed.

  6. Modeling Sluggishness in Binaural Unmasking of Speech for Maskers With Time-Varying Interaural Phase Differences.

    Science.gov (United States)

    Hauth, Christopher F; Brand, Thomas

    2018-01-01

    In studies investigating binaural processing in human listeners, relatively long and task-dependent time constants of a binaural window ranging from 10 ms to 250 ms have been observed. Such time constants are often thought to reflect "binaural sluggishness." In this study, the effect of binaural sluggishness on binaural unmasking of speech in stationary speech-shaped noise is investigated in 10 listeners with normal hearing. In order to design a masking signal with temporally varying binaural cues, the interaural phase difference of the noise was modulated sinusoidally with frequencies ranging from 0.25 Hz to 64 Hz. The lowest, that is the best, speech reception thresholds (SRTs) were observed for the lowest modulation frequency. SRTs increased with increasing modulation frequency up to 4 Hz. For higher modulation frequencies, SRTs remained constant in the range of 1 dB to 1.5 dB below the SRT determined in the diotic situation. The outcome of the experiment was simulated using a short-term binaural speech intelligibility model, which combines an equalization-cancellation (EC) model with the speech intelligibility index. This model segments the incoming signal into 23.2-ms time frames in order to predict release from masking in modulated noises. In order to predict the results from this study, the model required a further time constant applied to the EC mechanism representing binaural sluggishness. The best agreement with perceptual data was achieved using a temporal window of 200 ms in the EC mechanism.

  7. Multiscale Signal Analysis and Modeling

    CERN Document Server

    Zayed, Ahmed

    2013-01-01

    Multiscale Signal Analysis and Modeling presents recent advances in multiscale analysis and modeling using wavelets and other systems. This book also presents applications in digital signal processing using sampling theory and techniques from various function spaces, filter design, feature extraction and classification, signal and image representation/transmission, coding, nonparametric statistical signal processing, and statistical learning theory. This book also: Discusses recently developed signal modeling techniques, such as the multiscale method for complex time series modeling, multiscale positive density estimations, Bayesian Shrinkage Strategies, and algorithms for data adaptive statistics Introduces new sampling algorithms for multidimensional signal processing Provides comprehensive coverage of wavelets with presentations on waveform design and modeling, wavelet analysis of ECG signals and wavelet filters Reviews features extraction and classification algorithms for multiscale signal and image proce...

  8. Speech Silicon: An FPGA Architecture for Real-Time Hidden Markov-Model-Based Speech Recognition

    Directory of Open Access Journals (Sweden)

    Schuster Jeffrey

    2006-01-01

    Full Text Available This paper examines the design of an FPGA-based system-on-a-chip capable of performing continuous speech recognition on medium sized vocabularies in real time. Through the creation of three dedicated pipelines, one for each of the major operations in the system, we were able to maximize the throughput of the system while simultaneously minimizing the number of pipeline stalls in the system. Further, by implementing a token-passing scheme between the later stages of the system, the complexity of the control was greatly reduced and the amount of active data present in the system at any time was minimized. Additionally, through in-depth analysis of the SPHINX 3 large vocabulary continuous speech recognition engine, we were able to design models that could be efficiently benchmarked against a known software platform. These results, combined with the ability to reprogram the system for different recognition tasks, serve to create a system capable of performing real-time speech recognition in a vast array of environments.

  9. Speech Silicon: An FPGA Architecture for Real-Time Hidden Markov-Model-Based Speech Recognition

    Directory of Open Access Journals (Sweden)

    Alex K. Jones

    2006-11-01

    Full Text Available This paper examines the design of an FPGA-based system-on-a-chip capable of performing continuous speech recognition on medium sized vocabularies in real time. Through the creation of three dedicated pipelines, one for each of the major operations in the system, we were able to maximize the throughput of the system while simultaneously minimizing the number of pipeline stalls in the system. Further, by implementing a token-passing scheme between the later stages of the system, the complexity of the control was greatly reduced and the amount of active data present in the system at any time was minimized. Additionally, through in-depth analysis of the SPHINX 3 large vocabulary continuous speech recognition engine, we were able to design models that could be efficiently benchmarked against a known software platform. These results, combined with the ability to reprogram the system for different recognition tasks, serve to create a system capable of performing real-time speech recognition in a vast array of environments.

  10. The Effectiveness of Clear Speech as a Masker

    Science.gov (United States)

    Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

    2010-01-01

    Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…

  11. Filtering the Unknown: Speech Activity Detection in Heterogeneous Video Collections

    NARCIS (Netherlands)

    Huijbregts, M.A.H.; Wooters, Chuck; Ordelman, Roeland J.F.

    2007-01-01

    In this paper we discuss the speech activity detection system that we used for detecting speech regions in the Dutch TRECVID video collection. The system is designed to filter non-speech like music or sound effects out of the signal without the use of predefined non-speech models. Because the system

  12. Least 1-Norm Pole-Zero Modeling with Sparse Deconvolution for Speech Analysis

    DEFF Research Database (Denmark)

    Shi, Liming; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2017-01-01

    In this paper, we present a speech analysis method based on sparse pole-zero modeling of speech. Instead of using the all-pole model to approximate the speech production filter, a pole-zero model is used for the combined effect of the vocal tract; radiation at the lips and the glottal pulse shape...

  13. Speech perception at positive signal-to-noise ratios using adaptive adjustment of time compression.

    Science.gov (United States)

    Schlueter, Anne; Brand, Thomas; Lemke, Ulrike; Nitzschner, Stefan; Kollmeier, Birger; Holube, Inga

    2015-11-01

    Positive signal-to-noise ratios (SNRs) characterize listening situations most relevant for hearing-impaired listeners in daily life and should therefore be considered when evaluating hearing aid algorithms. For this, a speech-in-noise test was developed and evaluated, in which the background noise is presented at fixed positive SNRs and the speech rate (i.e., the time compression of the speech material) is adaptively adjusted. In total, 29 younger and 12 older normal-hearing, as well as 24 older hearing-impaired listeners took part in repeated measurements. Younger normal-hearing and older hearing-impaired listeners conducted one of two adaptive methods which differed in adaptive procedure and step size. Analysis of the measurements with regard to list length and estimation strategy for thresholds resulted in a practical method measuring the time compression for 50% recognition. This method uses time-compression adjustment and step sizes according to Versfeld and Dreschler [(2002). J. Acoust. Soc. Am. 111, 401-408], with sentence scoring, lists of 30 sentences, and a maximum likelihood method for threshold estimation. Evaluation of the procedure showed that older participants obtained higher test-retest reliability compared to younger participants. Depending on the group of listeners, one or two lists are required for training prior to data collection.

  14. Speech Perception With Combined Electric-Acoustic Stimulation: A Simulation and Model Comparison.

    Science.gov (United States)

    Rader, Tobias; Adel, Youssef; Fastl, Hugo; Baumann, Uwe

    2015-01-01

    The aim of this study is to simulate speech perception with combined electric-acoustic stimulation (EAS), verify the advantage of combined stimulation in normal-hearing (NH) subjects, and then compare it with cochlear implant (CI) and EAS user results from the authors' previous study. Furthermore, an automatic speech recognition (ASR) system was built to examine the impact of low-frequency information and is proposed as an applied model to study different hypotheses of the combined-stimulation advantage. Signal-detection-theory (SDT) models were applied to assess predictions of subject performance without the need to assume any synergistic effects. Speech perception was tested using a closed-set matrix test (Oldenburg sentence test), and its speech material was processed to simulate CI and EAS hearing. A total of 43 NH subjects and a customized ASR system were tested. CI hearing was simulated by an aurally adequate signal spectrum analysis and representation, the part-tone-time-pattern, which was vocoded at 12 center frequencies according to the MED-EL DUET speech processor. Residual acoustic hearing was simulated by low-pass (LP)-filtered speech with cutoff frequencies 200 and 500 Hz for NH subjects and in the range from 100 to 500 Hz for the ASR system. Speech reception thresholds were determined in amplitude-modulated noise and in pseudocontinuous noise. Previously proposed SDT models were lastly applied to predict NH subject performance with EAS simulations. NH subjects tested with EAS simulations demonstrated the combined-stimulation advantage. Increasing the LP cutoff frequency from 200 to 500 Hz significantly improved speech reception thresholds in both noise conditions. In continuous noise, CI and EAS users showed generally better performance than NH subjects tested with simulations. In modulated noise, performance was comparable except for the EAS at cutoff frequency 500 Hz where NH subject performance was superior. The ASR system showed similar behavior

  15. Syntheses by rules of the speech signal in its amplitude-time representation - melody study - phonetic, translation program

    International Nuclear Information System (INIS)

    Santamarina, Carole

    1975-01-01

    The present paper deals with the real-time speech synthesis implemented on a minicomputer. A first program translates the orthographic text into a string of phonetic codes, which is then processed by the synthesis program itself. The method used, a synthesis by rules, directly computes the speech signal in its amplitude-time representation. Emphasis has been put on special cases (diphthongs, 'e muet', consonant-consonant transition) and the implementation of the rhythm and of the melody. (author) [fr

  16. Mapping from Speech to Images Using Continuous State Space Models

    DEFF Research Database (Denmark)

    Lehn-Schiøler, Tue; Hansen, Lars Kai; Larsen, Jan

    2005-01-01

    In this paper a system that transforms speech waveforms to animated faces are proposed. The system relies on continuous state space models to perform the mapping, this makes it possible to ensure video with no sudden jumps and allows continuous control of the parameters in 'face space...... a subjective point of view the model is able to construct an image sequence from an unknown noisy speech sequence even though the number of training examples are limited.......'. The performance of the system is critically dependent on the number of hidden variables, with too few variables the model cannot represent data, and with too many overfitting is noticed. Simulations are performed on recordings of 3-5 sec.\\$\\backslash\\$ video sequences with sentences from the Timit database. From...

  17. Specific acoustic models for spontaneous and dictated style in indonesian speech recognition

    Science.gov (United States)

    Vista, C. B.; Satriawan, C. H.; Lestari, D. P.; Widyantoro, D. H.

    2018-03-01

    The performance of an automatic speech recognition system is affected by differences in speech style between the data the model is originally trained upon and incoming speech to be recognized. In this paper, the usage of GMM-HMM acoustic models for specific speech styles is investigated. We develop two systems for the experiments; the first employs a speech style classifier to predict the speech style of incoming speech, either spontaneous or dictated, then decodes this speech using an acoustic model specifically trained for that speech style. The second system uses both acoustic models to recognise incoming speech and decides upon a final result by calculating a confidence score of decoding. Results show that training specific acoustic models for spontaneous and dictated speech styles confers a slight recognition advantage as compared to a baseline model trained on a mixture of spontaneous and dictated training data. In addition, the speech style classifier approach of the first system produced slightly more accurate results than the confidence scoring employed in the second system.

  18. Speech Compression

    Directory of Open Access Journals (Sweden)

    Jerry D. Gibson

    2016-06-01

    Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.

  19. Mathematical Modelling Plant Signalling Networks

    KAUST Repository

    Muraro, D.; Byrne, H.M.; King, J.R.; Bennett, M.J.

    2013-01-01

    methods for modelling gene and signalling networks and their application in plants. We then describe specific models of hormonal perception and cross-talk in plants. This mathematical analysis of sub-cellular molecular mechanisms paves the way for more

  20. The Combined Effect of Signal Strength and Background Traffic Load on Speech Quality in IEEE 802.11 WLAN

    Directory of Open Access Journals (Sweden)

    P. Pocta

    2011-04-01

    Full Text Available This paper deals with measurements of the combined effect of signal strength and background traffic load on speech quality in IEEE 802.11 WLAN. The ITU-T G.729AB encoding scheme is deployed in this study and the Distributed Internet Traffic Generator (D-ITG is used for the purpose of background traffic generation. The speech quality and background traffic load are assessed by means of the accomplished PESQ algorithm and Wireshark network analyzer, respectively. The results show that background traffic load has a bit higher impact on speech quality than signal strength when both effects are available together. Moreover, background traffic load also partially masks the impact of signal strength. The reasons for those findings are particularly discussed. The results also suggest some implications for designers of wireless networks providing VoIP service.

  1. Rational speech act models of pragmatic reasoning in reference games

    OpenAIRE

    Frank, Michael

    2016-01-01

    Human communication is almost always ambiguous, but it typically takes place in a context where this ambiguity can be resolved. A key part of this process of disambiguation comes from pragmatic reasoning about alternative messages that a speaker could have said in that context. Following previous work, we describe pragmatic inference as recursive reasoning – in which listeners reason about speakers and vice versa – using a “rational speech act” (RSA) model. We then systematically test the par...

  2. Study of wavelet packet energy entropy for emotion classification in speech and glottal signals

    Science.gov (United States)

    He, Ling; Lech, Margaret; Zhang, Jing; Ren, Xiaomei; Deng, Lihua

    2013-07-01

    The automatic speech emotion recognition has important applications in human-machine communication. Majority of current research in this area is focused on finding optimal feature parameters. In recent studies, several glottal features were examined as potential cues for emotion differentiation. In this study, a new type of feature parameter is proposed, which calculates energy entropy on values within selected Wavelet Packet frequency bands. The modeling and classification tasks are conducted using the classical GMM algorithm. The experiments use two data sets: the Speech Under Simulated Emotion (SUSE) data set annotated with three different emotions (angry, neutral and soft) and Berlin Emotional Speech (BES) database annotated with seven different emotions (angry, bored, disgust, fear, happy, sad and neutral). The average classification accuracy achieved for the SUSE data (74%-76%) is significantly higher than the accuracy achieved for the BES data (51%-54%). In both cases, the accuracy was significantly higher than the respective random guessing levels (33% for SUSE and 14.3% for BES).

  3. A multi-resolution envelope-power based model for speech intelligibility

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Ewert, Stephan D.; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model (sEPSM) presented by Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] estimates the envelope power signal-to-noise ratio (SNRenv) after modulation-frequency selective processing. Changes in this metric were shown to account well...... to conditions with stationary interferers, due to the long-term integration of the envelope power, and cannot account for increased intelligibility typically obtained with fluctuating maskers. Here, a multi-resolution version of the sEPSM is presented where the SNRenv is estimated in temporal segments...... with a modulation-filter dependent duration. The multi-resolution sEPSM is demonstrated to account for intelligibility obtained in conditions with stationary and fluctuating interferers, and noisy speech distorted by reverberation or spectral subtraction. The results support the hypothesis that the SNRenv...

  4. Acceptance noise level: effects of the speech signal, babble, and listener language.

    Science.gov (United States)

    Shi, Lu-Feng; Azcona, Gabrielly; Buten, Lupe

    2015-04-01

    The acceptable noise level (ANL) measure has gained much research/clinical interest in recent years. The present study examined how the characteristics of the speech signal and the babble used in the measure may affect the ANL in listeners with different native languages. Fifteen English monolingual, 16 Russian-English bilingual, and 24 Spanish-English bilingual listeners participated. The ANL was obtained in eight conditions varying in the language of the signal (English and Spanish), language of the babble (English and Spanish), and number of talkers in the babble (4 and 12). Test conditions were randomized across listeners. The ANL for each condition was based on a minimum of two trials. Russian-English bilinguals yielded higher ANLs than other listeners; the intergroup difference of 4-5 dB was statistically and clinically significant. Spanish signals yielded significantly higher ANLs than English signals, but this difference of 0.5 dB was clinically negligible. The language and composition of the babble had a significant effect on Russian-English bilinguals, who yielded higher ANLs with the Spanish than English 12-talker babble. The above findings do not fully support the notion that the ANL is language- and population-independent. Clinicians should be aware of possible effects on ANL measures due to listeners' linguistic/cultural background.

  5. Modelling vocal anatomy's significant effect on speech

    NARCIS (Netherlands)

    de Boer, B.

    2010-01-01

    This paper investigates the effect of larynx position on the articulatory abilities of a humanlike vocal tract. Previous work has investigated models that were built to resemble the anatomy of existing species or fossil ancestors. This has led to conflicting conclusions about the relation between

  6. Comparing models of the combined-stimulation advantage for speech recognition.

    Science.gov (United States)

    Micheyl, Christophe; Oxenham, Andrew J

    2012-05-01

    The "combined-stimulation advantage" refers to an improvement in speech recognition when cochlear-implant or vocoded stimulation is supplemented by low-frequency acoustic information. Previous studies have been interpreted as evidence for "super-additive" or "synergistic" effects in the combination of low-frequency and electric or vocoded speech information by human listeners. However, this conclusion was based on predictions of performance obtained using a suboptimal high-threshold model of information combination. The present study shows that a different model, based on Gaussian signal detection theory, can predict surprisingly large combined-stimulation advantages, even when performance with either information source alone is close to chance, without involving any synergistic interaction. A reanalysis of published data using this model reveals that previous results, which have been interpreted as evidence for super-additive effects in perception of combined speech stimuli, are actually consistent with a more parsimonious explanation, according to which the combined-stimulation advantage reflects an optimal combination of two independent sources of information. The present results do not rule out the possible existence of synergistic effects in combined stimulation; however, they emphasize the possibility that the combined-stimulation advantages observed in some studies can be explained simply by non-interactive combination of two information sources.

  7. A Block-Based Linear MMSE Noise Reduction with a High Temporal Resolution Modeling of the Speech Excitation

    DEFF Research Database (Denmark)

    Li, Chunjian; Andersen, S. V.

    2005-01-01

    A comprehensive linear minimum mean squared error (LMMSE) approach for parametric speech enhancement is developed. The proposed algorithms aim at joint LMMSE estimation of signal power spectra and phase spectra, as well as exploitation of correlation between spectral components. The major cause...... of this interfrequency correlation is shown to be the prominent temporal power localization in the excitation of voiced speech. LMMSE estimators in time domain and frequency domain are first formulated. To obtain the joint estimator, we model the spectral signal covariance matrix as a full covariancematrix instead...... of a diagonal covariance matrix as is the case in the Wiener filter derived under the quasi-stationarity assumption. To accomplish this, we decompose the signal covariance matrix into a synthesis filter matrix and an excitation matrix. The synthesis filter matrix is built from estimates of the all-pole model...

  8. The design and validation of a hybrid digital-signal-processing plug-in for traditional cochlear implant speech processors.

    Science.gov (United States)

    Hajiaghababa, Fatemeh; Marateb, Hamid R; Kermani, Saeed

    2018-06-01

    Cochlear implants (CIs) are electronic devices restoring partial hearing to deaf individuals with profound hearing loss. In this paper, a new plug-in for traditional IIR filter-banks (FBs) is presented for cochlear implants based on wavelet neural networks (WNNs). Having provided such a plug-in for commercially available CIs, it is possible not only to use available hardware in the market but also to optimize their performance compared with the-state-of-the-art. An online database of Dutch diphone perception was used in our study. The weights of the WNNs were tuned using particle swarm optimization (PSO) on a training set (speech-shaped noise (SSN) of 2 dB SNR), while its performance was assessed on a test set in terms of objective and composite measures in the hold-out validation framework. The cost function was defined based on the combination of mean square error (MSE), short‑time objective intelligibility (STOI) criteria on the training set. Variety of performance indices were used including segmental signal- to -noise ratio (SNRseg), MSE, STOI, log-likelihood ratio (LLR), weighted spectral slope (WSS), and composite measures C sig , C bak and C ovl . Meanwhile, the following CI speech processing techniques were used for comparison: traditional FBs, dual resonance nonlinear (DRNL) and simple dual path nonlinear (SPDN) models. The average SNRseg, MSE, and LLR values for the WNN in the entire data set were 2.496 ± 2.794, 0.086 ± 0.025 and 2.323 ± 0.281, respectively. The proposed method significantly improved MSE, SNR, SNRseg, LLR, C sig C bak and C ovl compared with the other three methods (repeated-measures analysis of variance (ANOVA); P < 0.05). The average running time of the proposed algorithm (written in Matlab R2013a) on the training and test sets for each consonant or vowel on an Intel dual-core 2.10 GHz CPU with 2GB of RAM was 9.91 ± 0.87 (s) and 0.19 ± 0.01 (s), respectively. The proposed algorithm is accurate and

  9. Evaluation of Short-Term Cepstral Based Features for Detection of Parkinson’s Disease Severity Levels through Speech signals

    Science.gov (United States)

    Oung, Qi Wei; Nisha Basah, Shafriza; Muthusamy, Hariharan; Vijean, Vikneswaran; Lee, Hoileong

    2018-03-01

    Parkinson’s disease (PD) is one type of progressive neurodegenerative disease known as motor system syndrome, which is due to the death of dopamine-generating cells, a region of the human midbrain. PD normally affects people over 60 years of age, which at present has influenced a huge part of worldwide population. Lately, many researches have shown interest into the connection between PD and speech disorders. Researches have revealed that speech signals may be a suitable biomarker for distinguishing between people with Parkinson’s (PWP) from healthy subjects. Therefore, early diagnosis of PD through the speech signals can be considered for this aim. In this research, the speech data are acquired based on speech behaviour as the biomarker for differentiating PD severity levels (mild and moderate) from healthy subjects. Feature extraction algorithms applied are Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), and Weighted Linear Prediction Cepstral Coefficients (WLPCC). For classification, two types of classifiers are used: k-Nearest Neighbour (KNN) and Probabilistic Neural Network (PNN). The experimental results demonstrated that PNN classifier and KNN classifier achieve the best average classification performance of 92.63% and 88.56% respectively through 10-fold cross-validation measures. Favourably, the suggested techniques have the possibilities of becoming a new choice of promising tools for the PD detection with tremendous performance.

  10. Models of calcium signalling

    CERN Document Server

    Dupont, Geneviève; Kirk, Vivien; Sneyd, James

    2016-01-01

    This book discusses the ways in which mathematical, computational, and modelling methods can be used to help understand the dynamics of intracellular calcium. The concentration of free intracellular calcium is vital for controlling a wide range of cellular processes, and is thus of great physiological importance. However, because of the complex ways in which the calcium concentration varies, it is also of great mathematical interest.This book presents the general modelling theory as well as a large number of specific case examples, to show how mathematical modelling can interact with experimental approaches, in an interdisciplinary and multifaceted approach to the study of an important physiological control mechanism. Geneviève Dupont is FNRS Research Director at the Unit of Theoretical Chronobiology of the Université Libre de Bruxelles;Martin Falcke is head of the Mathematical Cell Physiology group at the Max Delbrück Center for Molecular Medicine, Berlin;Vivien Kirk is an Associate Professor in the Depar...

  11. Speech Signal and Facial Image Processing for Obstructive Sleep Apnea Assessment

    Directory of Open Access Journals (Sweden)

    Fernando Espinoza-Cuadros

    2015-01-01

    Full Text Available Obstructive sleep apnea (OSA is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA. OSA is generally diagnosed through a costly procedure requiring an overnight stay of the patient at the hospital. This has led to proposing less costly procedures based on the analysis of patients’ facial images and voice recordings to help in OSA detection and severity assessment. In this paper we investigate the use of both image and speech processing to estimate the apnea-hypopnea index, AHI (which describes the severity of the condition, over a population of 285 male Spanish subjects suspected to suffer from OSA and referred to a Sleep Disorders Unit. Photographs and voice recordings were collected in a supervised but not highly controlled way trying to test a scenario close to an OSA assessment application running on a mobile device (i.e., smartphones or tablets. Spectral information in speech utterances is modeled by a state-of-the-art low-dimensional acoustic representation, called i-vector. A set of local craniofacial features related to OSA are extracted from images after detecting facial landmarks using Active Appearance Models (AAMs. Support vector regression (SVR is applied on facial features and i-vectors to estimate the AHI.

  12. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    Science.gov (United States)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  13. Hidden Hearing Loss and Computational Models of the Auditory Pathway: Predicting Speech Intelligibility Decline

    Science.gov (United States)

    2016-11-28

    Title: Hidden Hearing Loss and Computational Models of the Auditory Pathway: Predicting Speech Intelligibility Decline Christopher J. Smalt...representation of speech intelligibility in noise. The auditory-periphery model of Zilany et al. (JASA 2009,2014) is used to make predictions of...auditory nerve (AN) responses to speech stimuli under a variety of difficult listening conditions. The resulting cochlear neurogram, a spectrogram

  14. The Cortical Organization of Speech Processing: Feedback Control and Predictive Coding the Context of a Dual-Stream Model

    Science.gov (United States)

    Hickok, Gregory

    2012-01-01

    Speech recognition is an active process that involves some form of predictive coding. This statement is relatively uncontroversial. What is less clear is the source of the prediction. The dual-stream model of speech processing suggests that there are two possible sources of predictive coding in speech perception: the motor speech system and the…

  15. Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals.

    Science.gov (United States)

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature.

  16. A New Bigram-PLSA Language Model for Speech Recognition

    Directory of Open Access Journals (Sweden)

    Bahrani Mohammad

    2010-01-01

    Full Text Available A novel method for combining bigram model and Probabilistic Latent Semantic Analysis (PLSA is introduced for language modeling. The motivation behind this idea is the relaxation of the "bag of words" assumption fundamentally present in latent topic models including the PLSA model. An EM-based parameter estimation technique for the proposed model is presented in this paper. Previous attempts to incorporate word order in the PLSA model are surveyed and compared with our new proposed model both in theory and by experimental evaluation. Perplexity measure is employed to compare the effectiveness of recently introduced models with the new proposed model. Furthermore, experiments are designed and carried out on continuous speech recognition (CSR tasks using word error rate (WER as the evaluation criterion. The superiority of the new bigram-PLSA model over Nie et al.'s bigram-PLSA and simple PLSA models is demonstrated in the results of our experiments. Experiments on BLLIP WSJ corpus show about 12% reduction in perplexity and 2.8% WER improvement compared to Nie et al.'s bigram-PLSA model.

  17. Mathematical Modelling Plant Signalling Networks

    KAUST Repository

    Muraro, D.

    2013-01-01

    During the last two decades, molecular genetic studies and the completion of the sequencing of the Arabidopsis thaliana genome have increased knowledge of hormonal regulation in plants. These signal transduction pathways act in concert through gene regulatory and signalling networks whose main components have begun to be elucidated. Our understanding of the resulting cellular processes is hindered by the complex, and sometimes counter-intuitive, dynamics of the networks, which may be interconnected through feedback controls and cross-regulation. Mathematical modelling provides a valuable tool to investigate such dynamics and to perform in silico experiments that may not be easily carried out in a laboratory. In this article, we firstly review general methods for modelling gene and signalling networks and their application in plants. We then describe specific models of hormonal perception and cross-talk in plants. This mathematical analysis of sub-cellular molecular mechanisms paves the way for more comprehensive modelling studies of hormonal transport and signalling in a multi-scale setting. © EDP Sciences, 2013.

  18. Diabetes: Models, Signals and control

    Science.gov (United States)

    Cobelli, C.

    2010-07-01

    Diabetes and its complications impose significant economic consequences on individuals, families, health systems, and countries. The control of diabetes is an interdisciplinary endeavor, which includes significant components of modeling, signal processing and control. Models: first, I will discuss the minimal (coarse) models which describe the key components of the system functionality and are capable of measuring crucial processes of glucose metabolism and insulin control in health and diabetes; then, the maximal (fine-grain) models which include comprehensively all available knowledge about system functionality and are capable to simulate the glucose-insulin system in diabetes, thus making it possible to create simulation scenarios whereby cost effective experiments can be conducted in silico to assess the efficacy of various treatment strategies - in particular I will focus on the first in silico simulation model accepted by FDA as a substitute to animal trials in the quest for optimal diabetes control. Signals: I will review metabolic monitoring, with a particular emphasis on the new continuous glucose sensors, on the crucial role of models to enhance the interpretation of their time-series signals, and on the opportunities that they present for automation of diabetes control. Control: I will review control strategies that have been successfully employed in vivo or in silico, presenting a promise for the development of a future artificial pancreas and, in particular, I will discuss a modular architecture for building closed-loop control systems, including insulin delivery and patient safety supervision layers.

  19. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  20. Auditory-motor interactions in pediatric motor speech disorders: neurocomputational modeling of disordered development.

    Science.gov (United States)

    Terband, H; Maassen, B; Guenther, F H; Brumberg, J

    2014-01-01

    Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between neurological deficits in auditory and motor processes using computational modeling with the DIVA model. In a series of computer simulations, we investigated the effect of a motor processing deficit alone (MPD), and the effect of a motor processing deficit in combination with an auditory processing deficit (MPD+APD) on the trajectory and endpoint of speech motor development in the DIVA model. Simulation results showed that a motor programming deficit predominantly leads to deterioration on the phonological level (phonemic mappings) when auditory self-monitoring is intact, and on the systemic level (systemic mapping) if auditory self-monitoring is impaired. These findings suggest a close relation between quality of auditory self-monitoring and the involvement of phonological vs. motor processes in children with pediatric motor speech disorders. It is suggested that MPD+APD might be involved in typically apraxic speech output disorders and MPD in pediatric motor speech disorders that also have a phonological component. Possibilities to verify these hypotheses using empirical data collected from human subjects are discussed. The reader will be able to: (1) identify the difficulties in studying disordered speech motor development; (2) describe the differences in speech motor characteristics between SSD and subtype CAS; (3) describe the different types of learning that occur in the sensory-motor system during babbling and early speech acquisition; (4) identify the neural control subsystems involved in speech production; (5) describe the potential role of auditory self-monitoring in developmental speech disorders. Copyright © 2014 Elsevier Inc. All rights reserved.

  1. Predicting binaural speech intelligibility using the signal-to-noise ratio in the envelope power spectrum domain

    DEFF Research Database (Denmark)

    Chabot-Leclerc, Alexandre; MacDonald, Ewen; Dau, Torsten

    2016-01-01

    time difference of the target and masker. The Pearson correlation coefficient between the simulated speech reception thresholds and the data across all experiments was 0.91. A model version that considered only BE processing performed similarly (correlation coefficient of 0.86) to the complete model...

  2. Comparison of Linear Prediction Models for Audio Signals

    Directory of Open Access Journals (Sweden)

    2009-03-01

    Full Text Available While linear prediction (LP has become immensely popular in speech modeling, it does not seem to provide a good approach for modeling audio signals. This is somewhat surprising, since a tonal signal consisting of a number of sinusoids can be perfectly predicted based on an (all-pole LP model with a model order that is twice the number of sinusoids. We provide an explanation why this result cannot simply be extrapolated to LP of audio signals. If noise is taken into account in the tonal signal model, a low-order all-pole model appears to be only appropriate when the tonal components are uniformly distributed in the Nyquist interval. Based on this observation, different alternatives to the conventional LP model can be suggested. Either the model should be changed to a pole-zero, a high-order all-pole, or a pitch prediction model, or the conventional LP model should be preceded by an appropriate frequency transform, such as a frequency warping or downsampling. By comparing these alternative LP models to the conventional LP model in terms of frequency estimation accuracy, residual spectral flatness, and perceptual frequency resolution, we obtain several new and promising approaches to LP-based audio modeling.

  3. Computational Neural Modeling of Speech Motor Control in Childhood Apraxia of Speech (CAS)

    Science.gov (United States)

    Terband, Hayo; Maassen, Ben; Guenther, Frank H.; Brumberg, Jonathan

    2009-01-01

    Purpose: Childhood apraxia of speech (CAS) has been associated with a wide variety of diagnostic descriptions and has been shown to involve different symptoms during successive stages of development. In the present study, the authors attempted to associate the symptoms of CAS in a particular developmental stage with particular…

  4. Computational neural modeling of speech motor control in childhood apraxia of speech (CAS).

    NARCIS (Netherlands)

    Terband, H.R.; Maassen, B.A.M.; Guenther, F.H.; Brumberg, J.

    2009-01-01

    PURPOSE: Childhood apraxia of speech (CAS) has been associated with a wide variety of diagnostic descriptions and has been shown to involve different symptoms during successive stages of development. In the present study, the authors attempted to associate the symptoms of CAS in a particular

  5. Segment-based acoustic models for continuous speech recognition

    Science.gov (United States)

    Ostendorf, Mari; Rohlicek, J. R.

    1993-07-01

    This research aims to develop new and more accurate stochastic models for speaker-independent continuous speech recognition, by extending previous work in segment-based modeling and by introducing a new hierarchical approach to representing intra-utterance statistical dependencies. These techniques, which are more costly than traditional approaches because of the large search space associated with higher order models, are made feasible through rescoring a set of HMM-generated N-best sentence hypotheses. We expect these different modeling techniques to result in improved recognition performance over that achieved by current systems, which handle only frame-based observations and assume that these observations are independent given an underlying state sequence. In the fourth quarter of the project, we have completed the following: (1) ported our recognition system to the Wall Street Journal task, a standard task in the ARPA community; (2) developed an initial dependency-tree model of intra-utterance observation correlation; and (3) implemented baseline language model estimation software. Our initial results on the Wall Street Journal task are quite good and represent significantly improved performance over most HMM systems reporting on the Nov. 1992 5k vocabulary test set.

  6. Enhancement of Non-Stationary Speech using Harmonic Chirp Filters

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2015-01-01

    In this paper, the issue of single channel speech enhancement of non-stationary voiced speech is addressed. The non-stationarity of speech is well known, but state of the art speech enhancement methods assume stationarity within frames of 20–30 ms. We derive optimal distortionless filters that take...... the non-stationarity nature of voiced speech into account via linear constraints. This is facilitated by imposing a harmonic chirp model on the speech signal. As an implicit part of the filter design, the noise statistics are also estimated based on the observed signal and parameters of the harmonic chirp...... model. Simulations on real speech show that the chirp based filters perform better than their harmonic counterparts. Further, it is seen that the gain of using the chirp model increases when the estimated chirp parameter is big corresponding to periods in the signal where the instantaneous fundamental...

  7. Perceptual organization of speech signals by children with and without dyslexia

    Science.gov (United States)

    Nittrouer, Susan; Lowenstein, Joanna H.

    2013-01-01

    Developmental dyslexia is a condition in which children encounter difficulty learning to read in spite of adequate instruction. Although considerable effort has been expended trying to identify the source of the problem, no single solution has been agreed upon. The current study explored a new hypothesis, that developmental dyslexia may be due to faulty perceptual organization of linguistically relevant sensory input. To test that idea, sentence-length speech signals were processed to create either sine-wave or noise-vocoded analogs. Seventy children between 8 and 11 years of age, with and without dyslexia participated. Children with dyslexia were selected to have phonological awareness deficits, although those without such deficits were retained in the study. The processed sentences were presented for recognition, and measures of reading, phonological awareness, and expressive vocabulary were collected. Results showed that children with dyslexia, regardless of phonological subtype, had poorer recognition scores than children without dyslexia for both kinds of degraded sentences. Older children with dyslexia recognized the sine-wave sentences better than younger children with dyslexia, but no such effect of age was found for the vocoded materials. Recognition scores were used as predictor variables in regression analyses with reading, phonological awareness, and vocabulary measures used as dependent variables. Scores for both sorts of sentence materials were strong predictors of performance on all three dependent measures when all children were included, but only performance for the sine-wave materials explained significant proportions of variance when only children with dyslexia were included. Finally, matching young, typical readers with older children with dyslexia on reading abilities did not mitigate the group difference in recognition of vocoded sentences. Conclusions were that children with dyslexia have difficulty organizing linguistically relevant sensory

  8. Academic Freedom in Classroom Speech: A Heuristic Model for U.S. Catholic Higher Education

    Science.gov (United States)

    Jacobs, Richard M.

    2010-01-01

    As the nation's Catholic universities and colleges continually clarify their identity, this article examines academic freedom in classroom speech, offering a heuristic model for use as board members, academic administrators, and faculty leaders discuss, evaluate, and judge allegations of misconduct in classroom speech. Focusing upon the practice…

  9. Model-Based Synthesis of Visual Speech Movements from 3D Video

    Directory of Open Access Journals (Sweden)

    Edge JamesD

    2009-01-01

    Full Text Available We describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. Speech lip movements are captured using a 3D stereo face capture system and split up into phonetic units. A dynamic parameterisation of this data is constructed which maintains the relationship between lip shapes and velocities; within this parameterisation a model of how lips move is built and is used in the animation of visual speech movements from speech audio input. The mapping from audio parameters to lip movements is disambiguated by selecting only the most similar stored phonetic units to the target utterance during synthesis. By combining properties of model-based synthesis (e.g., HMMs, neural nets with unit selection we improve the quality of our speech synthesis.

  10. New Results on Single-Channel Speech Separation Using Sinusoidal Modeling

    DEFF Research Database (Denmark)

    Mowlaee, Pejman; Christensen, Mads Græsbøll; Jensen, Søren Holdt

    2011-01-01

    We present new results on single-channel speech separation and suggest a new separation approach to improve the speech quality of separated signals from an observed mix- ture. The key idea is to derive a mixture estimator based on sinusoidal parameters. The proposed estimator is aimed at finding...... mixture estimator used in binary masks and the Wiener filtering approach, it is observed that the proposed method achieves an acceptable perceptual speech quality with less cross- talk at different signal-to-signal ratios. Moreover, the method is independent of pitch estimates and reduces the computational...... complexity of the separation by replacing the short-time Fourier transform (STFT) feature vectors of high dimensionality with sinusoidal feature vectors. We report separation results for the proposed method and compare them with respect to other benchmark methods. The improvements made by applying...

  11. Using EEG and stimulus context to probe the modelling of auditory-visual speech.

    Science.gov (United States)

    Paris, Tim; Kim, Jeesun; Davis, Chris

    2016-02-01

    We investigated whether internal models of the relationship between lip movements and corresponding speech sounds [Auditory-Visual (AV) speech] could be updated via experience. AV associations were indexed by early and late event related potentials (ERPs) and by oscillatory power and phase locking. Different AV experience was produced via a context manipulation. Participants were presented with valid (the conventional pairing) and invalid AV speech items in either a 'reliable' context (80% AVvalid items) or an 'unreliable' context (80% AVinvalid items). The results showed that for the reliable context, there was N1 facilitation for AV compared to auditory only speech. This N1 facilitation was not affected by AV validity. Later ERPs showed a difference in amplitude between valid and invalid AV speech and there was significant enhancement of power for valid versus invalid AV speech. These response patterns did not change over the context manipulation, suggesting that the internal models of AV speech were not updated by experience. The results also showed that the facilitation of N1 responses did not vary as a function of the salience of visual speech (as previously reported); in post-hoc analyses, it appeared instead that N1 facilitation varied according to the relative time of the acoustic onset, suggesting for AV events N1 may be more sensitive to the relationship of AV timing than form. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.

  12. Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2016-01-01

    In this paper, single channel speech enhancement in the time domain is considered. We address the problem of modelling non-stationary speech by describing the voiced speech parts by a harmonic linear chirp model instead of using the traditional harmonic model. This means that the speech signal...... through simulations on synthetic and speech signals, that the chirp versions of the filters perform better than their harmonic counterparts in terms of output signal-to-noise ratio (SNR) and signal reduction factor. For synthetic signals, the output SNR for the harmonic chirp APES based filter...... is increased 3 dB compared to the harmonic APES based filter at an input SNR of 10 dB, and at the same time the signal reduction factor is decreased. For speech signals, the increase is 1.5 dB along with a decrease in the signal reduction factor of 0.7. As an implicit part of the APES filter, a noise...

  13. Speech-like orofacial oscillations in stump-tailed macaque (Macaca arctoides) facial and vocal signals.

    Science.gov (United States)

    Toyoda, Aru; Maruhashi, Tamaki; Malaivijitnond, Suchinda; Koda, Hiroki

    2017-10-01

    Speech is unique to humans and characterized by facial actions of ∼5 Hz oscillations of lip, mouth or jaw movements. Lip-smacking, a facial display of primates characterized by oscillatory actions involving the vertical opening and closing of the jaw and lips, exhibits stable 5-Hz oscillation patterns, matching that of speech, suggesting that lip-smacking is a precursor of speech. We tested if facial or vocal actions exhibiting the same rate of oscillation are found in wide forms of facial or vocal displays in various social contexts, exhibiting diversity among species. We observed facial and vocal actions of wild stump-tailed macaques (Macaca arctoides), and selected video clips including facial displays (teeth chattering; TC), panting calls, and feeding. Ten open-to-open mouth durations during TC and feeding and five amplitude peak-to-peak durations in panting were analyzed. Facial display (TC) and vocalization (panting) oscillated within 5.74 ± 1.19 and 6.71 ± 2.91 Hz, respectively, similar to the reported lip-smacking of long-tailed macaques and the speech of humans. These results indicated a common mechanism for the central pattern generator underlying orofacial movements, which would evolve to speech. Similar oscillations in panting, which evolved from different muscular control than the orofacial action, suggested the sensory foundations for perceptual saliency particular to 5-Hz rhythms in macaques. This supports the pre-adaptation hypothesis of speech evolution, which states a central pattern generator for 5-Hz facial oscillation and perceptual background tuned to 5-Hz actions existed in common ancestors of macaques and humans, before the emergence of speech. © 2017 Wiley Periodicals, Inc.

  14. Modeling Driving Performance Using In-Vehicle Speech Data From a Naturalistic Driving Study.

    Science.gov (United States)

    Kuo, Jonny; Charlton, Judith L; Koppel, Sjaan; Rudin-Brown, Christina M; Cross, Suzanne

    2016-09-01

    We aimed to (a) describe the development and application of an automated approach for processing in-vehicle speech data from a naturalistic driving study (NDS), (b) examine the influence of child passenger presence on driving performance, and (c) model this relationship using in-vehicle speech data. Parent drivers frequently engage in child-related secondary behaviors, but the impact on driving performance is unknown. Applying automated speech-processing techniques to NDS audio data would facilitate the analysis of in-vehicle driver-child interactions and their influence on driving performance. Speech activity detection and speaker diarization algorithms were applied to audio data from a Melbourne-based NDS involving 42 families. Multilevel models were developed to evaluate the effect of speech activity and the presence of child passengers on driving performance. Speech activity was significantly associated with velocity and steering angle variability. Child passenger presence alone was not associated with changes in driving performance. However, speech activity in the presence of two child passengers was associated with the most variability in driving performance. The effects of in-vehicle speech on driving performance in the presence of child passengers appear to be heterogeneous, and multiple factors may need to be considered in evaluating their impact. This goal can potentially be achieved within large-scale NDS through the automated processing of observational data, including speech. Speech-processing algorithms enable new perspectives on driving performance to be gained from existing NDS data, and variables that were once labor-intensive to process can be readily utilized in future research. © 2016, Human Factors and Ergonomics Society.

  15. Blind estimation of the number of speech source in reverberant multisource scenarios based on binaural signals

    DEFF Research Database (Denmark)

    May, Tobias; van de Par, Steven

    2012-01-01

    In this paper we present a new approach for estimating the number of active speech sources in the presence of interfering noise sources and reverberation. First, a binaural front-end is used to detect the spatial positions of all active sound sources, resulting in a binary mask for each candidate...... on a support vector machine (SVM) classifier. A systematic analysis shows that the proposed algorithm is able to blindly determine the number and the corresponding spatial positions of speech sources in multisource scenarios and generalizes well to unknown acoustic conditions...

  16. Auditory-Motor Interactions in Pediatric Motor Speech Disorders: Neurocomputational Modeling of Disordered Development

    Science.gov (United States)

    Terband, H.; Maassen, B.; Guenther, F.H.; Brumberg, J.

    2014-01-01

    Background/Purpose Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between neurological deficits in auditory and motor processes using computational modeling with the DIVA model. Method In a series of computer simulations, we investigated the effect of a motor processing deficit alone (MPD), and the effect of a motor processing deficit in combination with an auditory processing deficit (MPD+APD) on the trajectory and endpoint of speech motor development in the DIVA model. Results Simulation results showed that a motor programming deficit predominantly leads to deterioration on the phonological level (phonemic mappings) when auditory self-monitoring is intact, and on the systemic level (systemic mapping) if auditory self-monitoring is impaired. Conclusions These findings suggest a close relation between quality of auditory self-monitoring and the involvement of phonological vs. motor processes in children with pediatric motor speech disorders. It is suggested that MPD+APD might be involved in typically apraxic speech output disorders and MPD in pediatric motor speech disorders that also have a phonological component. Possibilities to verify these hypotheses using empirical data collected from human subjects are discussed. PMID:24491630

  17. Contribution to automatic speech recognition. Analysis of the direct acoustical signal. Recognition of isolated words and phoneme identification

    International Nuclear Information System (INIS)

    Dupeyrat, Benoit

    1981-01-01

    This report deals with the acoustical-phonetic step of the automatic recognition of the speech. The parameters used are the extrema of the acoustical signal (coded in amplitude and duration). This coding method, the properties of which are described, is simple and well adapted to a digital processing. The quality and the intelligibility of the coded signal after reconstruction are particularly satisfactory. An experiment for the automatic recognition of isolated words has been carried using this coding system. We have designed a filtering algorithm operating on the parameters of the coding. Thus the characteristics of the formants can be derived under certain conditions which are discussed. Using these characteristics the identification of a large part of the phonemes for a given speaker was achieved. Carrying on the studies has required the development of a particular methodology of real time processing which allowed immediate evaluation of the improvement of the programs. Such processing on temporal coding of the acoustical signal is extremely powerful and could represent, used in connection with other methods an efficient tool for the automatic processing of the speech.(author) [fr

  18. Speech enhancement

    CERN Document Server

    Benesty, Jacob; Chen, Jingdong

    2006-01-01

    We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red

  19. Integrating Information from Speech and Physiological Signals to Achieve Emotional Sensitivity

    DEFF Research Database (Denmark)

    Kim, Jonghwa; André, Elisabeth; Rehm, Matthias

    2005-01-01

    Recently, there has been a significant amount of work on the recognition of emotions from speech and biosignals. Most approaches to emotion recognition so far concentrate on a single modality and do not take advantage of the fact that an integrated multimodal analysis may help to resolve...

  20. Stable 1-Norm Error Minimization Based Linear Predictors for Speech Modeling

    DEFF Research Database (Denmark)

    Giacobello, Daniele; Christensen, Mads Græsbøll; Jensen, Tobias Lindstrøm

    2014-01-01

    In linear prediction of speech, the 1-norm error minimization criterion has been shown to provide a valid alternative to the 2-norm minimization criterion. However, unlike 2-norm minimization, 1-norm minimization does not guarantee the stability of the corresponding all-pole filter and can generate...... saturations when this is used to synthesize speech. In this paper, we introduce two new methods to obtain intrinsically stable predictors with the 1-norm minimization. The first method is based on constraining the roots of the predictor to lie within the unit circle by reducing the numerical range...... based linear prediction for modeling and coding of speech....

  1. The DIVA model: A neural theory of speech acquisition and production.

    Science.gov (United States)

    Tourville, Jason A; Guenther, Frank H

    2011-01-01

    The DIVA model of speech production provides a computationally and neuroanatomically explicit account of the network of brain regions involved in speech acquisition and production. An overview of the model is provided along with descriptions of the computations performed in the different brain regions represented in the model. The latest version of the model, which contains a new right-lateralized feedback control map in ventral premotor cortex, will be described, and experimental results that motivated this new model component will be discussed. Application of the model to the study and treatment of communication disorders will also be briefly described.

  2. Wavelet Packet Entropy in Speaker-Independent Emotional State Detection from Speech Signal

    OpenAIRE

    Mina Kadkhodaei Elyaderani; Seyed Hamid Mahmoodian; Ghazaal Sheikhi

    2015-01-01

    In this paper, wavelet packet entropy is proposed for speaker-independent emotion detection from speech. After pre-processing, wavelet packet decomposition using wavelet type db3 at level 4 is calculated and Shannon entropy in its nodes is calculated to be used as feature. In addition, prosodic features such as first four formants, jitter or pitch deviation amplitude, and shimmer or energy variation amplitude besides MFCC features are applied to complete the feature vector. Then, Support Vect...

  3. Model-based prosodic analysis of charismatic speech

    DEFF Research Database (Denmark)

    Mixdorff, Hansjörg; Niebuhr, Oliver; Hönemann, Angelika

    2018-01-01

    This study examines at a new level of quantitative detail the intonation and timing properties of charismatic speech by comparing two popular CEOs, Steve Jobs and Mark Zuckerberg, who are known from informal observations and formal perception experiments alike to be more or less charismatic...

  4. Garbage Modeling for On-device Speech Recognition

    NARCIS (Netherlands)

    Van Gysel, C.; Velikovich, L.; McGraw, I.; Beaufays, F.

    2015-01-01

    User interactions with mobile devices increasingly depend on voice as a primary input modality. Due to the disadvantages of sending audio across potentially spotty network connections for speech recognition, in recent years there has been growing attention to performing recognition on-device. The

  5. The Sociolinguistic Model in Speech and Language Pathology.

    Science.gov (United States)

    Wolfram, Walt

    A discussion of the role of sociolinguistics in the treatment of communication disorders focuses on issues related to dialect and language variation. It begins with an examination of linguistic diversity and dynamic description of language, reporting on a study of speech and language pathologists' judgments of sentences in African American…

  6. Predicting automatic speech recognition performance over communication channels from instrumental speech quality and intelligibility scores

    NARCIS (Netherlands)

    Gallardo, L.F.; Möller, S.; Beerends, J.

    2017-01-01

    The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility

  7. Optimal speech motor control and token-to-token variability: a Bayesian modeling approach.

    Science.gov (United States)

    Patri, Jean-François; Diard, Julien; Perrier, Pascal

    2015-12-01

    The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.

  8. Effect of Simultaneous Bilingualism on Speech Intelligibility across Different Masker Types, Modalities, and Signal-to-Noise Ratios in School-Age Children.

    Science.gov (United States)

    Reetzke, Rachel; Lam, Boji Pak-Wing; Xie, Zilong; Sheng, Li; Chandrasekaran, Bharath

    2016-01-01

    Recognizing speech in adverse listening conditions is a significant cognitive, perceptual, and linguistic challenge, especially for children. Prior studies have yielded mixed results on the impact of bilingualism on speech perception in noise. Methodological variations across studies make it difficult to converge on a conclusion regarding the effect of bilingualism on speech-in-noise performance. Moreover, there is a dearth of speech-in-noise evidence for bilingual children who learn two languages simultaneously. The aim of the present study was to examine the extent to which various adverse listening conditions modulate differences in speech-in-noise performance between monolingual and simultaneous bilingual children. To that end, sentence recognition was assessed in twenty-four school-aged children (12 monolinguals; 12 simultaneous bilinguals, age of English acquisition ≤ 3 yrs.). We implemented a comprehensive speech-in-noise battery to examine recognition of English sentences across different modalities (audio-only, audiovisual), masker types (steady-state pink noise, two-talker babble), and a range of signal-to-noise ratios (SNRs; 0 to -16 dB). Results revealed no difference in performance between monolingual and simultaneous bilingual children across each combination of modality, masker, and SNR. Our findings suggest that when English age of acquisition and socioeconomic status is similar between groups, monolingual and bilingual children exhibit comparable speech-in-noise performance across a range of conditions analogous to everyday listening environments.

  9. Bridging computational approaches to speech production: The semantic–lexical–auditory–motor model (SLAM)

    Science.gov (United States)

    Hickok, Gregory

    2017-01-01

    Speech production is studied from both psycholinguistic and motor-control perspectives, with little interaction between the approaches. We assessed the explanatory value of integrating psycholinguistic and motor-control concepts for theories of speech production. By augmenting a popular psycholinguistic model of lexical retrieval with a motor-control-inspired architecture, we created a new computational model to explain speech errors in the context of aphasia. Comparing the model fits to picture-naming data from 255 aphasic patients, we found that our new model improves fits for a theoretically predictable subtype of aphasia: conduction. We discovered that the improved fits for this group were a result of strong auditory-lexical feedback activation, combined with weaker auditory-motor feedforward activation, leading to increased competition from phonologically related neighbors during lexical selection. We discuss the implications of our findings with respect to other extant models of lexical retrieval. PMID:26223468

  10. Wavelet Packet Entropy in Speaker-Independent Emotional State Detection from Speech Signal

    Directory of Open Access Journals (Sweden)

    Mina Kadkhodaei Elyaderani

    2015-01-01

    Full Text Available In this paper, wavelet packet entropy is proposed for speaker-independent emotion detection from speech. After pre-processing, wavelet packet decomposition using wavelet type db3 at level 4 is calculated and Shannon entropy in its nodes is calculated to be used as feature. In addition, prosodic features such as first four formants, jitter or pitch deviation amplitude, and shimmer or energy variation amplitude besides MFCC features are applied to complete the feature vector. Then, Support Vector Machine (SVM is used to classify the vectors in multi-class (all emotions or two-class (each emotion versus normal state format. 46 different utterances of a single sentence from Berlin Emotional Speech Dataset are selected. These are uttered by 10 speakers in sadness, happiness, fear, boredom, anger, and normal emotional state. Experimental results show that proposed features can improve emotional state detection accuracy in multi-class situation. Furthermore, adding to other features wavelet entropy coefficients increase the accuracy of two-class detection for anger, fear, and happiness.

  11. Toward A Dual-Learning Systems Model of Speech Category Learning

    Directory of Open Access Journals (Sweden)

    Bharath eChandrasekaran

    2014-07-01

    Full Text Available More than two decades of work in vision posits the existence of dual-learning systems of category learning. The reflective system uses working memory to develop and test rules for classifying in an explicit fashion, while the reflexive system operates by implicitly associating perception with actions that lead to reinforcement. Dual-learning systems models hypothesize that in learning natural categories, learners initially use the reflective system and, with practice, transfer control to the reflexive system. The role of reflective and reflexive systems in auditory category learning and more specifically in speech category learning has not been systematically examined. In this article we describe a neurobiologically-constrained dual-learning systems theoretical framework that is currently being developed in speech category learning and review recent applications of this framework. Using behavioral and computational modeling approaches, we provide evidence that speech category learning is predominantly mediated by the reflexive learning system. In one application, we explore the effects of normal aging on non-speech and speech category learning. We find an age related deficit in reflective-optimal but not reflexive-optimal auditory category learning. Prominently, we find a large age-related deficit in speech learning. The computational modeling suggests that older adults are less likely to transition from simple, reflective, uni-dimensional rules to more complex, reflexive, multi-dimensional rules. In a second application we summarize a recent study examining auditory category learning in individuals with elevated depressive symptoms. We find a deficit in reflective-optimal and an enhancement in reflexive-optimal auditory category learning. Interestingly, individuals with elevated depressive symptoms also show an advantage in learning speech categories. We end with a brief summary and description of a number of future directions.

  12. Modeling speech imitation and ecological learning of auditory-motor maps

    Directory of Open Access Journals (Sweden)

    Claudia eCanevari

    2013-06-01

    Full Text Available Classical models of speech consider an antero-posterior distinction between perceptive and productive functions. However, the selective alteration of neural activity in speech motor centers, via transcranial magnetic stimulation, was shown to affect speech discrimination. On the automatic speech recognition (ASR side, the recognition systems have classically relied solely on acoustic data, achieving rather good performance in optimal listening conditions. The main limitations of current ASR are mainly evident in the realistic use of such systems. These limitations can be partly reduced by using normalization strategies that minimize inter-speaker variability by either explicitly removing speakers’ peculiarities or adapting different speakers to a reference model. In this paper we aim at modeling a motor-based imitation learning mechanism in ASR. We tested the utility of a speaker normalization strategy that uses motor representations of speech and compare it with strategies that ignore the motor domain. Specifically, we first trained a regressor through state-of-the-art machine learning techniques to build an auditory-motor mapping, in a sense mimicking a human learner that tries to reproduce utterances produced by other speakers. This auditory-motor mapping maps the speech acoustics of a speaker into the motor plans of a reference speaker. Since, during recognition, only speech acoustics are available, the mapping is necessary to recover motor information. Subsequently, in a phone classification task, we tested the system on either one of the speakers that was used during training or a new one. Results show that in both cases the motor-based speaker normalization strategy almost always outperforms all other strategies where only acoustics is taken into account.

  13. Fundamental Frequency and Direction-of-Arrival Estimation for Multichannel Speech Enhancement

    DEFF Research Database (Denmark)

    Karimian-Azari, Sam

    Audio systems receive the speech signals of interest usually in the presence of noise. The noise has profound impacts on the quality and intelligibility of the speech signals, and it is therefore clear that the noisy signals must be cleaned up before being played back, stored, or analyzed. We can...... estimate the speech signal of interest from the noisy signals using a priori knowledge about it. A human speech signal is broadband and consists of both voiced and unvoiced parts. The voiced part is quasi-periodic with a time-varying fundamental frequency (or pitch as it is commonly referred to). We...... their time differences which eventually may further reduce the effects of noise. This thesis introduces a number of principles and methods to estimate periodic signals in noisy environments with application to multichannel speech enhancement. We propose model-based signal enhancement concerning the model...

  14. Modeling High-Dimensional Multichannel Brain Signals

    KAUST Repository

    Hu, Lechuan; Fortin, Norbert J.; Ombao, Hernando

    2017-01-01

    aspects: first, there are major statistical and computational challenges for modeling and analyzing high-dimensional multichannel brain signals; second, there is no set of universally agreed measures for characterizing connectivity. To model multichannel

  15. Encoding of phonology in a recurrent neural model of grounded speech

    NARCIS (Netherlands)

    Alishahi, Afra; Barking, Marie; Chrupala, Grzegorz; Levy, Roger; Specia, Lucia

    2017-01-01

    We study the representation and encoding of phonemes in a recurrent neural network model of grounded speech. We use a model which processes images and their spoken descriptions, and projects the visual and auditory representations into the same semantic space. We perform a number of analyses on how

  16. Prediction of speech masking release for fluctuating interferers based on the envelope power signal-to-noise ratio

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2012-01-01

    -hearing listeners in conditions with additive stationary noise, reverberation, and nonlinear processing with spectral subtraction. The latter condition represents a case in which the standardized speech intelligibility index and speech transmission index fail. However, the sEPSM is limited to conditions...... for the stationary and non-stationary interferers, demonstrating further that the envelope SNR is crucial for speech comprehension....

  17. Language modeling for automatic speech recognition of inflective languages an applications-oriented approach using lexical data

    CERN Document Server

    Donaj, Gregor

    2017-01-01

    This book covers language modeling and automatic speech recognition for inflective languages (e.g. Slavic languages), which represent roughly half of the languages spoken in Europe. These languages do not perform as well as English in speech recognition systems and it is therefore harder to develop an application with sufficient quality for the end user. The authors describe the most important language features for the development of a speech recognition system. This is then presented through the analysis of errors in the system and the development of language models and their inclusion in speech recognition systems, which specifically address the errors that are relevant for targeted applications. The error analysis is done with regard to morphological characteristics of the word in the recognized sentences. The book is oriented towards speech recognition with large vocabularies and continuous and even spontaneous speech. Today such applications work with a rather small number of languages compared to the nu...

  18. A Novel DBN Feature Fusion Model for Cross-Corpus Speech Emotion Recognition

    Directory of Open Access Journals (Sweden)

    Zou Cairong

    2016-01-01

    Full Text Available The feature fusion from separate source is the current technical difficulties of cross-corpus speech emotion recognition. The purpose of this paper is to, based on Deep Belief Nets (DBN in Deep Learning, use the emotional information hiding in speech spectrum diagram (spectrogram as image features and then implement feature fusion with the traditional emotion features. First, based on the spectrogram analysis by STB/Itti model, the new spectrogram features are extracted from the color, the brightness, and the orientation, respectively; then using two alternative DBN models they fuse the traditional and the spectrogram features, which increase the scale of the feature subset and the characterization ability of emotion. Through the experiment on ABC database and Chinese corpora, the new feature subset compared with traditional speech emotion features, the recognition result on cross-corpus, distinctly advances by 8.8%. The method proposed provides a new idea for feature fusion of emotion recognition.

  19. A predictive model for diagnosing stroke-related apraxia of speech.

    Science.gov (United States)

    Ballard, Kirrie J; Azizi, Lamiae; Duffy, Joseph R; McNeil, Malcolm R; Halaki, Mark; O'Dwyer, Nicholas; Layfield, Claire; Scholl, Dominique I; Vogel, Adam P; Robin, Donald A

    2016-01-29

    Diagnosis of the speech motor planning/programming disorder, apraxia of speech (AOS), has proven challenging, largely due to its common co-occurrence with the language-based impairment of aphasia. Currently, diagnosis is based on perceptually identifying and rating the severity of several speech features. It is not known whether all, or a subset of the features, are required for a positive diagnosis. The purpose of this study was to assess predictor variables for the presence of AOS after left-hemisphere stroke, with the goal of increasing diagnostic objectivity and efficiency. This population-based case-control study involved a sample of 72 cases, using the outcome measure of expert judgment on presence of AOS and including a large number of independently collected candidate predictors representing behavioral measures of linguistic, cognitive, nonspeech oral motor, and speech motor ability. We constructed a predictive model using multiple imputation to deal with missing data; the Least Absolute Shrinkage and Selection Operator (Lasso) technique for variable selection to define the most relevant predictors, and bootstrapping to check the model stability and quantify the optimism of the developed model. Two measures were sufficient to distinguish between participants with AOS plus aphasia and those with aphasia alone, (1) a measure of speech errors with words of increasing length and (2) a measure of relative vowel duration in three-syllable words with weak-strong stress pattern (e.g., banana, potato). The model has high discriminative ability to distinguish between cases with and without AOS (c-index=0.93) and good agreement between observed and predicted probabilities (calibration slope=0.94). Some caution is warranted, given the relatively small sample specific to left-hemisphere stroke, and the limitations of imputing missing data. These two speech measures are straightforward to collect and analyse, facilitating use in research and clinical settings. Copyright

  20. Benefits to Speech Perception in Noise From the Binaural Integration of Electric and Acoustic Signals in Simulated Unilateral Deafness.

    Science.gov (United States)

    Ma, Ning; Morris, Saffron; Kitterick, Pádraig Thomas

    2016-01-01

    This study used vocoder simulations with normal-hearing (NH) listeners to (1) measure their ability to integrate speech information from an NH ear and a simulated cochlear implant (CI), and (2) investigate whether binaural integration is disrupted by a mismatch in the delivery of spectral information between the ears arising from a misalignment in the mapping of frequency to place. Eight NH volunteers participated in the study and listened to sentences embedded in background noise via headphones. Stimuli presented to the left ear were unprocessed. Stimuli presented to the right ear (referred to as the CI-simulation ear) were processed using an eight-channel noise vocoder with one of the three processing strategies. An Ideal strategy simulated a frequency-to-place map across all channels that matched the delivery of spectral information between the ears. A Realistic strategy created a misalignment in the mapping of frequency to place in the CI-simulation ear where the size of the mismatch between the ears varied across channels. Finally, a Shifted strategy imposed a similar degree of misalignment in all channels, resulting in consistent mismatch between the ears across frequency. The ability to report key words in sentences was assessed under monaural and binaural listening conditions and at signal to noise ratios (SNRs) established by estimating speech-reception thresholds in each ear alone. The SNRs ensured that the monaural performance of the left ear never exceeded that of the CI-simulation ear. The advantages of binaural integration were calculated by comparing binaural performance with monaural performance using the CI-simulation ear alone. Thus, these advantages reflected the additional use of the experimentally constrained left ear and were not attributable to better-ear listening. Binaural performance was as accurate as, or more accurate than, monaural performance with the CI-simulation ear alone. When both ears supported a similar level of monaural

  1. A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition

    Science.gov (United States)

    Oh, Yoo Rhee; Kim, Hong Kook

    In this paper, we propose a hybrid model adaptation approach in which pronunciation and acoustic models are adapted by incorporating the pronunciation and acoustic variabilities of non-native speech in order to improve the performance of non-native automatic speech recognition (ASR). Specifically, the proposed hybrid model adaptation can be performed at either the state-tying or triphone-modeling level, depending at which acoustic model adaptation is performed. In both methods, we first analyze the pronunciation variant rules of non-native speakers and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level hybrid method then adapts pronunciation models and acoustic models by accommodating the pronunciation variants in the pronunciation dictionary and by clustering the states of triphone acoustic models using the acoustic variants, respectively. On the other hand, the triphone-modeling level hybrid method initially adapts pronunciation models in the same way as in the state-tying level hybrid method; however, for the acoustic model adaptation, the triphone acoustic models are then re-estimated based on the adapted pronunciation models and the states of the re-estimated triphone acoustic models are clustered using the acoustic variants. From the Korean-spoken English speech recognition experiments, it is shown that ASR systems employing the state-tying and triphone-modeling level adaptation methods can relatively reduce the average word error rates (WERs) by 17.1% and 22.1% for non-native speech, respectively, when compared to a baseline ASR system.

  2. Discrete dynamic modeling of cellular signaling networks.

    Science.gov (United States)

    Albert, Réka; Wang, Rui-Sheng

    2009-01-01

    Understanding signal transduction in cellular systems is a central issue in systems biology. Numerous experiments from different laboratories generate an abundance of individual components and causal interactions mediating environmental and developmental signals. However, for many signal transduction systems there is insufficient information on the overall structure and the molecular mechanisms involved in the signaling network. Moreover, lack of kinetic and temporal information makes it difficult to construct quantitative models of signal transduction pathways. Discrete dynamic modeling, combined with network analysis, provides an effective way to integrate fragmentary knowledge of regulatory interactions into a predictive mathematical model which is able to describe the time evolution of the system without the requirement for kinetic parameters. This chapter introduces the fundamental concepts of discrete dynamic modeling, particularly focusing on Boolean dynamic models. We describe this method step-by-step in the context of cellular signaling networks. Several variants of Boolean dynamic models including threshold Boolean networks and piecewise linear systems are also covered, followed by two examples of successful application of discrete dynamic modeling in cell biology.

  3. A Neuro-Linguistic Model for Speech Recognition in Tone Language

    African Journals Online (AJOL)

    The primary aim for this work is to develop a speech recognition system that exploits the computational paradigm with learning ability and the inherent robustness and parallelism in ANN coupled with the capability of fuzzy logic to model vagueness, handling uncertainness and support for human reasoning. This research ...

  4. A discourse model of affect for text-to-speech synthesis

    CSIR Research Space (South Africa)

    Schlunz, GI

    2013-12-01

    Full Text Available This paper introduces a model of affect to improve prosody in text-to-speech synthesis. It operates on the discourse level of text to predict the underlying linguistic factors that contribute towards emotional appraisal, rather than any particular...

  5. A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech.

    Directory of Open Access Journals (Sweden)

    John F Magnotti

    2017-02-01

    Full Text Available Audiovisual speech integration combines information from auditory speech (talker's voice and visual speech (talker's mouth movements to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory "ba" + visual "ga" (AbaVga, that are integrated to produce a fused percept ("da". This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba. We describe a simplified model of causal inference in multisensory speech perception (CIMS that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others.

  6. Regression models of reactor diagnostic signals

    International Nuclear Information System (INIS)

    Vavrin, J.

    1989-01-01

    The application is described of an autoregression model as the simplest regression model of diagnostic signals in experimental analysis of diagnostic systems, in in-service monitoring of normal and anomalous conditions and their diagnostics. The method of diagnostics is described using a regression type diagnostic data base and regression spectral diagnostics. The diagnostics is described of neutron noise signals from anomalous modes in the experimental fuel assembly of a reactor. (author)

  7. MASCOTTE: analytical model of eddy current signals

    International Nuclear Information System (INIS)

    Delsarte, G.; Levy, R.

    1992-01-01

    Tube examination is a major application of the eddy current technique in the nuclear and petrochemical industries. Such examination configurations being specially adapted to analytical modes, a physical model is developed on portable computers. It includes simple approximations made possible by the effective conditions of the examinations. The eddy current signal is described by an analytical formulation that takes into account the tube dimensions, the sensor conception, the physical characteristics of the defect and the examination parameters. Moreover, the model makes it possible to associate real signals and simulated signals

  8. Improving Language Models in Speech-Based Human-Machine Interaction

    Directory of Open Access Journals (Sweden)

    Raquel Justo

    2013-02-01

    Full Text Available This work focuses on speech-based human-machine interaction. Specifically, a Spoken Dialogue System (SDS that could be integrated into a robot is considered. Since Automatic Speech Recognition is one of the most sensitive tasks that must be confronted in such systems, the goal of this work is to improve the results obtained by this specific module. In order to do so, a hierarchical Language Model (LM is considered. Different series of experiments were carried out using the proposed models over different corpora and tasks. The results obtained show that these models provide greater accuracy in the recognition task. Additionally, the influence of the Acoustic Modelling (AM in the improvement percentage of the Language Models has also been explored. Finally the use of hierarchical Language Models in a language understanding task has been successfully employed, as shown in an additional series of experiments.

  9. Clear Speech - Mere Speech? How segmental and prosodic speech reduction shape the impression that speakers create on listeners

    DEFF Research Database (Denmark)

    Niebuhr, Oliver

    2017-01-01

    of reduction levels and perceived speaker attributes in which moderate reduction can make a better impression on listeners than no reduction. In addition to its relevance in reduction models and theories, this interplay is instructive for various fields of speech application from social robotics to charisma...... whether variation in the degree of reduction also has a systematic effect on the attributes we ascribe to the speaker who produces the speech signal. A perception experiment was carried out for German in which 46 listeners judged whether or not speakers showing 3 different combinations of segmental...... and prosodic reduction levels (unreduced, moderately reduced, strongly reduced) are appropriately described by 13 physical, social, and cognitive attributes. The experiment shows that clear speech is not mere speech, and less clear speech is not just reduced either. Rather, results revealed a complex interplay...

  10. Treatment Model in Children with Speech Disorders and Its Therapeutic Efficiency

    Directory of Open Access Journals (Sweden)

    Barberena, Luciana

    2014-05-01

    Full Text Available Introduction Speech articulation disorders affect the intelligibility of speech. Studies on therapeutic models show the effectiveness of the communication treatment. Objective To analyze the progress achieved by treatment with the ABAB—Withdrawal and Multiple Probes Model in children with different degrees of phonological disorders. Methods The diagnosis of speech articulation disorder was determined by speech and hearing evaluation and complementary tests. The subjects of this research were eight children, with the average age of 5:5. The children were distributed into four groups according to the degrees of the phonological disorders, based on the percentage of correct consonants, as follows: severe, moderate to severe, mild to moderate, and mild. The phonological treatment applied was the ABAB—Withdrawal and Multiple Probes Model. The development of the therapy by generalization was observed through the comparison between the two analyses: contrastive and distinctive features at the moment of evaluation and reevaluation. Results The following types of generalization were found: to the items not used in the treatment (other words, to another position in the word, within a sound class, to other classes of sounds, and to another syllable structure. Conclusion The different types of generalization studied showed the expansion of production and proper use of therapy-trained targets in other contexts or untrained environments. Therefore, the analysis of the generalizations proved to be an important criterion to measure the therapeutic efficacy.

  11. Study of the vocal signal in the amplitude-time representation. Speech segmentation and recognition algorithms

    International Nuclear Information System (INIS)

    Baudry, Marc

    1978-01-01

    This dissertation exposes an acoustical and phonetical study of vocal signal. The complex pattern of the signal is segmented into simple sub-patterns and each one of these sub-patterns may be segmented again into another more simplest patterns with lower level. Application of pattern recognition techniques facilitates on one hand this segmentation and on the other hand the definition of the structural relations between the sub-patterns. Particularly, we have developed syntactic techniques in which the rewriting rules, context-sensitive, are controlled by predicates using parameters evaluated on the sub-patterns themselves. This allow to generalize a pure syntactic analysis by adding a semantic information. The system we expose, realizes pre-classification and a partial identification of the phonemes as also the accurate detection of each pitch period. The voice signal is analysed directly using the amplitude-time representation. This system has been implemented on a mini-computer and it works in the real time. (author) [fr

  12. The early maximum likelihood estimation model of audiovisual integration in speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias

    2015-01-01

    integration to speech perception along with three model variations. In early MLE, integration is based on a continuous internal representation before categorization, which can make the model more parsimonious by imposing constraints that reflect experimental designs. The study also shows that cross......Speech perception is facilitated by seeing the articulatory mouth movements of the talker. This is due to perceptual audiovisual integration, which also causes the McGurk−MacDonald illusion, and for which a comprehensive computational account is still lacking. Decades of research have largely......-validation can evaluate models of audiovisual integration based on typical data sets taking both goodness-of-fit and model flexibility into account. All models were tested on a published data set previously used for testing the FLMP. Cross-validation favored the early MLE while more conventional error measures...

  13. Instantaneous Fundamental Frequency Estimation with Optimal Segmentation for Nonstationary Voiced Speech

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2016-01-01

    In speech processing, the speech is often considered stationary within segments of 20–30 ms even though it is well known not to be true. In this paper, we take the non-stationarity of voiced speech into account by using a linear chirp model to describe the speech signal. We propose a maximum...... likelihood estimator of the fundamental frequency and chirp rate of this model, and show that it reaches the Cramer-Rao bound. Since the speech varies over time, a fixed segment length is not optimal, and we propose to make a segmentation of the signal based on the maximum a posteriori (MAP) criterion. Using...... of the chirp model than the harmonic model to the speech signal. The methods are based on an assumption of white Gaussian noise, and, therefore, two prewhitening filters are also proposed....

  14. Multilevel Analysis in Analyzing Speech Data

    Science.gov (United States)

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  15. Clinical and MRI models predicting amyloid deposition in progressive aphasia and apraxia of speech.

    Science.gov (United States)

    Whitwell, Jennifer L; Weigand, Stephen D; Duffy, Joseph R; Strand, Edythe A; Machulda, Mary M; Senjem, Matthew L; Gunter, Jeffrey L; Lowe, Val J; Jack, Clifford R; Josephs, Keith A

    2016-01-01

    Beta-amyloid (Aβ) deposition can be observed in primary progressive aphasia (PPA) and progressive apraxia of speech (PAOS). While it is typically associated with logopenic PPA, there are exceptions that make predicting Aβ status challenging based on clinical diagnosis alone. We aimed to determine whether MRI regional volumes or clinical data could help predict Aβ deposition. One hundred and thirty-nine PPA (n = 97; 15 agrammatic, 53 logopenic, 13 semantic and 16 unclassified) and PAOS (n = 42) subjects were prospectively recruited into a cross-sectional study and underwent speech/language assessments, 3.0 T MRI and C11-Pittsburgh Compound B PET. The presence of Aβ was determined using a 1.5 SUVR cut-point. Atlas-based parcellation was used to calculate gray matter volumes of 42 regions-of-interest across the brain. Penalized binary logistic regression was utilized to determine what combination of MRI regions, and what combination of speech and language tests, best predicts Aβ (+) status. The optimal MRI model and optimal clinical model both performed comparably in their ability to accurately classify subjects according to Aβ status. MRI accurately classified 81% of subjects using 14 regions. Small left superior temporal and inferior parietal volumes and large left Broca's area volumes were particularly predictive of Aβ (+) status. Clinical scores accurately classified 83% of subjects using 12 tests. Phonological errors and repetition deficits, and absence of agrammatism and motor speech deficits were particularly predictive of Aβ (+) status. In comparison, clinical diagnosis was able to accurately classify 89% of subjects. However, the MRI model performed well in predicting Aβ deposition in unclassified PPA. Clinical diagnosis provides optimum prediction of Aβ status at the group level, although regional MRI measurements and speech and language testing also performed well and could have advantages in predicting Aβ status in unclassified PPA subjects.

  16. Clinical and MRI models predicting amyloid deposition in progressive aphasia and apraxia of speech

    Directory of Open Access Journals (Sweden)

    Jennifer L. Whitwell

    2016-01-01

    Full Text Available Beta-amyloid (Aβ deposition can be observed in primary progressive aphasia (PPA and progressive apraxia of speech (PAOS. While it is typically associated with logopenic PPA, there are exceptions that make predicting Aβ status challenging based on clinical diagnosis alone. We aimed to determine whether MRI regional volumes or clinical data could help predict Aβ deposition. One hundred and thirty-nine PPA (n = 97; 15 agrammatic, 53 logopenic, 13 semantic and 16 unclassified and PAOS (n = 42 subjects were prospectively recruited into a cross-sectional study and underwent speech/language assessments, 3.0 T MRI and C11-Pittsburgh Compound B PET. The presence of Aβ was determined using a 1.5 SUVR cut-point. Atlas-based parcellation was used to calculate gray matter volumes of 42 regions-of-interest across the brain. Penalized binary logistic regression was utilized to determine what combination of MRI regions, and what combination of speech and language tests, best predicts Aβ (+ status. The optimal MRI model and optimal clinical model both performed comparably in their ability to accurately classify subjects according to Aβ status. MRI accurately classified 81% of subjects using 14 regions. Small left superior temporal and inferior parietal volumes and large left Broca's area volumes were particularly predictive of Aβ (+ status. Clinical scores accurately classified 83% of subjects using 12 tests. Phonological errors and repetition deficits, and absence of agrammatism and motor speech deficits were particularly predictive of Aβ (+ status. In comparison, clinical diagnosis was able to accurately classify 89% of subjects. However, the MRI model performed well in predicting Aβ deposition in unclassified PPA. Clinical diagnosis provides optimum prediction of Aβ status at the group level, although regional MRI measurements and speech and language testing also performed well and could have advantages in predicting Aβ status in unclassified

  17. Ear, Hearing and Speech

    DEFF Research Database (Denmark)

    Poulsen, Torben

    2000-01-01

    An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...

  18. Learning sparse generative models of audiovisual signals

    OpenAIRE

    Monaci, Gianluca; Sommer, Friedrich T.; Vandergheynst, Pierre

    2008-01-01

    This paper presents a novel framework to learn sparse represen- tations for audiovisual signals. An audiovisual signal is modeled as a sparse sum of audiovisual kernels. The kernels are bimodal functions made of synchronous audio and video components that can be positioned independently and arbitrarily in space and time. We design an algorithm capable of learning sets of such audiovi- sual, synchronous, shift-invariant functions by alternatingly solving a coding and a learning pr...

  19. Proposal for Classifying the Severity of Speech Disorder Using a Fuzzy Model in Accordance with the Implicational Model of Feature Complexity

    Science.gov (United States)

    Brancalioni, Ana Rita; Magnago, Karine Faverzani; Keske-Soares, Marcia

    2012-01-01

    The objective of this study is to create a new proposal for classifying the severity of speech disorders using a fuzzy model in accordance with a linguistic model that represents the speech acquisition of Brazilian Portuguese. The fuzzy linguistic model was run in the MATLAB software fuzzy toolbox from a set of fuzzy rules, and it encompassed…

  20. The speech-based envelope power spectrum model (sEPSM) family: Development, achievements, and current challenges

    DEFF Research Database (Denmark)

    Relano-Iborra, Helia; Chabot-Leclerc, Alexandre; Scheidiger, Christoph

    2017-01-01

    have extended the predictive power of the original model to a broad range of conditions. This contribution presents the most recent developments within the sEPSM “family:” (i) A binaural extension, the B-sEPSM [Chabot-Leclerc et al. (2016). J. Acoust. Soc. Am. 140(1), 192-205] which combines better......Intelligibility models provide insights regarding the effects of target speech characteristics, transmission channels and/or auditory processing on the speech perception performance of listeners. In 2011, Jørgensen and Dau proposed the speech-based envelope power spectrum model [sEPSM, Jørgensen...

  1. Randomized controlled trial of video self-modeling following speech restructuring treatment for stuttering.

    Science.gov (United States)

    Cream, Angela; O'Brian, Sue; Jones, Mark; Block, Susan; Harrison, Elisabeth; Lincoln, Michelle; Hewat, Sally; Packman, Ann; Menzies, Ross; Onslow, Mark

    2010-08-01

    In this study, the authors investigated the efficacy of video self-modeling (VSM) following speech restructuring treatment to improve the maintenance of treatment effects. The design was an open-plan, parallel-group, randomized controlled trial. Participants were 89 adults and adolescents who undertook intensive speech restructuring treatment. Post treatment, participants were randomly assigned to 2 trial arms: standard maintenance and standard maintenance plus VSM. Participants in the latter arm viewed stutter-free videos of themselves each day for 1 month. The addition of VSM did not improve speech outcomes, as measured by percent syllables stuttered, at either 1 or 6 months postrandomization. However, at the latter assessment, self-rating of worst stuttering severity by the VSM group was 10% better than that of the control group, and satisfaction with speech fluency was 20% better. Quality of life was also better for the VSM group, which was mildly to moderately impaired compared with moderate impairment in the control group. VSM intervention after treatment was associated with improvements in self-reported outcomes. The clinical implications of this finding are discussed.

  2. Speech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering

    Directory of Open Access Journals (Sweden)

    M. H. Savoji

    2014-09-01

    Full Text Available Gaussian Mixture Models (GMMs of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equations whose solutions lead to the first estimates of speech and noise power spectra. The noise source is also identified and the input SNR estimated in this first step. These first estimates are then refined using approximate but explicit MMSE and MAP estimation formulations. The refined estimates are then used in a Wiener filter to reduce noise and enhance the noisy speech. The proposed schemes show good results. Nevertheless, it is shown that the MAP explicit solution, introduced here for the first time, reduces the computation time to less than one third with a slight higher improvement in SNR and PESQ score and also less distortion in comparison to the MMSE solution.

  3. Emergence of language structures from exposure to visually grounded speech signal

    NARCIS (Netherlands)

    Chrupala, Grzegorz; Alishahi, Afra; Gelderloos, Lieke

    2017-01-01

    A variety of computational models can learn meanings of words and sentences from exposure to word sequences coupled with the perceptual context in which they occur. More recently, neural network models have been applied to more naturalistic and more challenging versions of this problem: for example

  4. Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach

    Directory of Open Access Journals (Sweden)

    B. Yegnanarayana

    2007-01-01

    Full Text Available Speech recorded from a throat microphone is robust to the surrounding noise, but sounds unnatural unlike the speech recorded from a close-speaking microphone. This paper addresses the issue of improving the perceptual quality of the throat microphone speech by mapping the speech spectra from the throat microphone to the close-speaking microphone. A neural network model is used to capture the speaker-dependent functional relationship between the feature vectors (cepstral coefficients of the two speech signals. A method is proposed to ensure the stability of the all-pole synthesis filter. Objective evaluations indicate the effectiveness of the proposed mapping scheme. The advantage of this method is that the model gives a smooth estimate of the spectra of the close-speaking microphone speech. No distortions are perceived in the reconstructed speech. This mapping technique is also used for bandwidth extension of telephone speech.

  5. Computer models of vocal tract evolution: an overview and critique

    NARCIS (Netherlands)

    de Boer, B.; Fitch, W. T.

    2010-01-01

    Human speech has been investigated with computer models since the invention of digital computers, and models of the evolution of speech first appeared in the late 1960s and early 1970s. Speech science and computer models have a long shared history because speech is a physical signal and can be

  6. Bandwidth Extension of Telephone Speech Aided by Data Embedding

    Directory of Open Access Journals (Sweden)

    Sagi Ariel

    2007-01-01

    Full Text Available A system for bandwidth extension of telephone speech, aided by data embedding, is presented. The proposed system uses the transmitted analog narrowband speech signal as a carrier of the side information needed to carry out the bandwidth extension. The upper band of the wideband speech is reconstructed at the receiving end from two components: a synthetic wideband excitation signal, generated from the narrowband telephone speech and a wideband spectral envelope, parametrically represented and transmitted as embedded data in the telephone speech. We propose a novel data embedding scheme, in which the scalar Costa scheme is combined with an auditory masking model allowing high rate transparent embedding, while maintaining a low bit error rate. The signal is transformed to the frequency domain via the discrete Hartley transform (DHT and is partitioned into subbands. Data is embedded in an adaptively chosen subset of subbands by modifying the DHT coefficients. In our simulations, high quality wideband speech was obtained from speech transmitted over a telephone line (characterized by spectral magnitude distortion, dispersion, and noise, in which side information data is transparently embedded at the rate of 600 information bits/second and with a bit error rate of approximately . In a listening test, the reconstructed wideband speech was preferred (at different degrees over conventional telephone speech in of the test utterances.

  7. Bandwidth Extension of Telephone Speech Aided by Data Embedding

    Directory of Open Access Journals (Sweden)

    David Malah

    2007-01-01

    Full Text Available A system for bandwidth extension of telephone speech, aided by data embedding, is presented. The proposed system uses the transmitted analog narrowband speech signal as a carrier of the side information needed to carry out the bandwidth extension. The upper band of the wideband speech is reconstructed at the receiving end from two components: a synthetic wideband excitation signal, generated from the narrowband telephone speech and a wideband spectral envelope, parametrically represented and transmitted as embedded data in the telephone speech. We propose a novel data embedding scheme, in which the scalar Costa scheme is combined with an auditory masking model allowing high rate transparent embedding, while maintaining a low bit error rate. The signal is transformed to the frequency domain via the discrete Hartley transform (DHT and is partitioned into subbands. Data is embedded in an adaptively chosen subset of subbands by modifying the DHT coefficients. In our simulations, high quality wideband speech was obtained from speech transmitted over a telephone line (characterized by spectral magnitude distortion, dispersion, and noise, in which side information data is transparently embedded at the rate of 600 information bits/second and with a bit error rate of approximately 3⋅10−4. In a listening test, the reconstructed wideband speech was preferred (at different degrees over conventional telephone speech in 92.5% of the test utterances.

  8. Hybrid model decomposition of speech and noise in a radial basis function neural model framework

    DEFF Research Database (Denmark)

    Sørensen, Helge Bjarup Dissing; Hartmann, Uwe

    1994-01-01

    The aim of the paper is to focus on a new approach to automatic speech recognition in noisy environments where the noise has either stationary or non-stationary statistical characteristics. The aim is to perform automatic recognition of speech in the presence of additive car noise. The technique...

  9. Syntactic error modeling and scoring normalization in speech recognition: Error modeling and scoring normalization in the speech recognition task for adult literacy training

    Science.gov (United States)

    Olorenshaw, Lex; Trawick, David

    1991-01-01

    The purpose was to develop a speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Better mechanisms are provided for using speech recognition in a literacy tutor application. Using a combination of scoring normalization techniques and cheater-mode decoding, a reasonable acceptance/rejection threshold was provided. In continuous speech, the system was tested to be able to provide above 80 pct. correct acceptance of words, while correctly rejecting over 80 pct. of incorrectly pronounced words.

  10. Patterns of flavor signals in supersymmetric models

    Energy Technology Data Exchange (ETDEWEB)

    Goto, T. [KEK National High Energy Physics, Tsukuba (Japan)]|[Kyoto Univ. (Japan). YITP; Okada, Y. [KEK National High Energy Physics, Tsukuba (Japan)]|[Graduate Univ. for Advanced Studies, Tsukuba (Japan). Dept. of Particle and Nucelar Physics; Shindou, T. [Deutsches Elektronen-Synchrotron (DESY), Hamburg (Germany)]|[International School for Advanced Studies, Trieste (Italy); Tanaka, M. [Osaka Univ., Toyonaka (Japan). Dept. of Physics

    2007-11-15

    Quark and lepton flavor signals are studied in four supersymmetric models, namely the minimal supergravity model, the minimal supersymmetric standard model with right-handed neutrinos, SU(5) supersymmetric grand unified theory with right-handed neutrinos and the minimal supersymmetric standard model with U(2) flavor symmetry. We calculate b{yields}s(d) transition observables in B{sub d} and B{sub s} decays, taking the constraint from the B{sub s}- anti B{sub s} mixing recently observed at Tevatron into account. We also calculate lepton flavor violating processes {mu} {yields} e{gamma}, {tau} {yields} {mu}{gamma} and {tau} {yields} e{gamma} for the models with right-handed neutrinos. We investigate possibilities to distinguish the flavor structure of the supersymmetry breaking sector with use of patterns of various flavor signals which are expected to be measured in experiments such as MEG, LHCb and a future Super B Factory. (orig.)

  11. Patterns of flavor signals in supersymmetric models

    International Nuclear Information System (INIS)

    Goto, T.; Tanaka, M.

    2007-11-01

    Quark and lepton flavor signals are studied in four supersymmetric models, namely the minimal supergravity model, the minimal supersymmetric standard model with right-handed neutrinos, SU(5) supersymmetric grand unified theory with right-handed neutrinos and the minimal supersymmetric standard model with U(2) flavor symmetry. We calculate b→s(d) transition observables in B d and B s decays, taking the constraint from the B s - anti B s mixing recently observed at Tevatron into account. We also calculate lepton flavor violating processes μ → eγ, τ → μγ and τ → eγ for the models with right-handed neutrinos. We investigate possibilities to distinguish the flavor structure of the supersymmetry breaking sector with use of patterns of various flavor signals which are expected to be measured in experiments such as MEG, LHCb and a future Super B Factory. (orig.)

  12. Robust Transmission of Speech LSFs Using Hidden Markov Model-Based Multiple Description Index Assignments

    Directory of Open Access Journals (Sweden)

    Pradeepa Yahampath

    2008-03-01

    Full Text Available Speech coding techniques capable of generating encoded representations which are robust against channel losses play an important role in enabling reliable voice communication over packet networks and mobile wireless systems. In this paper, we investigate the use of multiple description index assignments (MDIAs for loss-tolerant transmission of line spectral frequency (LSF coefficients, typically generated by state-of-the-art speech coders. We propose a simulated annealing-based approach for optimizing MDIAs for Markov-model-based decoders which exploit inter- and intraframe correlations in LSF coefficients to reconstruct the quantized LSFs from coded bit streams corrupted by channel losses. Experimental results are presented which compare the performance of a number of novel LSF transmission schemes. These results clearly demonstrate that Markov-model-based decoders, when used in conjunction with optimized MDIA, can yield average spectral distortion much lower than that produced by methods such as interleaving/interpolation, commonly used to combat the packet losses.

  13. Robust Transmission of Speech LSFs Using Hidden Markov Model-Based Multiple Description Index Assignments

    Directory of Open Access Journals (Sweden)

    Rondeau Paul

    2008-01-01

    Full Text Available Speech coding techniques capable of generating encoded representations which are robust against channel losses play an important role in enabling reliable voice communication over packet networks and mobile wireless systems. In this paper, we investigate the use of multiple description index assignments (MDIAs for loss-tolerant transmission of line spectral frequency (LSF coefficients, typically generated by state-of-the-art speech coders. We propose a simulated annealing-based approach for optimizing MDIAs for Markov-model-based decoders which exploit inter- and intraframe correlations in LSF coefficients to reconstruct the quantized LSFs from coded bit streams corrupted by channel losses. Experimental results are presented which compare the performance of a number of novel LSF transmission schemes. These results clearly demonstrate that Markov-model-based decoders, when used in conjunction with optimized MDIA, can yield average spectral distortion much lower than that produced by methods such as interleaving/interpolation, commonly used to combat the packet losses.

  14. Mathematical Models Light Up Plant Signaling

    NARCIS (Netherlands)

    Chew, Y.H.; Smith, R.W.; Jones, H.J.; Seaton, D.D.; Grima, R.; Halliday, K.J.

    2014-01-01

    Plants respond to changes in the environment by triggering a suite of regulatory networks that control and synchronize molecular signaling in different tissues, organs, and the whole plant. Molecular studies through genetic and environmental perturbations, particularly in the model plant Arabidopsis

  15. Speech intelligibility for normal hearing and hearing-impaired listeners in simulated room acoustic conditions

    DEFF Research Database (Denmark)

    Arweiler, Iris; Dau, Torsten; Poulsen, Torben

    Speech intelligibility depends on many factors such as room acoustics, the acoustical properties and location of the signal and the interferers, and the ability of the (normal and impaired) auditory system to process monaural and binaural sounds. In the present study, the effect of reverberation...... on spatial release from masking was investigated in normal hearing and hearing impaired listeners using three types of interferers: speech shaped noise, an interfering female talker and speech-modulated noise. Speech reception thresholds (SRT) were obtained in three simulated environments: a listening room......, a classroom and a church. The data from the study provide constraints for existing models of speech intelligibility prediction (based on the speech intelligibility index, SII, or the speech transmission index, STI) which have shortcomings when reverberation and/or fluctuating noise affect speech...

  16. A Support Vector Machine-Based Gender Identification Using Speech Signal

    Science.gov (United States)

    Lee, Kye-Hwan; Kang, Sang-Ick; Kim, Deok-Hwan; Chang, Joon-Hyuk

    We propose an effective voice-based gender identification method using a support vector machine (SVM). The SVM is a binary classification algorithm that classifies two groups by finding the voluntary nonlinear boundary in a feature space and is known to yield high classification performance. In the present work, we compare the identification performance of the SVM with that of a Gaussian mixture model (GMM)-based method using the mel frequency cepstral coefficients (MFCC). A novel approach of incorporating a features fusion scheme based on a combination of the MFCC and the fundamental frequency is proposed with the aim of improving the performance of gender identification. Experimental results demonstrate that the gender identification performance using the SVM is significantly better than that of the GMM-based scheme. Moreover, the performance is substantially improved when the proposed features fusion technique is applied.

  17. Modeling high dimensional multichannel brain signals

    KAUST Repository

    Hu, Lechuan

    2017-03-27

    In this paper, our goal is to model functional and effective (directional) connectivity in network of multichannel brain physiological signals (e.g., electroencephalograms, local field potentials). The primary challenges here are twofold: first, there are major statistical and computational difficulties for modeling and analyzing high dimensional multichannel brain signals; second, there is no set of universally-agreed measures for characterizing connectivity. To model multichannel brain signals, our approach is to fit a vector autoregressive (VAR) model with sufficiently high order so that complex lead-lag temporal dynamics between the channels can be accurately characterized. However, such a model contains a large number of parameters. Thus, we will estimate the high dimensional VAR parameter space by our proposed hybrid LASSLE method (LASSO+LSE) which is imposes regularization on the first step (to control for sparsity) and constrained least squares estimation on the second step (to improve bias and mean-squared error of the estimator). Then to characterize connectivity between channels in a brain network, we will use various measures but put an emphasis on partial directed coherence (PDC) in order to capture directional connectivity between channels. PDC is a directed frequency-specific measure that explains the extent to which the present oscillatory activity in a sender channel influences the future oscillatory activity in a specific receiver channel relative all possible receivers in the network. Using the proposed modeling approach, we have achieved some insights on learning in a rat engaged in a non-spatial memory task.

  18. Modeling High-Dimensional Multichannel Brain Signals

    KAUST Repository

    Hu, Lechuan

    2017-12-12

    Our goal is to model and measure functional and effective (directional) connectivity in multichannel brain physiological signals (e.g., electroencephalograms, local field potentials). The difficulties from analyzing these data mainly come from two aspects: first, there are major statistical and computational challenges for modeling and analyzing high-dimensional multichannel brain signals; second, there is no set of universally agreed measures for characterizing connectivity. To model multichannel brain signals, our approach is to fit a vector autoregressive (VAR) model with potentially high lag order so that complex lead-lag temporal dynamics between the channels can be captured. Estimates of the VAR model will be obtained by our proposed hybrid LASSLE (LASSO + LSE) method which combines regularization (to control for sparsity) and least squares estimation (to improve bias and mean-squared error). Then we employ some measures of connectivity but put an emphasis on partial directed coherence (PDC) which can capture the directional connectivity between channels. PDC is a frequency-specific measure that explains the extent to which the present oscillatory activity in a sender channel influences the future oscillatory activity in a specific receiver channel relative to all possible receivers in the network. The proposed modeling approach provided key insights into potential functional relationships among simultaneously recorded sites during performance of a complex memory task. Specifically, this novel method was successful in quantifying patterns of effective connectivity across electrode locations, and in capturing how these patterns varied across trial epochs and trial types.

  19. Modeling high dimensional multichannel brain signals

    KAUST Repository

    Hu, Lechuan; Fortin, Norbert; Ombao, Hernando

    2017-01-01

    In this paper, our goal is to model functional and effective (directional) connectivity in network of multichannel brain physiological signals (e.g., electroencephalograms, local field potentials). The primary challenges here are twofold: first, there are major statistical and computational difficulties for modeling and analyzing high dimensional multichannel brain signals; second, there is no set of universally-agreed measures for characterizing connectivity. To model multichannel brain signals, our approach is to fit a vector autoregressive (VAR) model with sufficiently high order so that complex lead-lag temporal dynamics between the channels can be accurately characterized. However, such a model contains a large number of parameters. Thus, we will estimate the high dimensional VAR parameter space by our proposed hybrid LASSLE method (LASSO+LSE) which is imposes regularization on the first step (to control for sparsity) and constrained least squares estimation on the second step (to improve bias and mean-squared error of the estimator). Then to characterize connectivity between channels in a brain network, we will use various measures but put an emphasis on partial directed coherence (PDC) in order to capture directional connectivity between channels. PDC is a directed frequency-specific measure that explains the extent to which the present oscillatory activity in a sender channel influences the future oscillatory activity in a specific receiver channel relative all possible receivers in the network. Using the proposed modeling approach, we have achieved some insights on learning in a rat engaged in a non-spatial memory task.

  20. Noise-robust speech triage.

    Science.gov (United States)

    Bartos, Anthony L; Cipr, Tomas; Nelson, Douglas J; Schwarz, Petr; Banowetz, John; Jerabek, Ladislav

    2018-04-01

    A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (-10 dB ≤ SNR < +10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR ≥ +10 dB).

  1. Toward a Model of Pediatric Speech Sound Disorders (SSD) for Differential Diagnosis and Therapy Planning

    NARCIS (Netherlands)

    Terband, Hayo; Maassen, Bernardus; Maas, Edwin; van Lieshout, Pascal; Maassen, Ben; Terband, Hayo

    2016-01-01

    The classification and differentiation of pediatric speech sound disorders (SSD) is one of the main questions in the field of speech- and language pathology. Terms for classifying childhood and SSD and motor speech disorders (MSD) refer to speech production processes, and a variety of methods of

  2. Making Faces - State-Space Models Applied to Multi-Modal Signal Processing

    DEFF Research Database (Denmark)

    Lehn-Schiøler, Tue

    2005-01-01

    The two main focus areas of this thesis are State-Space Models and multi modal signal processing. The general State-Space Model is investigated and an addition to the class of sequential sampling methods is proposed. This new algorithm is denoted as the Parzen Particle Filter. Furthermore...... optimizer can be applied to speed up convergence. The linear version of the State-Space Model, the Kalman Filter, is applied to multi modal signal processing. It is demonstrated how a State-Space Model can be used to map from speech to lip movements. Besides the State-Space Model and the multi modal...... application an information theoretic vector quantizer is also proposed. Based on interactions between particles, it is shown how a quantizing scheme based on an analytic cost function can be derived....

  3. Principles of speech coding

    CERN Document Server

    Ogunfunmi, Tokunbo

    2010-01-01

    It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the

  4. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    Directory of Open Access Journals (Sweden)

    Santiago-Omar Caballero-Morales

    2013-01-01

    Full Text Available An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR system was built with Hidden Markov Models (HMMs, where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness. Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR’s output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech.

  5. Unifying Speech and Language in a Developmentally Sensitive Model of Production.

    Science.gov (United States)

    Redford, Melissa A

    2015-11-01

    Speaking is an intentional activity. It is also a complex motor skill; one that exhibits protracted development and the fully automatic character of an overlearned behavior. Together these observations suggest an analogy with skilled behavior in the non-language domain. This analogy is used here to argue for a model of production that is grounded in the activity of speaking and structured during language acquisition. The focus is on the plan that controls the execution of fluent speech; specifically, on the units that are activated during the production of an intonational phrase. These units are schemas: temporally structured sequences of remembered actions and their sensory outcomes. Schemas are activated and inhibited via associated goals, which are linked to specific meanings. Schemas may fuse together over developmental time with repeated use to form larger units, thereby affecting the relative timing of sequential action in participating schemas. In this way, the hierarchical structure of the speech plan and ensuing rhythm patterns of speech are a product of development. Individual schemas may also become differentiated during development, but only if subsequences are associated with meaning. The necessary association of action and meaning gives rise to assumptions about the primacy of certain linguistic forms in the production process. Overall, schema representations connect usage-based theories of language to the action of speaking.

  6. Model for neural signaling leap statistics

    International Nuclear Information System (INIS)

    Chevrollier, Martine; Oria, Marcos

    2011-01-01

    We present a simple model for neural signaling leaps in the brain considering only the thermodynamic (Nernst) potential in neuron cells and brain temperature. We numerically simulated connections between arbitrarily localized neurons and analyzed the frequency distribution of the distances reached. We observed qualitative change between Normal statistics (with T 37.5 0 C, awaken regime) and Levy statistics (T = 35.5 0 C, sleeping period), characterized by rare events of long range connections.

  7. Model for neural signaling leap statistics

    Science.gov (United States)

    Chevrollier, Martine; Oriá, Marcos

    2011-03-01

    We present a simple model for neural signaling leaps in the brain considering only the thermodynamic (Nernst) potential in neuron cells and brain temperature. We numerically simulated connections between arbitrarily localized neurons and analyzed the frequency distribution of the distances reached. We observed qualitative change between Normal statistics (with T = 37.5°C, awaken regime) and Lévy statistics (T = 35.5°C, sleeping period), characterized by rare events of long range connections.

  8. Model for neural signaling leap statistics

    Energy Technology Data Exchange (ETDEWEB)

    Chevrollier, Martine; Oria, Marcos, E-mail: oria@otica.ufpb.br [Laboratorio de Fisica Atomica e Lasers Departamento de Fisica, Universidade Federal da ParaIba Caixa Postal 5086 58051-900 Joao Pessoa, Paraiba (Brazil)

    2011-03-01

    We present a simple model for neural signaling leaps in the brain considering only the thermodynamic (Nernst) potential in neuron cells and brain temperature. We numerically simulated connections between arbitrarily localized neurons and analyzed the frequency distribution of the distances reached. We observed qualitative change between Normal statistics (with T 37.5{sup 0}C, awaken regime) and Levy statistics (T = 35.5{sup 0}C, sleeping period), characterized by rare events of long range connections.

  9. Under-resourced speech recognition based on the speech manifold

    CSIR Research Space (South Africa)

    Sahraeian, R

    2015-09-01

    Full Text Available Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a...

  10. Emotion recognition from speech: tools and challenges

    Science.gov (United States)

    Al-Talabani, Abdulbasit; Sellahewa, Harin; Jassim, Sabah A.

    2015-05-01

    Human emotion recognition from speech is studied frequently for its importance in many applications, e.g. human-computer interaction. There is a wide diversity and non-agreement about the basic emotion or emotion-related states on one hand and about where the emotion related information lies in the speech signal on the other side. These diversities motivate our investigations into extracting Meta-features using the PCA approach, or using a non-adaptive random projection RP, which significantly reduce the large dimensional speech feature vectors that may contain a wide range of emotion related information. Subsets of Meta-features are fused to increase the performance of the recognition model that adopts the score-based LDC classifier. We shall demonstrate that our scheme outperform the state of the art results when tested on non-prompted databases or acted databases (i.e. when subjects act specific emotions while uttering a sentence). However, the huge gap between accuracy rates achieved on the different types of datasets of speech raises questions about the way emotions modulate the speech. In particular we shall argue that emotion recognition from speech should not be dealt with as a classification problem. We shall demonstrate the presence of a spectrum of different emotions in the same speech portion especially in the non-prompted data sets, which tends to be more "natural" than the acted datasets where the subjects attempt to suppress all but one emotion.

  11. Speech Problems

    Science.gov (United States)

    ... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...

  12. Logic integer programming models for signaling networks.

    Science.gov (United States)

    Haus, Utz-Uwe; Niermann, Kathrin; Truemper, Klaus; Weismantel, Robert

    2009-05-01

    We propose a static and a dynamic approach to model biological signaling networks, and show how each can be used to answer relevant biological questions. For this, we use the two different mathematical tools of Propositional Logic and Integer Programming. The power of discrete mathematics for handling qualitative as well as quantitative data has so far not been exploited in molecular biology, which is mostly driven by experimental research, relying on first-order or statistical models. The arising logic statements and integer programs are analyzed and can be solved with standard software. For a restricted class of problems the logic models reduce to a polynomial-time solvable satisfiability algorithm. Additionally, a more dynamic model enables enumeration of possible time resolutions in poly-logarithmic time. Computational experiments are included.

  13. Speech Recognition

    Directory of Open Access Journals (Sweden)

    Adrian Morariu

    2009-01-01

    Full Text Available This paper presents a method of speech recognition by pattern recognition techniques. Learning consists in determining the unique characteristics of a word (cepstral coefficients by eliminating those characteristics that are different from one word to another. For learning and recognition, the system will build a dictionary of words by determining the characteristics of each word to be used in the recognition. Determining the characteristics of an audio signal consists in the following steps: noise removal, sampling it, applying Hamming window, switching to frequency domain through Fourier transform, calculating the magnitude spectrum, filtering data, determining cepstral coefficients.

  14. A comparison between the first-fit settings of two multichannel digital signal-processing strategies: music quality ratings and speech-in-noise scores.

    Science.gov (United States)

    Higgins, Paul; Searchfield, Grant; Coad, Gavin

    2012-06-01

    The aim of this study was to determine which level-dependent hearing aid digital signal-processing strategy (DSP) participants preferred when listening to music and/or performing a speech-in-noise task. Two receiver-in-the-ear hearing aids were compared: one using 32-channel adaptive dynamic range optimization (ADRO) and the other wide dynamic range compression (WDRC) incorporating dual fast (4 channel) and slow (15 channel) processing. The manufacturers' first-fit settings based on participants' audiograms were used in both cases. Results were obtained from 18 participants on a quick speech-in-noise (QuickSIN; Killion, Niquette, Gudmundsen, Revit, & Banerjee, 2004) task and for 3 music listening conditions (classical, jazz, and rock). Participants preferred the quality of music and performed better at the QuickSIN task using the hearing aids with ADRO processing. A potential reason for the better performance of the ADRO hearing aids was less fluctuation in output with change in sound dynamics. ADRO processing has advantages for both music quality and speech recognition in noise over the multichannel WDRC processing that was used in the study. Further evaluations of which DSP aspects contribute to listener preference are required.

  15. Assessment of the Speech Intelligibility Performance of Post Lingual Cochlear Implant Users at Different Signal-to-Noise Ratios Using the Turkish Matrix Test

    Directory of Open Access Journals (Sweden)

    Zahra Polat

    2016-10-01

    Full Text Available Background: Spoken word recognition and speech perception tests in quiet are being used as a routine in assessment of the benefit which children and adult cochlear implant users receive from their devices. Cochlear implant users generally demonstrate high level performances in these test materials as they are able to achieve high level speech perception ability in quiet situations. Although these test materials provide valuable information regarding Cochlear Implant (CI users’ performances in optimal listening conditions, they do not give realistic information regarding performances in adverse listening conditions, which is the case in the everyday environment. Aims: The aim of this study was to assess the speech intelligibility performance of post lingual CI users in the presence of noise at different signal-to-noise ratio with the Matrix Test developed for Turkish language. Study Design: Cross-sectional study. Methods: The thirty post lingual implant user adult subjects, who had been using implants for a minimum of one year, were evaluated with Turkish Matrix test. Subjects’ speech intelligibility was measured using the adaptive and non-adaptive Matrix Test in quiet and noisy environments. Results: The results of the study show a correlation between Pure Tone Average (PTA values of the subjects and Matrix test Speech Reception Threshold (SRT values in the quiet. Hence, it is possible to asses PTA values of CI users using the Matrix Test also. However, no correlations were found between Matrix SRT values in the quiet and Matrix SRT values in noise. Similarly, the correlation between PTA values and intelligibility scores in noise was also not significant. Therefore, it may not be possible to assess the intelligibility performance of CI users using test batteries performed in quiet conditions. Conclusion: The Matrix Test can be used to assess the benefit of CI users from their systems in everyday life, since it is possible to perform

  16. Speech Intelligibility in Noise Using Throat and Acoustic Microphones

    National Research Council Canada - National Science Library

    Acker-Mills, Barbara

    2004-01-01

    ... speech intelligibility. Speech intelligibility for signals generated by an acoustic microphone, a throat microphone, and the two microphones together was assessed using the Modified Rhyme Test (MRT...

  17. A dynamical model of hierarchical selection and coordination in speech planning.

    Directory of Open Access Journals (Sweden)

    Sam Tilsen

    Full Text Available studies of the control of complex sequential movements have dissociated two aspects of movement planning: control over the sequential selection of movement plans, and control over the precise timing of movement execution. This distinction is particularly relevant in the production of speech: utterances contain sequentially ordered words and syllables, but articulatory movements are often executed in a non-sequential, overlapping manner with precisely coordinated relative timing. This study presents a hybrid dynamical model in which competitive activation controls selection of movement plans and coupled oscillatory systems govern coordination. The model departs from previous approaches by ascribing an important role to competitive selection of articulatory plans within a syllable. Numerical simulations show that the model reproduces a variety of speech production phenomena, such as effects of preparation and utterance composition on reaction time, and asymmetries in patterns of articulatory timing associated with onsets and codas. The model furthermore provides a unified understanding of a diverse group of phonetic and phonological phenomena which have not previously been related.

  18. Animal Models of Speech and Vocal Communication Deficits Associated With Psychiatric Disorders.

    Science.gov (United States)

    Konopka, Genevieve; Roberts, Todd F

    2016-01-01

    Disruptions in speech, language, and vocal communication are hallmarks of several neuropsychiatric disorders, most notably autism spectrum disorders. Historically, the use of animal models to dissect molecular pathways and connect them to behavioral endophenotypes in cognitive disorders has proven to be an effective approach for developing and testing disease-relevant therapeutics. The unique aspects of human language compared with vocal behaviors in other animals make such an approach potentially more challenging. However, the study of vocal learning in species with analogous brain circuits to humans may provide entry points for understanding this human-specific phenotype and diseases. We review animal models of vocal learning and vocal communication and specifically link phenotypes of psychiatric disorders to relevant model systems. Evolutionary constraints in the organization of neural circuits and synaptic plasticity result in similarities in the brain mechanisms for vocal learning and vocal communication. Comparative approaches and careful consideration of the behavioral limitations among different animal models can provide critical avenues for dissecting the molecular pathways underlying cognitive disorders that disrupt speech, language, and vocal communication. Copyright © 2016 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  19. How does language model size effects speech recognition accuracy for the Turkish language?

    Directory of Open Access Journals (Sweden)

    Behnam ASEFİSARAY

    2016-05-01

    Full Text Available In this paper we aimed at investigating the effect of Language Model (LM size on Speech Recognition (SR accuracy. We also provided details of our approach for obtaining the LM for Turkish. Since LM is obtained by statistical processing of raw text, we expect that by increasing the size of available data for training the LM, SR accuracy will improve. Since this study is based on recognition of Turkish, which is a highly agglutinative language, it is important to find out the appropriate size for the training data. The minimum required data size is expected to be much higher than the data needed to train a language model for a language with low level of agglutination such as English. In the experiments we also tried to adjust the Language Model Weight (LMW and Active Token Count (ATC parameters of LM as these are expected to be different for a highly agglutinative language. We showed that by increasing the training data size to an appropriate level, the recognition accuracy improved on the other hand changes on LMW and ATC did not have a positive effect on Turkish speech recognition accuracy.

  20. Methods and models for quantative assessment of speech intelligibility in cross-language communication

    NARCIS (Netherlands)

    Wijngaarden, S.J. van; Steeneken, H.J.M.; Houtgast, T.

    2001-01-01

    To deal with the effects of nonnative speech communication on speech intelligibility, one must know the magnitude of these effects. To measure this magnitude, suitable test methods must be available. Many of the methods used in cross-language speech communication research are not very suitable for

  1. Auditory-Motor Interactions in Pediatric Motor Speech Disorders: Neurocomputational Modeling of Disordered Development

    NARCIS (Netherlands)

    Terband, H.R.; Maassen, B.A.M.; Guenther, F.H.; Brumberg, J.

    2014-01-01

    Background/Purpose: Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between

  2. Auditory-motor interactions in pediatric motor speech disorders: Neurocomputational modeling of disordered development

    NARCIS (Netherlands)

    Terband, H.; Maassen, B.; Guenther, F. H.; Brumberg, J.

    2014-01-01

    BACKGROUND/PURPOSE: Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between

  3. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    Science.gov (United States)

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  4. Blind Separation of Acoustic Signals Combining SIMO-Model-Based Independent Component Analysis and Binary Masking

    Directory of Open Access Journals (Sweden)

    Hiekata Takashi

    2006-01-01

    Full Text Available A new two-stage blind source separation (BSS method for convolutive mixtures of speech is proposed, in which a single-input multiple-output (SIMO-model-based independent component analysis (ICA and a new SIMO-model-based binary masking are combined. SIMO-model-based ICA enables us to separate the mixed signals, not into monaural source signals but into SIMO-model-based signals from independent sources in their original form at the microphones. Thus, the separated signals of SIMO-model-based ICA can maintain the spatial qualities of each sound source. Owing to this attractive property, our novel SIMO-model-based binary masking can be applied to efficiently remove the residual interference components after SIMO-model-based ICA. The experimental results reveal that the separation performance can be considerably improved by the proposed method compared with that achieved by conventional BSS methods. In addition, the real-time implementation of the proposed BSS is illustrated.

  5. Steganalysis of recorded speech

    Science.gov (United States)

    Johnson, Micah K.; Lyu, Siwei; Farid, Hany

    2005-03-01

    Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.

  6. The effects of mands and models on the speech of unresponsive language-delayed preschool children.

    Science.gov (United States)

    Warren, S F; McQuarter, R J; Rogers-Warren, A K

    1984-02-01

    The effects of the systematic use of mands (non-yes/no questions and instructions to verbalize), models (imitative prompts), and specific consequent events on the productive verbal behavior of three unresponsive, socially isolate, language-delayed preschool children were investigated in a multiple-baseline design within a classroom free play period. Following a lengthy intervention condition, experimental procedures were systematically faded out to check for maintenance effects. The treatment resulted in increases in total verbalizations and nonobligatory speech (initiations) by the subjects. Subjects also became more responsive in obligatory speech situations. In a second free play (generalization) setting, increased rates of total child verbalizations and nonobligatory verbalizations were observed for all three subjects, and two of the three subjects were more responsive compared to their baselines in the first free play setting. Rate of total teacher verbalizations and questions were also higher in this setting. Maintenance of the treatment effects was shown during the fading condition in the intervention setting. The subjects' MLUs (mean length of utterance) increased during the intervention condition when the teacher began prompting a minimum of two-word utterances in response to a mand or model.

  7. Audiovisual Asynchrony Detection in Human Speech

    Science.gov (United States)

    Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

    2011-01-01

    Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…

  8. Model-based inverse estimation for active contraction stresses of tongue muscles using 3D surface shape in speech production.

    Science.gov (United States)

    Koike, Narihiko; Ii, Satoshi; Yoshinaga, Tsukasa; Nozaki, Kazunori; Wada, Shigeo

    2017-11-07

    This paper presents a novel inverse estimation approach for the active contraction stresses of tongue muscles during speech. The proposed method is based on variational data assimilation using a mechanical tongue model and 3D tongue surface shapes for speech production. The mechanical tongue model considers nonlinear hyperelasticity, finite deformation, actual geometry from computed tomography (CT) images, and anisotropic active contraction by muscle fibers, the orientations of which are ideally determined using anatomical drawings. The tongue deformation is obtained by solving a stationary force-equilibrium equation using a finite element method. An inverse problem is established to find the combination of muscle contraction stresses that minimizes the Euclidean distance of the tongue surfaces between the mechanical analysis and CT results of speech production, where a signed-distance function represents the tongue surface. Our approach is validated through an ideal numerical example and extended to the real-world case of two Japanese vowels, /ʉ/ and /ɯ/. The results capture the target shape completely and provide an excellent estimation of the active contraction stresses in the ideal case, and exhibit similar tendencies as in previous observations and simulations for the actual vowel cases. The present approach can reveal the relative relationship among the muscle contraction stresses in similar utterances with different tongue shapes, and enables the investigation of the coordination of tongue muscles during speech using only the deformed tongue shape obtained from medical images. This will enhance our understanding of speech motor control. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

    Science.gov (United States)

    Borrie, Stephanie A

    2015-03-01

    This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.

  10. Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech and music

    Directory of Open Access Journals (Sweden)

    Hwee Ling eLee

    2014-08-01

    Full Text Available This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogues of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms. Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past three years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.

  11. Design and realisation of an audiovisual speech activity detector

    NARCIS (Netherlands)

    Van Bree, K.C.

    2006-01-01

    For many speech telecommunication technologies a robust speech activity detector is important. An audio-only speech detector will givefalse positives when the interfering signal is speech or has speech characteristics. The modality video is suitable to solve this problem. In this report the approach

  12. Time Series Neural Network Model for Part-of-Speech Tagging Indonesian Language

    Science.gov (United States)

    Tanadi, Theo

    2018-03-01

    Part-of-speech tagging (POS tagging) is an important part in natural language processing. Many methods have been used to do this task, including neural network. This paper models a neural network that attempts to do POS tagging. A time series neural network is modelled to solve the problems that a basic neural network faces when attempting to do POS tagging. In order to enable the neural network to have text data input, the text data will get clustered first using Brown Clustering, resulting a binary dictionary that the neural network can use. To further the accuracy of the neural network, other features such as the POS tag, suffix, and affix of previous words would also be fed to the neural network.

  13. Speech enhancement using emotion dependent codebooks

    NARCIS (Netherlands)

    Naidu, D.H.R.; Srinivasan, S.

    2012-01-01

    Several speech enhancement approaches utilize trained models of clean speech data, such as codebooks, Gaussian mixtures, and hidden Markov models. These models are typically trained on neutral clean speech data, without any emotion. However, in practical scenarios, emotional speech is a common

  14. Neuronal basis of speech comprehension.

    Science.gov (United States)

    Specht, Karsten

    2014-01-01

    Verbal communication does not rely only on the simple perception of auditory signals. It is rather a parallel and integrative processing of linguistic and non-linguistic information, involving temporal and frontal areas in particular. This review describes the inherent complexity of auditory speech comprehension from a functional-neuroanatomical perspective. The review is divided into two parts. In the first part, structural and functional asymmetry of language relevant structures will be discus. The second part of the review will discuss recent neuroimaging studies, which coherently demonstrate that speech comprehension processes rely on a hierarchical network involving the temporal, parietal, and frontal lobes. Further, the results support the dual-stream model for speech comprehension, with a dorsal stream for auditory-motor integration, and a ventral stream for extracting meaning but also the processing of sentences and narratives. Specific patterns of functional asymmetry between the left and right hemisphere can also be demonstrated. The review article concludes with a discussion on interactions between the dorsal and ventral streams, particularly the involvement of motor related areas in speech perception processes, and outlines some remaining unresolved issues. This article is part of a Special Issue entitled Human Auditory Neuroimaging. Copyright © 2013 Elsevier B.V. All rights reserved.

  15. Why do I hear but not understand? Stochastic undersampling as a model of degraded neural encoding of speech

    Directory of Open Access Journals (Sweden)

    Enrique A Lopez-Poveda

    2014-10-01

    Full Text Available Hearing impairment is a serious disease with increasing prevalence. It is defined based on increased audiometric thresholds but increased thresholds are only partly responsible for the greater difficulty understanding speech in noisy environments experienced by some older listeners or by hearing-impaired listeners. Identifying the additional factors and mechanisms that impair intelligibility is fundamental to understanding hearing impairment but these factors remain uncertain. Traditionally, these additional factors have been sought in the way the speech spectrum is encoded in the pattern of impaired mechanical cochlear responses. Recent studies, however, are steering the focus toward impaired encoding of the speech waveform in the auditory nerve. In our recent work, we gave evidence that a significant factor might be the loss of afferent auditory nerve fibers, a pathology that comes with aging or noise overexposure. Our approach was based on a signal-processing analogy whereby the auditory nerve may be regarded as a stochastic sampler of the sound waveform and deafferentation may be described in terms of waveform undersampling. We showed that stochastic undersampling simultaneously degrades the encoding of soft and rapid waveform features, and that this degrades speech intelligibility in noise more than in quiet without significant increases in audiometric thresholds. Here, we review our recent work in a broader context and argue that the stochastic undersampling analogy may be extended to study the perceptual consequences of various different hearing pathologies and their treatment.

  16. From where to what: a neuroanatomically based evolutionary model of the emergence of speech in humans [version 2; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Oren Poliva

    2016-01-01

    Full Text Available In the brain of primates, the auditory cortex connects with the frontal lobe via the temporal pole (auditory ventral stream; AVS and via the inferior parietal lobe (auditory dorsal stream; ADS. The AVS is responsible for sound recognition, and the ADS for sound-localization, voice detection and integration of calls with faces. I propose that the primary role of the ADS in non-human primates is the detection and response to contact calls. These calls are exchanged between tribe members (e.g., mother-offspring and are used for monitoring location. Detection of contact calls occurs by the ADS identifying a voice, localizing it, and verifying that the corresponding face is out of sight. Once a contact call is detected, the primate produces a contact call in return via descending connections from the frontal lobe to a network of limbic and brainstem regions. Because the ADS of present day humans also performs speech production, I further propose an evolutionary course for the transition from contact call exchange to an early form of speech. In accordance with this model, structural changes to the ADS endowed early members of the genus Homo with partial vocal control. This development was beneficial as it enabled offspring to modify their contact calls with intonations for signaling high or low levels of distress to their mother. Eventually, individuals were capable of participating in yes-no question-answer conversations. In these conversations the offspring emitted a low-level distress call for inquiring about the safety of objects (e.g., food, and his/her mother responded with a high- or low-level distress call to signal approval or disapproval of the interaction. Gradually, the ADS and its connections with brainstem motor regions became more robust and vocal control became more volitional. Speech emerged once vocal control was sufficient for inventing novel calls.

  17. From where to what: a neuroanatomically based evolutionary model of the emergence of speech in humans [version 3; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Oren Poliva

    2017-09-01

    Full Text Available In the brain of primates, the auditory cortex connects with the frontal lobe via the temporal pole (auditory ventral stream; AVS and via the inferior parietal lobe (auditory dorsal stream; ADS. The AVS is responsible for sound recognition, and the ADS for sound-localization, voice detection and integration of calls with faces. I propose that the primary role of the ADS in non-human primates is the detection and response to contact calls. These calls are exchanged between tribe members (e.g., mother-offspring and are used for monitoring location. Detection of contact calls occurs by the ADS identifying a voice, localizing it, and verifying that the corresponding face is out of sight. Once a contact call is detected, the primate produces a contact call in return via descending connections from the frontal lobe to a network of limbic and brainstem regions. Because the ADS of present day humans also performs speech production, I further propose an evolutionary course for the transition from contact call exchange to an early form of speech. In accordance with this model, structural changes to the ADS endowed early members of the genus Homo with partial vocal control. This development was beneficial as it enabled offspring to modify their contact calls with intonations for signaling high or low levels of distress to their mother. Eventually, individuals were capable of participating in yes-no question-answer conversations. In these conversations the offspring emitted a low-level distress call for inquiring about the safety of objects (e.g., food, and his/her mother responded with a high- or low-level distress call to signal approval or disapproval of the interaction. Gradually, the ADS and its connections with brainstem motor regions became more robust and vocal control became more volitional. Speech emerged once vocal control was sufficient for inventing novel calls.

  18. Speech Intelligibility Prediction Based on Mutual Information

    DEFF Research Database (Denmark)

    Jensen, Jesper; Taal, Cees H.

    2014-01-01

    This paper deals with the problem of predicting the average intelligibility of noisy and potentially processed speech signals, as observed by a group of normal hearing listeners. We propose a model which performs this prediction based on the hypothesis that intelligibility is monotonically related...... to the mutual information between critical-band amplitude envelopes of the clean signal and the corresponding noisy/processed signal. The resulting intelligibility predictor turns out to be a simple function of the mean-square error (mse) that arises when estimating a clean critical-band amplitude using...... a minimum mean-square error (mmse) estimator based on the noisy/processed amplitude. The proposed model predicts that speech intelligibility cannot be improved by any processing of noisy critical-band amplitudes. Furthermore, the proposed intelligibility predictor performs well ( ρ > 0.95) in predicting...

  19. A parametric framework for modelling of bioelectrical signals

    CERN Document Server

    Mughal, Yar Muhammad

    2016-01-01

    This book examines non-invasive, electrical-based methods for disease diagnosis and assessment of heart function. In particular, a formalized signal model is proposed since this offers several advantages over methods that rely on measured data alone. By using a formalized representation, the parameters of the signal model can be easily manipulated and/or modified, thus providing mechanisms that allow researchers to reproduce and control such signals. In addition, having such a formalized signal model makes it possible to develop computer tools that can be used for manipulating and understanding how signal changes result from various heart conditions, as well as for generating input signals for experimenting with and evaluating the performance of e.g. signal extraction methods. The work focuses on bioelectrical information, particularly electrical bio-impedance (EBI). Once the EBI has been measured, the corresponding signals have to be modelled for analysis. This requires a structured approach in order to move...

  20. A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

    Directory of Open Access Journals (Sweden)

    Dansereau Richard M

    2007-01-01

    Full Text Available We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA. For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.

  1. A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

    Directory of Open Access Journals (Sweden)

    Mohammad H. Radfar

    2006-11-01

    Full Text Available We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA. For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.

  2. The role of high-frequency envelope fluctuations for speech masking release

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model (sEPSM; Jørgensen and Dau, 2011; Jørgensen et al., 2013) was shown to successfully predict speech intelligibility in conditions with stationary and fluctuating interferers, reverberation, and spectral subtraction. The key element in the model...... was the multi-resolution estimation of the signal-to-noise ratio in the envelope domain (SNRenv) at the output of a modulation filterbank. The simulations suggested that mainly modulation filters centered in the range from 1-8 Hz contribute to speech intelligibility in the case of stationary maskers whereas...... modulation filters tuned to frequencies above 16 Hz might be important in the case of fluctuating maskers. In the present study, the role of high-frequency envelope fluctuations for speech masking release was further investigated in conditions of speech-on-speech masking. Simulations were compared to various...

  3. The role of high-frequency envelope fluctuations for speech masking release

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model [sEPSM; Jørgensen and Dau (2011), Jørgensen et al. (2013)] was shown to successfully predict speech intelligibility in conditions with stationary and fluctuating interferers, reverberation, and spectral subtraction. The key element in the model...... was the multi-resolution estimation of the signal-to-noise ratio in the envelope domain (SNRenv) at the output of a modulation filterbank. The simulations suggested that mainly modulation filters centered in the range from 1 to 8 Hz contribute to speech intelligibility in the case of stationary maskers whereas...... modulation filters tuned to frequencies above 16 Hz might be important in the case of fluctuating maskers. In the present study, the role of high-frequency envelope fluctuations for speech masking release was further investigated in conditions of speech-on-speech masking. Simulations were compared to various...

  4. Parents' and speech and language therapists' explanatory models of language development, language delay and intervention.

    Science.gov (United States)

    Marshall, Julie; Goldbart, Juliet; Phillips, Julie

    2007-01-01

    Parental and speech and language therapist (SLT) explanatory models may affect engagement with speech and language therapy, but there has been dearth of research in this area. This study investigated parents' and SLTs' views about language development, delay and intervention in pre-school children with language delay. The aims were to describe, explore and explain the thoughts, understandings, perceptions, beliefs, knowledge and feelings held by: a group of parents from East Manchester, UK, whose pre-school children had been referred with suspected language delay; and SLTs working in the same area, in relation to language development, language delay and language intervention. A total of 24 unstructured interviews were carried out: 15 with parents whose children had been referred for speech and language therapy and nine with SLTs who worked with pre-school children. The interviews were transcribed verbatim and coded using Atlas/ti. The data were analysed, subjected to respondent validation, and grounded theories and principled descriptions developed to explain and describe parents' and SLTs' beliefs and views. Parent and SLT data are presented separately. There are commonalities and differences between the parents and the SLTs. Both groups believe that language development and delay are influenced by both external and internal factors. Parents give more weight to the role of gender, imitation and personality and value television and videos, whereas the SLTs value the 'right environment' and listening skills and consider that health/disability and socio-economic factors are important. Parents see themselves as experts on their child and have varied ideas about the role of SLTs, which do not always accord with SLTs' views. The parents and SLTs differ in their views of the roles of imitation and play in intervention. Parents typically try strategies before seeing an SLT. These data suggest that parents' ideas vary and that, although parents and SLTs may share some

  5. Signal Processing Model for Radiation Transport

    Energy Technology Data Exchange (ETDEWEB)

    Chambers, D H

    2008-07-28

    This note describes the design of a simplified gamma ray transport model for use in designing a sequential Bayesian signal processor for low-count detection and classification. It uses a simple one-dimensional geometry to describe the emitting source, shield effects, and detector (see Fig. 1). At present, only Compton scattering and photoelectric absorption are implemented for the shield and the detector. Other effects may be incorporated in the future by revising the expressions for the probabilities of escape and absorption. Pair production would require a redesign of the simulator to incorporate photon correlation effects. The initial design incorporates the physical effects that were present in the previous event mode sequence simulator created by Alan Meyer. The main difference is that this simulator transports the rate distributions instead of single photons. Event mode sequences and other time-dependent photon flux sequences are assumed to be marked Poisson processes that are entirely described by their rate distributions. Individual realizations can be constructed from the rate distribution using a random Poisson point sequence generator.

  6. Speech Matters

    DEFF Research Database (Denmark)

    Hasse Jørgensen, Stina

    2011-01-01

    About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....

  7. Speech-to-Speech Relay Service

    Science.gov (United States)

    Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...

  8. Modelling the Architecture of Phonetic Plans: Evidence from Apraxia of Speech

    Science.gov (United States)

    Ziegler, Wolfram

    2009-01-01

    In theories of spoken language production, the gestural code prescribing the movements of the speech organs is usually viewed as a linear string of holistic, encapsulated, hard-wired, phonetic plans, e.g., of the size of phonemes or syllables. Interactions between phonetic units on the surface of overt speech are commonly attributed to either the…

  9. Building a Model of Support for Preschool Children with Speech and Language Disorders

    Science.gov (United States)

    Robertson, Natalie; Ohi, Sarah

    2016-01-01

    Speech and language disorders impede young children's abilities to communicate and are often associated with a number of behavioural problems arising in the preschool classroom. This paper reports a small-scale study that investigated 23 Australian educators' and 7 Speech Pathologists' experiences in working with three to five year old children…

  10. Integrating Music Therapy Services and Speech-Language Therapy Services for Children with Severe Communication Impairments: A Co-Treatment Model

    Science.gov (United States)

    Geist, Kamile; McCarthy, John; Rodgers-Smith, Amy; Porter, Jessica

    2008-01-01

    Documenting how music therapy can be integrated with speech-language therapy services for children with communication delay is not evident in the literature. In this article, a collaborative model with procedures, experiences, and communication outcomes of integrating music therapy with the existing speech-language services is given. Using…

  11. Investigations on search methods for speech recognition using weighted finite state transducers

    OpenAIRE

    Rybach, David

    2014-01-01

    The search problem in the statistical approach to speech recognition is to find the most likely word sequence for an observed speech signal using a combination of knowledge sources, i.e. the language model, the pronunciation model, and the acoustic models of phones. The resulting search space is enormous. Therefore, an efficient search strategy is required to compute the result with a feasible amount of time and memory. The structured statistical models as well as their combination, the searc...

  12. Apraxia of Speech

    Science.gov (United States)

    ... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...

  13. Statistical Challenges in Modeling Big Brain Signals

    KAUST Repository

    Yu, Zhaoxia

    2017-11-01

    Brain signal data are inherently big: massive in amount, complex in structure, and high in dimensions. These characteristics impose great challenges for statistical inference and learning. Here we review several key challenges, discuss possible solutions, and highlight future research directions.

  14. Statistical Challenges in Modeling Big Brain Signals

    KAUST Repository

    Yu, Zhaoxia; Pluta, Dustin; Shen, Tong; Chen, Chuansheng; Xue, Gui; Ombao, Hernando

    2017-01-01

    Brain signal data are inherently big: massive in amount, complex in structure, and high in dimensions. These characteristics impose great challenges for statistical inference and learning. Here we review several key challenges, discuss possible

  15. Introductory speeches

    International Nuclear Information System (INIS)

    2001-01-01

    This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering

  16. Channel noise enhances signal detectability in a model of acoustic neuron through the stochastic resonance paradigm.

    Science.gov (United States)

    Liberti, M; Paffi, A; Maggio, F; De Angelis, A; Apollonio, F; d'Inzeo, G

    2009-01-01

    A number of experimental investigations have evidenced the extraordinary sensitivity of neuronal cells to weak input stimulations, including electromagnetic (EM) fields. Moreover, it has been shown that biological noise, due to random channels gating, acts as a tuning factor in neuronal processing, according to the stochastic resonant (SR) paradigm. In this work the attention is focused on noise arising from the stochastic gating of ionic channels in a model of Ranvier node of acoustic fibers. The small number of channels gives rise to a high noise level, which is able to cause a spike train generation even in the absence of stimulations. A SR behavior has been observed in the model for the detection of sinusoidal signals at frequencies typical of the speech.

  17. Separating Underdetermined Convolutive Speech Mixtures

    DEFF Research Database (Denmark)

    Pedersen, Michael Syskind; Wang, DeLiang; Larsen, Jan

    2006-01-01

    a method for underdetermined blind source separation of convolutive mixtures. The proposed framework is applicable for separation of instantaneous as well as convolutive speech mixtures. It is possible to iteratively extract each speech signal from the mixture by combining blind source separation...

  18. Sinusoidal Representation of Acoustic Signals

    Science.gov (United States)

    Honda, Masaaki

    Sinusoidal representation of acoustic signals has been an important tool in speech and music processing like signal analysis, synthesis and time scale or pitch modifications. It can be applicable to arbitrary signals, which is an important advantage over other signal representations like physical modeling of acoustic signals. In sinusoidal representation, acoustic signals are composed as sums of sinusoid (sine wave) with different amplitudes, frequencies and phases, which is based on the timedependent short-time Fourier transform (STFT). This article describes the principles of acoustic signal analysis/synthesis based on a sinusoid representation with focus on sine waves with rapidly varying frequency.

  19. Phonetic recalibration of speech by text

    NARCIS (Netherlands)

    Keetels, M.N.; Schakel, L.; de Bonte, M.; Vroomen, J.

    2016-01-01

    Listeners adjust their phonetic categories to cope with variations in the speech signal (phonetic recalibration). Previous studies have shown that lipread speech (and word knowledge) can adjust the perception of ambiguous speech and can induce phonetic adjustments (Bertelson, Vroomen, & de Gelder in

  20. Mathematical modelling of SERK mediated BR signalling

    NARCIS (Netherlands)

    Esse, van G.W.

    2013-01-01

    Being sessile by nature plants are continuously challenged by biotic and abiotic stress factors. At the cellular level, different stimuli are perceived and translated to the desired response. In order to achieve this, signal transduction cascades have to be interlinked. Complex networks

  1. Efficient ECG Signal Compression Using Adaptive Heart Model

    National Research Council Canada - National Science Library

    Szilagyi, S

    2001-01-01

    This paper presents an adaptive, heart-model-based electrocardiography (ECG) compression method. After conventional pre-filtering the waves from the signal are localized and the model's parameters are determined...

  2. Improving traffic signal management and operations : a basic service model.

    Science.gov (United States)

    2009-12-01

    This report provides a guide for achieving a basic service model for traffic signal management and : operations. The basic service model is based on simply stated and defensible operational objectives : that consider the staffing level, expertise and...

  3. [Improving speech comprehension using a new cochlear implant speech processor].

    Science.gov (United States)

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  4. Signals and Systems in Biomedical Engineering Signal Processing and Physiological Systems Modeling

    CERN Document Server

    Devasahayam, Suresh R

    2013-01-01

    The use of digital signal processing is ubiquitous in the field of physiology and biomedical engineering. The application of such mathematical and computational tools requires a formal or explicit understanding of physiology. Formal models and analytical techniques are interlinked in physiology as in any other field. This book takes a unitary approach to physiological systems, beginning with signal measurement and acquisition, followed by signal processing, linear systems modelling, and computer simulations. The signal processing techniques range across filtering, spectral analysis and wavelet analysis. Emphasis is placed on fundamental understanding of the concepts as well as solving numerical problems. Graphs and analogies are used extensively to supplement the mathematics. Detailed models of nerve and muscle at the cellular and systemic levels provide examples for the mathematical methods and computer simulations. Several of the models are sufficiently sophisticated to be of value in understanding real wor...

  5. Modeling Speech Level as a Function of Background Noise Level and Talker-to-Listener Distance for Talkers Wearing Hearing Protection Devices

    DEFF Research Database (Denmark)

    Bouserhal, Rachel E.; Bockstael, Annelies; MacDonald, Ewen

    2017-01-01

    Purpose: Studying the variations in speech levels with changing background noise level and talker-to-listener distance for talkers wearing hearing protection devices (HPDs) can aid in understanding communication in background noise. Method: Speech was recorded using an intra-aural HPD from 12...... complements the existing model presented by Pelegrín-García, Smits, Brunskog, and Jeong (2011) and expands on it by taking into account the effects of occlusion and background noise level on changes in speech sound level. Conclusions: Three models of the relationship between vocal effort, background noise...

  6. [The application of cybernetic modeling methods for the forensic medical personality identification based on the voice and sounding speech characteristics].

    Science.gov (United States)

    Kaganov, A Sh; Kir'yanov, P A

    2015-01-01

    The objective of the present publication was to discuss the possibility of application of cybernetic modeling methods to overcome the apparent discrepancy between two kinds of the speech records, viz. initial ones (e.g. obtained in the course of special investigation activities) and the voice prints obtained from the persons subjected to the criminalistic examination. The paper is based on the literature sources and the materials of original criminalistics expertises performed by the authors.

  7. The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility.

    Science.gov (United States)

    Bentsen, Thomas; May, Tobias; Kressner, Abigail A; Dau, Torsten

    2018-01-01

    Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements. A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech intelligibility in normal-hearing listeners. A substantial improvement of 25.4 percentage points in speech intelligibility scores was found going from a subband-based architecture, in which a Gaussian Mixture Model-based classifier predicts the distributions of speech and noise for each frequency channel, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where the units are assigned a continuous value between zero and one. Therefore, both components play significant roles and by combining them, speech intelligibility improvements were obtained in a six-talker condition at a low signal-to-noise ratio.

  8. Automatic Emotion Recognition in Speech: Possibilities and Significance

    Directory of Open Access Journals (Sweden)

    Milana Bojanić

    2009-12-01

    Full Text Available Automatic speech recognition and spoken language understanding are crucial steps towards a natural humanmachine interaction. The main task of the speech communication process is the recognition of the word sequence, but the recognition of prosody, emotion and stress tags may be of particular importance as well. This paper discusses thepossibilities of recognition emotion from speech signal in order to improve ASR, and also provides the analysis of acoustic features that can be used for the detection of speaker’s emotion and stress. The paper also provides a short overview of emotion and stress classification techniques. The importance and place of emotional speech recognition is shown in the domain of human-computer interactive systems and transaction communication model. The directions for future work are given at the end of this work.

  9. MPD model for radar echo signal of hypersonic targets

    Directory of Open Access Journals (Sweden)

    Xu Xuefei

    2014-08-01

    Full Text Available The stop-and-go (SAG model is typically used for echo signal received by the radar using linear frequency modulation pulse compression. In this study, the authors demonstrate that this model is not applicable to hypersonic targets. Instead of SAG model, they present a more realistic echo signal model (moving-in-pulse duration (MPD for hypersonic targets. Following that, they evaluate the performances of pulse compression under the SAG and MPD models by theoretical analysis and simulations. They found that the pulse compression gain has an increase of 3 dB by using the MPD model compared with the SAG model in typical cases.

  10. Speech-specificity of two audiovisual integration effects

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2010-01-01

    Seeing the talker’s articulatory mouth movements can influence the auditory speech percept both in speech identification and detection tasks. Here we show that these audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers...... often fail to perceive as speech. While audiovisual integration in the identification task only occurred when observers were informed of the speech-like nature of SWS, integration occurred in the detection task both for informed and naïve observers. This shows that both speech-specific and general...... mechanisms underlie audiovisual integration of speech....

  11. Large-signal modeling method for power FETs and diodes

    Energy Technology Data Exchange (ETDEWEB)

    Sun Lu; Wang Jiali; Wang Shan; Li Xuezheng; Shi Hui; Wang Na; Guo Shengping, E-mail: sunlu_1019@126.co [School of Electromechanical Engineering, Xidian University, Xi' an 710071 (China)

    2009-06-01

    Under a large signal drive level, a frequency domain black box model of the nonlinear scattering function is introduced into power FETs and diodes. A time domain measurement system and a calibration method based on a digital oscilloscope are designed to extract the nonlinear scattering function of semiconductor devices. The extracted models can reflect the real electrical performance of semiconductor devices and propose a new large-signal model to the design of microwave semiconductor circuits.

  12. Large-signal modeling method for power FETs and diodes

    International Nuclear Information System (INIS)

    Sun Lu; Wang Jiali; Wang Shan; Li Xuezheng; Shi Hui; Wang Na; Guo Shengping

    2009-01-01

    Under a large signal drive level, a frequency domain black box model of the nonlinear scattering function is introduced into power FETs and diodes. A time domain measurement system and a calibration method based on a digital oscilloscope are designed to extract the nonlinear scattering function of semiconductor devices. The extracted models can reflect the real electrical performance of semiconductor devices and propose a new large-signal model to the design of microwave semiconductor circuits.

  13. Audiovisual Cues and Perceptual Learning of Spectrally Distorted Speech

    Science.gov (United States)

    Pilling, Michael; Thomas, Sharon

    2011-01-01

    Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…

  14. Comparing the information conveyed by envelope modulation for speech intelligibility, speech quality, and music quality.

    Science.gov (United States)

    Kates, James M; Arehart, Kathryn H

    2015-10-01

    This paper uses mutual information to quantify the relationship between envelope modulation fidelity and perceptual responses. Data from several previous experiments that measured speech intelligibility, speech quality, and music quality are evaluated for normal-hearing and hearing-impaired listeners. A model of the auditory periphery is used to generate envelope signals, and envelope modulation fidelity is calculated using the normalized cross-covariance of the degraded signal envelope with that of a reference signal. Two procedures are used to describe the envelope modulation: (1) modulation within each auditory frequency band and (2) spectro-temporal processing that analyzes the modulation of spectral ripple components fit to successive short-time spectra. The results indicate that low modulation rates provide the highest information for intelligibility, while high modulation rates provide the highest information for speech and music quality. The low-to-mid auditory frequencies are most important for intelligibility, while mid frequencies are most important for speech quality and high frequencies are most important for music quality. Differences between the spectral ripple components used for the spectro-temporal analysis were not significant in five of the six experimental conditions evaluated. The results indicate that different modulation-rate and auditory-frequency weights may be appropriate for indices designed to predict different types of perceptual relationships.

  15. Modeling laser velocimeter signals as triply stochastic Poisson processes

    Science.gov (United States)

    Mayo, W. T., Jr.

    1976-01-01

    Previous models of laser Doppler velocimeter (LDV) systems have not adequately described dual-scatter signals in a manner useful for analysis and simulation of low-level photon-limited signals. At low photon rates, an LDV signal at the output of a photomultiplier tube is a compound nonhomogeneous filtered Poisson process, whose intensity function is another (slower) Poisson process with the nonstationary rate and frequency parameters controlled by a random flow (slowest) process. In the present paper, generalized Poisson shot noise models are developed for low-level LDV signals. Theoretical results useful in detection error analysis and simulation are presented, along with measurements of burst amplitude statistics. Computer generated simulations illustrate the difference between Gaussian and Poisson models of low-level signals.

  16. Coherence method of identifying signal noise model

    International Nuclear Information System (INIS)

    Vavrin, J.

    1981-01-01

    The noise analysis method is discussed in identifying perturbance models and their parameters by a stochastic analysis of the noise model of variables measured on a reactor. The analysis of correlations is made in the frequency region using coherence analysis methods. In identifying an actual specific perturbance, its model should be determined and recognized in a compound model of the perturbance system using the results of observation. The determination of the optimum estimate of the perturbance system model is based on estimates of related spectral densities which are determined from the spectral density matrix of the measured variables. Partial and multiple coherence, partial transfers, the power spectral densities of the input and output variables of the noise model are determined from the related spectral densities. The possibilities of applying the coherence identification methods were tested on a simple case of a simulated stochastic system. Good agreement was found of the initial analytic frequency filters and the transfers identified. (B.S.)

  17. Modeling of Nonlinear Beat Signals of TAE's

    Science.gov (United States)

    Zhang, Bo; Berk, Herbert; Breizman, Boris; Zheng, Linjin

    2012-03-01

    Experiments on Alcator C-Mod reveal Toroidal Alfven Eigenmodes (TAE) together with signals at various beat frequencies, including those at twice the mode frequency. The beat frequencies are sidebands driven by quadratic nonlinear terms in the MHD equations. These nonlinear sidebands have not yet been quantified by any existing codes. We extend the AEGIS code to capture nonlinear effects by treating the nonlinear terms as a driving source in the linear MHD solver. Our goal is to compute the spatial structure of the sidebands for realistic geometry and q-profile, which can be directly compared with experiment in order to interpret the phase contrast imaging diagnostic measurements and to enable the quantitative determination of the Alfven wave amplitude in the plasma core

  18. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    Science.gov (United States)

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  19. Implementation of a Tour Guide Robot System Using RFID Technology and Viterbi Algorithm-Based HMM for Speech Recognition

    Directory of Open Access Journals (Sweden)

    Neng-Sheng Pai

    2014-01-01

    Full Text Available This paper applied speech recognition and RFID technologies to develop an omni-directional mobile robot into a robot with voice control and guide introduction functions. For speech recognition, the speech signals were captured by short-time processing. The speaker first recorded the isolated words for the robot to create speech database of specific speakers. After the speech pre-processing of this speech database, the feature parameters of cepstrum and delta-cepstrum were obtained using linear predictive coefficient (LPC. Then, the Hidden Markov Model (HMM was used for model training of the speech database, and the Viterbi algorithm was used to find an optimal state sequence as the reference sample for speech recognition. The trained reference model was put into the industrial computer on the robot platform, and the user entered the isolated words to be tested. After processing by the same reference model and comparing with previous reference model, the path of the maximum total probability in various models found using the Viterbi algorithm in the recognition was the recognition result. Finally, the speech recognition and RFID systems were achieved in an actual environment to prove its feasibility and stability, and implemented into the omni-directional mobile robot.

  20. Synergetic Organization in Speech Rhythm

    Science.gov (United States)

    Cummins, Fred

    The Speech Cycling Task is a novel experimental paradigm developed together with Robert Port and Keiichi Tajima at Indiana University. In a task of this sort, subjects repeat a phrase containing multiple prominent, or stressed, syllables in time with an auditory metronome, which can be simple or complex. A phase-based collective variable is defined in the acoustic speech signal. This paper reports on two experiments using speech cycling which together reveal many of the hallmarks of hierarchically coupled oscillatory processes. The first experiment requires subjects to place the final stressed syllable of a small phrase at specified phases within the overall Phrase Repetition Cycle (PRC). It is clearly demonstrated that only three patterns, characterized by phases around 1/3, 1/2 or 2/3 are reliably produced, and these points are attractors for other target phases. The system is thus multistable, and the attractors correspond to stable couplings between the metrical foot and the PRC. A second experiment examines the behavior of these attractors at increased rates. Faster rates lead to mode jumps between attractors. Previous experiments have also illustrated hysteresis as the system moves from one mode to the next. The dynamical organization is particularly interesting from a modeling point of view, as there is no single part of the speech production system which cycles at the level of either the metrical foot or the phrase repetition cycle. That is, there is no continuous kinematic observable in the system. Nonetheless, there is strong evidence that the oscopic behavior of the entire production system is correctly described as hierarchically coupled oscillators. There are many parallels between this organization and the forms of inter-limb coupling observed in locomotion and rhythmic manual tasks.

  1. Modelling of Signal - Level Crossing System

    Directory of Open Access Journals (Sweden)

    Daniel Novak

    2006-01-01

    Full Text Available The author presents an object-oriented model of a railway level-crossing system created for the purpose of functional requirements specification. Unified Modelling Language (UML, version 1.4, which enables specification, visualisation, construction and documentation of software system artefacts, was used. The main attention was paid to analysis and design phases. The former phase resulted in creation of use case diagrams and sequential diagrams, the latter in creation of class/object diagrams and statechart diagrams.

  2. HP Memristor mathematical model for periodic signals and DC

    KAUST Repository

    Radwan, Ahmed G.; Salama, Khaled N.; Zidan, Mohammed A.

    2012-01-01

    the formulas for any general square wave. The limiting conditions for saturation are also provided in case of either DC or periodic signals. The derived equations are compared to the SPICE model of the Memristor showing a perfect match.

  3. Towards an auditory account of speech rhythm: application of a model of the auditory 'primal sketch' to two multi-language corpora.

    Science.gov (United States)

    Lee, Christopher S; Todd, Neil P McAngus

    2004-10-01

    The world's languages display important differences in their rhythmic organization; most particularly, different languages seem to privilege different phonological units (mora, syllable, or stress foot) as their basic rhythmic unit. There is now considerable evidence that such differences have important consequences for crucial aspects of language acquisition and processing. Several questions remain, however, as to what exactly characterizes the rhythmic differences, how they are manifested at an auditory/acoustic level and how listeners, whether adult native speakers or young infants, process rhythmic information. In this paper it is proposed that the crucial determinant of rhythmic organization is the variability in the auditory prominence of phonetic events. In order to test this auditory prominence hypothesis, an auditory model is run on two multi-language data-sets, the first consisting of matched pairs of English and French sentences, and the second consisting of French, Italian, English and Dutch sentences. The model is based on a theory of the auditory primal sketch, and generates a primitive representation of an acoustic signal (the rhythmogram) which yields a crude segmentation of the speech signal and assigns prominence values to the obtained sequence of events. Its performance is compared with that of several recently proposed phonetic measures of vocalic and consonantal variability.

  4. Model-based Bayesian signal extraction algorithm for peripheral nerves

    Science.gov (United States)

    Eggers, Thomas E.; Dweiri, Yazan M.; McCallum, Grant A.; Durand, Dominique M.

    2017-10-01

    Objective. Multi-channel cuff electrodes have recently been investigated for extracting fascicular-level motor commands from mixed neural recordings. Such signals could provide volitional, intuitive control over a robotic prosthesis for amputee patients. Recent work has demonstrated success in extracting these signals in acute and chronic preparations using spatial filtering techniques. These extracted signals, however, had low signal-to-noise ratios and thus limited their utility to binary classification. In this work a new algorithm is proposed which combines previous source localization approaches to create a model based method which operates in real time. Approach. To validate this algorithm, a saline benchtop setup was created to allow the precise placement of artificial sources within a cuff and interference sources outside the cuff. The artificial source was taken from five seconds of chronic neural activity to replicate realistic recordings. The proposed algorithm, hybrid Bayesian signal extraction (HBSE), is then compared to previous algorithms, beamforming and a Bayesian spatial filtering method, on this test data. An example chronic neural recording is also analyzed with all three algorithms. Main results. The proposed algorithm improved the signal to noise and signal to interference ratio of extracted test signals two to three fold, as well as increased the correlation coefficient between the original and recovered signals by 10-20%. These improvements translated to the chronic recording example and increased the calculated bit rate between the recovered signals and the recorded motor activity. Significance. HBSE significantly outperforms previous algorithms in extracting realistic neural signals, even in the presence of external noise sources. These results demonstrate the feasibility of extracting dynamic motor signals from a multi-fascicled intact nerve trunk, which in turn could extract motor command signals from an amputee for the end goal of

  5. A simple statistical signal loss model for deep underground garage

    DEFF Research Database (Denmark)

    Nguyen, Huan Cong; Gimenez, Lucas Chavarria; Kovacs, Istvan

    2016-01-01

    In this paper we address the channel modeling aspects for a deep-indoor scenario with extreme coverage conditions in terms of signal losses, namely underground garage areas. We provide an in-depth analysis in terms of path loss (gain) and large scale signal shadowing, and a propose simple...... propagation model which can be used to predict cellular signal levels in similar deep-indoor scenarios. The proposed frequency-independent floor attenuation factor (FAF) is shown to be in range of 5.2 dB per meter deep....

  6. Wires in the soup: quantitative models of cell signaling

    Science.gov (United States)

    Cheong, Raymond; Levchenko, Andre

    2014-01-01

    Living cells are capable of extracting information from their environments and mounting appropriate responses to a variety of associated challenges. The underlying signal transduction networks enabling this can be quite complex, necessitating for their unraveling by sophisticated computational modeling coupled with precise experimentation. Although we are still at the beginning of this process, some recent examples of integrative analysis of cell signaling are very encouraging. This review highlights the case of the NF-κB pathway in order to illustrate how a quantitative model of a signaling pathway can be gradually constructed through continuous experimental validation, and what lessons one might learn from such exercises. PMID:18291655

  7. THE SIGNAL APPROACH TO MODELLING THE BALANCE OF PAYMENT CRISIS

    Directory of Open Access Journals (Sweden)

    O. Chernyak

    2016-12-01

    Full Text Available The paper considers and presents synthesis of theoretical models of balance of payment crisis and investigates the most effective ways to model the crisis in Ukraine. For mathematical formalization of balance of payment crisis, comparative analysis of the effectiveness of different calculation methods of Exchange Market Pressure Index was performed. A set of indicators that signal the growing likelihood of balance of payments crisis was defined using signal approach. With the help of minimization function thresholds indicators were selected, the crossing of which signalize increase in the probability of balance of payment crisis.

  8. Novel Techniques for Dialectal Arabic Speech Recognition

    CERN Document Server

    Elmahdy, Mohamed; Minker, Wolfgang

    2012-01-01

    Novel Techniques for Dialectal Arabic Speech describes approaches to improve automatic speech recognition for dialectal Arabic. Since speech resources for dialectal Arabic speech recognition are very sparse, the authors describe how existing Modern Standard Arabic (MSA) speech data can be applied to dialectal Arabic speech recognition, while assuming that MSA is always a second language for all Arabic speakers. In this book, Egyptian Colloquial Arabic (ECA) has been chosen as a typical Arabic dialect. ECA is the first ranked Arabic dialect in terms of number of speakers, and a high quality ECA speech corpus with accurate phonetic transcription has been collected. MSA acoustic models were trained using news broadcast speech. In order to cross-lingually use MSA in dialectal Arabic speech recognition, the authors have normalized the phoneme sets for MSA and ECA. After this normalization, they have applied state-of-the-art acoustic model adaptation techniques like Maximum Likelihood Linear Regression (MLLR) and M...

  9. Small signal modeling of wind farms

    DEFF Research Database (Denmark)

    Ebrahimzadeh, Esmaeil; Blaabjerg, Frede; Wang, Xiongfei

    2017-01-01

    -Input Multi-Output (MIMO) dynamic system, where the current control loops with Phase-Locked Loops (PLLs) are linearized around an operating point. Each sub-module of the wind farm is modeled as a 2×2 admittance matrix in dq-domain and all are combined together by using a dq nodal admittance matrix....... The frequency and damping of the oscillatory modes are calculated by finding the poles of the introduced MIMO matrix. Time-domain simulation results obtained from a 400-MW wind farm are used to verify the effectiveness of the presented model....

  10. An accurate and simple large signal model of HEMT

    DEFF Research Database (Denmark)

    Liu, Qing

    1989-01-01

    A large-signal model of discrete HEMTs (high-electron-mobility transistors) has been developed. It is simple and suitable for SPICE simulation of hybrid digital ICs. The model parameters are extracted by using computer programs and data provided by the manufacturer. Based on this model, a hybrid...

  11. Semiconductor Modeling For Simulating Signal, Power, and Electromagneticintegrity

    CERN Document Server

    Leventhal, Roy

    2006-01-01

    Assists engineers in designing high-speed circuits. The emphasis is on semiconductor modeling, with PCB transmission line effects, equipment enclosure effects, and other modeling issues discussed as needed. This text addresses practical considerations, including process variation, model accuracy, validation and verification, and signal integrity.

  12. Vibration Signal Forecasting on Rotating Machinery by means of Signal Decomposition and Neurofuzzy Modeling

    Directory of Open Access Journals (Sweden)

    Daniel Zurita-Millán

    2016-01-01

    Full Text Available Vibration monitoring plays a key role in the industrial machinery reliability since it allows enhancing the performance of the machinery under supervision through the detection of failure modes. Thus, vibration monitoring schemes that give information regarding future condition, that is, prognosis approaches, are of growing interest for the scientific and industrial communities. This work proposes a vibration signal prognosis methodology, applied to a rotating electromechanical system and its associated kinematic chain. The method combines the adaptability of neurofuzzy modeling with a signal decomposition strategy to model the patterns of the vibrations signal under different fault scenarios. The model tuning is performed by means of Genetic Algorithms along with a correlation based interval selection procedure. The performance and effectiveness of the proposed method are validated experimentally with an electromechanical test bench containing a kinematic chain. The results of the study indicate the suitability of the method for vibration forecasting in complex electromechanical systems and their associated kinematic chains.

  13. A computational model of human auditory signal processing and perception

    DEFF Research Database (Denmark)

    Jepsen, Morten Løve; Ewert, Stephan D.; Dau, Torsten

    2008-01-01

    A model of computational auditory signal-processing and perception that accounts for various aspects of simultaneous and nonsimultaneous masking in human listeners is presented. The model is based on the modulation filterbank model described by Dau et al. [J. Acoust. Soc. Am. 102, 2892 (1997...... discrimination with pure tones and broadband noise, tone-in-noise detection, spectral masking with narrow-band signals and maskers, forward masking with tone signals and tone or noise maskers, and amplitude-modulation detection with narrow- and wideband noise carriers. The model can account for most of the key...... properties of the data and is more powerful than the original model. The model might be useful as a front end in technical applications....

  14. Beyond production: Brain responses during speech perception in adults who stutter

    Directory of Open Access Journals (Sweden)

    Tali Halag-Milo

    2016-01-01

    Full Text Available Developmental stuttering is a speech disorder that disrupts the ability to produce speech fluently. While stuttering is typically diagnosed based on one's behavior during speech production, some models suggest that it involves more central representations of language, and thus may affect language perception as well. Here we tested the hypothesis that developmental stuttering implicates neural systems involved in language perception, in a task that manipulates comprehensibility without an overt speech production component. We used functional magnetic resonance imaging to measure blood oxygenation level dependent (BOLD signals in adults who do and do not stutter, while they were engaged in an incidental speech perception task. We found that speech perception evokes stronger activation in adults who stutter (AWS compared to controls, specifically in the right inferior frontal gyrus (RIFG and in left Heschl's gyrus (LHG. Significant differences were additionally found in the lateralization of response in the inferior frontal cortex: AWS showed bilateral inferior frontal activity, while controls showed a left lateralized pattern of activation. These findings suggest that developmental stuttering is associated with an imbalanced neural network for speech processing, which is not limited to speech production, but also affects cortical responses during speech perception.

  15. The Experimental Social Scientific Model in Speech Communication Research: Influences and Consequences.

    Science.gov (United States)

    Ferris, Sharmila Pixy

    A substantial number of published articles in speech communication research today is experimental/social scientific in nature. It is only in the past decade that scholars have begun to put the history of communication under the lens. Early advocates of the adoption of the method of social scientific inquiry were J. A. Winans, J. M. O'Neill, and C.…

  16. Facilitating Emergent Literacy: Efficacy of a Model that Partners Speech-Language Pathologists and Educators

    Science.gov (United States)

    Girolametto, Luigi; Weitzman, Elaine; Greenberg, Janice

    2012-01-01

    Purpose: This study examined the efficacy of a professional development program for early childhood educators that facilitated emergent literacy skills in preschoolers. The program, led by a speech-language pathologist, focused on teaching alphabet knowledge, print concepts, sound awareness, and decontextualized oral language within naturally…

  17. Source-system windowing for speech analysis

    NARCIS (Netherlands)

    Yegnanarayana, B.; Satyanarayana Murthy, P.; Eggen, J.H.

    1993-01-01

    In this paper we propose a speech-analysis method to bring out characteristics of the vocal tract system in short segments which are much less than a pitch period. The method performs windowing in the source and system components of the speech signal and recombines them to obtain a signal reflecting

  18. Discrete dynamic modeling of T cell survival signaling networks

    Science.gov (United States)

    Zhang, Ranran

    2009-03-01

    Biochemistry-based frameworks are often not applicable for the modeling of heterogeneous regulatory systems that are sparsely documented in terms of quantitative information. As an alternative, qualitative models assuming a small set of discrete states are gaining acceptance. This talk will present a discrete dynamic model of the signaling network responsible for the survival and long-term competence of cytotoxic T cells in the blood cancer T-LGL leukemia. We integrated the signaling pathways involved in normal T cell activation and the known deregulations of survival signaling in leukemic T-LGL, and formulated the regulation of each network element as a Boolean (logic) rule. Our model suggests that the persistence of two signals is sufficient to reproduce all known deregulations in leukemic T-LGL. It also indicates the nodes whose inactivity is necessary and sufficient for the reversal of the T-LGL state. We have experimentally validated several model predictions, including: (i) Inhibiting PDGF signaling induces apoptosis in leukemic T-LGL. (ii) Sphingosine kinase 1 and NFκB are essential for the long-term survival of T cells in T-LGL leukemia. (iii) T box expressed in T cells (T-bet) is constitutively activated in the T-LGL state. The model has identified potential therapeutic targets for T-LGL leukemia and can be used for generating long-term competent CTL necessary for tumor and cancer vaccine development. The success of this model, and of other discrete dynamic models, suggests that the organization of signaling networks has an determining role in their dynamics. Reference: R. Zhang, M. V. Shah, J. Yang, S. B. Nyland, X. Liu, J. K. Yun, R. Albert, T. P. Loughran, Jr., Network Model of Survival Signaling in LGL Leukemia, PNAS 105, 16308-16313 (2008).

  19. Perception and the temporal properties of speech

    Science.gov (United States)

    Gordon, Peter C.

    1991-11-01

    Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.

  20. Collective signaling behavior in a networked-oscillator model

    Science.gov (United States)

    Liu, Z.-H.; Hui, P. M.

    2007-09-01

    We propose and study the collective behavior of a model of networked signaling objects that incorporates several ingredients of real-life systems. These ingredients include spatial inhomogeneity with grouping of signaling objects, signal attenuation with distance, and delayed and impulsive coupling between non-identical signaling objects. Depending on the coupling strength and/or time-delay effect, the model exhibits completely, partially, and locally collective signaling behavior. In particular, a correlated signaling (CS) behavior is observed in which there exist time durations when nearly a constant fraction of oscillators in the system are in the signaling state. These time durations are much longer than the duration of a spike when a single oscillator signals, and they are separated by regular intervals in which nearly all oscillators are silent. Such CS behavior is similar to that observed in biological systems such as fireflies, cicadas, crickets, and frogs. The robustness of the CS behavior against noise is also studied. It is found that properly adjusting the coupling strength and noise level could enhance the correlated behavior.

  1. Analysis of a dynamic model of guard cell signaling reveals the stability of signal propagation

    Science.gov (United States)

    Gan, Xiao; Albert, RéKa

    Analyzing the long-term behaviors (attractors) of dynamic models of biological systems can provide valuable insight into biological phenotypes and their stability. We identified the long-term behaviors of a multi-level, 70-node discrete dynamic model of the stomatal opening process in plants. We reduce the model's huge state space by reducing unregulated nodes and simple mediator nodes, and by simplifying the regulatory functions of selected nodes while keeping the model consistent with experimental observations. We perform attractor analysis on the resulting 32-node reduced model by two methods: 1. converting it into a Boolean model, then applying two attractor-finding algorithms; 2. theoretical analysis of the regulatory functions. We conclude that all nodes except two in the reduced model have a single attractor; and only two nodes can admit oscillations. The multistability or oscillations do not affect the stomatal opening level in any situation. This conclusion applies to the original model as well in all the biologically meaningful cases. We further demonstrate the robustness of signal propagation by showing that a large percentage of single-node knockouts does not affect the stomatal opening level. Thus, we conclude that the complex structure of this signal transduction network provides multiple information propagation pathways while not allowing extensive multistability or oscillations, resulting in robust signal propagation. Our innovative combination of methods offers a promising way to analyze multi-level models.

  2. Network modeling reveals prevalent negative regulatory relationships between signaling sectors in Arabidopsis immune signaling.

    Directory of Open Access Journals (Sweden)

    Masanao Sato

    Full Text Available Biological signaling processes may be mediated by complex networks in which network components and network sectors interact with each other in complex ways. Studies of complex networks benefit from approaches in which the roles of individual components are considered in the context of the network. The plant immune signaling network, which controls inducible responses to pathogen attack, is such a complex network. We studied the Arabidopsis immune signaling network upon challenge with a strain of the bacterial pathogen Pseudomonas syringae expressing the effector protein AvrRpt2 (Pto DC3000 AvrRpt2. This bacterial strain feeds multiple inputs into the signaling network, allowing many parts of the network to be activated at once. mRNA profiles for 571 immune response genes of 22 Arabidopsis immunity mutants and wild type were collected 6 hours after inoculation with Pto DC3000 AvrRpt2. The mRNA profiles were analyzed as detailed descriptions of changes in the network state resulting from the genetic perturbations. Regulatory relationships among the genes corresponding to the mutations were inferred by recursively applying a non-linear dimensionality reduction procedure to the mRNA profile data. The resulting static network model accurately predicted 23 of 25 regulatory relationships reported in the literature, suggesting that predictions of novel regulatory relationships are also accurate. The network model revealed two striking features: (i the components of the network are highly interconnected; and (ii negative regulatory relationships are common between signaling sectors. Complex regulatory relationships, including a novel negative regulatory relationship between the early microbe-associated molecular pattern-triggered signaling sectors and the salicylic acid sector, were further validated. We propose that prevalent negative regulatory relationships among the signaling sectors make the plant immune signaling network a "sector

  3. Multistage audiovisual integration of speech: dissociating identification and detection.

    Science.gov (United States)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias S

    2011-02-01

    Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech signal. Here, we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers informed of its speech-like nature recognize as speech. While the McGurk illusion only occurred for informed observers, the audiovisual detection advantage occurred for naïve observers as well. This finding supports a multistage account of audiovisual integration of speech in which the many attributes of the audiovisual speech signal are integrated by separate integration processes.

  4. Audiovisual Speech Synchrony Measure: Application to Biometrics

    Directory of Open Access Journals (Sweden)

    Gérard Chollet

    2007-01-01

    Full Text Available Speech is a means of communication which is intrinsically bimodal: the audio signal originates from the dynamics of the articulators. This paper reviews recent works in the field of audiovisual speech, and more specifically techniques developed to measure the level of correspondence between audio and visual speech. It overviews the most common audio and visual speech front-end processing, transformations performed on audio, visual, or joint audiovisual feature spaces, and the actual measure of correspondence between audio and visual speech. Finally, the use of synchrony measure for biometric identity verification based on talking faces is experimented on the BANCA database.

  5. SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support

    Directory of Open Access Journals (Sweden)

    Giampiero Salvi

    2009-01-01

    Full Text Available This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling.

  6. Signal classification using global dynamical models, Part I: Theory

    International Nuclear Information System (INIS)

    Kadtke, J.; Kremliovsky, M.

    1996-01-01

    Detection and classification of signals is one of the principal areas of signal processing, and the utilization of nonlinear information has long been considered as a way of improving performance beyond standard linear (e.g. spectral) techniques. Here, we develop a method for using global models of chaotic dynamical systems theory to define a signal classification processing chain, which is sensitive to nonlinear correlations in the data. We use it to demonstrate classification in high noise regimes (negative SNR), and argue that classification probabilities can be directly computed from ensemble statistics in the model coefficient space. We also develop a modification for non-stationary signals (i.e. transients) using non-autonomous ODEs. In Part II of this paper, we demonstrate the analysis on actual open ocean acoustic data from marine biologics. copyright 1996 American Institute of Physics

  7. Small-signal model for the series resonant converter

    Science.gov (United States)

    King, R. J.; Stuart, T. A.

    1985-01-01

    The results of a previous discrete-time model of the series resonant dc-dc converter are reviewed and from these a small signal dynamic model is derived. This model is valid for low frequencies and is based on the modulation of the diode conduction angle for control. The basic converter is modeled separately from its output filter to facilitate the use of these results for design purposes. Experimental results are presented.

  8. Mixed-signal instrumentation for large-signal device characterization and modelling

    NARCIS (Netherlands)

    Marchetti, M.

    2013-01-01

    This thesis concentrates on the development of advanced large-signal measurement and characterization tools to support technology development, model extraction and validation, and power amplifier (PA) designs that address the newly introduced third and fourth generation (3G and 4G) wideband

  9. Electromagnetic modeling method for eddy current signal analysis

    International Nuclear Information System (INIS)

    Lee, D. H.; Jung, H. K.; Cheong, Y. M.; Lee, Y. S.; Huh, H.; Yang, D. J.

    2004-10-01

    An electromagnetic modeling method for eddy current signal analysis is necessary before an experiment is performed. Electromagnetic modeling methods consists of the analytical method and the numerical method. Also, the numerical methods can be divided by Finite Element Method(FEM), Boundary Element Method(BEM) and Volume Integral Method(VIM). Each modeling method has some merits and demerits. Therefore, the suitable modeling method can be chosen by considering the characteristics of each modeling. This report explains the principle and application of each modeling method and shows the comparison modeling programs

  10. Multimicrophone Speech Dereverberation: Experimental Validation

    Directory of Open Access Journals (Sweden)

    Marc Moonen

    2007-05-01

    Full Text Available Dereverberation is required in various speech processing applications such as handsfree telephony and voice-controlled systems, especially when signals are applied that are recorded in a moderately or highly reverberant environment. In this paper, we compare a number of classical and more recently developed multimicrophone dereverberation algorithms, and validate the different algorithmic settings by means of two performance indices and a speech recognition system. It is found that some of the classical solutions obtain a moderate signal enhancement. More advanced subspace-based dereverberation techniques, on the other hand, fail to enhance the signals despite their high-computational load.

  11. Neural Entrainment to Speech Modulates Speech Intelligibility

    NARCIS (Netherlands)

    Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne

    2018-01-01

    Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and

  12. Improved Methods for Pitch Synchronous Linear Prediction Analysis of Speech

    OpenAIRE

    劉, 麗清

    2015-01-01

    Linear prediction (LP) analysis has been applied to speech system over the last few decades. LP technique is well-suited for speech analysis due to its ability to model speech production process approximately. Hence LP analysis has been widely used for speech enhancement, low-bit-rate speech coding in cellular telephony, speech recognition, characteristic parameter extraction (vocal tract resonances frequencies, fundamental frequency called pitch) and so on. However, the performance of the co...

  13. Low-dimensional recurrent neural network-based Kalman filter for speech enhancement.

    Science.gov (United States)

    Xia, Youshen; Wang, Jun

    2015-07-01

    This paper proposes a new recurrent neural network-based Kalman filter for speech enhancement, based on a noise-constrained least squares estimate. The parameters of speech signal modeled as autoregressive process are first estimated by using the proposed recurrent neural network and the speech signal is then recovered from Kalman filtering. The proposed recurrent neural network is globally asymptomatically stable to the noise-constrained estimate. Because the noise-constrained estimate has a robust performance against non-Gaussian noise, the proposed recurrent neural network-based speech enhancement algorithm can minimize the estimation error of Kalman filter parameters in non-Gaussian noise. Furthermore, having a low-dimensional model feature, the proposed neural network-based speech enhancement algorithm has a much faster speed than two existing recurrent neural networks-based speech enhancement algorithms. Simulation results show that the proposed recurrent neural network-based speech enhancement algorithm can produce a good performance with fast computation and noise reduction. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Schizophrenia alters intra-network functional connectivity in the caudate for detecting speech under informational speech masking conditions.

    Science.gov (United States)

    Zheng, Yingjun; Wu, Chao; Li, Juanhua; Li, Ruikeng; Peng, Hongjun; She, Shenglin; Ning, Yuping; Li, Liang

    2018-04-04

    Speech recognition under noisy "cocktail-party" environments involves multiple perceptual/cognitive processes, including target detection, selective attention, irrelevant signal inhibition, sensory/working memory, and speech production. Compared to health listeners, people with schizophrenia are more vulnerable to masking stimuli and perform worse in speech recognition under speech-on-speech masking conditions. Although the schizophrenia-related speech-recognition impairment under "cocktail-party" conditions is associated with deficits of various perceptual/cognitive processes, it is crucial to know whether the brain substrates critically underlying speech detection against informational speech masking are impaired in people with schizophrenia. Using functional magnetic resonance imaging (fMRI), this study investigated differences between people with schizophrenia (n = 19, mean age = 33 ± 10 years) and their matched healthy controls (n = 15, mean age = 30 ± 9 years) in intra-network functional connectivity (FC) specifically associated with target-speech detection under speech-on-speech-masking conditions. The target-speech detection performance under the speech-on-speech-masking condition in participants with schizophrenia was significantly worse than that in matched healthy participants (healthy controls). Moreover, in healthy controls, but not participants with schizophrenia, the strength of intra-network FC within the bilateral caudate was positively correlated with the speech-detection performance under the speech-masking conditions. Compared to controls, patients showed altered spatial activity pattern and decreased intra-network FC in the caudate. In people with schizophrenia, the declined speech-detection performance under speech-on-speech masking conditions is associated with reduced intra-caudate functional connectivity, which normally contributes to detecting target speech against speech masking via its functions of suppressing masking-speech signals.

  15. Reduced modeling of signal transduction – a modular approach

    Directory of Open Access Journals (Sweden)

    Ederer Michael

    2007-09-01

    Full Text Available Abstract Background Combinatorial complexity is a challenging problem in detailed and mechanistic mathematical modeling of signal transduction. This subject has been discussed intensively and a lot of progress has been made within the last few years. A software tool (BioNetGen was developed which allows an automatic rule-based set-up of mechanistic model equations. In many cases these models can be reduced by an exact domain-oriented lumping technique. However, the resulting models can still consist of a very large number of differential equations. Results We introduce a new reduction technique, which allows building modularized and highly reduced models. Compared to existing approaches further reduction of signal transduction networks is possible. The method also provides a new modularization criterion, which allows to dissect the model into smaller modules that are called layers and can be modeled independently. Hallmarks of the approach are conservation relations within each layer and connection of layers by signal flows instead of mass flows. The reduced model can be formulated directly without previous generation of detailed model equations. It can be understood and interpreted intuitively, as model variables are macroscopic quantities that are converted by rates following simple kinetics. The proposed technique is applicable without using complex mathematical tools and even without detailed knowledge of the mathematical background. However, we provide a detailed mathematical analysis to show performance and limitations of the method. For physiologically relevant parameter domains the transient as well as the stationary errors caused by the reduction are negligible. Conclusion The new layer based reduced modeling method allows building modularized and strongly reduced models of signal transduction networks. Reduced model equations can be directly formulated and are intuitively interpretable. Additionally, the method provides very good

  16. The role of stress and accent in the perception of speech rhythm

    NARCIS (Netherlands)

    Grover, C.N.; Terken, J.M.B.

    1995-01-01

    Modelling rhythmic characteristics of speech is expected to contribute to the acceptability of synthetic speech. However, before rules for the control of speech rhythm in synthetic speech can be developed, we need to know which properties of speech give rise to the perception of speech rhythm. An

  17. HP Memristor mathematical model for periodic signals and DC

    KAUST Repository

    Radwan, Ahmed G.

    2012-07-28

    In this paper mathematical models of the HP Memristor for DC and periodic signal inputs are provided. The need for a rigid model for the Memristor using conventional current and voltage quantities is essential for the development of many promising Memristors\\' applications. Unlike the previous works, which focuses on the sinusoidal input waveform, we derived rules for any periodic signals in general in terms of voltage and current. Square and triangle waveforms are studied explicitly, extending the formulas for any general square wave. The limiting conditions for saturation are also provided in case of either DC or periodic signals. The derived equations are compared to the SPICE model of the Memristor showing a perfect match.

  18. Signal and noise modeling in confocal laser scanning fluorescence microscopy.

    Science.gov (United States)

    Herberich, Gerlind; Windoffer, Reinhard; Leube, Rudolf E; Aach, Til

    2012-01-01

    Fluorescence confocal laser scanning microscopy (CLSM) has revolutionized imaging of subcellular structures in biomedical research by enabling the acquisition of 3D time-series of fluorescently-tagged proteins in living cells, hence forming the basis for an automated quantification of their morphological and dynamic characteristics. Due to the inherently weak fluorescence, CLSM images exhibit a low SNR. We present a novel model for the transfer of signal and noise in CLSM that is both theoretically sound as well as corroborated by a rigorous analysis of the pixel intensity statistics via measurement of the 3D noise power spectra, signal-dependence and distribution. Our model provides a better fit to the data than previously proposed models. Further, it forms the basis for (i) the simulation of the CLSM imaging process indispensable for the quantitative evaluation of CLSM image analysis algorithms, (ii) the application of Poisson denoising algorithms and (iii) the reconstruction of the fluorescence signal.

  19. Corrected Four-Sphere Head Model for EEG Signals.

    Science.gov (United States)

    Næss, Solveig; Chintaluri, Chaitanya; Ness, Torbjørn V; Dale, Anders M; Einevoll, Gaute T; Wójcik, Daniel K

    2017-01-01

    The EEG signal is generated by electrical brain cell activity, often described in terms of current dipoles. By applying EEG forward models we can compute the contribution from such dipoles to the electrical potential recorded by EEG electrodes. Forward models are key both for generating understanding and intuition about the neural origin of EEG signals as well as inverse modeling, i.e., the estimation of the underlying dipole sources from recorded EEG signals. Different models of varying complexity and biological detail are used in the field. One such analytical model is the four-sphere model which assumes a four-layered spherical head where the layers represent brain tissue, cerebrospinal fluid (CSF), skull, and scalp, respectively. While conceptually clear, the mathematical expression for the electric potentials in the four-sphere model is cumbersome, and we observed that the formulas presented in the literature contain errors. Here, we derive and present the correct analytical formulas with a detailed derivation. A useful application of the analytical four-sphere model is that it can serve as ground truth to test the accuracy of numerical schemes such as the Finite Element Method (FEM). We performed FEM simulations of the four-sphere head model and showed that they were consistent with the corrected analytical formulas. For future reference we provide scripts for computing EEG potentials with the four-sphere model, both by means of the correct analytical formulas and numerical FEM simulations.

  20. Corrected Four-Sphere Head Model for EEG Signals

    Directory of Open Access Journals (Sweden)

    Solveig Næss

    2017-10-01

    Full Text Available The EEG signal is generated by electrical brain cell activity, often described in terms of current dipoles. By applying EEG forward models we can compute the contribution from such dipoles to the electrical potential recorded by EEG electrodes. Forward models are key both for generating understanding and intuition about the neural origin of EEG signals as well as inverse modeling, i.e., the estimation of the underlying dipole sources from recorded EEG signals. Different models of varying complexity and biological detail are used in the field. One such analytical model is the four-sphere model which assumes a four-layered spherical head where the layers represent brain tissue, cerebrospinal fluid (CSF, skull, and scalp, respectively. While conceptually clear, the mathematical expression for the electric potentials in the four-sphere model is cumbersome, and we observed that the formulas presented in the literature contain errors. Here, we derive and present the correct analytical formulas with a detailed derivation. A useful application of the analytical four-sphere model is that it can serve as ground truth to test the accuracy of numerical schemes such as the Finite Element Method (FEM. We performed FEM simulations of the four-sphere head model and showed that they were consistent with the corrected analytical formulas. For future reference we provide scripts for computing EEG potentials with the four-sphere model, both by means of the correct analytical formulas and numerical FEM simulations.

  1. Road Impedance Model Study under the Control of Intersection Signal

    Directory of Open Access Journals (Sweden)

    Yunlin Luo

    2015-01-01

    Full Text Available Road traffic impedance model is a difficult and critical point in urban traffic assignment and route guidance. The paper takes a signalized intersection as the research object. On the basis of traditional traffic wave theory including the implementation of traffic wave model and the analysis of vehicles’ gathering and dissipating, the road traffic impedance model is researched by determining the basic travel time and waiting delay time. Numerical example results have proved that the proposed model in this paper has received better calculation performance compared to existing model, especially in flat hours. The values of mean absolute percentage error (MAPE and mean absolute deviation (MAD are separately reduced by 3.78% and 2.62 s. It shows that the proposed model has feasibility and availability in road traffic impedance under intersection signal.

  2. A Joint Approach for Single-Channel Speaker Identification and Speech Separation

    DEFF Research Database (Denmark)

    Mowlaee, Pejman; Saeidi, Rahim; Christensen, Mads Græsbøll

    2012-01-01

    ) accuracy, here, we report the objective and subjective results as well. The results show that the proposed system performs as well as the best of the state-of-the-art in terms of perceived quality while its performance in terms of speaker identification and automatic speech recognition results......In this paper, we present a novel system for joint speaker identification and speech separation. For speaker identification a single-channel speaker identification algorithm is proposed which provides an estimate of signal-to-signal ratio (SSR) as a by-product. For speech separation, we propose...... a sinusoidal model-based algorithm. The speech separation algorithm consists of a double-talk/single-talk detector followed by a minimum mean square error estimator of sinusoidal parameters for finding optimal codevectors from pre-trained speaker codebooks. In evaluating the proposed system, we start from...

  3. YIN, a fundamental frequency estimator for speech and music

    Science.gov (United States)

    de Cheveigné, Alain; Kawahara, Hideki

    2002-04-01

    An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds. It is based on the well-known autocorrelation method with a number of modifications that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. There is no upper limit on the frequency search range, so the algorithm is suited for high-pitched voices and music. The algorithm is relatively simple and may be implemented efficiently and with low latency, and it involves few parameters that must be tuned. It is based on a signal model (periodic signal) that may be extended in several ways to handle various forms of aperiodicity that occur in particular applications. Finally, interesting parallels may be drawn with models of auditory processing.

  4. Stochastic Modelling as a Tool for Seismic Signals Segmentation

    Directory of Open Access Journals (Sweden)

    Daniel Kucharczyk

    2016-01-01

    Full Text Available In order to model nonstationary real-world processes one can find appropriate theoretical model with properties following the analyzed data. However in this case many trajectories of the analyzed process are required. Alternatively, one can extract parts of the signal that have homogenous structure via segmentation. The proper segmentation can lead to extraction of important features of analyzed phenomena that cannot be described without the segmentation. There is no one universal method that can be applied for all of the phenomena; thus novel methods should be invented for specific cases. They might address specific character of the signal in different domains (time, frequency, time-frequency, etc.. In this paper we propose two novel segmentation methods that take under consideration the stochastic properties of the analyzed signals in time domain. Our research is motivated by the analysis of vibration signals acquired in an underground mine. In such signals we observe seismic events which appear after the mining activity, like blasting, provoked relaxation of rock, and some unexpected events, like natural rock burst. The proposed segmentation procedures allow for extraction of such parts of the analyzed signals which are related to mentioned events.

  5. Gap Acceptance Behavior Model for Non-signalized

    OpenAIRE

    Fajaruddin Bin Mustakim

    2015-01-01

    The paper proposes field studies that were performed to determine the critical gap on the multiple rural roadways Malaysia, at non-signalized T-intersection by using The Raff and Logic Method. Critical gap between passenger car and motorcycle have been determined.   There are quite number of studied doing gap acceptance behavior model for passenger car however still few research on gap acceptance behavior model for motorcycle. Thus in this paper, logistic regression models were developed to p...

  6. Synchronous Modeling of Modular Avionics Architectures using the SIGNAL Language

    OpenAIRE

    Gamatié , Abdoulaye; Gautier , Thierry

    2002-01-01

    This document presents a study on the modeling of architecture components for avionics applications. We consider the avionics standard ARINC 653 specifications as basis, as well as the synchronous language SIGNAL to describe the modeling. A library of APEX object models (partition, process, communication and synchronization services, etc.) has been implemented. This should allow to describe distributed real-time applications using POLYCHRONY, so as to access formal tools and techniques for ar...

  7. Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

    Science.gov (United States)

    Ye, Sherry

    2015-01-01

    NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.

  8. Neural pathways for visual speech perception

    Directory of Open Access Journals (Sweden)

    Lynne E Bernstein

    2014-12-01

    Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.

  9. Reference analysis of the signal + background model in counting experiments

    Science.gov (United States)

    Casadei, D.

    2012-01-01

    The model representing two independent Poisson processes, labelled as ``signal'' and ``background'' and both contributing additively to the total number of counted events, is considered from a Bayesian point of view. This is a widely used model for the searches of rare or exotic events in presence of a background source, as for example in the searches performed by high-energy physics experiments. In the assumption of prior knowledge about the background yield, a reference prior is obtained for the signal alone and its properties are studied. Finally, the properties of the full solution, the marginal reference posterior, are illustrated with few examples.

  10. The design of a device for hearer and feeler differentiation, part A. [speech modulated hearing device

    Science.gov (United States)

    Creecy, R.

    1974-01-01

    A speech modulated white noise device is reported that gives the rhythmic characteristics of a speech signal for intelligible reception by deaf persons. The signal is composed of random amplitudes and frequencies as modulated by the speech envelope characteristics of rhythm and stress. Time intensity parameters of speech are conveyed through the vibro-tactile sensation stimuli.

  11. Speech Research

    Science.gov (United States)

    Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

  12. Hate speech

    Directory of Open Access Journals (Sweden)

    Anne Birgitta Nilsen

    2014-12-01

    Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the

  13. Examining speech perception in noise and cognitive functions in the elderly.

    Science.gov (United States)

    Meister, Hartmut; Schreitmüller, Stefan; Grugel, Linda; Beutner, Dirk; Walger, Martin; Meister, Ingo

    2013-12-01

    The purpose of this study was to investigate the relationship of cognitive functions (i.e., working memory [WM]) and speech recognition against different background maskers in older individuals. Speech reception thresholds (SRTs) were determined using a matrix-sentence test. Unmodulated noise, modulated noise (International Collegium for Rehabilitative Audiology [ICRA] noise 5-250), and speech fragments (International Speech Test Signal [ISTS]) were used as background maskers. Verbal WM was assessed using the Verbal Learning and Memory Test (VLMT; Helmstaedter & Durwen, 1990). Measurements were conducted with 14 normal-hearing older individuals and a control group of 12 normal-hearing young listeners. Despite their normal hearing ability, the young listeners outperformed the older individuals in all background maskers. These differences were largest for the modulated maskers. SRTs were significantly correlated with the scores of the VLMT. A linear regression model also included WM as the only significant predictor variable. The results support the assumption that WM plays an important role for speech understanding and that it might have impact on results obtained using speech audiometry. Thus, an individual's WM capacity should be considered with aural diagnosis and rehabilitation. The VLMT proved to be a clinically applicable test for WM. Further cognitive functions important with speech understanding are currently being investigated within the SAKoLA (Sprachaudiometrie und kognitive Leistungen im Alter [Speech Audiometry and Cognitive Functions in the Elderly]) project.

  14. State–time spectrum of signal transduction logic models

    International Nuclear Information System (INIS)

    MacNamara, Aidan; Terfve, Camille; Henriques, David; Bernabé, Beatriz Peñalver; Saez-Rodriguez, Julio

    2012-01-01

    Despite the current wealth of high-throughput data, our understanding of signal transduction is still incomplete. Mathematical modeling can be a tool to gain an insight into such processes. Detailed biochemical modeling provides deep understanding, but does not scale well above relatively a few proteins. In contrast, logic modeling can be used where the biochemical knowledge of the system is sparse and, because it is parameter free (or, at most, uses relatively a few parameters), it scales well to large networks that can be derived by manual curation or retrieved from public databases. Here, we present an overview of logic modeling formalisms in the context of training logic models to data, and specifically the different approaches to modeling qualitative to quantitative data (state) and dynamics (time) of signal transduction. We use a toy model of signal transduction to illustrate how different logic formalisms (Boolean, fuzzy logic and differential equations) treat state and time. Different formalisms allow for different features of the data to be captured, at the cost of extra requirements in terms of computational power and data quality and quantity. Through this demonstration, the assumptions behind each formalism are discussed, as well as their advantages and disadvantages and possible future developments. (paper)

  15. An improved large signal model of InP HEMTs

    Science.gov (United States)

    Li, Tianhao; Li, Wenjun; Liu, Jun

    2018-05-01

    An improved large signal model for InP HEMTs is proposed in this paper. The channel current and charge model equations are constructed based on the Angelov model equations. Both the equations for channel current and gate charge models were all continuous and high order drivable, and the proposed gate charge model satisfied the charge conservation. For the strong leakage induced barrier reduction effect of InP HEMTs, the Angelov current model equations are improved. The channel current model could fit DC performance of devices. A 2 × 25 μm × 70 nm InP HEMT device is used to demonstrate the extraction and validation of the model, in which the model has predicted the DC I–V, C–V and bias related S parameters accurately. Project supported by the National Natural Science Foundation of China (No. 61331006).

  16. Analysis and logical modeling of biological signaling transduction networks

    Science.gov (United States)

    Sun, Zhongyao

    The study of network theory and its application span across a multitude of seemingly disparate fields of science and technology: computer science, biology, social science, linguistics, etc. It is the intrinsic similarities embedded in the entities and the way they interact with one another in these systems that link them together. In this dissertation, I present from both the aspect of theoretical analysis and the aspect of application three projects, which primarily focus on signal transduction networks in biology. In these projects, I assembled a network model through extensively perusing literature, performed model-based simulations and validation, analyzed network topology, and proposed a novel network measure. The application of network modeling to the system of stomatal opening in plants revealed a fundamental question about the process that has been left unanswered in decades. The novel measure of the redundancy of signal transduction networks with Boolean dynamics by calculating its maximum node-independent elementary signaling mode set accurately predicts the effect of single node knockout in such signaling processes. The three projects as an organic whole advance the understanding of a real system as well as the behavior of such network models, giving me an opportunity to take a glimpse at the dazzling facets of the immense world of network science.

  17. Regulation of Wnt signaling by nociceptive input in animal models

    Directory of Open Access Journals (Sweden)

    Shi Yuqiang

    2012-06-01

    Full Text Available Abstract Background Central sensitization-associated synaptic plasticity in the spinal cord dorsal horn (SCDH critically contributes to the development of chronic pain, but understanding of the underlying molecular pathways is still incomplete. Emerging evidence suggests that Wnt signaling plays a crucial role in regulation of synaptic plasticity. Little is known about the potential function of the Wnt signaling cascades in chronic pain development. Results Fluorescent immunostaining results indicate that β-catenin, an essential protein in the canonical Wnt signaling pathway, is expressed in the superficial layers of the mouse SCDH with enrichment at synapses in lamina II. In addition, Wnt3a, a prototypic Wnt ligand that activates the canonical pathway, is also enriched in the superficial layers. Immunoblotting analysis indicates that both Wnt3a a β-catenin are up-regulated in the SCDH of various mouse pain models created by hind-paw injection of capsaicin, intrathecal (i.t. injection of HIV-gp120 protein or spinal nerve ligation (SNL. Furthermore, Wnt5a, a prototypic Wnt ligand for non-canonical pathways, and its receptor Ror2 are also up-regulated in the SCDH of these models. Conclusion Our results suggest that Wnt signaling pathways are regulated by nociceptive input. The activation of Wnt signaling may regulate the expression of spinal central sensitization during the development of acute and chronic pain.

  18. Modelling and Analysis of Biochemical Signalling Pathway Cross-talk

    Directory of Open Access Journals (Sweden)

    Robin Donaldson

    2010-02-01

    Full Text Available Signalling pathways are abstractions that help life scientists structure the coordination of cellular activity. Cross-talk between pathways accounts for many of the complex behaviours exhibited by signalling pathways and is often critical in producing the correct signal-response relationship. Formal models of signalling pathways and cross-talk in particular can aid understanding and drive experimentation. We define an approach to modelling based on the concept that a pathway is the (synchronising parallel composition of instances of generic modules (with internal and external labels. Pathways are then composed by (synchronising parallel composition and renaming; different types of cross-talk result from different combinations of synchronisation and renaming. We define a number of generic modules in PRISM and five types of cross-talk: signal flow, substrate availability, receptor function, gene expression and intracellular communication. We show that Continuous Stochastic Logic properties can both detect and distinguish the types of cross-talk. The approach is illustrated with small examples and an analysis of the cross-talk between the TGF-b/BMP, WNT and MAPK pathways.

  19. Broadening the "ports of entry" for speech-language pathologists: a relational and reflective model for clinical supervision.

    Science.gov (United States)

    Geller, Elaine; Foley, Gilbert M

    2009-02-01

    To offer a framework for clinical supervision in speech-language pathology that embeds a mental health perspective within the study of communication sciences and disorders. Key mental health constructs are examined as to how they are applied in traditional versus relational and reflective supervision models. Comparisons between traditional and relational and reflective approaches are outlined, with reference to each mental health construct and the developmental level of the supervisee. Three stages of supervisee development are proposed based on research from various disciplines, including nursing, psychology, speech-language pathology, social work, and education. Each developmental stage is characterized by shifts or changes in the supervisee's underlying assumptions, beliefs, and patterns of behavior. This article makes the case that both the cognitive and affective dimensions of the supervisor-supervisee relationship need to be addressed without minimizing the necessary development of discipline-specific expertise. The developmental stages outlined in this paradigm can be used to understand supervisees' patterns of change and growth over time, as well as to create optimal learning environments that match their developmental level and knowledge base.

  20. Expanding the "ports of entry" for speech-language pathologists: a relational and reflective model for clinical practice.

    Science.gov (United States)

    Geller, Elaine; Foley, Gilbert M

    2009-02-01

    To outline an expanded framework for clinical practice in speech-language pathology. This framework broadens the focus on discipline-specific knowledge and infuses mental health constructs within the study of communication sciences and disorders, with the objective of expanding the potential "ports or points of entry" (D. Stern, 1995) for clinical intervention with young children who are language impaired. Specific mental health constructs are highlighted in this article. These include relationship-based learning, attachment theory, working dyadically (the client is the child and parent), reflective practice, transference-countertransference, and the use of self. Each construct is explored as to the way it has been applied in traditional and contemporary models of clinical practice. The underlying premise in this framework is that working from a relationally based and reflective perspective augments change and growth in both client and parent(s). The challenge is for speech-language pathologists to embed mental health constructs within their discipline-specific expertise. This leads to paying attention to both observable aspects of clients' behaviors as well as their internal affective states.

  1. Signalling network construction for modelling plant defence response.

    Directory of Open Access Journals (Sweden)

    Dragana Miljkovic

    Full Text Available Plant defence signalling response against various pathogens, including viruses, is a complex phenomenon. In resistant interaction a plant cell perceives the pathogen signal, transduces it within the cell and performs a reprogramming of the cell metabolism leading to the pathogen replication arrest. This work focuses on signalling pathways crucial for the plant defence response, i.e., the salicylic acid, jasmonic acid and ethylene signal transduction pathways, in the Arabidopsis thaliana model plant. The initial signalling network topology was constructed manually by defining the representation formalism, encoding the information from public databases and literature, and composing a pathway diagram. The manually constructed network structure consists of 175 components and 387 reactions. In order to complement the network topology with possibly missing relations, a new approach to automated information extraction from biological literature was developed. This approach, named Bio3graph, allows for automated extraction of biological relations from the literature, resulting in a set of (component1, reaction, component2 triplets and composing a graph structure which can be visualised, compared to the manually constructed topology and examined by the experts. Using a plant defence response vocabulary of components and reaction types, Bio3graph was applied to a set of 9,586 relevant full text articles, resulting in 137 newly detected reactions between the components. Finally, the manually constructed topology and the new reactions were merged to form a network structure consisting of 175 components and 524 reactions. The resulting pathway diagram of plant defence signalling represents a valuable source for further computational modelling and interpretation of omics data. The developed Bio3graph approach, implemented as an executable language processing and graph visualisation workflow, is publically available at http://ropot.ijs.si/bio3graph/and can be

  2. Modeling, estimation and optimal filtration in signal processing

    CERN Document Server

    Najim, Mohamed

    2010-01-01

    The purpose of this book is to provide graduate students and practitioners with traditional methods and more recent results for model-based approaches in signal processing.Firstly, discrete-time linear models such as AR, MA and ARMA models, their properties and their limitations are introduced. In addition, sinusoidal models are addressed.Secondly, estimation approaches based on least squares methods and instrumental variable techniques are presented.Finally, the book deals with optimal filters, i.e. Wiener and Kalman filtering, and adaptive filters such as the RLS, the LMS and the

  3. Decoding Problem Gamblers' Signals: A Decision Model for Casino Enterprises.

    Science.gov (United States)

    Ifrim, Sandra

    2015-12-01

    The aim of the present study is to offer a validated decision model for casino enterprises. The model enables those users to perform early detection of problem gamblers and fulfill their ethical duty of social cost minimization. To this end, the interpretation of casino customers' nonverbal communication is understood as a signal-processing problem. Indicators of problem gambling recommended by Delfabbro et al. (Identifying problem gamblers in gambling venues: final report, 2007) are combined with Viterbi algorithm into an interdisciplinary model that helps decoding signals emitted by casino customers. Model output consists of a historical path of mental states and cumulated social costs associated with a particular client. Groups of problem and non-problem gamblers were simulated to investigate the model's diagnostic capability and its cost minimization ability. Each group consisted of 26 subjects and was subsequently enlarged to 100 subjects. In approximately 95% of the cases, mental states were correctly decoded for problem gamblers. Statistical analysis using planned contrasts revealed that the model is relatively robust to the suppression of signals performed by casino clientele facing gambling problems as well as to misjudgments made by staff regarding the clients' mental states. Only if the last mentioned source of error occurs in a very pronounced manner, i.e. judgment is extremely faulty, cumulated social costs might be distorted.

  4. The motor theory of speech perception revisited.

    Science.gov (United States)

    Massaro, Dominic W; Chen, Trevor H

    2008-04-01

    Galantucci, Fowler, and Turvey (2006) have claimed that perceiving speech is perceiving gestures and that the motor system is recruited for perceiving speech. We make the counter argument that perceiving speech is not perceiving gestures, that the motor system is not recruitedfor perceiving speech, and that speech perception can be adequately described by a prototypical pattern recognition model, the fuzzy logical model of perception (FLMP). Empirical evidence taken as support for gesture and motor theory is reconsidered in more detail and in the framework of the FLMR Additional theoretical and logical arguments are made to challenge gesture and motor theory.

  5. Speech Intelligibility

    Science.gov (United States)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  6. An exploratory study on the driving method of speech synthesis based on the human eye reading imaging data

    Science.gov (United States)

    Gao, Pei-pei; Liu, Feng

    2016-10-01

    With the development of information technology and artificial intelligence, speech synthesis plays a significant role in the fields of Human-Computer Interaction Techniques. However, the main problem of current speech synthesis techniques is lacking of naturalness and expressiveness so that it is not yet close to the standard of natural language. Another problem is that the human-computer interaction based on the speech synthesis is too monotonous to realize mechanism of user subjective drive. This thesis introduces the historical development of speech synthesis and summarizes the general process of this technique. It is pointed out that prosody generation module is an important part in the process of speech synthesis. On the basis of further research, using eye activity rules when reading to control and drive prosody generation was introduced as a new human-computer interaction method to enrich the synthetic form. In this article, the present situation of speech synthesis technology is reviewed in detail. Based on the premise of eye gaze data extraction, using eye movement signal in real-time driving, a speech synthesis method which can express the real speech rhythm of the speaker is proposed. That is, when reader is watching corpora with its eyes in silent reading, capture the reading information such as the eye gaze duration per prosodic unit, and establish a hierarchical prosodic pattern of duration model to determine the duration parameters of synthesized speech. At last, after the analysis, the feasibility of the above method is verified.

  7. Timing in audiovisual speech perception: A mini review and new psychophysical data.

    Science.gov (United States)

    Venezia, Jonathan H; Thurman, Steven M; Matchin, William; George, Sahara E; Hickok, Gregory

    2016-02-01

    Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.

  8. Timing in Audiovisual Speech Perception: A Mini Review and New Psychophysical Data

    Science.gov (United States)

    Venezia, Jonathan H.; Thurman, Steven M.; Matchin, William; George, Sahara E.; Hickok, Gregory

    2015-01-01

    Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually-relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (∼35% identification of /apa/ compared to ∼5% in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually-relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (∼130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content. PMID:26669309

  9. Speech masking and cancelling and voice obscuration

    Science.gov (United States)

    Holzrichter, John F.

    2013-09-10

    A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

  10. Phonemic Characteristics of Apraxia of Speech Resulting from Subcortical Hemorrhage

    Science.gov (United States)

    Peach, Richard K.; Tonkovich, John D.

    2004-01-01

    Reports describing subcortical apraxia of speech (AOS) have received little consideration in the development of recent speech processing models because the speech characteristics of patients with this diagnosis have not been described precisely. We describe a case of AOS with aphasia secondary to basal ganglia hemorrhage. Speech-language symptoms…

  11. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...

  12. A causal test of the motor theory of speech perception: a case of impaired speech production and spared speech perception.

    Science.gov (United States)

    Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z

    2015-01-01

    The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.

  13. Nonlinear signal processing using neural networks: Prediction and system modelling

    Energy Technology Data Exchange (ETDEWEB)

    Lapedes, A.; Farber, R.

    1987-06-01

    The backpropagation learning algorithm for neural networks is developed into a formalism for nonlinear signal processing. We illustrate the method by selecting two common topics in signal processing, prediction and system modelling, and show that nonlinear applications can be handled extremely well by using neural networks. The formalism is a natural, nonlinear extension of the linear Least Mean Squares algorithm commonly used in adaptive signal processing. Simulations are presented that document the additional performance achieved by using nonlinear neural networks. First, we demonstrate that the formalism may be used to predict points in a highly chaotic time series with orders of magnitude increase in accuracy over conventional methods including the Linear Predictive Method and the Gabor-Volterra-Weiner Polynomial Method. Deterministic chaos is thought to be involved in many physical situations including the onset of turbulence in fluids, chemical reactions and plasma physics. Secondly, we demonstrate the use of the formalism in nonlinear system modelling by providing a graphic example in which it is clear that the neural network has accurately modelled the nonlinear transfer function. It is interesting to note that the formalism provides explicit, analytic, global, approximations to the nonlinear maps underlying the various time series. Furthermore, the neural net seems to be extremely parsimonious in its requirements for data points from the time series. We show that the neural net is able to perform well because it globally approximates the relevant maps by performing a kind of generalized mode decomposition of the maps. 24 refs., 13 figs.

  14. Acoustic/seismic signal propagation and sensor performance modeling

    Science.gov (United States)

    Wilson, D. Keith; Marlin, David H.; Mackay, Sean

    2007-04-01

    Performance, optimal employment, and interpretation of data from acoustic and seismic sensors depend strongly and in complex ways on the environment in which they operate. Software tools for guiding non-expert users of acoustic and seismic sensors are therefore much needed. However, such tools require that many individual components be constructed and correctly connected together. These components include the source signature and directionality, representation of the atmospheric and terrain environment, calculation of the signal propagation, characterization of the sensor response, and mimicking of the data processing at the sensor. Selection of an appropriate signal propagation model is particularly important, as there are significant trade-offs between output fidelity and computation speed. Attenuation of signal energy, random fading, and (for array systems) variations in wavefront angle-of-arrival should all be considered. Characterization of the complex operational environment is often the weak link in sensor modeling: important issues for acoustic and seismic modeling activities include the temporal/spatial resolution of the atmospheric data, knowledge of the surface and subsurface terrain properties, and representation of ambient background noise and vibrations. Design of software tools that address these challenges is illustrated with two examples: a detailed target-to-sensor calculation application called the Sensor Performance Evaluator for Battlefield Environments (SPEBE) and a GIS-embedded approach called Battlefield Terrain Reasoning and Awareness (BTRA).

  15. Multistage audiovisual integration of speech: dissociating identification and detection

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2011-01-01

    Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech...... signal. Here we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers...... informed of its speech-like nature recognize as speech. While the McGurk illusion only occurred for informed observers the audiovisual detection advantage occurred for naïve observers as well. This finding supports a multi-stage account of audiovisual integration of speech in which the many attributes...

  16. MOTORCYCLE CRASH PREDICTION MODEL FOR NON-SIGNALIZED INTERSECTIONS

    Directory of Open Access Journals (Sweden)

    S. HARNEN

    2003-01-01

    Full Text Available This paper attempts to develop a prediction model for motorcycle crashes at non-signalized intersections on urban roads in Malaysia. The Generalized Linear Modeling approach was used to develop the model. The final model revealed that an increase in motorcycle and non-motorcycle flows entering an intersection is associated with an increase in motorcycle crashes. Non-motorcycle flow on major road had the greatest effect on the probability of motorcycle crashes. Approach speed, lane width, number of lanes, shoulder width and land use were also found to be significant in explaining motorcycle crashes. The model should assist traffic engineers to decide the need for appropriate intersection treatment that specifically designed for non-exclusive motorcycle lane facilities.

  17. A Danish open-set speech corpus for competing-speech studies

    DEFF Research Database (Denmark)

    Nielsen, Jens Bo; Dau, Torsten; Neher, Tobias

    2014-01-01

    Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs......) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded...... with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed...

  18. Predicting the perceived sound quality of frequency-compressed speech.

    Directory of Open Access Journals (Sweden)

    Rainer Huber

    Full Text Available The performance of objective speech and audio quality measures for the prediction of the perceived quality of frequency-compressed speech in hearing aids is investigated in this paper. A number of existing quality measures have been applied to speech signals processed by a hearing aid, which compresses speech spectra along frequency in order to make information contained in higher frequencies audible for listeners with severe high-frequency hearing loss. Quality measures were compared with subjective ratings obtained from normal hearing and hearing impaired children and adults in an earlier study. High correlations were achieved with quality measures computed by quality models that are based on the auditory model of Dau et al., namely, the measure PSM, computed by the quality model PEMO-Q; the measure qc, computed by the quality model proposed by Hansen and Kollmeier; and the linear subcomponent of the HASQI. For the prediction of quality ratings by hearing impaired listeners, extensions of some models incorporating hearing loss were implemented and shown to achieve improved prediction accuracy. Results indicate that these objective quality measures can potentially serve as tools for assisting in initial setting of frequency compression parameters.

  19. Psychophysics of Complex Auditory and Speech Stimuli

    National Research Council Canada - National Science Library

    Pastore, Richard

    1996-01-01

    The supported research provides a careful examination of the many different interrelated factors, processes, and constructs important to the perception by humans of complex acoustic signals, including speech and music...

  20. Modeling Guidelines for Code Generation in the Railway Signaling Context

    Science.gov (United States)

    Ferrari, Alessio; Bacherini, Stefano; Fantechi, Alessandro; Zingoni, Niccolo

    2009-01-01

    Modeling guidelines constitute one of the fundamental cornerstones for Model Based Development. Their relevance is essential when dealing with code generation in the safety-critical domain. This article presents the experience of a railway signaling systems manufacturer on this issue. Introduction of Model-Based Development (MBD) and code generation in the industrial safety-critical sector created a crucial paradigm shift in the development process of dependable systems. While traditional software development focuses on the code, with MBD practices the focus shifts to model abstractions. The change has fundamental implications for safety-critical systems, which still need to guarantee a high degree of confidence also at code level. Usage of the Simulink/Stateflow platform for modeling, which is a de facto standard in control software development, does not ensure by itself production of high-quality dependable code. This issue has been addressed by companies through the definition of modeling rules imposing restrictions on the usage of design tools components, in order to enable production of qualified code. The MAAB Control Algorithm Modeling Guidelines (MathWorks Automotive Advisory Board)[3] is a well established set of publicly available rules for modeling with Simulink/Stateflow. This set of recommendations has been developed by a group of OEMs and suppliers of the automotive sector with the objective of enforcing and easing the usage of the MathWorks tools within the automotive industry. The guidelines have been published in 2001 and afterwords revisited in 2007 in order to integrate some additional rules developed by the Japanese division of MAAB [5]. The scope of the current edition of the guidelines ranges from model maintainability and readability to code generation issues. The rules are conceived as a reference baseline and therefore they need to be tailored to comply with the characteristics of each industrial context. Customization of these

  1. Development of a System for Automatic Recognition of Speech

    Directory of Open Access Journals (Sweden)

    Roman Jarina

    2003-01-01

    Full Text Available The article gives a review of a research on processing and automatic recognition of speech signals (ARR at the Department of Telecommunications of the Faculty of Electrical Engineering, University of iilina. On-going research is oriented to speech parametrization using 2-dimensional cepstral analysis, and to an application of HMMs and neural networks for speech recognition in Slovak language. The article summarizes achieved results and outlines future orientation of our research in automatic speech recognition.

  2. Modeling SMAP Spacecraft Attitude Control Estimation Error Using Signal Generation Model

    Science.gov (United States)

    Rizvi, Farheen

    2016-01-01

    Two ground simulation software are used to model the SMAP spacecraft dynamics. The CAST software uses a higher fidelity model than the ADAMS software. The ADAMS software models the spacecraft plant, controller and actuator models, and assumes a perfect sensor and estimator model. In this simulation study, the spacecraft dynamics results from the ADAMS software are used as CAST software is unavailable. The main source of spacecraft dynamics error in the higher fidelity CAST software is due to the estimation error. A signal generation model is developed to capture the effect of this estimation error in the overall spacecraft dynamics. Then, this signal generation model is included in the ADAMS software spacecraft dynamics estimate such that the results are similar to CAST. This signal generation model has similar characteristics mean, variance and power spectral density as the true CAST estimation error. In this way, ADAMS software can still be used while capturing the higher fidelity spacecraft dynamics modeling from CAST software.

  3. A Segmented Signal Progression Model for the Modern Streetcar System

    Directory of Open Access Journals (Sweden)

    Baojie Wang

    2015-01-01

    Full Text Available This paper is on the purpose of developing a segmented signal progression model for modern streetcar system. The new method is presented with the following features: (1 the control concept is based on the assumption of only one streetcar line operating along an arterial under a constant headway and no bandwidth demand for streetcar system signal progression; (2 the control unit is defined as a coordinated intersection group associated with several streetcar stations, and the control joints must be streetcar stations; (3 the objective function is built to ensure the two-way streetcar arrival times distributing within the available time of streetcar phase; (4 the available time of streetcar phase is determined by timing schemes, intersection structures, track locations, streetcar speeds, and vehicular accelerations; (5 the streetcar running speed is constant separately whether it is in upstream or downstream route; (6 the streetcar dwell time is preset according to historical data distribution or charging demand. The proposed method is experimentally examined in Hexi New City Streetcar Project in Nanjing, China. In the experimental results, the streetcar system operation and the progression impacts are shown to affect transit and vehicular traffic. The proposed model presents promising outcomes through the design of streetcar system segmented signal progression, in terms of ensuring high streetcar system efficiency and minimizing negative impacts on transit and vehicular traffic.

  4. Speech enhancement on smartphone voice recording

    International Nuclear Information System (INIS)

    Atmaja, Bagus Tris; Farid, Mifta Nur; Arifianto, Dhany

    2016-01-01

    Speech enhancement is challenging task in audio signal processing to enhance the quality of targeted speech signal while suppress other noises. In the beginning, the speech enhancement algorithm growth rapidly from spectral subtraction, Wiener filtering, spectral amplitude MMSE estimator to Non-negative Matrix Factorization (NMF). Smartphone as revolutionary device now is being used in all aspect of life including journalism; personally and professionally. Although many smartphones have two microphones (main and rear) the only main microphone is widely used for voice recording. This is why the NMF algorithm widely used for this purpose of speech enhancement. This paper evaluate speech enhancement on smartphone voice recording by using some algorithms mentioned previously. We also extend the NMF algorithm to Kulback-Leibler NMF with supervised separation. The last algorithm shows improved result compared to others by spectrogram and PESQ score evaluation. (paper)

  5. Modeling skull's acoustic attenuation and dispersion on photoacoustic signal

    Science.gov (United States)

    Mohammadi, L.; Behnam, H.; Nasiriavanaki, M. R.

    2017-03-01

    Despite the great promising results of a recent new transcranial photoacoustic brain imaging technology, it has been shown that the presence of the skull severely affects the performance of this imaging modality. In this paper, we investigate the effect of skull on generated photoacoustic signals with a mathematical model. The developed model takes into account the frequency dependence attenuation and acoustic dispersion effects occur with the wave reflection and refraction at the skull surface. Numerical simulations based on the developed model are performed for calculating the propagation of photoacoustic waves through the skull. From the simulation results, it was found that the skull-induced distortion becomes very important and the reconstructed image would be strongly distorted without correcting these effects. In this regard, it is anticipated that an accurate quantification and modeling of the skull transmission effects would ultimately allow for skull aberration correction in transcranial photoacoustic brain imaging.

  6. Prosodic Contrasts in Ironic Speech

    Science.gov (United States)

    Bryant, Gregory A.

    2010-01-01

    Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…

  7. Mathematical modeling of gonadotropin-releasing hormone signaling.

    Science.gov (United States)

    Pratap, Amitesh; Garner, Kathryn L; Voliotis, Margaritis; Tsaneva-Atanasova, Krasimira; McArdle, Craig A

    2017-07-05

    Gonadotropin-releasing hormone (GnRH) acts via G-protein coupled receptors on pituitary gonadotropes to control reproduction. These are G q -coupled receptors that mediate acute effects of GnRH on the exocytotic secretion of luteinizing hormone (LH) and follicle-stimulating hormone (FSH), as well as the chronic regulation of their synthesis. GnRH is secreted in short pulses and GnRH effects on its target cells are dependent upon the dynamics of these pulses. Here we overview GnRH receptors and their signaling network, placing emphasis on pulsatile signaling, and how mechanistic mathematical models and an information theoretic approach have helped further this field. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  8. Purkinje Cell Signaling Deficits in Animal Models of Ataxia

    Directory of Open Access Journals (Sweden)

    Eriola Hoxha

    2018-04-01

    Full Text Available Purkinje cell (PC dysfunction or degeneration is the most frequent finding in animal models with ataxic symptoms. Mutations affecting intrinsic membrane properties can lead to ataxia by altering the firing rate of PCs or their firing pattern. However, the relationship between specific firing alterations and motor symptoms is not yet clear, and in some cases PC dysfunction precedes the onset of ataxic signs. Moreover, a great variety of ionic and synaptic mechanisms can affect PC signaling, resulting in different features of motor dysfunction. Mutations affecting Na+ channels (NaV1.1, NaV1.6, NaVβ4, Fgf14 or Rer1 reduce the firing rate of PCs, mainly via an impairment of the Na+ resurgent current. Mutations that reduce Kv3 currents limit the firing rate frequency range. Mutations of Kv1 channels act mainly on inhibitory interneurons, generating excessive GABAergic signaling onto PCs, resulting in episodic ataxia. Kv4.3 mutations are responsible for a complex syndrome with several neurologic dysfunctions including ataxia. Mutations of either Cav or BK channels have similar consequences, consisting in a disruption of the firing pattern of PCs, with loss of precision, leading to ataxia. Another category of pathogenic mechanisms of ataxia regards alterations of synaptic signals arriving at the PC. At the parallel fiber (PF-PC synapse, mutations of glutamate delta-2 (GluD2 or its ligand Crbl1 are responsible for the loss of synaptic contacts, abolishment of long-term depression (LTD and motor deficits. At the same synapse, a correct function of metabotropic glutamate receptor 1 (mGlu1 receptors is necessary to avoid ataxia. Failure of climbing fiber (CF maturation and establishment of PC mono-innervation occurs in a great number of mutant mice, including mGlu1 and its transduction pathway, GluD2, semaphorins and their receptors. All these models have in common the alteration of PC output signals, due to a variety of mechanisms affecting incoming

  9. Skull's acoustic attenuation and dispersion modeling on photoacoustic signal

    Science.gov (United States)

    Mohammadi, Leila; Behnam, Hamid; Tavakkoli, Jahan; Nasiriavanaki, Mohammadreza

    2018-02-01

    Despite the promising results of the recent novel transcranial photoacoustic (PA) brain imaging technology, it has been demonstrated that the presence of the skull severely affects the performance of this imaging modality. We theoretically investigate the effects of acoustic heterogeneity induced by skull on the PA signals generated from single particles, with firstly developing a mathematical model for this phenomenon and then explore experimental validation of the results. The model takes into account the frequency dependent attenuation and dispersion effects occur with wave reflection, refraction and mode conversion at the skull surfaces. Numerical simulations based on the developed model are performed for calculating the propagation of photoacoustic waves through the skull. The results show a strong agreement between simulation and ex-vivo study. The findings are as follow: The thickness of the skull is the most PA signal deteriorating factor that affects both its amplitude (attenuation) and phase (distortion). Also we demonstrated that, when the depth of target region is low and it is comparable to the skull thickness, however, the skull-induced distortion becomes increasingly severe and the reconstructed image would be strongly distorted without correcting these effects. It is anticipated that an accurate quantification and modeling of the skull transmission effects would ultimately allow for aberration correction in transcranial PA brain imaging.

  10. Speech disorders - children

    Science.gov (United States)

    ... disorder; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder; Speech disorder - stuttering ... evaluation tools that can help identify and diagnose speech disorders: Denver Articulation Screening Examination Goldman-Fristoe Test of ...

  11. Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity

    Science.gov (United States)

    Moses, David A.; Mesgarani, Nima; Leonard, Matthew K.; Chang, Edward F.

    2016-10-01

    Objective. The superior temporal gyrus (STG) and neighboring brain regions play a key role in human language processing. Previous studies have attempted to reconstruct speech information from brain activity in the STG, but few of them incorporate the probabilistic framework and engineering methodology used in modern speech recognition systems. In this work, we describe the initial efforts toward the design of a neural speech recognition (NSR) system that performs continuous phoneme recognition on English stimuli with arbitrary vocabulary sizes using the high gamma band power of local field potentials in the STG and neighboring cortical areas obtained via electrocorticography. Approach. The system implements a Viterbi decoder that incorporates phoneme likelihood estimates from a linear discriminant analysis model and transition probabilities from an n-gram phonemic language model. Grid searches were used in an attempt to determine optimal parameterizations of the feature vectors and Viterbi decoder. Main results. The performance of the system was significantly improved by using spatiotemporal representations of the neural activity (as opposed to purely spatial representations) and by including language modeling and Viterbi decoding in the NSR system. Significance. These results emphasize the importance of modeling the temporal dynamics of neural responses when analyzing their variations with respect to varying stimuli and demonstrate that speech recognition techniques can be successfully leveraged when decoding speech from neural signals. Guided by the results detailed in this work, further development of the NSR system could have applications in the fields of automatic speech recognition and neural prosthetics.

  12. Performance analysis of NOAA tropospheric signal delay model

    International Nuclear Information System (INIS)

    Ibrahim, Hassan E; El-Rabbany, Ahmed

    2011-01-01

    Tropospheric delay is one of the dominant global positioning system (GPS) errors, which degrades the positioning accuracy. Recent development in tropospheric modeling relies on implementation of more accurate numerical weather prediction (NWP) models. In North America one of the NWP-based tropospheric correction models is the NOAA Tropospheric Signal Delay Model (NOAATrop), which was developed by the US National Oceanic and Atmospheric Administration (NOAA). Because of its potential to improve the GPS positioning accuracy, the NOAATrop model became the focus of many researchers. In this paper, we analyzed the performance of the NOAATrop model and examined its effect on ionosphere-free-based precise point positioning (PPP) solution. We generated 3 year long tropospheric zenith total delay (ZTD) data series for the NOAATrop model, Hopfield model, and the International GNSS Services (IGS) final tropospheric correction product, respectively. These data sets were generated at ten IGS reference stations spanning Canada and the United States. We analyzed the NOAATrop ZTD data series and compared them with those of the Hopfield model. The IGS final tropospheric product was used as a reference. The analysis shows that the performance of the NOAATrop model is a function of both season (time of the year) and geographical location. However, its performance was superior to the Hopfield model in all cases. We further investigated the effect of implementing the NOAATrop model on the ionosphere-free-based PPP solution convergence and accuracy. It is shown that the use of the NOAATrop model improved the PPP solution convergence by 1%, 10% and 15% for the latitude, longitude and height components, respectively

  13. Signal analysis of accelerometry data using gravity-based modeling

    Science.gov (United States)

    Davey, Neil P.; James, Daniel A.; Anderson, Megan E.

    2004-03-01

    Triaxial accelerometers have been used to measure human movement parameters in swimming. Interpretation of data is difficult due to interference sources including interaction of external bodies. In this investigation the authors developed a model to simulate the physical movement of the lower back. Theoretical accelerometery outputs were derived thus giving an ideal, or noiseless dataset. An experimental data collection apparatus was developed by adapting a system to the aquatic environment for investigation of swimming. Model data was compared against recorded data and showed strong correlation. Comparison of recorded and modeled data can be used to identify changes in body movement, this is especially useful when cyclic patterns are present in the activity. Strong correlations between data sets allowed development of signal processing algorithms for swimming stroke analysis using first the pure noiseless data set which were then applied to performance data. Video analysis was also used to validate study results and has shown potential to provide acceptable results.

  14. Infant speech-sound discrimination testing: effects of stimulus intensity and procedural model on measures of performance.

    Science.gov (United States)

    Nozza, R J

    1987-06-01

    Performance of infants in a speech-sound discrimination task (/ba/ vs /da/) was measured at three stimulus intensity levels (50, 60, and 70 dB SPL) using the operant head-turn procedure. The procedure was modified so that data could be treated as though from a single-interval (yes-no) procedure, as is commonly done, as well as if from a sustained attention (vigilance) task. Discrimination performance changed significantly with increase in intensity, suggesting caution in the interpretation of results from infant discrimination studies in which only single stimulus intensity levels within this range are used. The assumptions made about the underlying methodological model did not change the performance-intensity relationships. However, infants demonstrated response decrement, typical of vigilance tasks, which supports the notion that the head-turn procedure is represented best by the vigilance model. Analysis then was done according to a method designed for tasks with undefined observation intervals [C. S. Watson and T. L. Nichols, J. Acoust. Soc. Am. 59, 655-668 (1976)]. Results reveal that, while group data are reasonably well represented across levels of difficulty by the fixed-interval model, there is a variation in performance as a function of time following trial onset that could lead to underestimation of performance in some cases.

  15. Linear degrees of freedom in speech production: analysis of cineradio- and labio-film data and articulatory-acoustic modeling.

    Science.gov (United States)

    Beautemps, D; Badin, P; Bailly, G

    2001-05-01

    The following contribution addresses several issues concerning speech degrees of freedom in French oral vowels, stop, and fricative consonants based on an analysis of tongue and lip shapes extracted from cineradio- and labio-films. The midsagittal tongue shapes have been submitted to a linear decomposition where some of the loading factors were selected such as jaw and larynx position while four other components were derived from principal component analysis (PCA). For the lips, in addition to the more traditional protrusion and opening components, a supplementary component was extracted to explain the upward movement of both the upper and lower lips in [v] production. A linear articulatory model was developed; the six tongue degrees of freedom were used as the articulatory control parameters of the midsagittal tongue contours and explained 96% of the tongue data variance. These control parameters were also used to specify the frontal lip width dimension derived from the labio-film front views. Finally, this model was complemented by a conversion model going from the midsagittal to the area function, based on a fitting of the midsagittal distances and the formant frequencies for both vowels and consonants.

  16. Statistical mechanics of learning orthogonal signals for general covariance models

    International Nuclear Information System (INIS)

    Hoyle, David C

    2010-01-01

    Statistical mechanics techniques have proved to be useful tools in quantifying the accuracy with which signal vectors are extracted from experimental data. However, analysis has previously been limited to specific model forms for the population covariance C, which may be inappropriate for real world data sets. In this paper we obtain new statistical mechanical results for a general population covariance matrix C. For data sets consisting of p sample points in R N we use the replica method to study the accuracy of orthogonal signal vectors estimated from the sample data. In the asymptotic limit of N,p→∞ at fixed α = p/N, we derive analytical results for the signal direction learning curves. In the asymptotic limit the learning curves follow a single universal form, each displaying a retarded learning transition. An explicit formula for the location of the retarded learning transition is obtained and we find marked variation in the location of the retarded learning transition dependent on the distribution of population covariance eigenvalues. The results of the replica analysis are confirmed against simulation

  17. Designing speech for a recipient

    DEFF Research Database (Denmark)

    Fischer, Kerstin

    This study asks how speakers adjust their speech to their addressees, focusing on the potential roles of cognitive representations such as partner models, automatic processes such as interactive alignment, and social processes such as interactional negotiation. The nature of addressee orientation......, psycholinguistics and conversation analysis, and offers both overviews of child-directed, foreigner-directed and robot-directed speech and in-depth analyses of the processes involved in adjusting to a communication partner....

  18. Speech Processing.

    Science.gov (United States)

    1983-05-01

    The VDE system developed had the capability of recognizing up to 248 separate words in syntactic structures. 4 The two systems described are isolated...AND SPEAKER RECOGNITION by M.J.Hunt 5 ASSESSMENT OF SPEECH SYSTEMS ’ ..- * . by R.K.Moore 6 A SURVEY OF CURRENT EQUIPMENT AND RESEARCH’ by J.S.Bridle...TECHNOLOGY IN NAVY TRAINING SYSTEMS by R.Breaux, M.Blind and R.Lynchard 10 9 I-I GENERAL REVIEW OF MILITARY APPLICATIONS OF VOICE PROCESSING DR. BRUNO

  19. Using the PLUM procedure of SPSS to fit unequal variance and generalized signal detection models.

    Science.gov (United States)

    DeCarlo, Lawrence T

    2003-02-01

    The recent addition of aprocedure in SPSS for the analysis of ordinal regression models offers a simple means for researchers to fit the unequal variance normal signal detection model and other extended signal detection models. The present article shows how to implement the analysis and how to interpret the SPSS output. Examples of fitting the unequal variance normal model and other generalized signal detection models are given. The approach offers a convenient means for applying signal detection theory to a variety of research.

  20. Source Separation via Spectral Masking for Speech Recognition Systems

    Directory of Open Access Journals (Sweden)

    Gustavo Fernandes Rodrigues

    2012-12-01

    Full Text Available In this paper we present an insight into the use of spectral masking techniques in time-frequency domain, as a preprocessing step for the speech signal recognition. Speech recognition systems have their performance negatively affected in noisy environments or in the presence of other speech signals. The limits of these masking techniques for different levels of the signal-to-noise ratio are discussed. We show the robustness of the spectral masking techniques against four types of noise: white, pink, brown and human speech noise (bubble noise. The main contribution of this work is to analyze the performance limits of recognition systems  using spectral masking. We obtain an increase of 18% on the speech hit rate, when the speech signals were corrupted by other speech signals or bubble noise, with different signal-to-noise ratio of approximately 1, 10 and 20 dB. On the other hand, applying the ideal binary masks to mixtures corrupted by white, pink and brown noise, results an average growth of 9% on the speech hit rate, with the same different signal-to-noise ratio. The experimental results suggest that the masking spectral techniques are more suitable for the case when it is applied a bubble noise, which is produced by human speech, than for the case of applying white, pink and brown noise.

  1. Speech Intelligibility Evaluation for Mobile Phones

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Cubick, Jens; Dau, Torsten

    2015-01-01

    In the development process of modern telecommunication systems, such as mobile phones, it is common practice to use computer models to objectively evaluate the transmission quality of the system, instead of time-consuming perceptual listening tests. Such models have typically focused on the quality...... of the transmitted speech, while little or no attention has been provided to speech intelligibility. The present study investigated to what extent three state-of-the art speech intelligibility models could predict the intelligibility of noisy speech transmitted through mobile phones. Sentences from the Danish...... Dantale II speech material were mixed with three different kinds of background noise, transmitted through three different mobile phones, and recorded at the receiver via a local network simulator. The speech intelligibility of the transmitted sentences was assessed by six normal-hearing listeners...

  2. On the Evaluation of the Conversational Speech Quality in Telecommunications

    Directory of Open Access Journals (Sweden)

    Vincent Barriac

    2008-04-01

    Full Text Available We propose an objective method to assess speech quality in the conversational context by taking into account the talking and listening speech qualities and the impact of delay. This approach is applied to the results of four subjective tests on the effects of echo, delay, packet loss, and noise. The dataset is divided into training and validation sets. For the training set, a multiple linear regression is applied to determine a relationship between conversational, talking, and listening speech qualities and the delay value. The multiple linear regression leads to an accurate estimation of the conversational scores with high correlation and low error between subjective and estimated scores, both on the training and validation sets. In addition, a validation is performed on the data of a subjective test found in the literature which confirms the reliability of the regression. The relationship is then applied to an objective level by replacing talking and listening subjective scores with talking and listening objective scores provided by existing objective models, fed by speech signals recorded during the subjective tests. The conversational model achieves high performance as revealed by comparison with the test results and with the existing standard methodology “E-model,” presented in the ITU-T (International Telecommunication Union Recommendation G.107.

  3. Acquiring neural signals for developing a perception and cognition model

    Science.gov (United States)

    Li, Wei; Li, Yunyi; Chen, Genshe; Shen, Dan; Blasch, Erik; Pham, Khanh; Lynch, Robert

    2012-06-01

    The understanding of how humans process information, determine salience, and combine seemingly unrelated information is essential to automated processing of large amounts of information that is partially relevant, or of unknown relevance. Recent neurological science research in human perception, and in information science regarding contextbased modeling, provides us with a theoretical basis for using a bottom-up approach for automating the management of large amounts of information in ways directly useful for human operators. However, integration of human intelligence into a game theoretic framework for dynamic and adaptive decision support needs a perception and cognition model. For the purpose of cognitive modeling, we present a brain-computer-interface (BCI) based humanoid robot system to acquire brainwaves during human mental activities of imagining a humanoid robot-walking behavior. We use the neural signals to investigate relationships between complex humanoid robot behaviors and human mental activities for developing the perception and cognition model. The BCI system consists of a data acquisition unit with an electroencephalograph (EEG), a humanoid robot, and a charge couple CCD camera. An EEG electrode cup acquires brainwaves from the skin surface on scalp. The humanoid robot has 20 degrees of freedom (DOFs); 12 DOFs located on hips, knees, and ankles for humanoid robot walking, 6 DOFs on shoulders and arms for arms motion, and 2 DOFs for head yaw and pitch motion. The CCD camera takes video clips of the human subject's hand postures to identify mental activities that are correlated to the robot-walking behaviors. We use the neural signals to investigate relationships between complex humanoid robot behaviors and human mental activities for developing the perception and cognition model.

  4. Speech perception as an active cognitive process

    Directory of Open Access Journals (Sweden)

    Shannon eHeald

    2014-03-01

    Full Text Available One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processingd with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or

  5. Speech Entrainment Compensates for Broca's Area Damage

    Science.gov (United States)

    Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris

    2015-01-01

    Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to speech entrainment. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during speech entrainment versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of speech entrainment to improve speech production and may help select patients for speech entrainment treatment. PMID:25989443

  6. Hierarchic stochastic modelling applied to intracellular Ca(2+ signals.

    Directory of Open Access Journals (Sweden)

    Gregor Moenke

    Full Text Available Important biological processes like cell signalling and gene expression have noisy components and are very complex at the same time. Mathematical analysis of such systems has often been limited to the study of isolated subsystems, or approximations are used that are difficult to justify. Here we extend a recently published method (Thurley and Falcke, PNAS 2011 which is formulated in observable system configurations instead of molecular transitions. This reduces the number of system states by several orders of magnitude and avoids fitting of kinetic parameters. The method is applied to Ca(2+ signalling. Ca(2+ is a ubiquitous second messenger transmitting information by stochastic sequences of concentration spikes, which arise by coupling of subcellular Ca(2+ release events (puffs. We derive analytical expressions for a mechanistic Ca(2+ model, based on recent data from live cell imaging, and calculate Ca(2+ spike statistics in dependence on cellular parameters like stimulus strength or number of Ca(2+ channels. The new approach substantiates a generic Ca(2+ model, which is a very convenient way to simulate Ca(2+ spike sequences with correct spiking statistics.

  7. Linear collider signal of anomaly mediated supersymmetry breaking model

    International Nuclear Information System (INIS)

    Ghosh Dilip Kumar; Kundu, Anirban; Roy, Probir; Roy, Sourov

    2001-01-01

    Though the minimal model of anomaly mediated supersymmetry breaking has been significantly constrained by recent experimental and theoretical work, there are still allowed regions of the parameter space for moderate to large values of tan β. We show that these regions will be comprehensively probed in a √s = 1 TeV e + e - linear collider. Diagnostic signals to this end are studied by zeroing in on a unique and distinct feature of a large class of models in this genre: a neutral winolike Lightest Supersymmetric Particle closely degenerate in mass with a winolike chargino. The pair production processes e + e - → e tilde L ± e tilde L ± , e tilde R ± e tilde R ± , e tilde L ± e tilde R ± , ν tilde anti ν tilde, χ tilde 1 0 χ tilde 2 0 , χ tilde 2 0 χ tilde 2 0 are all considered at √s = 1 TeV corresponding to the proposed TESLA linear collider in two natural categories of mass ordering in the sparticle spectra. The signals analysed comprise multiple combinations of fast charged leptons (any of which can act as the trigger) plus displaced vertices X D (any of which can be identified by a heavy ionizing track terminating in the detector) and/or associated soft pions with characteristic momentum distributions. (author)

  8. Mathematical model with autoregressive process for electrocardiogram signals

    Science.gov (United States)

    Evaristo, Ronaldo M.; Batista, Antonio M.; Viana, Ricardo L.; Iarosz, Kelly C.; Szezech, José D., Jr.; Godoy, Moacir F. de

    2018-04-01

    The cardiovascular system is composed of the heart, blood and blood vessels. Regarding the heart, cardiac conditions are determined by the electrocardiogram, that is a noninvasive medical procedure. In this work, we propose autoregressive process in a mathematical model based on coupled differential equations in order to obtain the tachograms and the electrocardiogram signals of young adults with normal heartbeats. Our results are compared with experimental tachogram by means of Poincaré plot and dentrended fluctuation analysis. We verify that the results from the model with autoregressive process show good agreement with experimental measures from tachogram generated by electrical activity of the heartbeat. With the tachogram we build the electrocardiogram by means of coupled differential equations.

  9. Large-Signal DG-MOSFET Modelling for RFID Rectification

    Directory of Open Access Journals (Sweden)

    R. Rodríguez

    2016-01-01

    Full Text Available This paper analyses the undoped DG-MOSFETs capability for the operation of rectifiers for RFIDs and Wireless Power Transmission (WPT at microwave frequencies. For this purpose, a large-signal compact model has been developed and implemented in Verilog-A. The model has been numerically validated with a device simulator (Sentaurus. It is found that the number of stages to achieve the optimal rectifier performance is inferior to that required with conventional MOSFETs. In addition, the DC output voltage could be incremented with the use of appropriate mid-gap metals for the gate, as TiN. Minor impact of short channel effects (SCEs on rectification is also pointed out.

  10. Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

    Science.gov (United States)

    Kayasith, Prakasith; Theeramunkong, Thanaruk

    It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.

  11. Quantitative Models of Imperfect Deception in Network Security using Signaling Games with Evidence

    OpenAIRE

    Pawlick, Jeffrey; Zhu, Quanyan

    2017-01-01

    Deception plays a critical role in many interactions in communication and network security. Game-theoretic models called "cheap talk signaling games" capture the dynamic and information asymmetric nature of deceptive interactions. But signaling games inherently model undetectable deception. In this paper, we investigate a model of signaling games in which the receiver can detect deception with some probability. This model nests traditional signaling games and complete information Stackelberg ...

  12. An analysis of machine translation and speech synthesis in speech-to-speech translation system

    OpenAIRE

    Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.

    2011-01-01

    This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...

  13. Specialization in audiovisual speech perception: a replication study

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    Speech perception is audiovisual as evidenced by bimodal integration in the McGurk effect. This integration effect may be specific to speech or be applied to all stimuli in general. To investigate this, Tuomainen et al. (2005) used sine-wave speech, which naïve observers may perceive as non......-speech, but hear as speech once informed of the linguistic origin of the signal. Combinations of sine-wave speech and incongruent video of the talker elicited a McGurk effect only for informed observers. This indicates that the audiovisual integration effect is specific to speech perception. However, observers...... that observers did look near the mouth. We conclude that eye-movements did not influence the results of Tuomainen et al. and that their results thus can be taken as evidence of a speech specific mode of audiovisual integration underlying the McGurk illusion....

  14. Using Video Modeling to Teach Young Children with Autism Developmentally Appropriate Play and Connected Speech

    Science.gov (United States)

    Scheflen, Sarah Clifford; Freeman, Stephanny F. N.; Paparella, Tanya

    2012-01-01

    Four children with autism were taught play skills through the use of video modeling. Video instruction was used to model play and appropriate language through a developmental sequence of play levels integrated with language techniques. Results showed that children with autism could successfully use video modeling to learn how to play appropriately…

  15. Recent advances in nonlinear speech processing

    CERN Document Server

    Faundez-Zanuy, Marcos; Esposito, Antonietta; Cordasco, Gennaro; Drugman, Thomas; Solé-Casals, Jordi; Morabito, Francesco

    2016-01-01

    This book presents recent advances in nonlinear speech processing beyond nonlinear techniques. It shows that it exploits heuristic and psychological models of human interaction in order to succeed in the implementations of socially believable VUIs and applications for human health and psychological support. The book takes into account the multifunctional role of speech and what is “outside of the box” (see Björn Schuller’s foreword). To this aim, the book is organized in 6 sections, each collecting a small number of short chapters reporting advances “inside” and “outside” themes related to nonlinear speech research. The themes emphasize theoretical and practical issues for modelling socially believable speech interfaces, ranging from efforts to capture the nature of sound changes in linguistic contexts and the timing nature of speech; labors to identify and detect speech features that help in the diagnosis of psychological and neuronal disease, attempts to improve the effectiveness and performa...

  16. A new time-adaptive discrete bionic wavelet transform for enhancing speech from adverse noise environment

    Science.gov (United States)

    Palaniswamy, Sumithra; Duraisamy, Prakash; Alam, Mohammad Showkat; Yuan, Xiaohui

    2012-04-01

    Automatic speech processing systems are widely used in everyday life such as mobile communication, speech and speaker recognition, and for assisting the hearing impaired. In speech communication systems, the quality and intelligibility of speech is of utmost importance for ease and accuracy of information exchange. To obtain an intelligible speech signal and one that is more pleasant to listen, noise reduction is essential. In this paper a new Time Adaptive Discrete Bionic Wavelet Thresholding (TADBWT) scheme is proposed. The proposed technique uses Daubechies mother wavelet to achieve better enhancement of speech from additive non- stationary noises which occur in real life such as street noise and factory noise. Due to the integration of human auditory system model into the wavelet transform, bionic wavelet transform (BWT) has great potential for speech enhancement which may lead to a new path in speech processing. In the proposed technique, at first, discrete BWT is applied to noisy speech to derive TADBWT coefficients. Then the adaptive nature of the BWT is captured by introducing a time varying linear factor which updates the coefficients at each scale over time. This approach has shown better performance than the existing algorithms at lower input SNR due to modified soft level dependent thresholding on time adaptive coefficients. The objective and subjective test results confirmed the competency of the TADBWT technique. The effectiveness of the proposed technique is also evaluated for speaker recognition task under noisy environment. The recognition results show that the TADWT technique yields better performance when compared to alternate methods specifically at lower input SNR.

  17. Interdependent processing and encoding of speech and concurrent background noise.

    Science.gov (United States)

    Cooper, Angela; Brouwer, Susanne; Bradlow, Ann R

    2015-05-01

    Speech processing can often take place in adverse listening conditions that involve the mixing of speech and background noise. In this study, we investigated processing dependencies between background noise and indexical speech features, using a speeded classification paradigm (Garner, 1974; Exp. 1), and whether background noise is encoded and represented in memory for spoken words in a continuous recognition memory paradigm (Exp. 2). Whether or not the noise spectrally overlapped with the speech signal was also manipulated. The results of Experiment 1 indicated that background noise and indexical features of speech (gender, talker identity) cannot be completely segregated during processing, even when the two auditory streams are spectrally nonoverlapping. Perceptual interference was asymmetric, whereby irrelevant indexical feature variation in the speech signal slowed noise classification to a greater extent than irrelevant noise variation slowed speech classification. This asymmetry may stem from the fact that speech features have greater functional relevance to listeners, and are thus more difficult to selectively ignore than background noise. Experiment 2 revealed that a recognition cost for words embedded in different types of background noise on the first and second occurrences only emerged when the noise and the speech signal were spectrally overlapping. Together, these data suggest integral processing of speech and background noise, modulated by the level of processing and the spectral separation of the speech and noise.

  18. Efficient Blind System Identification of Non-Gaussian Auto-Regressive Models with HMM Modeling of the Excitation

    DEFF Research Database (Denmark)

    Li, Chunjian; Andersen, Søren Vang

    2007-01-01

    We propose two blind system identification methods that exploit the underlying dynamics of non-Gaussian signals. The two signal models to be identified are: an Auto-Regressive (AR) model driven by a discrete-state Hidden Markov process, and the same model whose output is perturbed by white Gaussi...... outputs. The signal models are general and suitable to numerous important signals, such as speech signals and base-band communication signals. Applications to speech analysis and blind channel equalization are given to exemplify the efficiency of the new methods....

  19. Non-fluent speech following stroke is caused by impaired efference copy.

    Science.gov (United States)

    Feenaughty, Lynda; Basilakos, Alexandra; Bonilha, Leonardo; den Ouden, Dirk-Bart; Rorden, Chris; Stark, Brielle; Fridriksson, Julius

    2017-09-01

    Efference copy is a cognitive mechanism argued to be critical for initiating and monitoring speech: however, the extent to which breakdown of efference copy mechanisms impact speech production is unclear. This study examined the best mechanistic predictors of non-fluent speech among 88 stroke survivors. Objective speech fluency measures were subjected to a principal component analysis (PCA). The primary PCA factor was then entered into a multiple stepwise linear regression analysis as the dependent variable, with a set of independent mechanistic variables. Participants' ability to mimic audio-visual speech ("speech entrainment response") was the best independent predictor of non-fluent speech. We suggest that this "speech entrainment" factor reflects integrity of internal monitoring (i.e., efference copy) of speech production, which affects speech initiation and maintenance. Results support models of normal speech production and suggest that therapy focused on speech initiation and maintenance may improve speech fluency for individuals with chronic non-fluent aphasia post stroke.

  20. Sensorimotor Representation of Speech Perception. Cross-Decoding of Place of Articulation Features during Selective Attention to Syllables in 7T fMRI

    NARCIS (Netherlands)

    Archila-Meléndez, Mario E.; Valente, Giancarlo; Correia, Joao M.; Rouhl, Rob P. W.; van Kranen-Mastenbroek, Vivianne H.; Jansma, Bernadette M.

    2018-01-01

    Sensorimotor integration, the translation between acoustic signals and motoric programs, may constitute a crucial mechanism for speech. During speech perception, the acoustic-motoric translations include the recruitment of cortical areas for the representation of speech articulatory features, such

  1. Evidence-Based Occupational Hearing Screening I: Modeling the Effects of Real-World Noise Environments on the Likelihood of Effective Speech Communication.

    Science.gov (United States)

    Soli, Sigfrid D; Giguère, Christian; Laroche, Chantal; Vaillancourt, Véronique; Dreschler, Wouter A; Rhebergen, Koenraad S; Harkins, Kevin; Ruckstuhl, Mark; Ramulu, Pradeep; Meyers, Lawrence S

    corrections environments. The likelihood of effective speech communication at communication distances of 0.5 and 1 m was often less than 0.50 for normal vocal effort. Likelihood values often increased to 0.80 or more when raised or loud vocal effort was used. Effective speech communication at and beyond 5 m was often unlikely, regardless of vocal effort. ESII modeling of nonstationary real-world noise environments may prove an objective means of characterizing their impact on the likelihood of effective speech communication. The normative reference provided by these measures predicts the extent to which hearing impairments that increase the ESII value required for effective speech communication also decrease the likelihood of effective speech communication. These predictions may provide an objective evidence-based link between the essential hearing-critical job task requirements of public safety and law enforcement personnel and ESII-based hearing assessment of individuals who seek to perform these jobs.

  2. Speech and Language Delay

    Science.gov (United States)

    ... OTC Relief for Diarrhea Home Diseases and Conditions Speech and Language Delay Condition Speech and Language Delay Share Print Table of Contents1. ... Treatment6. Everyday Life7. Questions8. Resources What is a speech and language delay? A speech and language delay ...

  3. Modelling noninvasively measured cerebral signals during a hypoxemia challenge: steps towards individualised modelling.

    Directory of Open Access Journals (Sweden)

    Beth Jelfs

    Full Text Available Noninvasive approaches to measuring cerebral circulation and metabolism are crucial to furthering our understanding of brain function. These approaches also have considerable potential for clinical use "at the bedside". However, a highly nontrivial task and precondition if such methods are to be used routinely is the robust physiological interpretation of the data. In this paper, we explore the ability of a previously developed model of brain circulation and metabolism to explain and predict quantitatively the responses of physiological signals. The five signals all noninvasively-measured during hypoxemia in healthy volunteers include four signals measured using near-infrared spectroscopy along with middle cerebral artery blood flow measured using transcranial Doppler flowmetry. We show that optimising the model using partial data from an individual can increase its predictive power thus aiding the interpretation of NIRS signals in individuals. At the same time such optimisation can also help refine model parametrisation and provide confidence intervals on model parameters. Discrepancies between model and data which persist despite model optimisation are used to flag up important questions concerning the underlying physiology, and the reliability and physiological meaning of the signals.

  4. Audiovisual Temporal Recalibration for Speech in Synchrony Perception and Speech Identification

    Science.gov (United States)

    Asakawa, Kaori; Tanaka, Akihiro; Imai, Hisato

    We investigated whether audiovisual synchrony perception for speech could change after observation of the audiovisual temporal mismatch. Previous studies have revealed that audiovisual synchrony perception is re-calibrated after exposure to a constant timing difference between auditory and visual signals in non-speech. In the present study, we examined whether this audiovisual temporal recalibration occurs at the perceptual level even for speech (monosyllables). In Experiment 1, participants performed an audiovisual simultaneity judgment task (i.e., a direct measurement of the audiovisual synchrony perception) in terms of the speech signal after observation of the speech stimuli which had a constant audiovisual lag. The results showed that the “simultaneous” responses (i.e., proportion of responses for which participants judged the auditory and visual stimuli to be synchronous) at least partly depended on exposure lag. In Experiment 2, we adopted the McGurk identification task (i.e., an indirect measurement of the audiovisual synchrony perception) to exclude the possibility that this modulation of synchrony perception was solely attributable to the response strategy using stimuli identical to those of Experiment 1. The characteristics of the McGurk effect reported by participants depended on exposure lag. Thus, it was shown that audiovisual synchrony perception for speech could be modulated following exposure to constant lag both in direct and indirect measurement. Our results suggest that temporal recalibration occurs not only in non-speech signals but also in monosyllabic speech at the perceptual level.

  5. Spectral integration in speech and non-speech sounds

    Science.gov (United States)

    Jacewicz, Ewa

    2005-04-01

    Spectral integration (or formant averaging) was proposed in vowel perception research to account for the observation that a reduction of the intensity of one of two closely spaced formants (as in /u/) produced a predictable shift in vowel quality [Delattre et al., Word 8, 195-210 (1952)]. A related observation was reported in psychoacoustics, indicating that when the components of a two-tone periodic complex differ in amplitude and frequency, its perceived pitch is shifted toward that of the more intense tone [Helmholtz, App. XIV (1875/1948)]. Subsequent research in both fields focused on the frequency interval that separates these two spectral components, in an attempt to determine the size of the bandwidth for spectral integration to occur. This talk will review the accumulated evidence for and against spectral integration within the hypothesized limit of 3.5 Bark for static and dynamic signals in speech perception and psychoacoustics. Based on similarities in the processing of speech and non-speech sounds, it is suggested that spectral integration may reflect a general property of the auditory system. A larger frequency bandwidth, possibly close to 3.5 Bark, may be utilized in integrating acoustic information, including speech, complex signals, or sound quality of a violin.

  6. Channel modeling, signal processing and coding for perpendicular magnetic recording

    Science.gov (United States)

    Wu, Zheng

    With the increasing areal density in magnetic recording systems, perpendicular recording has replaced longitudinal recording to overcome the superparamagnetic limit. Studies on perpendicular recording channels including aspects of channel modeling, signal processing and coding techniques are presented in this dissertation. To optimize a high density perpendicular magnetic recording system, one needs to know the tradeoffs between various components of the system including the read/write transducers, the magnetic medium, and the read channel. We extend the work by Chaichanavong on the parameter optimization for systems via design curves. Different signal processing and coding techniques are studied. Information-theoretic tools are utilized to determine the acceptable region for the channel parameters when optimal detection and linear coding techniques are used. Our results show that a considerable gain can be achieved by the optimal detection and coding techniques. The read-write process in perpendicular magnetic recording channels includes a number of nonlinear effects. Nonlinear transition shift (NLTS) is one of them. The signal distortion induced by NLTS can be reduced by write precompensation during data recording. We numerically evaluate the effect of NLTS on the read-back signal and examine the effectiveness of several write precompensation schemes in combating NLTS in a channel characterized by both transition jitter noise and additive white Gaussian electronics noise. We also present an analytical method to estimate the bit-error-rate and use it to help determine the optimal write precompensation values in multi-level precompensation schemes. We propose a mean-adjusted pattern-dependent noise predictive (PDNP) detection algorithm for use on the channel with NLTS. We show that this detector can offer significant improvements in bit-error-rate (BER) compared to conventional Viterbi and PDNP detectors. Moreover, the system performance can be further improved by

  7. Speech emotion recognition methods: A literature review

    Science.gov (United States)

    Basharirad, Babak; Moradhaseli, Mohammadreza

    2017-10-01

    Recently, attention of the emotional speech signals research has been boosted in human machine interfaces due to availability of high computation capability. There are many systems proposed in the literature to identify the emotional state through speech. Selection of suitable feature sets, design of a proper classifications methods and prepare an appropriate dataset are the main key issues of speech emotion recognition systems. This paper critically analyzed the current available approaches of speech emotion recognition methods based on the three evaluating parameters (feature set, classification of features, accurately usage). In addition, this paper also evaluates the performance and limitations of available methods. Furthermore, it highlights the current promising direction for improvement of speech emotion recognition systems.

  8. Fast Monaural Separation of Speech

    DEFF Research Database (Denmark)

    Pontoppidan, Niels Henrik; Dyrholm, Mads

    2003-01-01

    a Factorial Hidden Markov Model, with non-stationary assumptions on the source autocorrelations modelled through the Factorial Hidden Markov Model, leads to separation in the monaural case. By extending Hansens work we find that Roweis' assumptions are necessary for monaural speech separation. Furthermore we...

  9. Modeling and processing of laser Doppler reactive hyperaemia signals

    Science.gov (United States)

    Humeau, Anne; Saumet, Jean-Louis; L'Huiller, Jean-Pierre

    2003-07-01

    Laser Doppler flowmetry is a non-invasive method used in the medical domain to monitor the microvascular blood cell perfusion through tissue. Most commercial laser Doppler flowmeters use an algorithm calculating the first moment of the power spectral density to give the perfusion value. Many clinical applications measure the perfusion after a vascular provocation such as a vascular occlusion. The response obtained is then called reactive hyperaemia. Target pathologies include diabetes, hypertension and peripheral arterial occlusive diseases. In order to have a deeper knowledge on reactive hyperaemia acquired by the laser Doppler technique, the present work first proposes two models (one analytical and one numerical) of the observed phenomenon. Then, a study on the multiple scattering between photons and red blood cells occurring during reactive hyperaemia is carried out. Finally, a signal processing that improves the diagnosis of peripheral arterial occlusive diseases is presented.

  10. Speech enhancement in the Karhunen-Loeve expansion domain

    CERN Document Server

    Benesty, Jacob

    2011-01-01

    This book is devoted to the study of the problem of speech enhancement whose objective is the recovery of a signal of interest (i.e., speech) from noisy observations. Typically, the recovery process is accomplished by passing the noisy observations through a linear filter (or a linear transformation). Since both the desired speech and undesired noise are filtered at the same time, the most critical issue of speech enhancement resides in how to design a proper optimal filter that can fully take advantage of the difference between the speech and noise statistics to mitigate the noise effect as m

  11. Modeling the photoacoustic signal during the porous silicon formation

    Science.gov (United States)

    Ramirez-Gutierrez, C. F.; Castaño-Yepes, J. D.; Rodriguez-García, M. E.

    2017-01-01

    Within this work, the kinetics of the growing stage of porous silicon (PS) during the etching process was studied using the photoacoustic technique. A p-type Si with low resistivity was used as a substrate. An extension of the Rosencwaig and Gersho model is proposed in order to analyze the temporary changes that take place in the amplitude of the photoacoustic signal during the PS growth. The solution of the heat equation takes into account the modulated laser beam, the changes in the reflectance of the PS-backing heterostructure, the electrochemical reaction, and the Joule effect as thermal sources. The model includes the time-dependence of the sample thickness during the electrochemical etching of PS. The changes in the reflectance are identified as the laser reflections in the internal layers of the system. The reflectance is modeled by an additional sinusoidal-monochromatic light source and its modulated frequency is related to the velocity of the PS growth. The chemical reaction and the DC components of the heat sources are taken as an average value from the experimental data. The theoretical results are in agreement with the experimental data and hence provided a method to determine variables of the PS growth, such as the etching velocity and the thickness of the porous layer during the growing process.

  12. Finite element speaker-specific face model generation for the study of speech production.

    Science.gov (United States)

    Bucki, Marek; Nazari, Mohammad Ali; Payan, Yohan

    2010-08-01

    In situations where automatic mesh generation is unsuitable, the finite element (FE) mesh registration technique known as mesh-match-and-repair (MMRep) is an interesting option for quickly creating a subject-specific FE model by fitting a predefined template mesh onto the target organ. The irregular or poor quality elements produced by the elastic deformation are corrected by a 'mesh reparation' procedure ensuring that the desired regularity and quality standards are met. Here, we further extend the MMRep capabilities and demonstrate the possibility of taking into account additional relevant anatomical features. We illustrate this approach with an example of biomechanical model generation of a speaker's face comprising face muscle insertions. While taking advantage of the a priori knowledge about tissues conveyed by the template model, this novel, fast and automatic mesh registration technique makes it possible to achieve greater modelling realism by accurately representing the organ surface as well as inner anatomical or functional structures of interest.

  13. Discriminative learning for speech recognition

    CERN Document Server

    He, Xiadong

    2008-01-01

    In this book, we introduce the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition. The specific models treated in depth include the widely used exponential-family distributions and the hidden Markov model. A detailed study is presented on unifying the common objective functions for discriminative learning in speech recognition, namely maximum mutual information (MMI), minimum classification error, and minimum phone/word error. The unification is presented, with rigorous mathematical analysis, in a common rational-functio

  14. Prediction and constraint in audiovisual speech perception.

    Science.gov (United States)

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  15. Prediction and constraint in audiovisual speech perception

    Science.gov (United States)

    Peelle, Jonathan E.; Sommers, Mitchell S.

    2015-01-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported

  16. A signaling model of foreign direct investment attraction

    Directory of Open Access Journals (Sweden)

    Marcelo de C. Griebeler

    2017-09-01

    Full Text Available Foreign direct investors face uncertainty about government's type of the host country. In a two period game, we allow the host country's government to mitigate such uncertainty by sending a signal through fiscal policy. Our main finding states that a populist government may mimic a conservative one in order to attract foreign direct investment (FDI, and this choice depends mainly on its impatience degree and the originally planned FDI stock. We highlight the role of the government's reputation in attracting foreign capital and thus provide some policy implications. Moreover, our model explains why some governments considered to be populist adopt conservative policies in the beginning of its terms of office. Resumo: Investidores estrangeiros diretos são incertos sobre o tipo do governo do país onde desejam investir. Em um jogo de dois períodos, permitimos que o governo de tal país mitigue essa incerteza ao enviar um sinal através da política fiscal. Nosso principal resultado estabelece que um governo populista pode imitar um conservador a fim de atrair investimento estrangeiro direto (IED, e essa escolha depende principalmente do grau de impaciência e do estoque de IED originalmente planejado. Destacamos o papel da reputação do governo em atrair capital externo e assim fornecemos algumas recomendações de política. Além disso, nosso modelo explica porque alguns governos considerados populistas adotam políticas conservadores no início do seus mandatos. JEL classification: F41, F34, C72, Keywords: Signaling, Foreign direct investment, Game theory, Palavras-chave: Sinalização, Investimento estrangeiro direto, Teoria dos jogos

  17. Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise.

    Science.gov (United States)

    Cao, Shuyang; Li, Liang; Wu, Xihong

    2011-04-01

    When a target-speech/masker mixture is processed with the signal-separation technique, ideal binary mask (IBM), intelligibility of target speech is remarkably improved in both normal-hearing listeners and hearing-impaired listeners. Intelligibility of speech can also be improved by filling in speech gaps with un-modulated broadband noise. This study investigated whether intelligibility of target speech in the IBM-treated target-speech/masker mixture can be further improved by adding a broadband-noise background. The results of this study show that following the IBM manipulation, which remarkably released target speech from speech-spectrum noise, foreign-speech, or native-speech masking (experiment 1), adding a broadband-noise background with the signal-to-noise ratio no less than 4 dB significantly improved intelligibility of target speech when the masker was either noise (experiment 2) or speech (experiment 3). The results suggest that since adding the noise background shallows the areas of silence in the time-frequency domain of the IBM-treated target-speech/masker mixture, the abruption of transient changes in the mixture is smoothed and the perceived continuity of target-speech components becomes enhanced, leading to improved target-speech intelligibility. The findings are useful for advancing computational auditory scene analysis, hearing-aid/cochlear-implant designs, and understanding of speech perception under "cocktail-party" conditions.

  18. Understanding speech when wearing communication headsets and hearing protectors with subband processing.

    Science.gov (United States)

    Brammer, Anthony J; Yu, Gongqiang; Bernstein, Eric R; Cherniack, Martin G; Peterson, Donald R; Tufts, Jennifer B

    2014-08-01

    An adaptive, delayless, subband feed-forward control structure is employed to improve the speech signal-to-noise ratio (SNR) in the communication channel of a circumaural headset/hearing protector (HPD) from 90 Hz to 11.3 kHz, and to provide active noise control (ANC) from 50 to 800 Hz to complement the passive attenuation of the HPD. The task involves optimizing the speech SNR for each communication channel subband, subject to limiting the maximum sound level at the ear, maintaining a speech SNR preferred by users, and reducing large inter-band gain differences to improve speech quality. The performance of a proof-of-concept device has been evaluated in a pseudo-diffuse sound field when worn by human subjects under conditions of environmental noise and speech that do not pose a risk to hearing, and by simulation for other conditions. For the environmental noises employed in this study, subband speech SNR control combined with subband ANC produced greater improvement in word scores than subband ANC alone, and improved the consistency of word scores across subjects. The simulation employed a subject-specific linear model, and predicted that word scores are maintained in excess of 90% for sound levels outside the HPD of up to ∼115 dBA.

  19. Two-Microphone Separation of Speech Mixtures

    DEFF Research Database (Denmark)

    Pedersen, Michael Syskind; Wang, DeLiang; Larsen, Jan

    2008-01-01

    combined, independent component analysis (ICA) and binary time–frequency (T–F) masking. By estimating binary masks from the outputs of an ICA algorithm, it is possible in an iterative way to extract basis speech signals from a convolutive mixture. The basis signals are afterwards improved by grouping...

  20. Context-dependent modelling of English vowels in Sepedi code-switched speech

    CSIR Research Space (South Africa)

    Modipa, TI

    2012-11-01

    Full Text Available multilingual systems (combining dictionaries, language and/or acoustic models from multiple languages) or by running more than one monolingual system in parallel, switching from the one to the other [2], [3]. We are interested in the first approach.... DATA In this section we describe the data used during experiments: the audio corpora, phone sets and dictionaries. A. Audio corpora We use two different audio corpora for the experiments: a general Sepedi corpus (NCHLT [8]) and a custom...

  1. Infants' brain responses to speech suggest analysis by synthesis.

    Science.gov (United States)

    Kuhl, Patricia K; Ramírez, Rey R; Bosseler, Alexis; Lin, Jo-Fu Lotus; Imada, Toshiaki

    2014-08-05

    Historic theories of speech perception (Motor Theory and Analysis by Synthesis) invoked listeners' knowledge of speech production to explain speech perception. Neuroimaging data show that adult listeners activate motor brain areas during speech perception. In two experiments using magnetoencephalography (MEG), we investigated motor brain activation, as well as auditory brain activation, during discrimination of native and nonnative syllables in infants at two ages that straddle the developmental transition from language-universal to language-specific speech perception. Adults are also tested in Exp. 1. MEG data revealed that 7-mo-old infants activate auditory (superior temporal) as well as motor brain areas (Broca's area, cerebellum) in response to speech, and equivalently for native and nonnative syllables. However, in 11- and 12-mo-old infants, native speech activates auditory brain areas to a greater degree than nonnative, whereas nonnative speech activates motor brain areas to a greater degree than native speech. This double dissociation in 11- to 12-mo-old infants matches the pattern of results obtained in adult listeners. Our infant data are consistent with Analysis by Synthesis: auditory analysis of speech is coupled with synthesis of the motor plans necessary to produce the speech signal. The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of "motherese" on early language learning, and (iii) the "social-gating" hypothesis and humans' development of social understanding.

  2. Cortical activity patterns predict robust speech discrimination ability in noise

    Science.gov (United States)

    Shetake, Jai A.; Wolf, Jordan T.; Cheung, Ryan J.; Engineer, Crystal T.; Ram, Satyananda K.; Kilgard, Michael P.

    2012-01-01

    The neural mechanisms that support speech discrimination in noisy conditions are poorly understood. In quiet conditions, spike timing information appears to be used in the discrimination of speech sounds. In this study, we evaluated the hypothesis that spike timing is also used to distinguish between speech sounds in noisy conditions that significantly degrade neural responses to speech sounds. We tested speech sound discrimination in rats and recorded primary auditory cortex (A1) responses to speech sounds in background noise of different intensities and spectral compositions. Our behavioral results indicate that rats, like humans, are able to accurately discriminate consonant sounds even in the presence of background noise that is as loud as the speech signal. Our neural recordings confirm that speech sounds evoke degraded but detectable responses in noise. Finally, we developed a novel neural classifier that mimics behavioral discrimination. The classifier discriminates between speech sounds by comparing the A1 spatiotemporal activity patterns evoked on single trials with the average spatiotemporal patterns evoked by known sounds. Unlike classifiers in most previous studies, this classifier is not provided with the stimulus onset time. Neural activity analyzed with the use of relative spike timing was well correlated with behavioral speech discrimination in quiet and in noise. Spike timing information integrated over longer intervals was required to accurately predict rat behavioral speech discrimination in noisy conditions. The similarity of neural and behavioral discrimination of speech in noise suggests that humans and rats may employ similar brain mechanisms to solve this problem. PMID:22098331

  3. Speech and audio processing for coding, enhancement and recognition

    CERN Document Server

    Togneri, Roberto; Narasimha, Madihally

    2015-01-01

    This book describes the basic principles underlying the generation, coding, transmission and enhancement of speech and audio signals, including advanced statistical and machine learning techniques for speech and speaker recognition with an overview of the key innovations in these areas. Key research undertaken in speech coding, speech enhancement, speech recognition, emotion recognition and speaker diarization are also presented, along with recent advances and new paradigms in these areas. ·         Offers readers a single-source reference on the significant applications of speech and audio processing to speech coding, speech enhancement and speech/speaker recognition. Enables readers involved in algorithm development and implementation issues for speech coding to understand the historical development and future challenges in speech coding research; ·         Discusses speech coding methods yielding bit-streams that are multi-rate and scalable for Voice-over-IP (VoIP) Networks; ·     �...

  4. Measures of metacognition on signal-detection theoretic models.

    Science.gov (United States)

    Barrett, Adam B; Dienes, Zoltan; Seth, Anil K

    2013-12-01

    Analyzing metacognition, specifically knowledge of accuracy of internal perceptual, memorial, or other knowledge states, is vital for many strands of psychology, including determining the accuracy of feelings of knowing and discriminating conscious from unconscious cognition. Quantifying metacognitive sensitivity is however more challenging than quantifying basic stimulus sensitivity. Under popular signal-detection theory (SDT) models for stimulus classification tasks, approaches based on Type II receiver-operating characteristic (ROC) curves or Type II d-prime risk confounding metacognition with response biases in either the Type I (classification) or Type II (metacognitive) tasks. A new approach introduces meta-d': The Type I d-prime that would have led to the observed Type II data had the subject used all the Type I information. Here, we (a) further establish the inconsistency of the Type II d-prime and ROC approaches with new explicit analyses of the standard SDT model and (b) analyze, for the first time, the behavior of meta-d' under nontrivial scenarios, such as when metacognitive judgments utilize enhanced or degraded versions of the Type I evidence. Analytically, meta-d' values typically reflect the underlying model well and are stable under changes in decision criteria; however, in relatively extreme cases, meta-d' can become unstable. We explore bias and variance of in-sample measurements of meta-d' and supply MATLAB code for estimation in general cases. Our results support meta-d' as a useful measure of metacognition and provide rigorous methodology for its application. Our recommendations are useful for any researchers interested in assessing metacognitive accuracy. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  5. Fetal QRS extraction from abdominal recordings via model-based signal processing and intelligent signal merging

    International Nuclear Information System (INIS)

    Haghpanahi, Masoumeh; Borkholder, David A

    2014-01-01

    Noninvasive fetal ECG (fECG) monitoring has potential applications in diagnosing congenital heart diseases in a timely manner and assisting clinicians to make more appropriate decisions during labor. However, despite advances in signal processing and machine learning techniques, the analysis of fECG signals has still remained in its preliminary stages. In this work, we describe an algorithm to automatically locate QRS complexes in noninvasive fECG signals obtained from a set of four electrodes placed on the mother’s abdomen. The algorithm is based on an iterative decomposition of the maternal and fetal subspaces and filtering of the maternal ECG (mECG) components from the fECG recordings. Once the maternal components are removed, a novel merging technique is applied to merge the signals and detect the fetal QRS (fQRS) complexes. The algorithm was trained and tested on the fECG datasets provided by the PhysioNet/CinC challenge 2013. The final results indicate that the algorithm is able to detect fetal peaks for a variety of signals with different morphologies and strength levels encountered in clinical practice. (paper)

  6. General-Purpose Monitoring during Speech Production

    Science.gov (United States)

    Ries, Stephanie; Janssen, Niels; Dufau, Stephane; Alario, F.-Xavier; Burle, Boris

    2011-01-01

    The concept of "monitoring" refers to our ability to control our actions on-line. Monitoring involved in speech production is often described in psycholinguistic models as an inherent part of the language system. We probed the specificity of speech monitoring in two psycholinguistic experiments where electroencephalographic activities were…

  7. HMM Adaptation for child speech synthesis

    CSIR Research Space (South Africa)

    Govender, Avashna

    2015-09-01

    Full Text Available Hidden Markov Model (HMM)-based synthesis in combination with speaker adaptation has proven to be an approach that is well-suited for child speech synthesis. This paper describes the development and evaluation of different HMM-based child speech...

  8. Lung cancer, intracellular signaling pathways, and preclinical models

    International Nuclear Information System (INIS)

    Mordant, P.

    2012-01-01

    Non-small cell lung cancer (NSCLC) is the leading cause of cancer-related mortality worldwide. Activation of phosphatidylinositol-3-kinase (PI3K)-AKT and Kirsten rat sarcoma viral oncogene homologue (KRAS) can induce cellular immortalization, proliferation, and resistance to anticancer therapeutics such as epidermal growth factor receptor inhibitors or chemotherapy. This study assessed the consequences of inhibiting these two pathways in tumor cells with activation of KRAS, PI3K-AKT, or both. We investigated whether the combination of a novel RAF/vascular endothelial growth factor receptor inhibitor, RAF265, with a mammalian target of rapamycin (mTOR) inhibitor, RAD001 (everolimus), could lead to enhanced anti-tumoral effects in vitro and in vivo. To address this question, we used cell lines with different status regarding KRAS, PIK3CA, and BRAF mutations, using immunoblotting to evaluate the inhibitors, and MTT and clonogenic assays for effects on cell viability and proliferation. Subcutaneous xenografts were used to assess the activity of the combination in vivo. RAD001 inhibited mTOR downstream signaling in all cell lines, whereas RAF265 inhibited RAF downstream signaling only in BRAF mutant cells. In vitro, addition of RAF265 to RAD001 led to decreased AKT, S6, and Eukaryotic translation initiation factor 4E binding protein 1 phosphorylation in HCT116 cells. In vitro and in vivo, RAD001 addition enhanced the anti-tumoral effect of RAF265 in HCT116 and H460 cells (both KRAS mut, PIK3CA mut); in contrast, the combination of RAF265 and RAD001 yielded no additional activity in A549 and MDAMB231 cells. The combination of RAF and mTOR inhibitors is effective for enhancing anti-tumoral effects in cells with deregulation of both RAS-RAF and PI3K, possibly through the cross-inhibition of 4E binding protein 1 and S6 protein. We then focus on animal models. Preclinical models of NSCLC require better clinical relevance to study disease mechanisms and innovative

  9. Inner Speech and Clarity of Self-Concept in Thought Disorder and Auditory-Verbal Hallucinations.

    Science.gov (United States)

    de Sousa, Paulo; Sellwood, William; Spray, Amy; Fernyhough, Charles; Bentall, Richard P

    2016-12-01

    Eighty patients and thirty controls were interviewed using one interview that promoted personal disclosure and another about everyday topics. Speech was scored using the Thought, Language and Communication scale (TLC). All participants completed the Self-Concept Clarity Scale (SCCS) and the Varieties of Inner Speech Questionnaire (VISQ). Patients scored lower than comparisons on the SCCS. Low scores were associated the disorganized dimension of TD. Patients also scored significantly higher on condensed and other people in inner speech, but not on dialogical or evaluative inner speech. The poverty of speech dimension of TD was associated with less dialogical inner speech, other people in inner speech, and less evaluative inner speech. Hallucinations were significantly associated with more other people in inner speech and evaluative inner speech. Clarity of self-concept and qualities of inner speech are differentially associated with dimensions of TD. The findings also support inner speech models of hallucinations.

  10. Inner Speech and Clarity of Self-Concept in Thought Disorder and Auditory-Verbal Hallucinations

    Science.gov (United States)

    de Sousa, Paulo; Sellwood, William; Spray, Amy; Fernyhough, Charles; Bentall, Richard P.

    2016-01-01

    Abstract Eighty patients and thirty controls were interviewed using one interview that promoted personal disclosure and another about everyday topics. Speech was scored using the Thought, Language and Communication scale (TLC). All participants completed the Self-Concept Clarity Scale (SCCS) and the Varieties of Inner Speech Questionnaire (VISQ). Patients scored lower than comparisons on the SCCS. Low scores were associated the disorganized dimension of TD. Patients also scored significantly higher on condensed and other people in inner speech, but not on dialogical or evaluative inner speech. The poverty of speech dimension of TD was associated with less dialogical inner speech, other people in inner speech, and less evaluative inner speech. Hallucinations were significantly associated with more other people in inner speech and evaluative inner speech. Clarity of self-concept and qualities of inner speech are differentially associated with dimensions of TD. The findings also support inner speech models of hallucinations. PMID:27898489

  11. A Multi-Model Stereo Similarity Function Based on Monogenic Signal Analysis in Poisson Scale Space

    Directory of Open Access Journals (Sweden)

    Jinjun Li

    2011-01-01

    Full Text Available A stereo similarity function based on local multi-model monogenic image feature descriptors (LMFD is proposed to match interest points and estimate disparity map for stereo images. Local multi-model monogenic image features include local orientation and instantaneous phase of the gray monogenic signal, local color phase of the color monogenic signal, and local mean colors in the multiscale color monogenic signal framework. The gray monogenic signal, which is the extension of analytic signal to gray level image using Dirac operator and Laplace equation, consists of local amplitude, local orientation, and instantaneous phase of 2D image signal. The color monogenic signal is the extension of monogenic signal to color image based on Clifford algebras. The local color phase can be estimated by computing geometric product between the color monogenic signal and a unit reference vector in RGB color space. Experiment results on the synthetic and natural stereo images show the performance of the proposed approach.

  12. Increasing Early Childhood Educators' Use of Communication-Facilitating and Language-Modelling Strategies: Brief Speech and Language Therapy Training

    Science.gov (United States)

    McDonald, David; Proctor, Penny; Gill, Wendy; Heaven, Sue; Marr, Jane; Young, Jane

    2015-01-01

    Intensive Speech and Language Therapy (SLT) training courses for Early Childhood Educators (ECEs) can have a positive effect on their use of interaction strategies that support children's communication skills. The impact of brief SLT training courses is not yet clearly understood. The aims of these two studies were to assess the impact of a brief…

  13. Representational Similarity Analysis Reveals Heterogeneous Networks Supporting Speech Motor Control

    DEFF Research Database (Denmark)

    Zheng, Zane; Cusack, Rhodri; Johnsrude, Ingrid

    The everyday act of speaking involves the complex processes of speech motor control. One important feature of such control is regulation of articulation when auditory concomitants of speech do not correspond to the intended motor gesture. While theoretical accounts of speech monitoring posit...... multiple functional components required for detection of errors in speech planning (e.g., Levelt, 1983), neuroimaging studies generally indicate either single brain regions sensitive to speech production errors, or small, discrete networks. Here we demonstrate that the complex system controlling speech...... is supported by a complex neural network that is involved in linguistic, motoric and sensory processing. With the aid of novel real-time acoustic analyses and representational similarity analyses of fMRI signals, our data show functionally differentiated networks underlying auditory feedback control of speech....

  14. EEG Signal Classification With Super-Dirichlet Mixture Model

    DEFF Research Database (Denmark)

    Ma, Zhanyu; Tan, Zheng-Hua; Prasad, Swati

    2012-01-01

    Classification of the Electroencephalogram (EEG) signal is a challengeable task in the brain-computer interface systems. The marginalized discrete wavelet transform (mDWT) coefficients extracted from the EEG signals have been frequently used in researches since they reveal features related...

  15. Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition

    Directory of Open Access Journals (Sweden)

    Suzuki Motoyuki

    2009-01-01

    Full Text Available Abstract We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves Web documents using the chosen keywords. A problem is that the selected keywords tend to contain misrecognized words. The proposed method introduces two new ideas for avoiding the effects of keywords derived from misrecognized words. The first idea is to compose multiple queries from selected keyword candidates so that the misrecognized words and correct words do not fall into one query. The second idea is that the number of Web documents downloaded for each query is determined according to the "query relevance." Combining these two ideas, we can alleviate bad effect of misrecognized keywords by decreasing the number of downloaded Web documents from queries that contain misrecognized keywords. Finally, we examine a method of determining the number of iterative adaptations based on the recognition likelihood. Experiments have shown that the proposed stopping criterion can determine almost the optimum number of iterations. In the final experiment, the word accuracy without adaptation (55.29% was improved to 60.38%, which was 1.13 point better than the result of the conventional unsupervised adaptation method (59.25%.

  16. Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition

    Directory of Open Access Journals (Sweden)

    Akinori Ito

    2009-01-01

    Full Text Available We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves Web documents using the chosen keywords. A problem is that the selected keywords tend to contain misrecognized words. The proposed method introduces two new ideas for avoiding the effects of keywords derived from misrecognized words. The first idea is to compose multiple queries from selected keyword candidates so that the misrecognized words and correct words do not fall into one query. The second idea is that the number of Web documents downloaded for each query is determined according to the “query relevance.” Combining these two ideas, we can alleviate bad effect of misrecognized keywords by decreasing the number of downloaded Web documents from queries that contain misrecognized keywords. Finally, we examine a method of determining the number of iterative adaptations based on the recognition likelihood. Experiments have shown that the proposed stopping criterion can determine almost the optimum number of iterations. In the final experiment, the word accuracy without adaptation (55.29% was improved to 60.38%, which was 1.13 point better than the result of the conventional unsupervised adaptation method (59.25%.

  17. The truthful signalling hypothesis: an explicit general equilibrium model.

    Science.gov (United States)

    Hausken, Kjell; Hirshleifer, Jack

    2004-06-21

    In mating competition, the truthful signalling hypothesis (TSH), sometimes known as the handicap principle, asserts that higher-quality males signal while lower-quality males do not (or else emit smaller signals). Also, the signals are "believed", that is, females mate preferentially with higher-signalling males. Our analysis employs specific functional forms to generate analytic solutions and numerical simulations that illuminate the conditions needed to validate the TSH. Analytic innovations include: (1) A Mating Success Function indicates how female mating choices respond to higher and lower signalling levels. (2) A congestion function rules out corner solutions in which females would mate exclusively with higher-quality males. (3) A Malthusian condition determines equilibrium population size as related to per-capita resource availability. Equilibria validating the TSH are achieved over a wide range of parameters, though not universally. For TSH equilibria it is not strictly necessary that the high-quality males have an advantage in terms of lower per-unit signalling costs, but a cost difference in favor of the low-quality males cannot be too great if a TSH equilibrium is to persist. And although the literature has paid less attention to these points, TSH equilibria may also fail if: the quality disparity among males is too great, or the proportion of high-quality males in the population is too large, or if the congestion effect is too weak. Signalling being unprofitable in aggregate, it can take off from a no-signalling equilibrium only if the trait used for signalling is not initially a handicap, but instead is functionally useful at low levels. Selection for this trait sets in motion a bandwagon, whereby the initially useful indicator is pushed by male-male competition into the domain where it does indeed become a handicap.

  18. Transmembrane signaling in Saccharomyces cerevisiae as a model for signaling in metazoans: state of the art after 25 years.

    Science.gov (United States)

    Engelberg, David; Perlman, Riki; Levitzki, Alexander

    2014-12-01

    In the very first article that appeared in Cellular Signalling, published in its inaugural issue in October 1989, we reviewed signal transduction pathways in Saccharomyces cerevisiae. Although this yeast was already a powerful model organism for the study of cellular processes, it was not yet a valuable instrument for the investigation of signaling cascades. In 1989, therefore, we discussed only two pathways, the Ras/cAMP and the mating (Fus3) signaling cascades. The pivotal findings concerning those pathways undoubtedly contributed to the realization that yeast is a relevant model for understanding signal transduction in higher eukaryotes. Consequently, the last 25 years have witnessed the discovery of many signal transduction pathways in S. cerevisiae, including the high osmotic glycerol (Hog1), Stl2/Mpk1 and Smk1 mitogen-activated protein (MAP) kinase pathways, the TOR, AMPK/Snf1, SPS, PLC1 and Pkr/Gcn2 cascades, and systems that sense and respond to various types of stress. For many cascades, orthologous pathways were identified in mammals following their discovery in yeast. Here we review advances in the understanding of signaling in S. cerevisiae over the last 25 years. When all pathways are analyzed together, some prominent themes emerge. First, wiring of signaling cascades may not be identical in all S. cerevisiae strains, but is probably specific to each genetic background. This situation complicates attempts to decipher and generalize these webs of reactions. Secondly, the Ras/cAMP and the TOR cascades are pivotal pathways that affect all processes of the life of the yeast cell, whereas the yeast MAP kinase pathways are not essential. Yeast cells deficient in all MAP kinases proliferate normally. Another theme is the existence of central molecular hubs, either as single proteins (e.g., Msn2/4, Flo11) or as multisubunit complexes (e.g., TORC1/2), which are controlled by numerous pathways and in turn determine the fate of the cell. It is also apparent that

  19. Sequential Markov chain Monte Carlo filter with simultaneous model selection for electrocardiogram signal modeling.

    Science.gov (United States)

    Edla, Shwetha; Kovvali, Narayan; Papandreou-Suppappola, Antonia

    2012-01-01

    Constructing statistical models of electrocardiogram (ECG) signals, whose parameters can be used for automated disease classification, is of great importance in precluding manual annotation and providing prompt diagnosis of cardiac diseases. ECG signals consist of several segments with different morphologies (namely the P wave, QRS complex and the T wave) in a single heart beat, which can vary across individuals and diseases. Also, existing statistical ECG models exhibit a reliance upon obtaining a priori information from the ECG data by using preprocessing algorithms to initialize the filter parameters, or to define the user-specified model parameters. In this paper, we propose an ECG modeling technique using the sequential Markov chain Monte Carlo (SMCMC) filter that can perform simultaneous model selection, by adaptively choosing from different representations depending upon the nature of the data. Our results demonstrate the ability of the algorithm to track various types of ECG morphologies, including intermittently occurring ECG beats. In addition, we use the estimated model parameters as the feature set to classify between ECG signals with normal sinus rhythm and four different types of arrhythmia.

  20. Speech and Communication Disorders

    Science.gov (United States)

    ... to being completely unable to speak or understand speech. Causes include Hearing disorders and deafness Voice problems, ... or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism ...

  1. CASRA+: A Colloquial Arabic Speech Recognition Application

    OpenAIRE

    Ramzi A. Haraty; Omar El Ariss

    2007-01-01

    The research proposed here was for an Arabic speech recognition application, concentrating on the Lebanese dialect. The system starts by sampling the speech, which was the process of transforming the sound from analog to digital and then extracts the features by using the Mel-Frequency Cepstral Coefficients (MFCC). The extracted features are then compared with the system's stored model; in this case the stored model chosen was a phoneme-based model. This reference model differs from the direc...

  2. Sparsity in Linear Predictive Coding of Speech

    DEFF Research Database (Denmark)

    Giacobello, Daniele

    of the effectiveness of their application in audio processing. The second part of the thesis deals with introducing sparsity directly in the linear prediction analysis-by-synthesis (LPAS) speech coding paradigm. We first propose a novel near-optimal method to look for a sparse approximate excitation using a compressed...... one with direct applications to coding but also consistent with the speech production model of voiced speech, where the excitation of the all-pole filter can be modeled as an impulse train, i.e., a sparse sequence. Introducing sparsity in the LP framework will also bring to de- velop the concept...... sensing formulation. Furthermore, we define a novel re-estimation procedure to adapt the predictor coefficients to the given sparse excitation, balancing the two representations in the context of speech coding. Finally, the advantages of the compact parametric representation of a segment of speech, given...

  3. Enhancement of Single-Channel Periodic Signals in the Time-Domain

    DEFF Research Database (Denmark)

    Jensen, Jesper Rindom; Benesty, Jacob; Christensen, Mads Græsbøll

    2012-01-01

    speech. That is, signal-dependent methods based on the signal statistics will introduce undesired distortion for some parts of speech compared to signal-independent methods based on the noise statistics. Since both the signal-independent and signal-dependent approaches to speech enhancement have...

  4. Advanced radar detection schemes under mismatched signal models

    CERN Document Server

    Bandiera, Francesco

    2009-01-01

    Adaptive detection of signals embedded in correlated Gaussian noise has been an active field of research in the last decades. This topic is important in many areas of signal processing such as, just to give some examples, radar, sonar, communications, and hyperspectral imaging. Most of the existing adaptive algorithms have been designed following the lead of the derivation of Kelly's detector which assumes perfect knowledge of the target steering vector. However, in realistic scenarios, mismatches are likely to occur due to both environmental and instrumental factors. When a mismatched signal

  5. Free Speech Yearbook 1978.

    Science.gov (United States)

    Phifer, Gregg, Ed.

    The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…

  6. Stochastic model for detection of signals in noise

    OpenAIRE

    Klein, Stanley A.; Levi, Dennis M.

    2009-01-01

    Fifty years ago Birdsall, Tanner, and colleagues made rapid progress in developing signal detection theory into a powerful psychophysical tool. One of their major insights was the utility of adding external noise to the signals of interest. These methods have been enhanced in recent years by the addition of multipass and classification-image methods for opening up the black box. There remain a number of as yet unresolved issues. In particular, Birdsall developed a theorem that large amounts o...

  7. Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel

    Science.gov (United States)

    Kleinschmidt, Dave F.; Jaeger, T. Florian

    2016-01-01

    Successful speech perception requires that listeners map the acoustic signal to linguistic categories. These mappings are not only probabilistic, but change depending on the situation. For example, one talker’s /p/ might be physically indistinguishable from another talker’s /b/ (cf. lack of invariance). We characterize the computational problem posed by such a subjectively non-stationary world and propose that the speech perception system overcomes this challenge by (1) recognizing previously encountered situations, (2) generalizing to other situations based on previous similar experience, and (3) adapting to novel situations. We formalize this proposal in the ideal adapter framework: (1) to (3) can be understood as inference under uncertainty about the appropriate generative model for the current talker, thereby facilitating robust speech perception despite the lack of invariance. We focus on two critical aspects of the ideal adapter. First, in situations that clearly deviate from previous experience, listeners need to adapt. We develop a distributional (belief-updating) learning model of incremental adaptation. The model provides a good fit against known and novel phonetic adaptation data, including perceptual recalibration and selective adaptation. Second, robust speech recognition requires listeners learn to represent the structured component of cross-situation variability in the speech signal. We discuss how these two aspects of the ideal adapter provide a unifying explanation for adaptation, talker-specificity, and generalization across talkers and groups of talkers (e.g., accents and dialects). The ideal adapter provides a guiding framework for future investigations into speech perception and adaptation, and more broadly language comprehension. PMID:25844873

  8. Study on non-linear bistable dynamics model based EEG signal discrimination analysis method.

    Science.gov (United States)

    Ying, Xiaoguo; Lin, Han; Hui, Guohua

    2015-01-01

    Electroencephalogram (EEG) is the recording of electrical activity along the scalp. EEG measures voltage fluctuations generating from ionic current flows within the neurons of the brain. EEG signal is looked as one of the most important factors that will be focused in the next 20 years. In this paper, EEG signal discrimination based on non-linear bistable dynamical model was proposed. EEG signals were processed by non-linear bistable dynamical model, and features of EEG signals were characterized by coherence index. Experimental results showed that the proposed method could properly extract the features of different EEG signals.

  9. The natural statistics of audiovisual speech.

    Directory of Open Access Journals (Sweden)

    Chandramouli Chandrasekaran

    2009-07-01

    Full Text Available Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2-7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver.

  10. Paraconsistent semantics of speech acts

    NARCIS (Netherlands)

    Dunin-Kȩplicz, Barbara; Strachocka, Alina; Szałas, Andrzej; Verbrugge, Rineke

    2015-01-01

    This paper discusses an implementation of four speech acts: assert, concede, request and challenge in a paraconsistent framework. A natural four-valued model of interaction yields multiple new cognitive situations. They are analyzed in the context of communicative relations, which partially replace

  11. An Ecosystem of Intelligent ICT Tools for Speech-Language Therapy Based on a Formal Knowledge Model.

    Science.gov (United States)

    Robles-Bykbaev, Vladimir; López-Nores, Martín; Pazos-Arias, José; Quisi-Peralta, Diego; García-Duque, Jorge

    2015-01-01

    The language and communication constitute the development mainstays of several intellectual and cognitive skills in humans. However, there are millions of people around the world who suffer from several disabilities and disorders related with language and communication, while most of the countries present a lack of corresponding services related with health care and rehabilitation. On these grounds, we are working to develop an ecosystem of intelligent ICT tools to support speech and language pathologists, doctors, students, patients and their relatives. This ecosystem has several layers and components, integrating Electronic Health Records management, standardized vocabularies, a knowledge database, an ontology of concepts from the speech-language domain, and an expert system. We discuss the advantages of such an approach through experiments carried out in several institutions assisting children with a wide spectrum of disabilities.

  12. Least Squares Estimate of the Initial Phases in STFT based Speech Enhancement

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie; Krawczyk-Becker, Martin; Gerkmann, Timo

    2015-01-01

    In this paper, we consider single-channel speech enhancement in the short time Fourier transform (STFT) domain. We suggest to improve an STFT phase estimate by estimating the initial phases. The method is based on the harmonic model and a model for the phase evolution over time. The initial phases...... are estimated by setting up a least squares problem between the noisy phase and the model for phase evolution. Simulations on synthetic and speech signals show a decreased error on the phase when an estimate of the initial phase is included compared to using the noisy phase as an initialisation. The error...... on the phase is decreased at input SNRs from -10 to 10 dB. Reconstructing the signal using the clean amplitude, the mean squared error is decreased and the PESQ score is increased....

  13. Speech Intelligibility Advantages using an Acoustic Beamformer Display

    Science.gov (United States)

    Begault, Durand R.; Sunder, Kaushik; Godfroy, Martine; Otto, Peter

    2015-01-01

    A speech intelligibility test conforming to the Modified Rhyme Test of ANSI S3.2 "Method for Measuring the Intelligibility of Speech Over Communication Systems" was conducted using a prototype 12-channel acoustic beamformer system. The target speech material (signal) was identified against speech babble (noise), with calculated signal-noise ratios of 0, 5 and 10 dB. The signal was delivered at a fixed beam orientation of 135 deg (re 90 deg as the frontal direction of the array) and the noise at 135 deg (co-located) and 0 deg (separated). A significant improvement in intelligibility from 57% to 73% was found for spatial separation for the same signal-noise ratio (0 dB). Significant effects for improved intelligibility due to spatial separation were also found for higher signal-noise ratios (5 and 10 dB).

  14. Evaluation of the autoregression time-series model for analysis of a noisy signal

    International Nuclear Information System (INIS)

    Allen, J.W.

    1977-01-01

    The autoregression (AR) time-series model of a continuous noisy signal was statistically evaluated to determine quantitatively the uncertainties of the model order, the model parameters, and the model's power spectral density (PSD). The result of such a statistical evaluation enables an experimenter to decide whether an AR model can adequately represent a continuous noisy signal and be consistent with the signal's frequency spectrum, and whether it can be used for on-line monitoring. Although evaluations of other types of signals have been reported in the literature, no direct reference has been found to AR model's uncertainties for continuous noisy signals; yet the evaluation is necessary to decide the usefulness of AR models of typical reactor signals (e.g., neutron detector output or thermocouple output) and the potential of AR models for on-line monitoring applications. AR and other time-series models for noisy data representation are being investigated by others since such models require fewer parameters than the traditional PSD model. For this study, the AR model was selected for its simplicity and conduciveness to uncertainty analysis, and controlled laboratory bench signals were used for continuous noisy data. (author)

  15. Speech in spinocerebellar ataxia.

    Science.gov (United States)

    Schalling, Ellika; Hartelius, Lena

    2013-12-01

    Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.

  16. Distributed Speech Enhancement in Wireless Acoustic Sensor Networks

    NARCIS (Netherlands)

    Zeng, Y.

    2015-01-01

    In digital speech communication applications like hands-free mobile telephony, hearing aids and human-to-computer communication systems, the recorded speech signals are typically corrupted by background noise. As a result, their quality and intelligibility can get severely degraded. Traditional

  17. Technologies for the Study of Speech: Review and an Application

    Science.gov (United States)

    Babatsouli, Elena

    2015-01-01

    Technologies used for the study of speech are classified here into non-intrusive and intrusive. The paper informs on current non-intrusive technologies that are used for linguistic investigations of the speech signal, both phonological and phonetic. Providing a point of reference, the review covers existing technological advances in language…

  18. Modeling the Pulse Signal by Wave-Shape Function and Analyzing by Synchrosqueezing Transform.

    Science.gov (United States)

    Wu, Hau-Tieng; Wu, Han-Kuei; Wang, Chun-Li; Yang, Yueh-Lung; Wu, Wen-Hsiang; Tsai, Tung-Hu; Chang, Hen-Hong

    2016-01-01

    We apply the recently developed adaptive non-harmonic model based on the wave-shape function, as well as the time-frequency analysis tool called synchrosqueezing transform (SST) to model and analyze oscillatory physiological signals. To demonstrate how the model and algorithm work, we apply them to study the pulse wave signal. By extracting features called the spectral pulse signature, and based on functional regression, we characterize the hemodynamics from the radial pulse wave signals recorded by the sphygmomanometer. Analysis results suggest the potential of the proposed signal processing approach to extract health-related hemodynamics features.

  19. Modeling the Pulse Signal by Wave-Shape Function and Analyzing by Synchrosqueezing Transform.

    Directory of Open Access Journals (Sweden)

    Hau-Tieng Wu

    Full Text Available We apply the recently developed adaptive non-harmonic model based on the wave-shape function, as well as the time-frequency analysis tool called synchrosqueezing transform (SST to model and analyze oscillatory physiological signals. To demonstrate how the model and algorithm work, we apply them to study the pulse wave signal. By extracting features called the spectral pulse signature, and based on functional regression, we characterize the hemodynamics from the radial pulse wave signals recorded by the sphygmomanometer. Analysis results suggest the potential of the proposed signal processing approach to extract health-related hemodynamics features.

  20. Modeling evolution of crosstalk in noisy signal transduction networks

    Science.gov (United States)

    Tareen, Ammar; Wingreen, Ned S.; Mukhopadhyay, Ranjan

    2018-02-01

    Signal transduction networks can form highly interconnected systems within cells due to crosstalk between constituent pathways. To better understand the evolutionary design principles underlying such networks, we study the evolution of crosstalk for two parallel signaling pathways that arise via gene duplication. We use a sequence-based evolutionary algorithm and evolve the network based on two physically motivated fitness functions related to information transmission. We find that one fitness function leads to a high degree of crosstalk while the other leads to pathway specificity. Our results offer insights on the relationship between network architecture and information transmission for noisy biomolecular networks.

  1. Speech processing: from peripheral to hemispheric asymmetry of the auditory system.

    Science.gov (United States)

    Lazard, Diane S; Collette, Jean-Louis; Perrot, Xavier

    2012-01-01

    Language processing from the cochlea to auditory association cortices shows side-dependent specificities with an apparent left hemispheric dominance. The aim of this article was to propose to nonspeech specialists a didactic review of two complementary theories about hemispheric asymmetry in speech processing. Starting from anatomico-physiological and clinical observations of auditory asymmetry and interhemispheric connections, this review then exposes behavioral (dichotic listening paradigm) as well as functional (functional magnetic resonance imaging and positron emission tomography) experiments that assessed hemispheric specialization for speech processing. Even though speech at an early phonological level is regarded as being processed bilaterally, a left-hemispheric dominance exists for higher-level processing. This asymmetry may arise from a segregation of the speech signal, broken apart within nonprimary auditory areas in two distinct temporal integration windows--a fast one on the left and a slower one on the right--modeled through the asymmetric sampling in time theory or a spectro-temporal trade-off, with a higher temporal resolution in the left hemisphere and a higher spectral resolution in the right hemisphere, modeled through the spectral/temporal resolution trade-off theory. Both theories deal with the concept that lower-order tuning principles for acoustic signal might drive higher-order organization for speech processing. However, the precise nature, mechanisms, and origin of speech processing asymmetry are still being debated. Finally, an example of hemispheric asymmetry alteration, which has direct clinical implications, is given through the case of auditory aging that mixes peripheral disorder and modifications of central processing. Copyright © 2011 The American Laryngological, Rhinological, and Otological Society, Inc.

  2. Causal inference and temporal predictions in audiovisual perception of speech and music.

    Science.gov (United States)

    Noppeney, Uta; Lee, Hwee Ling

    2018-03-31

    To form a coherent percept of the environment, the brain must integrate sensory signals emanating from a common source but segregate those from different sources. Temporal regularities are prominent cues for multisensory integration, particularly for speech and music perception. In line with models of predictive coding, we suggest that the brain adapts an internal model to the statistical regularities in its environment. This internal model enables cross-sensory and sensorimotor temporal predictions as a mechanism to arbitrate between integration and segregation of signals from different senses. © 2018 New York Academy of Sciences.

  3. Modeling the effects of Multi-path propagation and scintillation on GPS signals

    Science.gov (United States)

    Habash Krause, L.; Wilson, S. J.

    2014-12-01

    GPS signals traveling through the earth's ionosphere are affected by charged particles that often disrupt the signal and the information it carries due to "scintillation", which resembles an extra noise source on the signal. These signals are also affected by weather changes, tropospheric scattering, and absorption from objects due to multi-path propagation of the signal. These obstacles cause distortion within information and fading of the signal, which ultimately results in phase locking errors and noise in messages. In this work, we attempted to replicate the distortion that occurs in GPS signals using a signal processing simulation model. We wanted to be able to create and identify scintillated signals so we could better understand the environment that caused it to become scintillated. Then, under controlled conditions, we simulated the receiver's ability to suppress scintillation in a signal. We developed a code in MATLAB that was programmed to: 1. Create a carrier wave and then plant noise (four different frequencies) on the carrier wave, 2. Compute a Fourier transform on the four different frequencies to find the frequency content of a signal, 3. Use a filter and apply it to the Fourier transform of the four frequencies and then compute a Signal-to-noise ratio to evaluate the power (in Decibels) of the filtered signal, and 4.Plot each of these components into graphs. To test the code's validity, we used user input and data from an AM transmitter. We determined that the amplitude modulated signal or AM signal would be the best type of signal to test the accuracy of the MATLAB code due to its simplicity. This code is basic to give students the ability to change and use it to determine the environment and effects of noise on different AM signals and their carrier waves. Overall, we were able to manipulate a scenario of a noisy signal and interpret its behavior and change due to its noisy components: amplitude, frequency, and phase shift.

  4. Personality in speech assessment and automatic classification

    CERN Document Server

    Polzehl, Tim

    2015-01-01

    This work combines interdisciplinary knowledge and experience from research fields of psychology, linguistics, audio-processing, machine learning, and computer science. The work systematically explores a novel research topic devoted to automated modeling of personality expression from speech. For this aim, it introduces a novel personality assessment questionnaire and presents the results of extensive labeling sessions to annotate the speech data with personality assessments. It provides estimates of the Big 5 personality traits, i.e. openness, conscientiousness, extroversion, agreeableness, and neuroticism. Based on a database built on the questionnaire, the book presents models to tell apart different personality types or classes from speech automatically.

  5. A knowledge representation meta-model for rule-based modelling of signalling networks

    Directory of Open Access Journals (Sweden)

    Adrien Basso-Blandin

    2016-03-01

    Full Text Available The study of cellular signalling pathways and their deregulation in disease states, such as cancer, is a large and extremely complex task. Indeed, these systems involve many parts and processes but are studied piecewise and their literatures and data are consequently fragmented, distributed and sometimes—at least apparently—inconsistent. This makes it extremely difficult to build significant explanatory models with the result that effects in these systems that are brought about by many interacting factors are poorly understood. The rule-based approach to modelling has shown some promise for the representation of the highly combinatorial systems typically found in signalling where many of the proteins are composed of multiple binding domains, capable of simultaneous interactions, and/or peptide motifs controlled by post-translational modifications. However, the rule-based approach requires highly detailed information about the precise conditions for each and every interaction which is rarely available from any one single source. Rather, these conditions must be painstakingly inferred and curated, by hand, from information contained in many papers—each of which contains only part of the story. In this paper, we introduce a graph-based meta-model, attuned to the representation of cellular signalling networks, which aims to ease this massive cognitive burden on the rule-based curation process. This meta-model is a generalization of that used by Kappa and BNGL which allows for the flexible representation of knowledge at various levels of granularity. In particular, it allows us to deal with information which has either too little, or too much, detail with respect to the strict rule-based meta-model. Our approach provides a basis for the gradual aggregation of fragmented biological knowledge extracted from the literature into an instance of the meta-model from which we can define an automated translation into executable Kappa programs.

  6. Modelling and simulation of signal transductions in an apoptosis ...

    Indian Academy of Sciences (India)

    Prakash

    Structural Analysis of Metabolic Networks: Elementary Flux. Mode, Analogy to Petri Nets, and Application to Mycoplasma pneumoniae; German Conference on Bioinformatics 2000 pp 115–120. Takai-Igarashi T and Mizoguchi R 2004 Cell signalling networks ontology; In Silico Biol. 4 81–87. Thompson C 1995 Apoptosis in ...

  7. Modeling the diffusion magnetic resonance imaging signal inside neurons

    International Nuclear Information System (INIS)

    Nguyen, D V; Li, J R; Grebenkov, D S; Le Bihan, D

    2014-01-01

    The Bloch-Torrey partial differential equation (PDE) describes the complex transverse water proton magnetization due to diffusion-encoding magnetic field gradient pulses. The integral of the solution of this PDE yields the diffusion magnetic resonance imaging (dMRI) signal. In a complex medium such as cerebral tissue, it is difficult to explicitly link the dMRI signal to biological parameters such as the cellular geometry or the cellular volume fraction. Studying the dMRI signal arising from a single neuron can provide insight into how the geometrical structure of neurons influences the measured signal. We formulate the Bloch-Torrey PDE inside a single neuron, under no water exchange condition with the extracellular space, and show how to reduce the 3D simulation in the full neuron to a 3D simulation around the soma and 1D simulations in the neurites. We show that this latter approach is computationally much faster than full 3D simulation and still gives accurate results over a wide range of diffusion times

  8. Model-based design of self-Adapting networked signal processing systems

    NARCIS (Netherlands)

    Oliveira Filho, J.A. de; Papp, Z.; Djapic, R.; Oostveen, J.C.

    2013-01-01

    The paper describes a model based approach for architecture design of runtime reconfigurable, large-scale, networked signal processing applications. A graph based modeling formalism is introduced to describe all relevant aspects of the design (functional, concurrency, hardware, communication,

  9. Speech Motor Programming in Apraxia of Speech: Evidence from a Delayed Picture-Word Interference Task

    Science.gov (United States)

    Mailend, Marja-Liisa; Maas, Edwin

    2013-01-01

    Purpose: Apraxia of speech (AOS) is considered a speech motor programming impairment, but the specific nature of the impairment remains a matter of debate. This study investigated 2 hypotheses about the underlying impairment in AOS framed within the Directions Into Velocities of Articulators (DIVA; Guenther, Ghosh, & Tourville, 2006) model: The…

  10. Part-of-speech effects on text-to-speech synthesis

    CSIR Research Space (South Africa)

    Schlunz, GI

    2010-11-01

    Full Text Available One of the goals of text-to-speech (TTS) systems is to produce natural-sounding synthesised speech. Towards this end various natural language processing (NLP) tasks are performed to model the prosodic aspects of the TTS voice. One of the fundamental...

  11. A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM

    Directory of Open Access Journals (Sweden)

    Chenchen Huang

    2014-01-01

    Full Text Available Feature extraction is a very important part in speech emotion recognition, and in allusion to feature extraction in speech emotion recognition problems, this paper proposed a new method of feature extraction, using DBNs in DNN to extract emotional features in speech signal automatically. By training a 5 layers depth DBNs, to extract speech emotion feature and incorporate multiple consecutive frames to form a high dimensional feature. The features after training in DBNs were the input of nonlinear SVM classifier, and finally speech emotion recognition multiple classifier system was achieved. The speech emotion recognition rate of the system reached 86.5%, which was 7% higher than the original method.

  12. Childhood apraxia of speech: A survey of praxis and typical speech characteristics.

    Science.gov (United States)

    Malmenholt, Ann; Lohmander, Anette; McAllister, Anita

    2017-07-01

    The purpose of this study was to investigate current knowledge of the diagnosis childhood apraxia of speech (CAS) in Sweden and compare speech characteristics and symptoms to those of earlier survey findings in mainly English-speakers. In a web-based questionnaire 178 Swedish speech-language pathologists (SLPs) anonymously answered questions about their perception of typical speech characteristics for CAS. They graded own assessment skills and estimated clinical occurrence. The seven top speech characteristics reported as typical for children with CAS were: inconsistent speech production (85%), sequencing difficulties (71%), oro-motor deficits (63%), vowel errors (62%), voicing errors (61%), consonant cluster deletions (54%), and prosodic disturbance (53%). Motor-programming deficits described as lack of automatization of speech movements were perceived by 82%. All listed characteristics were consistent with the American Speech-Language-Hearing Association (ASHA) consensus-based features, Strand's 10-point checklist, and the diagnostic model proposed by Ozanne. The mode for clinical occurrence was 5%. Number of suspected cases of CAS in the clinical caseload was approximately one new patient/year and SLP. The results support and add to findings from studies of CAS in English-speaking children with similar speech characteristics regarded as typical. Possibly, these findings could contribute to cross-linguistic consensus on CAS characteristics.

  13. Acquired apraxia of speech: features, accounts, and treatment.

    Science.gov (United States)

    Peach, Richard K

    2004-01-01

    The features of apraxia of speech (AOS) are presented with regard to both traditional and contemporary descriptions of the disorder. Models of speech processing, including the neurological bases for apraxia of speech, are discussed. Recent findings concerning subcortical contributions to apraxia of speech and the role of the insula are presented. The key features to differentially diagnose AOS from related speech syndromes are identified. Treatment implications derived from motor accounts of AOS are presented along with a summary of current approaches designed to treat the various subcomponents of the disorder. Finally, guidelines are provided for treating the AOS patient with coexisting aphasia.

  14. Exploring the zebra finch Taeniopygia guttata as a novel animal model for the speech-language deficit of fragile X syndrome.

    Science.gov (United States)

    Winograd, Claudia; Ceman, Stephanie

    2012-01-01

    Fragile X syndrome (FXS) is the most common cause of inherited intellectual disability and presents with markedly atypical speech-language, likely due to impaired vocal learning. Although current models have been useful for studies of some aspects of FXS, zebra finch is the only tractable lab model for vocal learning. The neural circuits for vocal learning in the zebra finch have clear relationships to the pathways in the human brain that may be affected in FXS. Further, finch vocal learning may be quantified using software designed specifically for this purpose. Knockdown of the zebra finch FMR1 gene may ultimately enable novel tests of therapies that are modality-specific, using drugs or even social strategies, to ameliorate deficits in vocal development and function. In this chapter, we describe the utility of the zebra finch model and present a hypothesis for the role of FMRP in the developing neural circuitry for vocalization.

  15. Microscopic Control Delay Modeling at Signalized Arterials Using Bluetooth Technology

    OpenAIRE

    Rajasekhar, Lakshmi

    2011-01-01

    Real-time control delay estimation is an important performance measure for any intersection to improve the signal timing plans dynamically in real-time and hence improve the overall system performance. Control delay estimates helps to determine the level-of-service (LOS) characteristics of various approaches at an intersection and takes into account deceleration delay, stopped delay and acceleration delay. All kinds of traffic delay calculation especially control delay calculation has always ...

  16. A speed guidance strategy for multiple signalized intersections based on car-following model

    Science.gov (United States)

    Tang, Tie-Qiao; Yi, Zhi-Yan; Zhang, Jian; Wang, Tao; Leng, Jun-Qiang

    2018-04-01

    Signalized intersection has great roles in urban traffic system. The signal infrastructure and the driving behavior near the intersection are paramount factors that have significant impacts on traffic flow and energy consumption. In this paper, a speed guidance strategy is introduced into a car-following model to study the driving behavior and the fuel consumption in a single-lane road with multiple signalized intersections. The numerical results indicate that the proposed model can reduce the fuel consumption and the average stop times. The findings provide insightful guidance for the eco-driving strategies near the signalized intersections.

  17. Sound quality measures for speech in noise through a commercial hearing aid implementing digital noise reduction.

    Science.gov (United States)

    Ricketts, Todd A; Hornsby, Benjamin W Y

    2005-05-01

    This brief report discusses the affect of digital noise reduction (DNR) processing on aided speech recognition and sound quality measures in 14 adults fitted with a commercial hearing aid. Measures of speech recognition and sound quality were obtained in two different speech-in-noise conditions (71 dBA speech, +6 dB SNR and 75 dBA speech, +1 dB SNR). The results revealed that the presence or absence of DNR processing did not impact speech recognition in noise (either positively or negatively). Paired comparisons of sound quality for the same speech in noise signals, however, revealed a strong preference for DNR processing. These data suggest that at least one implementation of DNR processing is capable of providing improved sound quality, for speech in noise, in the absence of improved speech recognition.

  18. Noise-robust speech recognition through auditory feature detection and spike sequence decoding.

    Science.gov (United States)

    Schafer, Phillip B; Jin, Dezhe Z

    2014-03-01

    Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.

  19. Speech perception under adverse conditions: Insights from behavioral, computational and neuroscience research

    Directory of Open Access Journals (Sweden)

    Sara eGuediche

    2014-01-01

    Full Text Available Adult speech perception reflects the long-term regularities of the native language, but it is also flexible such that it accommodates and adapts to adverse listening conditions and short-term deviations from native-language norms. The purpose of this review article is to examine how the broader neuroscience literature can inform and advance research efforts in understanding the neural basis of flexibility and adaptive plasticity in speech perception. In particular, we consider several domains of neuroscience research that offer insight into how perception can be adaptively tuned to short-term deviations while also maintaining without affecting the long-term learned regularities for mapping sensory input. We review several literatures to highlight the potential role of learning algorithms that rely on prediction error signals and discuss specific neural structures that are likely to contribute to such learning. Already, a few studies have alluded to a potential role of these mechanisms in adaptive plasticity in speech perception. Better understanding the application and limitations of these algorithms for the challenges of flexible speech perception under adverse conditions promises to inform theoretical models of speech.

  20. A glimpsing account of the role of temporal fine structure information in speech recognition.

    Science.gov (United States)

    Apoux, Frédéric; Healy, Eric W

    2013-01-01

    Many behavioral studies have reported a significant decrease in intelligibility when the temporal fine structure (TFS) of a sound mixture is replaced with noise or tones (i.e., vocoder processing). This finding has led to the conclusion that TFS information is critical for speech recognition in noise. How the normal -auditory system takes advantage of the original TFS, however, remains unclear. Three -experiments on the role of TFS in noise are described. All three experiments measured speech recognition in various backgrounds while manipulating the envelope, TFS, or both. One experiment tested the hypothesis that vocoder processing may artificially increase the apparent importance of TFS cues. Another experiment evaluated the relative contribution of the target and masker TFS by disturbing only the TFS of the target or that of the masker. Finally, a last experiment evaluated the -relative contribution of envelope and TFS information. In contrast to previous -studies, however, the original envelope and TFS were both preserved - to some extent - in all conditions. Overall, the experiments indicate a limited influence of TFS and suggest that little speech information is extracted from the TFS. Concomitantly, these experiments confirm that most speech information is carried by the temporal envelope in real-world conditions. When interpreted within the framework of the glimpsing model, the results of these experiments suggest that TFS is primarily used as a grouping cue to select the time-frequency regions -corresponding to the target speech signal.