linear prediction speech: Topics by WorldWideScience.org

Sample records for linear prediction speech

Improved Methods for Pitch Synchronous Linear Prediction Analysis of Speech

OpenAIRE

劉, 麗清

2015-01-01

Linear prediction (LP) analysis has been applied to speech system over the last few decades. LP technique is well-suited for speech analysis due to its ability to model speech production process approximately. Hence LP analysis has been widely used for speech enhancement, low-bit-rate speech coding in cellular telephony, speech recognition, characteristic parameter extraction (vocal tract resonances frequencies, fundamental frequency called pitch) and so on. However, the performance of the co...
Sparsity in Linear Predictive Coding of Speech

DEFF Research Database (Denmark)

Giacobello, Daniele

of the effectiveness of their application in audio processing. The second part of the thesis deals with introducing sparsity directly in the linear prediction analysis-by-synthesis (LPAS) speech coding paradigm. We first propose a novel near-optimal method to look for a sparse approximate excitation using a compressed...... one with direct applications to coding but also consistent with the speech production model of voiced speech, where the excitation of the all-pole filter can be modeled as an impulse train, i.e., a sparse sequence. Introducing sparsity in the LP framework will also bring to de- velop the concept...... sensing formulation. Furthermore, we define a novel re-estimation procedure to adapt the predictor coefficients to the given sparse excitation, balancing the two representations in the context of speech coding. Finally, the advantages of the compact parametric representation of a segment of speech, given...
Fast Algorithms for High-Order Sparse Linear Prediction with Applications to Speech Processing

DEFF Research Database (Denmark)

Jensen, Tobias Lindstrøm; Giacobello, Daniele; van Waterschoot, Toon

2016-01-01

In speech processing applications, imposing sparsity constraints on high-order linear prediction coefficients and prediction residuals has proven successful in overcoming some of the limitation of conventional linear predictive modeling. However, this modeling scheme, named sparse linear prediction...... problem with lower accuracy than in previous work. In the experimental analysis, we clearly show that a solution with lower accuracy can achieve approximately the same performance as a high accuracy solution both objectively, in terms of prediction gain, as well as with perceptual relevant measures, when...... evaluated in a speech reconstruction application....
The application of sparse linear prediction dictionary to compressive sensing in speech signals

Directory of Open Access Journals (Sweden)

YOU Hanxu

2016-04-01

Full Text Available Appling compressive sensing (CS,which theoretically guarantees that signal sampling and signal compression can be achieved simultaneously,into audio and speech signal processing is one of the most popular research topics in recent years.In this paper,K-SVD algorithm was employed to learn a sparse linear prediction dictionary regarding as the sparse basis of underlying speech signals.Compressed signals was obtained by applying random Gaussian matrix to sample original speech frames.Orthogonal matching pursuit (OMP and compressive sampling matching pursuit (CoSaMP were adopted to recovery original signals from compressed one.Numbers of experiments were carried out to investigate the impact of speech frames length,compression ratios,sparse basis and reconstruction algorithms on CS performance.Results show that sparse linear prediction dictionary can advance the performance of speech signals reconstruction compared with discrete cosine transform (DCT matrix.
Similarities and Differences Between Warped Linear Prediction and Laguerre Linear Prediction

NARCIS (Netherlands)

Brinker, Albertus C. den; Krishnamoorthi, Harish; Verbitskiy, Evgeny A.

2011-01-01

Linear prediction has been successfully applied in many speech and audio processing systems. This paper presents the similarities and differences between two classes of linear prediction schemes, namely, Warped Linear Prediction (WLP) and Laguerre Linear Prediction (LLP). It is shown that both
Quasi-closed phase forward-backward linear prediction analysis of speech for accurate formant detection and estimation.

Science.gov (United States)

Gowda, Dhananjaya; Airaksinen, Manu; Alku, Paavo

2017-09-01

Recently, a quasi-closed phase (QCP) analysis of speech signals for accurate glottal inverse filtering was proposed. However, the QCP analysis which belongs to the family of temporally weighted linear prediction (WLP) methods uses the conventional forward type of sample prediction. This may not be the best choice especially in computing WLP models with a hard-limiting weighting function. A sample selective minimization of the prediction error in WLP reduces the effective number of samples available within a given window frame. To counter this problem, a modified quasi-closed phase forward-backward (QCP-FB) analysis is proposed, wherein each sample is predicted based on its past as well as future samples thereby utilizing the available number of samples more effectively. Formant detection and estimation experiments on synthetic vowels generated using a physical modeling approach as well as natural speech utterances show that the proposed QCP-FB method yields statistically significant improvements over the conventional linear prediction and QCP methods.
Non-linear Dynamics of Speech in Schizophrenia

DEFF Research Database (Denmark)

Fusaroli, Riccardo; Simonsen, Arndis; Weed, Ethan

(regularity and complexity) of speech. Our aims are (1) to achieve a more fine-grained understanding of the speech patterns in schizophrenia than has previously been achieved using traditional, linear measures of prosody and fluency, and (2) to employ the results in a supervised machine-learning process......-effects inference. SANS and SAPS scores were predicted using a 10-fold cross-validated multiple linear regression. Both analyses were iterated 1000 to test for stability of results. Results: Voice dynamics allowed discrimination of patients with schizophrenia from healthy controls with a balanced accuracy of 85...
Stable 1-Norm Error Minimization Based Linear Predictors for Speech Modeling

DEFF Research Database (Denmark)

Giacobello, Daniele; Christensen, Mads Græsbøll; Jensen, Tobias Lindstrøm

2014-01-01

In linear prediction of speech, the 1-norm error minimization criterion has been shown to provide a valid alternative to the 2-norm minimization criterion. However, unlike 2-norm minimization, 1-norm minimization does not guarantee the stability of the corresponding all-pole filter and can generate...... saturations when this is used to synthesize speech. In this paper, we introduce two new methods to obtain intrinsically stable predictors with the 1-norm minimization. The first method is based on constraining the roots of the predictor to lie within the unit circle by reducing the numerical range...... based linear prediction for modeling and coding of speech....
Restoring the missing features of the corrupted speech using linear interpolation methods

Science.gov (United States)

Rassem, Taha H.; Makbol, Nasrin M.; Hasan, Ali Muttaleb; Zaki, Siti Syazni Mohd; Girija, P. N.

2017-10-01

One of the main challenges in the Automatic Speech Recognition (ASR) is the noise. The performance of the ASR system reduces significantly if the speech is corrupted by noise. In spectrogram representation of a speech signal, after deleting low Signal to Noise Ratio (SNR) elements, the incomplete spectrogram is obtained. In this case, the speech recognizer should make modifications to the spectrogram in order to restore the missing elements, which is one direction. In another direction, speech recognizer should be able to restore the missing elements due to deleting low SNR elements before performing the recognition. This is can be done using different spectrogram reconstruction methods. In this paper, the geometrical spectrogram reconstruction methods suggested by some researchers are implemented as a toolbox. In these geometrical reconstruction methods, the linear interpolation along time or frequency methods are used to predict the missing elements between adjacent observed elements in the spectrogram. Moreover, a new linear interpolation method using time and frequency together is presented. The CMU Sphinx III software is used in the experiments to test the performance of the linear interpolation reconstruction method. The experiments are done under different conditions such as different lengths of the window and different lengths of utterances. Speech corpus consists of 20 males and 20 females; each one has two different utterances are used in the experiments. As a result, 80% recognition accuracy is achieved with 25% SNR ratio.
Speech coding code- excited linear prediction

CERN Document Server

Bäckström, Tom

2017-01-01

This book provides scientific understanding of the most central techniques used in speech coding both for advanced students as well as professionals with a background in speech audio and or digital signal processing. It provides a clear connection between the whys hows and whats thus enabling a clear view of the necessity purpose and solutions provided by various tools as well as their strengths and weaknesses in each respect Equivalently this book sheds light on the following perspectives for each technology presented Objective What do we want to achieve and especially why is this goal important Resource Information What information is available and how can it be useful and Resource Platform What kind of platforms are we working with and what are their capabilities restrictions This includes computational memory and acoustic properties and the transmission capacity of devices used. The book goes on to address Solutions Which solutions have been proposed and how can they be used to reach the stated goals and ...
Predicting the perceived sound quality of frequency-compressed speech.

Directory of Open Access Journals (Sweden)

Rainer Huber

Full Text Available The performance of objective speech and audio quality measures for the prediction of the perceived quality of frequency-compressed speech in hearing aids is investigated in this paper. A number of existing quality measures have been applied to speech signals processed by a hearing aid, which compresses speech spectra along frequency in order to make information contained in higher frequencies audible for listeners with severe high-frequency hearing loss. Quality measures were compared with subjective ratings obtained from normal hearing and hearing impaired children and adults in an earlier study. High correlations were achieved with quality measures computed by quality models that are based on the auditory model of Dau et al., namely, the measure PSM, computed by the quality model PEMO-Q; the measure qc, computed by the quality model proposed by Hansen and Kollmeier; and the linear subcomponent of the HASQI. For the prediction of quality ratings by hearing impaired listeners, extensions of some models incorporating hearing loss were implemented and shown to achieve improved prediction accuracy. Results indicate that these objective quality measures can potentially serve as tools for assisting in initial setting of frequency compression parameters.
Linear Prediction Using Refined Autocorrelation Function

Directory of Open Access Journals (Sweden)

M. Shahidur Rahman

2007-07-01

Full Text Available This paper proposes a new technique for improving the performance of linear prediction analysis by utilizing a refined version of the autocorrelation function. Problems in analyzing voiced speech using linear prediction occur often due to the harmonic structure of the excitation source, which causes the autocorrelation function to be an aliased version of that of the vocal tract impulse response. To estimate the vocal tract characteristics accurately, however, the effect of aliasing must be eliminated. In this paper, we employ homomorphic deconvolution technique in the autocorrelation domain to eliminate the aliasing effect occurred due to periodicity. The resulted autocorrelation function of the vocal tract impulse response is found to produce significant improvement in estimating formant frequencies. The accuracy of formant estimation is verified on synthetic vowels for a wide range of pitch frequencies typical for male and female speakers. The validity of the proposed method is also illustrated by inspecting the spectral envelopes of natural speech spoken by high-pitched female speaker. The synthesis filter obtained by the current method is guaranteed to be stable, which makes the method superior to many of its alternatives.
Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

DEFF Research Database (Denmark)

Jørgensen, Søren; Dau, Torsten

2013-01-01

The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...
Speech Compression

Directory of Open Access Journals (Sweden)

Jerry D. Gibson

2016-06-01

Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.
A brief overview of speech enhancement with linear filtering

DEFF Research Database (Denmark)

Benesty, Jacob; Christensen, Mads Græsbøll; Jensen, Jesper Rindom

2014-01-01

In this paper, we provide an overview of some recently introduced principles and ideas for speech enhancement with linear filtering and explore how these are related and how they can be used in various applications. This is done in a general framework where the speech enhancement problem is stated......-to-noise ratio (SNR), and Wiener filters are derived from the conventional speech enhancement approach and the recently introduced orthogonal decomposition approach. For each of the filters, we derive their properties in terms of output SNR and speech distortion. We then demonstrate how the ideas can be applied...
On Optimal Linear Filtering of Speech for Near-End Listening Enhancement

DEFF Research Database (Denmark)

Taal, Cees H.; Jensen, Jesper; Leijon, Arne

2013-01-01

In this letter the focus is on linear filtering of speech before degradation due to additive background noise. The goal is to design the filter such that the speech intelligibility index (SII) is maximized when the speech is played back in a known noisy environment. Moreover, a power constraint i...
Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

Science.gov (United States)

Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

2014-01-01

Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…
Using the Speech Transmission Index for predicting non-native speech intelligibility

NARCIS (Netherlands)

Wijngaarden, S.J. van; Bronkhorst, A.W.; Houtgast, T.; Steeneken, H.J.M.

2004-01-01

While the Speech Transmission Index ~STI! is widely applied for prediction of speech intelligibility in room acoustics and telecommunication engineering, it is unclear how to interpret STI values when non-native talkers or listeners are involved. Based on subjectively measured psychometric functions
Preschool speech intelligibility and vocabulary skills predict long-term speech and language outcomes following cochlear implantation in early childhood.

Science.gov (United States)

Castellanos, Irina; Kronenberger, William G; Beer, Jessica; Henning, Shirley C; Colson, Bethany G; Pisoni, David B

2014-07-01

Speech and language measures during grade school predict adolescent speech-language outcomes in children who receive cochlear implants (CIs), but no research has examined whether speech and language functioning at even younger ages is predictive of long-term outcomes in this population. The purpose of this study was to examine whether early preschool measures of speech and language performance predict speech-language functioning in long-term users of CIs. Early measures of speech intelligibility and receptive vocabulary (obtained during preschool ages of 3-6 years) in a sample of 35 prelingually deaf, early-implanted children predicted speech perception, language, and verbal working memory skills up to 18 years later. Age of onset of deafness and age at implantation added additional variance to preschool speech intelligibility in predicting some long-term outcome scores, but the relationship between preschool speech-language skills and later speech-language outcomes was not significantly attenuated by the addition of these hearing history variables. These findings suggest that speech and language development during the preschool years is predictive of long-term speech and language functioning in early-implanted, prelingually deaf children. As a result, measures of speech-language functioning at preschool ages can be used to identify and adjust interventions for very young CI users who may be at long-term risk for suboptimal speech and language outcomes.
Predicting clinical decline in progressive agrammatic aphasia and apraxia of speech.

Science.gov (United States)

Whitwell, Jennifer L; Weigand, Stephen D; Duffy, Joseph R; Clark, Heather M; Strand, Edythe A; Machulda, Mary M; Spychalla, Anthony J; Senjem, Matthew L; Jack, Clifford R; Josephs, Keith A

2017-11-28

To determine whether baseline clinical and MRI features predict rate of clinical decline in patients with progressive apraxia of speech (AOS). Thirty-four patients with progressive AOS, with AOS either in isolation or in the presence of agrammatic aphasia, were followed up longitudinally for up to 4 visits, with clinical testing and MRI at each visit. Linear mixed-effects regression models including all visits (n = 94) were used to assess baseline clinical and MRI variables that predict rate of worsening of aphasia, motor speech, parkinsonism, and behavior. Clinical predictors included baseline severity and AOS type. MRI predictors included baseline frontal, premotor, motor, and striatal gray matter volumes. More severe parkinsonism at baseline was associated with faster rate of decline in parkinsonism. Patients with predominant sound distortions (AOS type 1) showed faster rates of decline in aphasia and motor speech, while patients with segmented speech (AOS type 2) showed faster rates of decline in parkinsonism. On MRI, we observed trends for fastest rates of decline in aphasia in patients with relatively small left, but preserved right, Broca area and precentral cortex. Bilateral reductions in lateral premotor cortex were associated with faster rates of decline of behavior. No associations were observed between volumes and decline in motor speech or parkinsonism. Rate of decline of each of the 4 clinical features assessed was associated with different baseline clinical and regional MRI predictors. Our findings could help improve prognostic estimates for these patients. © 2017 American Academy of Neurology.

Microscopic prediction of speech intelligibility in spatially distributed speech-shaped noise for normal-hearing listeners.

Science.gov (United States)

Geravanchizadeh, Masoud; Fallah, Ali

2015-12-01

A binaural and psychoacoustically motivated intelligibility model, based on a well-known monaural microscopic model is proposed. This model simulates a phoneme recognition task in the presence of spatially distributed speech-shaped noise in anechoic scenarios. In the proposed model, binaural advantage effects are considered by generating a feature vector for a dynamic-time-warping speech recognizer. This vector consists of three subvectors incorporating two monaural subvectors to model the better-ear hearing, and a binaural subvector to simulate the binaural unmasking effect. The binaural unit of the model is based on equalization-cancellation theory. This model operates blindly, which means separate recordings of speech and noise are not required for the predictions. Speech intelligibility tests were conducted with 12 normal hearing listeners by collecting speech reception thresholds (SRTs) in the presence of single and multiple sources of speech-shaped noise. The comparison of the model predictions with the measured binaural SRTs, and with the predictions of a macroscopic binaural model called extended equalization-cancellation, shows that this approach predicts the intelligibility in anechoic scenarios with good precision. The square of the correlation coefficient (r(2)) and the mean-absolute error between the model predictions and the measurements are 0.98 and 0.62 dB, respectively.
Predicting Prosody from Text for Text-to-Speech Synthesis

CERN Document Server

Rao, K Sreenivasa

2012-01-01

Predicting Prosody from Text for Text-to-Speech Synthesis covers the specific aspects of prosody, mainly focusing on how to predict the prosodic information from linguistic text, and then how to exploit the predicted prosodic knowledge for various speech applications. Author K. Sreenivasa Rao discusses proposed methods along with state-of-the-art techniques for the acquisition and incorporation of prosodic knowledge for developing speech systems. Positional, contextual and phonological features are proposed for representing the linguistic and production constraints of the sound units present in the text. This book is intended for graduate students and researchers working in the area of speech processing.
Analysis of Feature Extraction Methods for Speaker Dependent Speech Recognition

Directory of Open Access Journals (Sweden)

Gurpreet Kaur

2017-02-01

Full Text Available Speech recognition is about what is being said, irrespective of who is saying. Speech recognition is a growing field. Major progress is taking place on the technology of automatic speech recognition (ASR. Still, there are lots of barriers in this field in terms of recognition rate, background noise, speaker variability, speaking rate, accent etc. Speech recognition rate mainly depends on the selection of features and feature extraction methods. This paper outlines the feature extraction techniques for speaker dependent speech recognition for isolated words. A brief survey of different feature extraction techniques like Mel-Frequency Cepstral Coefficients (MFCC, Linear Predictive Coding Coefficients (LPCC, Perceptual Linear Prediction (PLP, Relative Spectra Perceptual linear Predictive (RASTA-PLP analysis are presented and evaluation is done. Speech recognition has various applications from daily use to commercial use. We have made a speaker dependent system and this system can be useful in many areas like controlling a patient vehicle using simple commands.
Design and performance of an analysis-by-synthesis class of predictive speech coders

Science.gov (United States)

Rose, Richard C.; Barnwell, Thomas P., III

1990-01-01

The performance of a broad class of analysis-by-synthesis linear predictive speech coders is quantified experimentally. The class of coders includes a number of well-known techniques as well as a very large number of speech coders which have not been named or studied. A general formulation for deriving the parametric representation used in all of the coders in the class is presented. A new coder, named the self-excited vocoder, is discussed because of its good performance with low complexity, and because of the insight this coder gives to analysis-by-synthesis coders in general. The results of a study comparing the performances of different members of this class are presented. The study takes the form of a series of formal subjective and objective speech quality tests performed on selected coders. The results of this study lead to some interesting and important observations concerning the controlling parameters for analysis-by-synthesis speech coders.
Prediction and imitation in speech

Directory of Open Access Journals (Sweden)

Chiara eGambi

2013-06-01

Full Text Available It has been suggested that intra- and inter-speaker variability in speech are correlated. Interlocutors have been shown to converge on various phonetic dimensions. In addition, speakers imitate the phonetic properties of voices they are exposed to in shadowing, repetition, and even passive listening tasks. We review three theoretical accounts of speech imitation and convergence phenomena: (i the Episodic Theory (ET of speech perception and production (Goldinger, 1998; (ii the Motor Theory (MT of speech perception (Liberman and Whalen, 2000;Galantucci et al., 2006 ; (iii Communication Accommodation Theory (CAT; Giles et al., 1991;Giles and Coupland, 1991. We argue that no account is able to explain all the available evidence. In particular, there is a need to integrate low-level, mechanistic accounts (like ET and MT and higher-level accounts (like CAT. We propose that this is possible within the framework of an integrated theory of production and comprehension (Pickering & Garrod, in press. Similarly to both ET and MT, this theory assumes parity between production and perception. Uniquely, however, it posits that listeners simulate speakers’ utterances by computing forward-model predictions at many different levels, which are then compared to the incoming phonetic input. In our account phonetic imitation can be achieved via the same mechanism that is responsible for sensorimotor adaptation; i.e. the correction of prediction errors. In addition, the model assumes that the degree to which sensory prediction errors lead to motor adjustments is context-dependent. The notion of context subsumes both the preceding linguistic input and non-linguistic attributes of the situation (e.g., the speaker’s and listener’s social identities, their conversational roles, the listener’s intention to imitate.
Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

Science.gov (United States)

Kayasith, Prakasith; Theeramunkong, Thanaruk

It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.
Speech Intelligibility Prediction Based on Mutual Information

DEFF Research Database (Denmark)

Jensen, Jesper; Taal, Cees H.

2014-01-01

This paper deals with the problem of predicting the average intelligibility of noisy and potentially processed speech signals, as observed by a group of normal hearing listeners. We propose a model which performs this prediction based on the hypothesis that intelligibility is monotonically related...... to the mutual information between critical-band amplitude envelopes of the clean signal and the corresponding noisy/processed signal. The resulting intelligibility predictor turns out to be a simple function of the mean-square error (mse) that arises when estimating a clean critical-band amplitude using...... a minimum mean-square error (mmse) estimator based on the noisy/processed amplitude. The proposed model predicts that speech intelligibility cannot be improved by any processing of noisy critical-band amplitudes. Furthermore, the proposed intelligibility predictor performs well ( ρ > 0.95) in predicting...
Predicting speech intelligibility in adverse conditions: evaluation of the speech-based envelope power spectrum model

DEFF Research Database (Denmark)

Jørgensen, Søren; Dau, Torsten

2011-01-01

conditions by comparing predictions to measured data from [Kjems et al. (2009). J. Acoust. Soc. Am. 126 (3), 1415-1426] where speech is mixed with four different interferers, including speech-shaped noise, bottle noise, car noise, and cafe noise. The model accounts well for the differences in intelligibility......The speech-based envelope power spectrum model (sEPSM) [Jørgensen and Dau (2011). J. Acoust. Soc. Am., 130 (3), 1475–1487] estimates the envelope signal-to-noise ratio (SNRenv) of distorted speech and accurately describes the speech recognition thresholds (SRT) for normal-hearing listeners...... observed for the different interferers. None of the standardized models successfully describe these data....
Automated analysis of free speech predicts psychosis onset in high-risk youths

Science.gov (United States)

Bedi, Gillinder; Carrillo, Facundo; Cecchi, Guillermo A; Slezak, Diego Fernández; Sigman, Mariano; Mota, Natália B; Ribeiro, Sidarta; Javitt, Daniel C; Copelli, Mauro; Corcoran, Cheryl M

2015-01-01

Background/Objectives: Psychiatry lacks the objective clinical tests routinely used in other specializations. Novel computerized methods to characterize complex behaviors such as speech could be used to identify and predict psychiatric illness in individuals. AIMS: In this proof-of-principle study, our aim was to test automated speech analyses combined with Machine Learning to predict later psychosis onset in youths at clinical high-risk (CHR) for psychosis. Methods: Thirty-four CHR youths (11 females) had baseline interviews and were assessed quarterly for up to 2.5 years; five transitioned to psychosis. Using automated analysis, transcripts of interviews were evaluated for semantic and syntactic features predicting later psychosis onset. Speech features were fed into a convex hull classification algorithm with leave-one-subject-out cross-validation to assess their predictive value for psychosis outcome. The canonical correlation between the speech features and prodromal symptom ratings was computed. Results: Derived speech features included a Latent Semantic Analysis measure of semantic coherence and two syntactic markers of speech complexity: maximum phrase length and use of determiners (e.g., which). These speech features predicted later psychosis development with 100% accuracy, outperforming classification from clinical interviews. Speech features were significantly correlated with prodromal symptoms. Conclusions: Findings support the utility of automated speech analysis to measure subtle, clinically relevant mental state changes in emergent psychosis. Recent developments in computer science, including natural language processing, could provide the foundation for future development of objective clinical tests for psychiatry. PMID:27336038
Linear program differentiation for single-channel speech separation

DEFF Research Database (Denmark)

Pearlmutter, Barak A.; Olsson, Rasmus Kongsgaard

2006-01-01

Many apparently difficult problems can be solved by reduction to linear programming. Such problems are often subproblems within larger systems. When gradient optimisation of the entire larger system is desired, it is necessary to propagate gradients through the internally-invoked LP solver....... For instance, when an intermediate quantity z is the solution to a linear program involving constraint matrix A, a vector of sensitivities dE/dz will induce sensitivities dE/dA. Here we show how these can be efficiently calculated, when they exist. This allows algorithmic differentiation to be applied...... to algorithms that invoke linear programming solvers as subroutines, as is common when using sparse representations in signal processing. Here we apply it to gradient optimisation of over complete dictionaries for maximally sparse representations of a speech corpus. The dictionaries are employed in a single...
Prediction and constraint in audiovisual speech perception

Science.gov (United States)

Peelle, Jonathan E.; Sommers, Mitchell S.

2015-01-01

During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported
Prediction and constraint in audiovisual speech perception.

Science.gov (United States)

Peelle, Jonathan E; Sommers, Mitchell S

2015-07-01

During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration
INVERSE FILTERING TECHNIQUES IN SPEECH ANALYSIS

African Journals Online (AJOL)

Dr Obe

domain or in the frequency domain. However their .... computer to speech analysis led to important elaborations ... tool for the estimation of formant trajectory (10), ... prediction Linear prediction In effect determines the filter .... Radio Res. Lab.
Auditory Brainstem Response to Complex Sounds Predicts Self-Reported Speech-in-Noise Performance

Science.gov (United States)

Anderson, Samira; Parbery-Clark, Alexandra; White-Schwoch, Travis; Kraus, Nina

2013-01-01

Purpose: To compare the ability of the auditory brainstem response to complex sounds (cABR) to predict subjective ratings of speech understanding in noise on the Speech, Spatial, and Qualities of Hearing Scale (SSQ; Gatehouse & Noble, 2004) relative to the predictive ability of the Quick Speech-in-Noise test (QuickSIN; Killion, Niquette,…
Real time implementation of a linear predictive coding algorithm on digital signal processor DSP32C

International Nuclear Information System (INIS)

Sheikh, N.M.; Usman, S.R.; Fatima, S.

2002-01-01

Pulse Code Modulation (PCM) has been widely used in speech coding. However, due to its high bit rate. PCM has severe limitations in application where high spectral efficiency is desired, for example, in mobile communication, CD quality broadcasting system etc. These limitation have motivated research in bit rate reduction techniques. Linear predictive coding (LPC) is one of the most powerful complex techniques for bit rate reduction. With the introduction of powerful digital signal processors (DSP) it is possible to implement the complex LPC algorithm in real time. In this paper we present a real time implementation of the LPC algorithm on AT and T's DSP32C at a sampling frequency of 8192 HZ. Application of the LPC algorithm on two speech signals is discussed. Using this implementation , a bit rate reduction of 1:3 is achieved for better than tool quality speech, while a reduction of 1.16 is possible for speech quality required in military applications. (author)
Hidden Hearing Loss and Computational Models of the Auditory Pathway: Predicting Speech Intelligibility Decline

Science.gov (United States)

2016-11-28

Title: Hidden Hearing Loss and Computational Models of the Auditory Pathway: Predicting Speech Intelligibility Decline Christopher J. Smalt...representation of speech intelligibility in noise. The auditory-periphery model of Zilany et al. (JASA 2009,2014) is used to make predictions of...auditory nerve (AN) responses to speech stimuli under a variety of difficult listening conditions. The resulting cochlear neurogram, a spectrogram
Preschool Speech Error Patterns Predict Articulation and Phonological Awareness Outcomes in Children with Histories of Speech Sound Disorders

Science.gov (United States)

Preston, Jonathan L.; Hull, Margaret; Edwards, Mary Louise

2013-01-01

Purpose: To determine if speech error patterns in preschoolers with speech sound disorders (SSDs) predict articulation and phonological awareness (PA) outcomes almost 4 years later. Method: Twenty-five children with histories of preschool SSDs (and normal receptive language) were tested at an average age of 4;6 (years;months) and were followed up…
The Cortical Organization of Speech Processing: Feedback Control and Predictive Coding the Context of a Dual-Stream Model

Science.gov (United States)

Hickok, Gregory

2012-01-01

Speech recognition is an active process that involves some form of predictive coding. This statement is relatively uncontroversial. What is less clear is the source of the prediction. The dual-stream model of speech processing suggests that there are two possible sources of predictive coding in speech perception: the motor speech system and the…
Cortical activity patterns predict robust speech discrimination ability in noise

Science.gov (United States)

Shetake, Jai A.; Wolf, Jordan T.; Cheung, Ryan J.; Engineer, Crystal T.; Ram, Satyananda K.; Kilgard, Michael P.

2012-01-01

The neural mechanisms that support speech discrimination in noisy conditions are poorly understood. In quiet conditions, spike timing information appears to be used in the discrimination of speech sounds. In this study, we evaluated the hypothesis that spike timing is also used to distinguish between speech sounds in noisy conditions that significantly degrade neural responses to speech sounds. We tested speech sound discrimination in rats and recorded primary auditory cortex (A1) responses to speech sounds in background noise of different intensities and spectral compositions. Our behavioral results indicate that rats, like humans, are able to accurately discriminate consonant sounds even in the presence of background noise that is as loud as the speech signal. Our neural recordings confirm that speech sounds evoke degraded but detectable responses in noise. Finally, we developed a novel neural classifier that mimics behavioral discrimination. The classifier discriminates between speech sounds by comparing the A1 spatiotemporal activity patterns evoked on single trials with the average spatiotemporal patterns evoked by known sounds. Unlike classifiers in most previous studies, this classifier is not provided with the stimulus onset time. Neural activity analyzed with the use of relative spike timing was well correlated with behavioral speech discrimination in quiet and in noise. Spike timing information integrated over longer intervals was required to accurately predict rat behavioral speech discrimination in noisy conditions. The similarity of neural and behavioral discrimination of speech in noise suggests that humans and rats may employ similar brain mechanisms to solve this problem. PMID:22098331
Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing

DEFF Research Database (Denmark)

Jørgensen, Søren; Dau, Torsten

2011-01-01

A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The ...... process provides a key measure of speech intelligibility. © 2011 Acoustical Society of America.......A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data....... The model estimates the speech-to-noise envelope power ratio, SNR env, at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech...

Comparison of Linear Prediction Models for Audio Signals

Directory of Open Access Journals (Sweden)

2009-03-01

Full Text Available While linear prediction (LP has become immensely popular in speech modeling, it does not seem to provide a good approach for modeling audio signals. This is somewhat surprising, since a tonal signal consisting of a number of sinusoids can be perfectly predicted based on an (all-pole LP model with a model order that is twice the number of sinusoids. We provide an explanation why this result cannot simply be extrapolated to LP of audio signals. If noise is taken into account in the tonal signal model, a low-order all-pole model appears to be only appropriate when the tonal components are uniformly distributed in the Nyquist interval. Based on this observation, different alternatives to the conventional LP model can be suggested. Either the model should be changed to a pole-zero, a high-order all-pole, or a pitch prediction model, or the conventional LP model should be preceded by an appropriate frequency transform, such as a frequency warping or downsampling. By comparing these alternative LP models to the conventional LP model in terms of frequency estimation accuracy, residual spectral flatness, and perceptual frequency resolution, we obtain several new and promising approaches to LP-based audio modeling.
Robust digital processing of speech signals

CERN Document Server

Kovacevic, Branko; Veinović, Mladen; Marković, Milan

2017-01-01

This book focuses on speech signal phenomena, presenting a robustification of the usual speech generation models with regard to the presumed types of excitation signals, which is equivalent to the introduction of a class of nonlinear models and the corresponding criterion functions for parameter estimation. Compared to the general class of nonlinear models, such as various neural networks, these models possess good properties of controlled complexity, the option of working in “online” mode, as well as a low information volume for efficient speech encoding and transmission. Providing comprehensive insights, the book is based on the authors’ research, which has already been published, supplemented by additional texts discussing general considerations of speech modeling, linear predictive analysis and robust parameter estimation.
A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments

Directory of Open Access Journals (Sweden)

Jing Mi

2016-09-01

Full Text Available Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.
A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments.

Science.gov (United States)

Mi, Jing; Colburn, H Steven

2016-10-03

Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model. © The Author(s) 2016.
Adaptive plasticity in speech perception: Effects of external information and internal predictions.

Science.gov (United States)

Guediche, Sara; Fiez, Julie A; Holt, Lori L

2016-07-01

When listeners encounter speech under adverse listening conditions, adaptive adjustments in perception can improve comprehension over time. In some cases, these adaptive changes require the presence of external information that disambiguates the distorted speech signals, whereas in other cases mere exposure is sufficient. Both external (e.g., written feedback) and internal (e.g., prior word knowledge) sources of information can be used to generate predictions about the correct mapping of a distorted speech signal. We hypothesize that these predictions provide a basis for determining the discrepancy between the expected and actual speech signal that can be used to guide adaptive changes in perception. This study provides the first empirical investigation that manipulates external and internal factors through (a) the availability of explicit external disambiguating information via the presence or absence of postresponse orthographic information paired with a repetition of the degraded stimulus, and (b) the accuracy of internally generated predictions; an acoustic distortion is introduced either abruptly or incrementally. The results demonstrate that the impact of external information on adaptive plasticity is contingent upon whether the intelligibility of the stimuli permits accurate internally generated predictions during exposure. External information sources enhance adaptive plasticity only when input signals are severely degraded and cannot reliably access internal predictions. This is consistent with a computational framework for adaptive plasticity in which error-driven supervised learning relies on the ability to compute sensory prediction error signals from both internal and external sources of information. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Linear Regression on Sparse Features for Single-Channel Speech Separation

DEFF Research Database (Denmark)

Schmidt, Mikkel N.; Olsson, Rasmus Kongsgaard

2007-01-01

In this work we address the problem of separating multiple speakers from a single microphone recording. We formulate a linear regression model for estimating each speaker based on features derived from the mixture. The employed feature representation is a sparse, non-negative encoding of the speech...... mixture in terms of pre-learned speaker-dependent dictionaries. Previous work has shown that this feature representation by itself provides some degree of separation. We show that the performance is significantly improved when regression analysis is performed on the sparse, non-negative features, both...
Temporal predictive mechanisms modulate motor reaction time during initiation and inhibition of speech and hand movement.

Science.gov (United States)

Johari, Karim; Behroozmand, Roozbeh

2017-08-01

Skilled movement is mediated by motor commands executed with extremely fine temporal precision. The question of how the brain incorporates temporal information to perform motor actions has remained unanswered. This study investigated the effect of stimulus temporal predictability on response timing of speech and hand movement. Subjects performed a randomized vowel vocalization or button press task in two counterbalanced blocks in response to temporally-predictable and unpredictable visual cues. Results indicated that speech and hand reaction time was decreased for predictable compared with unpredictable stimuli. This finding suggests that a temporal predictive code is established to capture temporal dynamics of sensory cues in order to produce faster movements in responses to predictable stimuli. In addition, results revealed a main effect of modality, indicating faster hand movement compared with speech. We suggest that this effect is accounted for by the inherent complexity of speech production compared with hand movement. Lastly, we found that movement inhibition was faster than initiation for both hand and speech, suggesting that movement initiation requires a longer processing time to coordinate activities across multiple regions in the brain. These findings provide new insights into the mechanisms of temporal information processing during initiation and inhibition of speech and hand movement. Copyright © 2017 Elsevier B.V. All rights reserved.
A predictive model for diagnosing stroke-related apraxia of speech.

Science.gov (United States)

Ballard, Kirrie J; Azizi, Lamiae; Duffy, Joseph R; McNeil, Malcolm R; Halaki, Mark; O'Dwyer, Nicholas; Layfield, Claire; Scholl, Dominique I; Vogel, Adam P; Robin, Donald A

2016-01-29

Diagnosis of the speech motor planning/programming disorder, apraxia of speech (AOS), has proven challenging, largely due to its common co-occurrence with the language-based impairment of aphasia. Currently, diagnosis is based on perceptually identifying and rating the severity of several speech features. It is not known whether all, or a subset of the features, are required for a positive diagnosis. The purpose of this study was to assess predictor variables for the presence of AOS after left-hemisphere stroke, with the goal of increasing diagnostic objectivity and efficiency. This population-based case-control study involved a sample of 72 cases, using the outcome measure of expert judgment on presence of AOS and including a large number of independently collected candidate predictors representing behavioral measures of linguistic, cognitive, nonspeech oral motor, and speech motor ability. We constructed a predictive model using multiple imputation to deal with missing data; the Least Absolute Shrinkage and Selection Operator (Lasso) technique for variable selection to define the most relevant predictors, and bootstrapping to check the model stability and quantify the optimism of the developed model. Two measures were sufficient to distinguish between participants with AOS plus aphasia and those with aphasia alone, (1) a measure of speech errors with words of increasing length and (2) a measure of relative vowel duration in three-syllable words with weak-strong stress pattern (e.g., banana, potato). The model has high discriminative ability to distinguish between cases with and without AOS (c-index=0.93) and good agreement between observed and predicted probabilities (calibration slope=0.94). Some caution is warranted, given the relatively small sample specific to left-hemisphere stroke, and the limitations of imputing missing data. These two speech measures are straightforward to collect and analyse, facilitating use in research and clinical settings. Copyright
The Theory of Linear Prediction

CERN Document Server

Vaidyanathan, PP

2007-01-01

Linear prediction theory has had a profound impact in the field of digital signal processing. Although the theory dates back to the early 1940s, its influence can still be seen in applications today. The theory is based on very elegant mathematics and leads to many beautiful insights into statistical signal processing. Although prediction is only a part of the more general topics of linear estimation, filtering, and smoothing, this book focuses on linear prediction. This has enabled detailed discussion of a number of issues that are normally not found in texts. For example, the theory of vecto
Clinical and MRI models predicting amyloid deposition in progressive aphasia and apraxia of speech.

Science.gov (United States)

Whitwell, Jennifer L; Weigand, Stephen D; Duffy, Joseph R; Strand, Edythe A; Machulda, Mary M; Senjem, Matthew L; Gunter, Jeffrey L; Lowe, Val J; Jack, Clifford R; Josephs, Keith A

2016-01-01

Beta-amyloid (Aβ) deposition can be observed in primary progressive aphasia (PPA) and progressive apraxia of speech (PAOS). While it is typically associated with logopenic PPA, there are exceptions that make predicting Aβ status challenging based on clinical diagnosis alone. We aimed to determine whether MRI regional volumes or clinical data could help predict Aβ deposition. One hundred and thirty-nine PPA (n = 97; 15 agrammatic, 53 logopenic, 13 semantic and 16 unclassified) and PAOS (n = 42) subjects were prospectively recruited into a cross-sectional study and underwent speech/language assessments, 3.0 T MRI and C11-Pittsburgh Compound B PET. The presence of Aβ was determined using a 1.5 SUVR cut-point. Atlas-based parcellation was used to calculate gray matter volumes of 42 regions-of-interest across the brain. Penalized binary logistic regression was utilized to determine what combination of MRI regions, and what combination of speech and language tests, best predicts Aβ (+) status. The optimal MRI model and optimal clinical model both performed comparably in their ability to accurately classify subjects according to Aβ status. MRI accurately classified 81% of subjects using 14 regions. Small left superior temporal and inferior parietal volumes and large left Broca's area volumes were particularly predictive of Aβ (+) status. Clinical scores accurately classified 83% of subjects using 12 tests. Phonological errors and repetition deficits, and absence of agrammatism and motor speech deficits were particularly predictive of Aβ (+) status. In comparison, clinical diagnosis was able to accurately classify 89% of subjects. However, the MRI model performed well in predicting Aβ deposition in unclassified PPA. Clinical diagnosis provides optimum prediction of Aβ status at the group level, although regional MRI measurements and speech and language testing also performed well and could have advantages in predicting Aβ status in unclassified PPA subjects.
Clinical and MRI models predicting amyloid deposition in progressive aphasia and apraxia of speech

Directory of Open Access Journals (Sweden)

Jennifer L. Whitwell

2016-01-01

Full Text Available Beta-amyloid (Aβ deposition can be observed in primary progressive aphasia (PPA and progressive apraxia of speech (PAOS. While it is typically associated with logopenic PPA, there are exceptions that make predicting Aβ status challenging based on clinical diagnosis alone. We aimed to determine whether MRI regional volumes or clinical data could help predict Aβ deposition. One hundred and thirty-nine PPA (n = 97; 15 agrammatic, 53 logopenic, 13 semantic and 16 unclassified and PAOS (n = 42 subjects were prospectively recruited into a cross-sectional study and underwent speech/language assessments, 3.0 T MRI and C11-Pittsburgh Compound B PET. The presence of Aβ was determined using a 1.5 SUVR cut-point. Atlas-based parcellation was used to calculate gray matter volumes of 42 regions-of-interest across the brain. Penalized binary logistic regression was utilized to determine what combination of MRI regions, and what combination of speech and language tests, best predicts Aβ (+ status. The optimal MRI model and optimal clinical model both performed comparably in their ability to accurately classify subjects according to Aβ status. MRI accurately classified 81% of subjects using 14 regions. Small left superior temporal and inferior parietal volumes and large left Broca's area volumes were particularly predictive of Aβ (+ status. Clinical scores accurately classified 83% of subjects using 12 tests. Phonological errors and repetition deficits, and absence of agrammatism and motor speech deficits were particularly predictive of Aβ (+ status. In comparison, clinical diagnosis was able to accurately classify 89% of subjects. However, the MRI model performed well in predicting Aβ deposition in unclassified PPA. Clinical diagnosis provides optimum prediction of Aβ status at the group level, although regional MRI measurements and speech and language testing also performed well and could have advantages in predicting Aβ status in unclassified
Prediction of speech intelligibility based on a correlation metric in the envelope power spectrum domain

DEFF Research Database (Denmark)

Relano-Iborra, Helia; May, Tobias; Zaar, Johannes

A powerful tool to investigate speech perception is the use of speech intelligibility prediction models. Recently, a model was presented, termed correlation-based speechbased envelope power spectrum model (sEPSMcorr) [1], based on the auditory processing of the multi-resolution speech-based Envel...
Prediction of Audience Response from Spoken Sequences, Speech Pauses and Co-speech Gestures in Humorous Discourse by Barack Obama

DEFF Research Database (Denmark)

Navarretta, Costanza

2017-01-01

president mocks himself, his collaborators, political adversary and the press corps making the audience react with cheers, laughter and/or applause. The results of the prediction experiment demonstrate that information about spoken sequences, pauses and co-speech gestures by Obama can be used to predict...
The role of auditory spectro-temporal modulation filtering and the decision metric for speech intelligibility prediction

DEFF Research Database (Denmark)

Chabot-Leclerc, Alexandre; Jørgensen, Søren; Dau, Torsten

2014-01-01

Speech intelligibility models typically consist of a preprocessing part that transforms stimuli into some internal (auditory) representation and a decision metric that relates the internal representation to speech intelligibility. The present study analyzed the role of modulation filtering...... in the preprocessing of different speech intelligibility models by comparing predictions from models that either assume a spectro-temporal (i.e., two-dimensional) or a temporal-only (i.e., one-dimensional) modulation filterbank. Furthermore, the role of the decision metric for speech intelligibility was investigated...... subtraction. The results suggested that a decision metric based on the SNRenv may provide a more general basis for predicting speech intelligibility than a metric based on the MTF. Moreover, the one-dimensional modulation filtering process was found to be sufficient to account for the data when combined...
High-Order Sparse Linear Predictors for Audio Processing

DEFF Research Database (Denmark)

Giacobello, Daniele; van Waterschoot, Toon; Christensen, Mads Græsbøll

2010-01-01

Linear prediction has generally failed to make a breakthrough in audio processing, as it has done in speech processing. This is mostly due to its poor modeling performance, since an audio signal is usually an ensemble of different sources. Nevertheless, linear prediction comes with a whole set...... of interesting features that make the idea of using it in audio processing not far fetched, e.g., the strong ability of modeling the spectral peaks that play a dominant role in perception. In this paper, we provide some preliminary conjectures and experiments on the use of high-order sparse linear predictors...... in audio processing. These predictors, successfully implemented in modeling the short-term and long-term redundancies present in speech signals, will be used to model tonal audio signals, both monophonic and polyphonic. We will show how the sparse predictors are able to model efﬁciently the different...
Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

Science.gov (United States)

Larm, Petra; Hongisto, Valtteri

2006-02-01

During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.
Predicting masking release of lateralized speech

DEFF Research Database (Denmark)

Chabot-Leclerc, Alexandre; MacDonald, Ewen; Dau, Torsten

2016-01-01

. The largest masking release (MR) was observed when all maskers were on the opposite side of the target. The data in the conditions containing only energetic masking and modulation masking could be accounted for using a binaural extension of the speech-based envelope power spectrum model [sEPSM; Jørgensen et...... al., 2013, J. Acoust. Soc. Am. 130], which uses a short-term equalization-cancellation process to model binaural unmasking. In the conditions where informational masking (IM) was involved, the predicted SRTs were lower than the measured values because the model is blind to confusions experienced...
Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals.

Science.gov (United States)

Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

2015-01-01

In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature.
Damage to the anterior arcuate fasciculus predicts non-fluent speech production in aphasia.

Science.gov (United States)

Fridriksson, Julius; Guo, Dazhou; Fillmore, Paul; Holland, Audrey; Rorden, Chris

2013-11-01

Non-fluent aphasia implies a relatively straightforward neurological condition characterized by limited speech output. However, it is an umbrella term for different underlying impairments affecting speech production. Several studies have sought the critical lesion location that gives rise to non-fluent aphasia. The results have been mixed but typically implicate anterior cortical regions such as Broca's area, the left anterior insula, and deep white matter regions. To provide a clearer picture of cortical damage in non-fluent aphasia, the current study examined brain damage that negatively influences speech fluency in patients with aphasia. It controlled for some basic speech and language comprehension factors in order to better isolate the contribution of different mechanisms to fluency, or its lack. Cortical damage was related to overall speech fluency, as estimated by clinical judgements using the Western Aphasia Battery speech fluency scale, diadochokinetic rate, rudimentary auditory language comprehension, and executive functioning (scores on a matrix reasoning test) in 64 patients with chronic left hemisphere stroke. A region of interest analysis that included brain regions typically implicated in speech and language processing revealed that non-fluency in aphasia is primarily predicted by damage to the anterior segment of the left arcuate fasciculus. An improved prediction model also included the left uncinate fasciculus, a white matter tract connecting the middle and anterior temporal lobe with frontal lobe regions, including the pars triangularis. Models that controlled for diadochokinetic rate, picture-word recognition, or executive functioning also revealed a strong relationship between anterior segment involvement and speech fluency. Whole brain analyses corroborated the findings from the region of interest analyses. An additional exploratory analysis revealed that involvement of the uncinate fasciculus adjudicated between Broca's and global aphasia
Evaluation of Short-Term Cepstral Based Features for Detection of Parkinson’s Disease Severity Levels through Speech signals

Science.gov (United States)

Oung, Qi Wei; Nisha Basah, Shafriza; Muthusamy, Hariharan; Vijean, Vikneswaran; Lee, Hoileong

2018-03-01

Parkinson’s disease (PD) is one type of progressive neurodegenerative disease known as motor system syndrome, which is due to the death of dopamine-generating cells, a region of the human midbrain. PD normally affects people over 60 years of age, which at present has influenced a huge part of worldwide population. Lately, many researches have shown interest into the connection between PD and speech disorders. Researches have revealed that speech signals may be a suitable biomarker for distinguishing between people with Parkinson’s (PWP) from healthy subjects. Therefore, early diagnosis of PD through the speech signals can be considered for this aim. In this research, the speech data are acquired based on speech behaviour as the biomarker for differentiating PD severity levels (mild and moderate) from healthy subjects. Feature extraction algorithms applied are Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), and Weighted Linear Prediction Cepstral Coefficients (WLPCC). For classification, two types of classifiers are used: k-Nearest Neighbour (KNN) and Probabilistic Neural Network (PNN). The experimental results demonstrated that PNN classifier and KNN classifier achieve the best average classification performance of 92.63% and 88.56% respectively through 10-fold cross-validation measures. Favourably, the suggested techniques have the possibilities of becoming a new choice of promising tools for the PD detection with tremendous performance.

Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain

DEFF Research Database (Denmark)

Relaño-Iborra, Helia; May, Tobias; Zaar, Johannes

2016-01-01

A speech intelligibility prediction model is proposed that combines the auditory processing front end of the multi-resolution speech-based envelope power spectrum model [mr-sEPSM; Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134(1), 436–446] with a correlation back end inspired by the sh...
Model Predictive Control for Linear Complementarity and Extended Linear Complementarity Systems

Directory of Open Access Journals (Sweden)

Bambang Riyanto

2005-11-01

Full Text Available In this paper, we propose model predictive control method for linear complementarity and extended linear complementarity systems by formulating optimization along prediction horizon as mixed integer quadratic program. Such systems contain interaction between continuous dynamics and discrete event systems, and therefore, can be categorized as hybrid systems. As linear complementarity and extended linear complementarity systems finds applications in different research areas, such as impact mechanical systems, traffic control and process control, this work will contribute to the development of control design method for those areas as well, as shown by three given examples.
Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model.

Science.gov (United States)

Jürgens, Tim; Brand, Thomas

2009-11-01

This study compares the phoneme recognition performance in speech-shaped noise of a microscopic model for speech recognition with the performance of normal-hearing listeners. "Microscopic" is defined in terms of this model twofold. First, the speech recognition rate is predicted on a phoneme-by-phoneme basis. Second, microscopic modeling means that the signal waveforms to be recognized are processed by mimicking elementary parts of human's auditory processing. The model is based on an approach by Holube and Kollmeier [J. Acoust. Soc. Am. 100, 1703-1716 (1996)] and consists of a psychoacoustically and physiologically motivated preprocessing and a simple dynamic-time-warp speech recognizer. The model is evaluated while presenting nonsense speech in a closed-set paradigm. Averaged phoneme recognition rates, specific phoneme recognition rates, and phoneme confusions are analyzed. The influence of different perceptual distance measures and of the model's a-priori knowledge is investigated. The results show that human performance can be predicted by this model using an optimal detector, i.e., identical speech waveforms for both training of the recognizer and testing. The best model performance is yielded by distance measures which focus mainly on small perceptual distances and neglect outliers.
Digitised evaluation of speech intelligibility using vowels in maxillectomy patients.

Science.gov (United States)

Sumita, Y I; Hattori, M; Murase, M; Elbashti, M E; Taniguchi, H

2018-03-01

Among the functional disabilities that patients face following maxillectomy, speech impairment is a major factor influencing quality of life. Proper rehabilitation of speech, which may include prosthodontic and surgical treatments and speech therapy, requires accurate evaluation of speech intelligibility (SI). A simple, less time-consuming yet accurate evaluation is desirable both for maxillectomy patients and the various clinicians providing maxillofacial treatment. This study sought to determine the utility of digital acoustic analysis of vowels for the prediction of SI in maxillectomy patients, based on a comprehensive understanding of speech production in the vocal tract of maxillectomy patients and its perception. Speech samples were collected from 33 male maxillectomy patients (mean age 57.4 years) in two conditions, without and with a maxillofacial prosthesis, and formant data for the vowels /a/,/e/,/i/,/o/, and /u/ were calculated based on linear predictive coding. The frequency range of formant 2 (F2) was determined by differences between the minimum and maximum frequency. An SI test was also conducted to reveal the relationship between SI score and F2 range. Statistical analyses were applied. F2 range and SI score were significantly different between the two conditions without and with a prosthesis (both P maxillectomy. © 2017 John Wiley & Sons Ltd.
Variable Span Filters for Speech Enhancement

DEFF Research Database (Denmark)

Jensen, Jesper Rindom; Benesty, Jacob; Christensen, Mads Græsbøll

2016-01-01

In this work, we consider enhancement of multichannel speech recordings. Linear filtering and subspace approaches have been considered previously for solving the problem. The current linear filtering methods, although many variants exist, have limited control of noise reduction and speech...
Prediction of speech intelligibility based on an auditory preprocessing model

DEFF Research Database (Denmark)

Christiansen, Claus Forup Corlin; Pedersen, Michael Syskind; Dau, Torsten

2010-01-01

in noise experiment was used for training and an ideal binary mask experiment was used for evaluation. All three models were able to capture the trends in the speech in noise training data well, but the proposed model provides a better prediction of the binary mask test data, particularly when the binary...... masks degenerate to a noise vocoder....
Predicting automatic speech recognition performance over communication channels from instrumental speech quality and intelligibility scores

NARCIS (Netherlands)

Gallardo, L.F.; Möller, S.; Beerends, J.

2017-01-01

The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility
Estimation of Frame Independent and Enhancement Components for Speech Communication over Packet Networks

DEFF Research Database (Denmark)

Giacobello, Daniele; Murthi, Manohar N.; Christensen, Mads Græsbøll

2010-01-01

In this paper, we describe a new approach to cope with packet loss in speech coders. The idea is to split the information present in each speech packet into two components, one to independently decode the given speech frame and one to enhance it by exploiting interframe dependencies. The scheme...... is based on sparse linear prediction and a redeﬁnition of the analysis-by-synthesis process. We present Mean Opinion Scores for the presented coder with different degrees of packet loss and show that it performs similarly to frame dependent coders for low packet loss probability and similarly to frame...
Linear zonal atmospheric prediction for adaptive optics

Science.gov (United States)

McGuire, Patrick C.; Rhoadarmer, Troy A.; Coy, Hanna A.; Angel, J. Roger P.; Lloyd-Hart, Michael

2000-07-01

We compare linear zonal predictors of atmospheric turbulence for adaptive optics. Zonal prediction has the possible advantage of being able to interpret and utilize wind-velocity information from the wavefront sensor better than modal prediction. For simulated open-loop atmospheric data for a 2- meter 16-subaperture AO telescope with 5 millisecond prediction and a lookback of 4 slope-vectors, we find that Widrow-Hoff Delta-Rule training of linear nets and Back- Propagation training of non-linear multilayer neural networks is quite slow, getting stuck on plateaus or in local minima. Recursive Least Squares training of linear predictors is two orders of magnitude faster and it also converges to the solution with global minimum error. We have successfully implemented Amari's Adaptive Natural Gradient Learning (ANGL) technique for a linear zonal predictor, which premultiplies the Delta-Rule gradients with a matrix that orthogonalizes the parameter space and speeds up the training by two orders of magnitude, like the Recursive Least Squares predictor. This shows that the simple Widrow-Hoff Delta-Rule's slow convergence is not a fluke. In the case of bright guidestars, the ANGL, RLS, and standard matrix-inversion least-squares (MILS) algorithms all converge to the same global minimum linear total phase error (approximately 0.18 rad2), which is only approximately 5% higher than the spatial phase error (approximately 0.17 rad2), and is approximately 33% lower than the total 'naive' phase error without prediction (approximately 0.27 rad2). ANGL can, in principle, also be extended to make non-linear neural network training feasible for these large networks, with the potential to lower the predictor error below the linear predictor error. We will soon scale our linear work to the approximately 108-subaperture MMT AO system, both with simulations and real wavefront sensor data from prime focus.
Infant and Toddler Oral- and Manual-Motor Skills Predict Later Speech Fluency in Autism

Science.gov (United States)

Gernsbacher, Morton Ann; Sauer, Eve A.; Geye, Heather M.; Schweigert, Emily K.; Goldsmith, H. Hill

2008-01-01

Background: Spoken and gestural communication proficiency varies greatly among autistic individuals. Three studies examined the role of oral- and manual-motor skill in predicting autistic children's speech development. Methods: Study 1 investigated whether infant and toddler oral- and manual-motor skills predict middle childhood and teenage speech…
Magnified Neural Envelope Coding Predicts Deficits in Speech Perception in Noise.

Science.gov (United States)

Millman, Rebecca E; Mattys, Sven L; Gouws, André D; Prendergast, Garreth

2017-08-09

Verbal communication in noisy backgrounds is challenging. Understanding speech in background noise that fluctuates in intensity over time is particularly difficult for hearing-impaired listeners with a sensorineural hearing loss (SNHL). The reduction in fast-acting cochlear compression associated with SNHL exaggerates the perceived fluctuations in intensity in amplitude-modulated sounds. SNHL-induced changes in the coding of amplitude-modulated sounds may have a detrimental effect on the ability of SNHL listeners to understand speech in the presence of modulated background noise. To date, direct evidence for a link between magnified envelope coding and deficits in speech identification in modulated noise has been absent. Here, magnetoencephalography was used to quantify the effects of SNHL on phase locking to the temporal envelope of modulated noise (envelope coding) in human auditory cortex. Our results show that SNHL enhances the amplitude of envelope coding in posteromedial auditory cortex, whereas it enhances the fidelity of envelope coding in posteromedial and posterolateral auditory cortex. This dissociation was more evident in the right hemisphere, demonstrating functional lateralization in enhanced envelope coding in SNHL listeners. However, enhanced envelope coding was not perceptually beneficial. Our results also show that both hearing thresholds and, to a lesser extent, magnified cortical envelope coding in left posteromedial auditory cortex predict speech identification in modulated background noise. We propose a framework in which magnified envelope coding in posteromedial auditory cortex disrupts the segregation of speech from background noise, leading to deficits in speech perception in modulated background noise. SIGNIFICANCE STATEMENT People with hearing loss struggle to follow conversations in noisy environments. Background noise that fluctuates in intensity over time poses a particular challenge. Using magnetoencephalography, we demonstrate
INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM

Directory of Open Access Journals (Sweden)

J. SANGEETHA

2015-02-01

Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.
Analysis of a simplified normalized covariance measure based on binary weighting functions for predicting the intelligibility of noise-suppressed speech.

Science.gov (United States)

Chen, Fei; Loizou, Philipos C

2010-12-01

The normalized covariance measure (NCM) has been shown previously to predict reliably the intelligibility of noise-suppressed speech containing non-linear distortions. This study analyzes a simplified NCM measure that requires only a small number of bands (not necessarily contiguous) and uses simple binary (1 or 0) weighting functions. The rationale behind the use of a small number of bands is to account for the fact that the spectral information contained in contiguous or nearby bands is correlated and redundant. The modified NCM measure was evaluated with speech intelligibility scores obtained by normal-hearing listeners in 72 noisy conditions involving noise-suppressed speech corrupted by four different types of maskers (car, babble, train, and street interferences). High correlation (r = 0.8) was obtained with the modified NCM measure even when only one band was used. Further analysis revealed a masker-specific pattern of correlations when only one band was used, and bands with low correlation signified the corresponding envelopes that have been severely distorted by the noise-suppression algorithm and/or the masker. Correlation improved to r = 0.84 when only two disjoint bands (centered at 325 and 1874 Hz) were used. Even further improvements in correlation (r = 0.85) were obtained when three or four lower-frequency (<700 Hz) bands were selected.
Profiles of verbal working memory growth predict speech and language development in children with cochlear implants.

Science.gov (United States)

Kronenberger, William G; Pisoni, David B; Harris, Michael S; Hoen, Helena M; Xu, Huiping; Miyamoto, Richard T

2013-06-01

Verbal short-term memory (STM) and working memory (WM) skills predict speech and language outcomes in children with cochlear implants (CIs) even after conventional demographic, device, and medical factors are taken into account. However, prior research has focused on single end point outcomes as opposed to the longitudinal process of development of verbal STM/WM and speech-language skills. In this study, the authors investigated relations between profiles of verbal STM/WM development and speech-language development over time. Profiles of verbal STM/WM development were identified through the use of group-based trajectory analysis of repeated digit span measures over at least a 2-year time period in a sample of 66 children (ages 6-16 years) with CIs. Subjects also completed repeated assessments of speech and language skills during the same time period. Clusters representing different patterns of development of verbal STM (digit span forward scores) were related to the growth rate of vocabulary and language comprehension skills over time. Clusters representing different patterns of development of verbal WM (digit span backward scores) were related to the growth rate of vocabulary and spoken word recognition skills over time. Different patterns of development of verbal STM/WM capacity predict the dynamic process of development of speech and language skills in this clinical population.
Comparison of linear and non-linear models for predicting energy expenditure from raw accelerometer data.

Science.gov (United States)

Montoye, Alexander H K; Begum, Munni; Henning, Zachary; Pfeiffer, Karin A

2017-02-01

This study had three purposes, all related to evaluating energy expenditure (EE) prediction accuracy from body-worn accelerometers: (1) compare linear regression to linear mixed models, (2) compare linear models to artificial neural network models, and (3) compare accuracy of accelerometers placed on the hip, thigh, and wrists. Forty individuals performed 13 activities in a 90 min semi-structured, laboratory-based protocol. Participants wore accelerometers on the right hip, right thigh, and both wrists and a portable metabolic analyzer (EE criterion). Four EE prediction models were developed for each accelerometer: linear regression, linear mixed, and two ANN models. EE prediction accuracy was assessed using correlations, root mean square error (RMSE), and bias and was compared across models and accelerometers using repeated-measures analysis of variance. For all accelerometer placements, there were no significant differences for correlations or RMSE between linear regression and linear mixed models (correlations: r = 0.71-0.88, RMSE: 1.11-1.61 METs; p > 0.05). For the thigh-worn accelerometer, there were no differences in correlations or RMSE between linear and ANN models (ANN-correlations: r = 0.89, RMSE: 1.07-1.08 METs. Linear models-correlations: r = 0.88, RMSE: 1.10-1.11 METs; p > 0.05). Conversely, one ANN had higher correlations and lower RMSE than both linear models for the hip (ANN-correlation: r = 0.88, RMSE: 1.12 METs. Linear models-correlations: r = 0.86, RMSE: 1.18-1.19 METs; p linear models for the wrist-worn accelerometers (ANN-correlations: r = 0.82-0.84, RMSE: 1.26-1.32 METs. Linear models-correlations: r = 0.71-0.73, RMSE: 1.55-1.61 METs; p models offer a significant improvement in EE prediction accuracy over linear models. Conversely, linear models showed similar EE prediction accuracy to machine learning models for hip- and thigh
Modelling and Predicting Backstroke Start Performance Using Non-Linear and Linear Models.

Science.gov (United States)

de Jesus, Karla; Ayala, Helon V H; de Jesus, Kelly; Coelho, Leandro Dos S; Medeiros, Alexandre I A; Abraldes, José A; Vaz, Mário A P; Fernandes, Ricardo J; Vilas-Boas, João Paulo

2018-03-01

Our aim was to compare non-linear and linear mathematical model responses for backstroke start performance prediction. Ten swimmers randomly completed eight 15 m backstroke starts with feet over the wedge, four with hands on the highest horizontal and four on the vertical handgrip. Swimmers were videotaped using a dual media camera set-up, with the starts being performed over an instrumented block with four force plates. Artificial neural networks were applied to predict 5 m start time using kinematic and kinetic variables and to determine the accuracy of the mean absolute percentage error. Artificial neural networks predicted start time more robustly than the linear model with respect to changing training to the validation dataset for the vertical handgrip (3.95 ± 1.67 vs. 5.92 ± 3.27%). Artificial neural networks obtained a smaller mean absolute percentage error than the linear model in the horizontal (0.43 ± 0.19 vs. 0.98 ± 0.19%) and vertical handgrip (0.45 ± 0.19 vs. 1.38 ± 0.30%) using all input data. The best artificial neural network validation revealed a smaller mean absolute error than the linear model for the horizontal (0.007 vs. 0.04 s) and vertical handgrip (0.01 vs. 0.03 s). Artificial neural networks should be used for backstroke 5 m start time prediction due to the quite small differences among the elite level performances.
Impact of Different Active-Speech-Ratios on PESQ’s Predictions in Case of Independent and Dependent Losses (in Presence of Receiver-Side Comfort-Noise

Directory of Open Access Journals (Sweden)

P. Pocta

2010-04-01

Full Text Available This paper deals with the investigation of PESQ’s behavior under independent and dependent loss conditions from an Active-Speech-Ratio perspective in presence of receiver-side comfort-noise. This reference signal characteristic is defined very broadly by ITU-T Recommendation P.862.3. That is the reason to investigate an impact of this characteristic on speech quality prediction more in-depth. We assess the variability of PESQ’s predictions with respect to Active-Speech-Ratios and loss conditions, as well as their accuracy, by comparing the predictions with subjective assessments. Our results show that an increase in amount of speech in the reference signal (expressed by the Active-Speech-Ratio characteristic may result in an increase of the reference signal sensitivity to packet loss change. Interestingly, we have found two additional effects in this investigated case. The use of higher Active-Speech-Ratios may lead to negative shifting effect in MOS domain and also PESQ’s predictions accuracy declining. Predictions accuracy could be improved by higher packet losses.
Speech enhancement in the Karhunen-Loeve expansion domain

CERN Document Server

Benesty, Jacob

2011-01-01

This book is devoted to the study of the problem of speech enhancement whose objective is the recovery of a signal of interest (i.e., speech) from noisy observations. Typically, the recovery process is accomplished by passing the noisy observations through a linear filter (or a linear transformation). Since both the desired speech and undesired noise are filtered at the same time, the most critical issue of speech enhancement resides in how to design a proper optimal filter that can fully take advantage of the difference between the speech and noise statistics to mitigate the noise effect as m
EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression.

Science.gov (United States)

Lian, Yao; Ge, Meng; Pan, Xian-Ming

2014-12-19

B-cell epitopes have been studied extensively due to their immunological applications, such as peptide-based vaccine development, antibody production, and disease diagnosis and therapy. Despite several decades of research, the accurate prediction of linear B-cell epitopes has remained a challenging task. In this work, based on the antigen's primary sequence information, a novel linear B-cell epitope prediction model was developed using the multiple linear regression (MLR). A 10-fold cross-validation test on a large non-redundant dataset was performed to evaluate the performance of our model. To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed. We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728. We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information. Moreover, a web server EPMLR has been developed for linear B-cell epitope prediction: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/ .
Non linear analyses of speech and prosody in Asperger's syndrome

DEFF Research Database (Denmark)

Fusaroli, Riccardo; Bang, Dan; Weed, Ethan

It is widely acknowledged that people on the ASD spectrum behave atypically in the way they modulate aspects of speech and voice, including pitch, fluency, and voice quality. ASD speech has been described at times as “odd”, “mechanical”, or “monotone”. However, it has proven difficult to quantify...... the results in a supervised machine-learning process to classify speech production as either belonging to the control or the AS group as well as to assess the severity of the disorder (as measured by Autism Spectrum Quotient), based solely on acoustic features....

An online re-linearization scheme suited for Model Predictive and Linear Quadratic Control

DEFF Research Database (Denmark)

Henriksen, Lars Christian; Poulsen, Niels Kjølstad

This technical note documents the equations for primal-dual interior-point quadratic programming problem solver used for MPC. The algorithm exploits the special structure of the MPC problem and is able to reduce the computational burden such that the computational burden scales with prediction...... horizon length in a linear way rather than cubic, which would be the case if the structure was not exploited. It is also shown how models used for design of model-based controllers, e.g. linear quadratic and model predictive, can be linearized both at equilibrium and non-equilibrium points, making...
Emotion Recognition of Speech Signals Based on Filter Methods

Directory of Open Access Journals (Sweden)

Narjes Yazdanian

2016-10-01

Full Text Available Speech is the basic mean of communication among human beings.With the increase of transaction between human and machine, necessity of automatic dialogue and removing human factor has been considered. The aim of this study was to determine a set of affective features the speech signal is based on emotions. In this study system was designs that include three mains sections, features extraction, features selection and classification. After extraction of useful features such as, mel frequency cepstral coefficient (MFCC, linear prediction cepstral coefficients (LPC, perceptive linear prediction coefficients (PLP, ferment frequency, zero crossing rate, cepstral coefficients and pitch frequency, Mean, Jitter, Shimmer, Energy, Minimum, Maximum, Amplitude, Standard Deviation, at a later stage with filter methods such as Pearson Correlation Coefficient, t-test, relief and information gain, we came up with a method to rank and select effective features in emotion recognition. Then Result, are given to the classification system as a subset of input. In this classification stage, multi support vector machine are used to classify seven type of emotion. According to the results, that method of relief, together with multi support vector machine, has the most classification accuracy with emotion recognition rate of 93.94%.
Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

Science.gov (United States)

Borrie, Stephanie A

2015-03-01

This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.
Predicting the effect of spectral subtraction on the speech recognition threshold based on the signal-to-noise ratio in the envelope domain

DEFF Research Database (Denmark)

Jørgensen, Søren; Dau, Torsten

2011-01-01

rarely been evaluated perceptually in terms of speech intelligibility. This study analyzed the effects of the spectral subtraction strategy proposed by Berouti at al. [ICASSP 4 (1979), 208-211] on the speech recognition threshold (SRT) obtained with sentences presented in stationary speech-shaped noise....... The SRT was measured in five normal-hearing listeners in six conditions of spectral subtraction. The results showed an increase of the SRT after processing, i.e. a decreased speech intelligibility, in contrast to what is predicted by the Speech Transmission Index (STI). Here, another approach is proposed......, denoted the speech-based envelope power spectrum model (sEPSM) which predicts the intelligibility based on the signal-to-noise ratio in the envelope domain. In contrast to the STI, the sEPSM is sensitive to the increased amount of the noise envelope power as a consequence of the spectral subtraction...
Hearing and seeing meaning in noise: Alpha, beta, and gamma oscillations predict gestural enhancement of degraded speech comprehension.

Science.gov (United States)

Drijvers, Linda; Özyürek, Asli; Jensen, Ole

2018-05-01

During face-to-face communication, listeners integrate speech with gestures. The semantic information conveyed by iconic gestures (e.g., a drinking gesture) can aid speech comprehension in adverse listening conditions. In this magnetoencephalography (MEG) study, we investigated the spatiotemporal neural oscillatory activity associated with gestural enhancement of degraded speech comprehension. Participants watched videos of an actress uttering clear or degraded speech, accompanied by a gesture or not and completed a cued-recall task after watching every video. When gestures semantically disambiguated degraded speech comprehension, an alpha and beta power suppression and a gamma power increase revealed engagement and active processing in the hand-area of the motor cortex, the extended language network (LIFG/pSTS/STG/MTG), medial temporal lobe, and occipital regions. These observed low- and high-frequency oscillatory modulations in these areas support general unification, integration and lexical access processes during online language comprehension, and simulation of and increased visual attention to manual gestures over time. All individual oscillatory power modulations associated with gestural enhancement of degraded speech comprehension predicted a listener's correct disambiguation of the degraded verb after watching the videos. Our results thus go beyond the previously proposed role of oscillatory dynamics in unimodal degraded speech comprehension and provide first evidence for the role of low- and high-frequency oscillations in predicting the integration of auditory and visual information at a semantic level. © 2018 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.
Objective Prediction of Hearing Aid Benefit Across Listener Groups Using Machine Learning: Speech Recognition Performance With Binaural Noise-Reduction Algorithms.

Science.gov (United States)

Schädler, Marc R; Warzybok, Anna; Kollmeier, Birger

2018-01-01

The simulation framework for auditory discrimination experiments (FADE) was adopted and validated to predict the individual speech-in-noise recognition performance of listeners with normal and impaired hearing with and without a given hearing-aid algorithm. FADE uses a simple automatic speech recognizer (ASR) to estimate the lowest achievable speech reception thresholds (SRTs) from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Empirical data from the literature were used to evaluate the model in terms of predicted SRTs and benefits in SRT with the German matrix sentence recognition test when using eight single- and multichannel binaural noise-reduction algorithms. To allow individual predictions of SRTs in binaural conditions, the model was extended with a simple better ear approach and individualized by taking audiograms into account. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for a group of normal-hearing listeners and predicted the corresponding benefits with a root-mean-square prediction error of 0.6 dB. This highlights the potential of the approach for the objective assessment of benefits in SRT without prior knowledge about the empirical data. The predictions for the group of listeners with impaired hearing explained 75% of the empirical variance, while the individual predictions explained less than 25%. Possibly, additional individual factors should be considered for more accurate predictions with impaired hearing. A competing talker condition clearly showed one limitation of current ASR technology, as the empirical performance with SRTs lower than -20 dB could not be predicted.
Predicting fatigue and psychophysiological test performance from speech for safety critical environments

Directory of Open Access Journals (Sweden)

Khan Richard Baykaner

2015-08-01

Full Text Available Automatic systems for estimating operator fatigue have application in safety-critical environments. A system which could estimate level of fatigue from speech would have application in domains where operators engage in regular verbal communication as part of their duties. Previous studies on the prediction of fatigue from speech have been limited because of their reliance on subjective ratings and because they lack comparison to other methods for assessing fatigue. In this paper we present an analysis of voice recordings and psychophysiological test scores collected from seven aerospace personnel during a training task in which they remained awake for 60 hours. We show that voice features and test scores are affected by both the total time spent awake and the time position within each subject’s circadian cycle. However, we show that time spent awake and time of day information are poor predictors of the test results; while voice features can give good predictions of the psychophysiological test scores and sleep latency. Mean absolute errors of prediction are possible within about 17.5% for sleep latency and 5-12% for test scores. We discuss the implications for the use of voice as a means to monitor the effects of fatigue on cognitive performance in practical applications.
The Use of Linear Programming for Prediction.

Science.gov (United States)

Schnittjer, Carl J.

The purpose of the study was to develop a linear programming model to be used for prediction, test the accuracy of the predictions, and compare the accuracy with that produced by curvilinear multiple regression analysis. (Author)
A Block-Based Linear MMSE Noise Reduction with a High Temporal Resolution Modeling of the Speech Excitation

DEFF Research Database (Denmark)

Li, Chunjian; Andersen, S. V.

2005-01-01

A comprehensive linear minimum mean squared error (LMMSE) approach for parametric speech enhancement is developed. The proposed algorithms aim at joint LMMSE estimation of signal power spectra and phase spectra, as well as exploitation of correlation between spectral components. The major cause...... of this interfrequency correlation is shown to be the prominent temporal power localization in the excitation of voiced speech. LMMSE estimators in time domain and frequency domain are first formulated. To obtain the joint estimator, we model the spectral signal covariance matrix as a full covariancematrix instead...... of a diagonal covariance matrix as is the case in the Wiener filter derived under the quasi-stationarity assumption. To accomplish this, we decompose the signal covariance matrix into a synthesis filter matrix and an excitation matrix. The synthesis filter matrix is built from estimates of the all-pole model...
Predicting an Individual’s Gestures from the Interlocutor’s Co-occurring Gestures and Related Speech

DEFF Research Database (Denmark)

Navarretta, Costanza

2016-01-01

to the prediction of gestures of the same type of the other subject. In this work, we also want to determine whether the speech segments to which these gestures are related to contribute to the prediction. The results of our pilot experiments show that a Naive Bayes classifier trained on the duration and shape...
Prediction of consonant recognition in quiet for listeners with normal and impaired hearing using an auditory model.

Science.gov (United States)

Jürgens, Tim; Ewert, Stephan D; Kollmeier, Birger; Brand, Thomas

2014-03-01

Consonant recognition was assessed in normal-hearing (NH) and hearing-impaired (HI) listeners in quiet as a function of speech level using a nonsense logatome test. Average recognition scores were analyzed and compared to recognition scores of a speech recognition model. In contrast to commonly used spectral speech recognition models operating on long-term spectra, a "microscopic" model operating in the time domain was used. Variations of the model (accounting for hearing impairment) and different model parameters (reflecting cochlear compression) were tested. Using these model variations this study examined whether speech recognition performance in quiet is affected by changes in cochlear compression, namely, a linearization, which is often observed in HI listeners. Consonant recognition scores for HI listeners were poorer than for NH listeners. The model accurately predicted the speech reception thresholds of the NH and most HI listeners. A partial linearization of the cochlear compression in the auditory model, while keeping audibility constant, produced higher recognition scores and improved the prediction accuracy. However, including listener-specific information about the exact form of the cochlear compression did not improve the prediction further.
Predicting birth weight with conditionally linear transformation models.

Science.gov (United States)

Möst, Lisa; Schmid, Matthias; Faschingbauer, Florian; Hothorn, Torsten

2016-12-01

Low and high birth weight (BW) are important risk factors for neonatal morbidity and mortality. Gynecologists must therefore accurately predict BW before delivery. Most prediction formulas for BW are based on prenatal ultrasound measurements carried out within one week prior to birth. Although successfully used in clinical practice, these formulas focus on point predictions of BW but do not systematically quantify uncertainty of the predictions, i.e. they result in estimates of the conditional mean of BW but do not deliver prediction intervals. To overcome this problem, we introduce conditionally linear transformation models (CLTMs) to predict BW. Instead of focusing only on the conditional mean, CLTMs model the whole conditional distribution function of BW given prenatal ultrasound parameters. Consequently, the CLTM approach delivers both point predictions of BW and fetus-specific prediction intervals. Prediction intervals constitute an easy-to-interpret measure of prediction accuracy and allow identification of fetuses subject to high prediction uncertainty. Using a data set of 8712 deliveries at the Perinatal Centre at the University Clinic Erlangen (Germany), we analyzed variants of CLTMs and compared them to standard linear regression estimation techniques used in the past and to quantile regression approaches. The best-performing CLTM variant was competitive with quantile regression and linear regression approaches in terms of conditional coverage and average length of the prediction intervals. We propose that CLTMs be used because they are able to account for possible heteroscedasticity, kurtosis, and skewness of the distribution of BWs. © The Author(s) 2014.
Objective Prediction of Hearing Aid Benefit Across Listener Groups Using Machine Learning: Speech Recognition Performance With Binaural Noise-Reduction Algorithms

Science.gov (United States)

Schädler, Marc R.; Warzybok, Anna; Kollmeier, Birger

2018-01-01

The simulation framework for auditory discrimination experiments (FADE) was adopted and validated to predict the individual speech-in-noise recognition performance of listeners with normal and impaired hearing with and without a given hearing-aid algorithm. FADE uses a simple automatic speech recognizer (ASR) to estimate the lowest achievable speech reception thresholds (SRTs) from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Empirical data from the literature were used to evaluate the model in terms of predicted SRTs and benefits in SRT with the German matrix sentence recognition test when using eight single- and multichannel binaural noise-reduction algorithms. To allow individual predictions of SRTs in binaural conditions, the model was extended with a simple better ear approach and individualized by taking audiograms into account. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for a group of normal-hearing listeners and predicted the corresponding benefits with a root-mean-square prediction error of 0.6 dB. This highlights the potential of the approach for the objective assessment of benefits in SRT without prior knowledge about the empirical data. The predictions for the group of listeners with impaired hearing explained 75% of the empirical variance, while the individual predictions explained less than 25%. Possibly, additional individual factors should be considered for more accurate predictions with impaired hearing. A competing talker condition clearly showed one limitation of current ASR technology, as the empirical performance with SRTs lower than −20 dB could not be predicted. PMID:29692200
Patterns of poststroke brain damage that predict speech production errors in apraxia of speech and aphasia dissociate.

Science.gov (United States)

Basilakos, Alexandra; Rorden, Chris; Bonilha, Leonardo; Moser, Dana; Fridriksson, Julius

2015-06-01

Acquired apraxia of speech (AOS) is a motor speech disorder caused by brain damage. AOS often co-occurs with aphasia, a language disorder in which patients may also demonstrate speech production errors. The overlap of speech production deficits in both disorders has raised questions on whether AOS emerges from a unique pattern of brain damage or as a subelement of the aphasic syndrome. The purpose of this study was to determine whether speech production errors in AOS and aphasia are associated with distinctive patterns of brain injury. Forty-three patients with history of a single left-hemisphere stroke underwent comprehensive speech and language testing. The AOS Rating Scale was used to rate speech errors specific to AOS versus speech errors that can also be associated with both AOS and aphasia. Localized brain damage was identified using structural magnetic resonance imaging, and voxel-based lesion-impairment mapping was used to evaluate the relationship between speech errors specific to AOS, those that can occur in AOS or aphasia, and brain damage. The pattern of brain damage associated with AOS was most strongly associated with damage to cortical motor regions, with additional involvement of somatosensory areas. Speech production deficits that could be attributed to AOS or aphasia were associated with damage to the temporal lobe and the inferior precentral frontal regions. AOS likely occurs in conjunction with aphasia because of the proximity of the brain areas supporting speech and language, but the neurobiological substrate for each disorder differs. © 2015 American Heart Association, Inc.
Causal inference and temporal predictions in audiovisual perception of speech and music.

Science.gov (United States)

Noppeney, Uta; Lee, Hwee Ling

2018-03-31

To form a coherent percept of the environment, the brain must integrate sensory signals emanating from a common source but segregate those from different sources. Temporal regularities are prominent cues for multisensory integration, particularly for speech and music perception. In line with models of predictive coding, we suggest that the brain adapts an internal model to the statistical regularities in its environment. This internal model enables cross-sensory and sensorimotor temporal predictions as a mechanism to arbitrate between integration and segregation of signals from different senses. © 2018 New York Academy of Sciences.
Neural Generalized Predictive Control of a non-linear Process

DEFF Research Database (Denmark)

Sørensen, Paul Haase; Nørgård, Peter Magnus; Ravn, Ole

1998-01-01

The use of neural network in non-linear control is made difficult by the fact the stability and robustness is not guaranteed and that the implementation in real time is non-trivial. In this paper we introduce a predictive controller based on a neural network model which has promising stability qu...... detail and discuss the implementation difficulties. The neural generalized predictive controller is tested on a pneumatic servo sys-tem.......The use of neural network in non-linear control is made difficult by the fact the stability and robustness is not guaranteed and that the implementation in real time is non-trivial. In this paper we introduce a predictive controller based on a neural network model which has promising stability...... qualities. The controller is a non-linear version of the well-known generalized predictive controller developed in linear control theory. It involves minimization of a cost function which in the present case has to be done numerically. Therefore, we develop the numerical algorithms necessary in substantial...
Implementation of neural network based non-linear predictive

DEFF Research Database (Denmark)

Sørensen, Paul Haase; Nørgård, Peter Magnus; Ravn, Ole

1998-01-01

The paper describes a control method for non-linear systems based on generalized predictive control. Generalized predictive control (GPC) was developed to control linear systems including open loop unstable and non-minimum phase systems, but has also been proposed extended for the control of non......-linear systems. GPC is model-based and in this paper we propose the use of a neural network for the modeling of the system. Based on the neural network model a controller with extended control horizon is developed and the implementation issues are discussed, with particular emphasis on an efficient Quasi......-Newton optimization algorithm. The performance is demonstrated on a pneumatic servo system....
Patterns of Post-Stroke Brain Damage that Predict Speech Production Errors in Apraxia of Speech and Aphasia Dissociate

Science.gov (United States)

Basilakos, Alexandra; Rorden, Chris; Bonilha, Leonardo; Moser, Dana; Fridriksson, Julius

2015-01-01

Background and Purpose Acquired apraxia of speech (AOS) is a motor speech disorder caused by brain damage. AOS often co-occurs with aphasia, a language disorder in which patients may also demonstrate speech production errors. The overlap of speech production deficits in both disorders has raised questions regarding if AOS emerges from a unique pattern of brain damage or as a sub-element of the aphasic syndrome. The purpose of this study was to determine whether speech production errors in AOS and aphasia are associated with distinctive patterns of brain injury. Methods Forty-three patients with history of a single left-hemisphere stroke underwent comprehensive speech and language testing. The Apraxia of Speech Rating Scale was used to rate speech errors specific to AOS versus speech errors that can also be associated with AOS and/or aphasia. Localized brain damage was identified using structural MRI, and voxel-based lesion-impairment mapping was used to evaluate the relationship between speech errors specific to AOS, those that can occur in AOS and/or aphasia, and brain damage. Results The pattern of brain damage associated with AOS was most strongly associated with damage to cortical motor regions, with additional involvement of somatosensory areas. Speech production deficits that could be attributed to AOS and/or aphasia were associated with damage to the temporal lobe and the inferior pre-central frontal regions. Conclusion AOS likely occurs in conjunction with aphasia due to the proximity of the brain areas supporting speech and language, but the neurobiological substrate for each disorder differs. PMID:25908457
Non-linear aeroelastic prediction for aircraft applications

Science.gov (United States)

de C. Henshaw, M. J.; Badcock, K. J.; Vio, G. A.; Allen, C. B.; Chamberlain, J.; Kaynes, I.; Dimitriadis, G.; Cooper, J. E.; Woodgate, M. A.; Rampurawala, A. M.; Jones, D.; Fenwick, C.; Gaitonde, A. L.; Taylor, N. V.; Amor, D. S.; Eccles, T. A.; Denley, C. J.

2007-05-01

Current industrial practice for the prediction and analysis of flutter relies heavily on linear methods and this has led to overly conservative design and envelope restrictions for aircraft. Although the methods have served the industry well, it is clear that for a number of reasons the inclusion of non-linearity in the mathematical and computational aeroelastic prediction tools is highly desirable. The increase in available and affordable computational resources, together with major advances in algorithms, mean that non-linear aeroelastic tools are now viable within the aircraft design and qualification environment. The Partnership for Unsteady Methods in Aerodynamics (PUMA) Defence and Aerospace Research Partnership (DARP) was sponsored in 2002 to conduct research into non-linear aeroelastic prediction methods and an academic, industry, and government consortium collaborated to address the following objectives: To develop useable methodologies to model and predict non-linear aeroelastic behaviour of complete aircraft. To evaluate the methodologies on real aircraft problems. To investigate the effect of non-linearities on aeroelastic behaviour and to determine which have the greatest effect on the flutter qualification process. These aims have been very effectively met during the course of the programme and the research outputs include: New methods available to industry for use in the flutter prediction process, together with the appropriate coaching of industry engineers. Interesting results in both linear and non-linear aeroelastics, with comprehensive comparison of methods and approaches for challenging problems. Additional embryonic techniques that, with further research, will further improve aeroelastics capability. This paper describes the methods that have been developed and how they are deployable within the industrial environment. We present a thorough review of the PUMA aeroelastics programme together with a comprehensive review of the relevant research
Speech Entrainment Compensates for Broca's Area Damage

Science.gov (United States)

Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris

2015-01-01

Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to speech entrainment. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during speech entrainment versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of speech entrainment to improve speech production and may help select patients for speech entrainment treatment. PMID:25989443

A tale of two hands: Children's early gesture use in narrative production predicts later narrative structure in speech

Science.gov (United States)

Demir, Özlem Ece; Levine, Susan C.; Goldin-Meadow, Susan

2014-01-01

Speakers of all ages spontaneously gesture as they talk. These gestures predict children's milestones in vocabulary and sentence structure. We ask whether gesture serves a similar role in the development of narrative skill. Children were asked to retell a story conveyed in a wordless cartoon at age 5 and then again at 6, 7, and 8. Children's narrative structure in speech improved across these ages. At age 5, many of the children expressed a character's viewpoint in gesture, and these children were more likely to tell better-structured stories at the later ages than children who did not produce character-viewpoint gestures at age 5. In contrast, framing narratives from a character's perspective in speech at age 5 did not predict later narrative structure in speech. Gesture thus continues to act as a harbinger of change even as it assumes new roles in relation to discourse. PMID:25088361
Comparing auditory filter bandwidths, spectral ripple modulation detection, spectral ripple discrimination, and speech recognition: Normal and impaired hearing.

Science.gov (United States)

Davies-Venn, Evelyn; Nelson, Peggy; Souza, Pamela

2015-07-01

Some listeners with hearing loss show poor speech recognition scores in spite of using amplification that optimizes audibility. Beyond audibility, studies have suggested that suprathreshold abilities such as spectral and temporal processing may explain differences in amplified speech recognition scores. A variety of different methods has been used to measure spectral processing. However, the relationship between spectral processing and speech recognition is still inconclusive. This study evaluated the relationship between spectral processing and speech recognition in listeners with normal hearing and with hearing loss. Narrowband spectral resolution was assessed using auditory filter bandwidths estimated from simultaneous notched-noise masking. Broadband spectral processing was measured using the spectral ripple discrimination (SRD) task and the spectral ripple depth detection (SMD) task. Three different measures were used to assess unamplified and amplified speech recognition in quiet and noise. Stepwise multiple linear regression revealed that SMD at 2.0 cycles per octave (cpo) significantly predicted speech scores for amplified and unamplified speech in quiet and noise. Commonality analyses revealed that SMD at 2.0 cpo combined with SRD and equivalent rectangular bandwidth measures to explain most of the variance captured by the regression model. Results suggest that SMD and SRD may be promising clinical tools for diagnostic evaluation and predicting amplification outcomes.
Comparing auditory filter bandwidths, spectral ripple modulation detection, spectral ripple discrimination, and speech recognition: Normal and impaired hearinga)

Science.gov (United States)

Davies-Venn, Evelyn; Nelson, Peggy; Souza, Pamela

2015-01-01

Some listeners with hearing loss show poor speech recognition scores in spite of using amplification that optimizes audibility. Beyond audibility, studies have suggested that suprathreshold abilities such as spectral and temporal processing may explain differences in amplified speech recognition scores. A variety of different methods has been used to measure spectral processing. However, the relationship between spectral processing and speech recognition is still inconclusive. This study evaluated the relationship between spectral processing and speech recognition in listeners with normal hearing and with hearing loss. Narrowband spectral resolution was assessed using auditory filter bandwidths estimated from simultaneous notched-noise masking. Broadband spectral processing was measured using the spectral ripple discrimination (SRD) task and the spectral ripple depth detection (SMD) task. Three different measures were used to assess unamplified and amplified speech recognition in quiet and noise. Stepwise multiple linear regression revealed that SMD at 2.0 cycles per octave (cpo) significantly predicted speech scores for amplified and unamplified speech in quiet and noise. Commonality analyses revealed that SMD at 2.0 cpo combined with SRD and equivalent rectangular bandwidth measures to explain most of the variance captured by the regression model. Results suggest that SMD and SRD may be promising clinical tools for diagnostic evaluation and predicting amplification outcomes. PMID:26233047
Combined effects of form- and meaning-based predictability on perceived clarity of speech.

Science.gov (United States)

Signoret, Carine; Johnsrude, Ingrid; Classon, Elisabet; Rudner, Mary

2018-02-01

The perceptual clarity of speech is influenced by more than just the acoustic quality of the sound; it also depends on contextual support. For example, a degraded sentence is perceived to be clearer when the content of the speech signal is provided with matching text (i.e., form-based predictability) before hearing the degraded sentence. Here, we investigate whether sentence-level semantic coherence (i.e., meaning-based predictability), enhances perceptual clarity of degraded sentences, and if so, whether the mechanism is the same as that underlying enhancement by matching text. We also ask whether form- and meaning-based predictability are related to individual differences in cognitive abilities. Twenty participants listened to spoken sentences that were either clear or degraded by noise vocoding and rated the clarity of each item. The sentences had either high or low semantic coherence. Each spoken word was preceded by the homologous printed word (matching text), or by a meaningless letter string (nonmatching text). Cognitive abilities were measured with a working memory test. Results showed that perceptual clarity was significantly enhanced both by matching text and by semantic coherence. Importantly, high coherence enhanced the perceptual clarity of the degraded sentences even when they were preceded by matching text, suggesting that the effects of form- and meaning-based predictions on perceptual clarity are independent and additive. However, when working memory capacity indexed by the Size-Comparison Span Test was controlled for, only form-based predictions enhanced perceptual clarity, and then only at some sound quality levels, suggesting that prediction effects are to a certain extent dependent on cognitive abilities. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
How important is embeddedness in predicting Australian speech-language pathologists' intentions to leave their jobs and the profession?

Science.gov (United States)

Heritage, Brody; Quail, Michelle; Cocks, Naomi

2018-03-05

This study explored the predictors of the outcomes of turnover and occupation attrition intentions for speech-language pathologists. The researchers examined the mediating effects of job satisfaction and strain on the relationship between stress and the latter outcomes. Additionally, the researchers examined the importance of embeddedness in predicting turnover intentions after accounting for stress, strain and job satisfaction. An online questionnaire was used to explore turnover and attrition intentions in 293 Australian speech-language pathologists. Job satisfaction contributed to a significant indirect effect on the stress and turnover intention relationship, however strain did not. There was a significant direct effect between stress and turnover intention after accounting for covariates. Embeddedness and the perceived availability of alternative jobs were also found to be significant predictors of turnover intentions. The mediating model used to predict turnover intentions also predicted occupation attrition intentions. The effect of stress on occupation attrition intentions was indirect in nature, the direct effect negated by mediating variables. Qualitative data provided complementary evidence to the quantitative model. The findings indicate that the proposed parsimonious model adequately captures predictors of speech-language pathologists' turnover and occupation attrition intentions. Workplaces and the profession may wish to consider these retention factors.
Longitudinal development of communication in children with cerebral palsy between 24 and 53 months: Predicting speech outcomes.

Science.gov (United States)

Hustad, Katherine C; Allison, Kristen M; Sakash, Ashley; McFadd, Emily; Broman, Aimee Teo; Rathouz, Paul J

2017-08-01

To determine whether communication at 2 years predicted communication at 4 years in children with cerebral palsy (CP); and whether the age a child first produces words imitatively predicts change in speech production. 30 children (15 males) with CP participated and were seen 5 times at 6-month intervals between 24 and 53 months (mean age at time 1 = 26.9 months (SD 1.9)). Variables were communication classification at 24 and 53 months, age that children were first able to produce words imitatively, single-word intelligibility, and longest utterance produced. Communication at 24 months was highly predictive of abilities at 53 months. Speaking earlier led to faster gains in intelligibility and length of utterance and better outcomes at 53 months than speaking later. Inability to speak at 24 months indicates greater speech and language difficulty at 53 months and a strong need for early communication intervention.
Robust signal selection for lineair prediction analysis of voiced speech

NARCIS (Netherlands)

Ma, C.; Kamp, Y.; Willems, L.F.

1993-01-01

This paper investigates a weighted LPC analysis of voiced speech. In view of the speech production model, the weighting function is either chosen to be the short-time energy function of the preemphasized speech sample sequence with certain delays or is obtained by thresholding the short-time energy
Integrated Phoneme Subspace Method for Speech Feature Extraction

Directory of Open Access Journals (Sweden)

Park Hyunsin

2009-01-01

Full Text Available Speech feature extraction has been a key focus in robust speech recognition research. In this work, we discuss data-driven linear feature transformations applied to feature vectors in the logarithmic mel-frequency filter bank domain. Transformations are based on principal component analysis (PCA, independent component analysis (ICA, and linear discriminant analysis (LDA. Furthermore, this paper introduces a new feature extraction technique that collects the correlation information among phoneme subspaces and reconstructs feature space for representing phonemic information efficiently. The proposed speech feature vector is generated by projecting an observed vector onto an integrated phoneme subspace (IPS based on PCA or ICA. The performance of the new feature was evaluated for isolated word speech recognition. The proposed method provided higher recognition accuracy than conventional methods in clean and reverberant environments.
Understanding speech when wearing communication headsets and hearing protectors with subband processing.

Science.gov (United States)

Brammer, Anthony J; Yu, Gongqiang; Bernstein, Eric R; Cherniack, Martin G; Peterson, Donald R; Tufts, Jennifer B

2014-08-01

An adaptive, delayless, subband feed-forward control structure is employed to improve the speech signal-to-noise ratio (SNR) in the communication channel of a circumaural headset/hearing protector (HPD) from 90 Hz to 11.3 kHz, and to provide active noise control (ANC) from 50 to 800 Hz to complement the passive attenuation of the HPD. The task involves optimizing the speech SNR for each communication channel subband, subject to limiting the maximum sound level at the ear, maintaining a speech SNR preferred by users, and reducing large inter-band gain differences to improve speech quality. The performance of a proof-of-concept device has been evaluated in a pseudo-diffuse sound field when worn by human subjects under conditions of environmental noise and speech that do not pose a risk to hearing, and by simulation for other conditions. For the environmental noises employed in this study, subband speech SNR control combined with subband ANC produced greater improvement in word scores than subband ANC alone, and improved the consistency of word scores across subjects. The simulation employed a subject-specific linear model, and predicted that word scores are maintained in excess of 90% for sound levels outside the HPD of up to ∼115 dBA.
Strength Is in Numbers: Can Concordant Artificial Listeners Improve Prediction of Emotion from Speech?

Science.gov (United States)

Martinelli, Eugenio; Mencattini, Arianna; Daprati, Elena; Di Natale, Corrado

2016-01-01

Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present 'intelligent personal assistants', and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants' emotional state, selective/differential data collection based on emotional content, etc.).
Implementation of neural network based non-linear predictive control

DEFF Research Database (Denmark)

Sørensen, Paul Haase; Nørgård, Peter Magnus; Ravn, Ole

1999-01-01

This paper describes a control method for non-linear systems based on generalized predictive control. Generalized predictive control (GPC) was developed to control linear systems, including open-loop unstable and non-minimum phase systems, but has also been proposed to be extended for the control...... of non-linear systems. GPC is model based and in this paper we propose the use of a neural network for the modeling of the system. Based on the neural network model, a controller with extended control horizon is developed and the implementation issues are discussed, with particular emphasis...... on an efficient quasi-Newton algorithm. The performance is demonstrated on a pneumatic servo system....
Linear regression crash prediction models : issues and proposed solutions.

Science.gov (United States)

2010-05-01

The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
Natural speech algorithm applied to baseline interview data can predict which patients will respond to psilocybin for treatment-resistant depression.

Science.gov (United States)

Carrillo, Facundo; Sigman, Mariano; Fernández Slezak, Diego; Ashton, Philip; Fitzgerald, Lily; Stroud, Jack; Nutt, David J; Carhart-Harris, Robin L

2018-04-01

Natural speech analytics has seen some improvements over recent years, and this has opened a window for objective and quantitative diagnosis in psychiatry. Here, we used a machine learning algorithm applied to natural speech to ask whether language properties measured before psilocybin for treatment-resistant can predict for which patients it will be effective and for which it will not. A baseline autobiographical memory interview was conducted and transcribed. Patients with treatment-resistant depression received 2 doses of psilocybin, 10 mg and 25 mg, 7 days apart. Psychological support was provided before, during and after all dosing sessions. Quantitative speech measures were applied to the interview data from 17 patients and 18 untreated age-matched healthy control subjects. A machine learning algorithm was used to classify between controls and patients and predict treatment response. Speech analytics and machine learning successfully differentiated depressed patients from healthy controls and identified treatment responders from non-responders with a significant level of 85% of accuracy (75% precision). Automatic natural language analysis was used to predict effective response to treatment with psilocybin, suggesting that these tools offer a highly cost-effective facility for screening individuals for treatment suitability and sensitivity. The sample size was small and replication is required to strengthen inferences on these results. Copyright © 2018 Elsevier B.V. All rights reserved.
Novel Techniques for Dialectal Arabic Speech Recognition

CERN Document Server

Elmahdy, Mohamed; Minker, Wolfgang

2012-01-01

Novel Techniques for Dialectal Arabic Speech describes approaches to improve automatic speech recognition for dialectal Arabic. Since speech resources for dialectal Arabic speech recognition are very sparse, the authors describe how existing Modern Standard Arabic (MSA) speech data can be applied to dialectal Arabic speech recognition, while assuming that MSA is always a second language for all Arabic speakers. In this book, Egyptian Colloquial Arabic (ECA) has been chosen as a typical Arabic dialect. ECA is the first ranked Arabic dialect in terms of number of speakers, and a high quality ECA speech corpus with accurate phonetic transcription has been collected. MSA acoustic models were trained using news broadcast speech. In order to cross-lingually use MSA in dialectal Arabic speech recognition, the authors have normalized the phoneme sets for MSA and ECA. After this normalization, they have applied state-of-the-art acoustic model adaptation techniques like Maximum Likelihood Linear Regression (MLLR) and M...
Classifying laughter and speech using audio-visual feature prediction

NARCIS (Netherlands)

Petridis, Stavros; Asghar, Ali; Pantic, Maja

2010-01-01

In this study, a system that discriminates laughter from speech by modelling the relationship between audio and visual features is presented. The underlying assumption is that this relationship is different between speech and laughter. Neural networks are trained which learn the audio-to-visual and
Improved part-of-speech prediction in suffix analysis.

Directory of Open Access Journals (Sweden)

Mario Fruzangohar

Full Text Available MOTIVATION: Predicting the part of speech (POS tag of an unknown word in a sentence is a significant challenge. This is particularly difficult in biomedicine, where POS tags serve as an input to training sophisticated literature summarization techniques, such as those based on Hidden Markov Models (HMM. Different approaches have been taken to deal with the POS tagger challenge, but with one exception--the TnT POS tagger--previous publications on POS tagging have omitted details of the suffix analysis used for handling unknown words. The suffix of an English word is a strong predictor of a POS tag for that word. As a pre-requisite for an accurate HMM POS tagger for biomedical publications, we present an efficient suffix prediction method for integration into a POS tagger. RESULTS: We have implemented a fully functional HMM POS tagger using experimentally optimised suffix based prediction. Our simple suffix analysis method, significantly outperformed the probability interpolation based TnT method. We have also shown how important suffix analysis can be for probability estimation of a known word (in the training corpus with an unseen POS tag; a common scenario with a small training corpus. We then integrated this simple method in our POS tagger and determined an optimised parameter set for both methods, which can help developers to optimise their current algorithm, based on our results. We also introduce the concept of counting methods in maximum likelihood estimation for the first time and show how counting methods can affect the prediction result. Finally, we describe how machine-learning techniques were applied to identify words, for which prediction of POS tags were always incorrect and propose a method to handle words of this type. AVAILABILITY AND IMPLEMENTATION: Java source code, binaries and setup instructions are freely available at http://genomes.sapac.edu.au/text_mining/pos_tagger.zip.
Prediction and Optimization of Speech Intelligibility in Adverse Conditions

NARCIS (Netherlands)

Taal, C.H.

2013-01-01

In digital speech-communication systems like mobile phones, public address systems and hearing aids, conveying the message is one of the most important goals. This can be challenging since the intelligibility of the speech may be harmed at various stages before, during and after the transmission
Assessment and prediction of speech quality in telecommunications

CERN Document Server

Möller, Sebastian

2000-01-01

The quality of a telecommunication voice service is largely inftuenced by the quality of the transmission system. Nevertheless, the analysis, synthesis and prediction of quality should take into account its multidimensional aspects. Quality can be regarded as a point where the perceived characteristics and the desired or expected ones meet. A schematic is presented which classifies different entities which contribute to the quality of a service, taking into account conversational, user as weIl as service related contributions. Starting from this concept, perceptively relevant constituents of speech communication quality are identified. The perceptive factors result from ele ments of the transmission configuration. A simulation model is developed and implemented which allows the most relevant parameters of traditional trans mission configurations to be manipulated, in real time and for the conversation situation. Inputs into the simulation are instrumentally measurable quality elements commonly used in tra...
Semifragile Speech Watermarking Based on Least Significant Bit Replacement of Line Spectral Frequencies

Directory of Open Access Journals (Sweden)

Mohammad Ali Nematollahi

2017-01-01

Full Text Available There are various techniques for speech watermarking based on modifying the linear prediction coefficients (LPCs; however, the estimated and modified LPCs vary from each other even without attacks. Because line spectral frequency (LSF has less sensitivity to watermarking than LPC, watermark bits are embedded into the maximum number of LSFs by applying the least significant bit replacement (LSBR method. To reduce the differences between estimated and modified LPCs, a checking loop is added to minimize the watermark extraction error. Experimental results show that the proposed semifragile speech watermarking method can provide high imperceptibility and that any manipulation of the watermark signal destroys the watermark bits since manipulation changes it to a random stream of bits.
Strength Is in Numbers: Can Concordant Artificial Listeners Improve Prediction of Emotion from Speech?

Directory of Open Access Journals (Sweden)

Eugenio Martinelli

Full Text Available Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present 'intelligent personal assistants', and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc. and for applications in the research domain (such as real-time pairing of stimuli to participants' emotional state, selective/differential data collection based on emotional content, etc..

Speech-Language Dissociations, Distractibility, and Childhood Stuttering

Science.gov (United States)

Conture, Edward G.; Walden, Tedra A.; Lambert, Warren E.

2015-01-01

Purpose This study investigated the relation among speech-language dissociations, attentional distractibility, and childhood stuttering. Method Participants were 82 preschool-age children who stutter (CWS) and 120 who do not stutter (CWNS). Correlation-based statistics (Bates, Appelbaum, Salcedo, Saygin, & Pizzamiglio, 2003) identified dissociations across 5 norm-based speech-language subtests. The Behavioral Style Questionnaire Distractibility subscale measured attentional distractibility. Analyses addressed (a) between-groups differences in the number of children exhibiting speech-language dissociations; (b) between-groups distractibility differences; (c) the relation between distractibility and speech-language dissociations; and (d) whether interactions between distractibility and dissociations predicted the frequency of total, stuttered, and nonstuttered disfluencies. Results More preschool-age CWS exhibited speech-language dissociations compared with CWNS, and more boys exhibited dissociations compared with girls. In addition, male CWS were less distractible than female CWS and female CWNS. For CWS, but not CWNS, less distractibility (i.e., greater attention) was associated with more speech-language dissociations. Last, interactions between distractibility and dissociations did not predict speech disfluencies in CWS or CWNS. Conclusions The present findings suggest that for preschool-age CWS, attentional processes are associated with speech-language dissociations. Future investigations are warranted to better understand the directionality of effect of this association (e.g., inefficient attentional processes → speech-language dissociations vs. inefficient attentional processes ← speech-language dissociations). PMID:26126203
Machine learning-based methods for prediction of linear B-cell epitopes.

Science.gov (United States)

Wang, Hsin-Wei; Pai, Tun-Wen

2014-01-01

B-cell epitope prediction facilitates immunologists in designing peptide-based vaccine, diagnostic test, disease prevention, treatment, and antibody production. In comparison with T-cell epitope prediction, the performance of variable length B-cell epitope prediction is still yet to be satisfied. Fortunately, due to increasingly available verified epitope databases, bioinformaticians could adopt machine learning-based algorithms on all curated data to design an improved prediction tool for biomedical researchers. Here, we have reviewed related epitope prediction papers, especially those for linear B-cell epitope prediction. It should be noticed that a combination of selected propensity scales and statistics of epitope residues with machine learning-based tools formulated a general way for constructing linear B-cell epitope prediction systems. It is also observed from most of the comparison results that the kernel method of support vector machine (SVM) classifier outperformed other machine learning-based approaches. Hence, in this chapter, except reviewing recently published papers, we have introduced the fundamentals of B-cell epitope and SVM techniques. In addition, an example of linear B-cell prediction system based on physicochemical features and amino acid combinations is illustrated in details.
Validation of Individual Non-Linear Predictive Pharmacokinetic ...

African Journals Online (AJOL)

3Department of Veterinary Medicine, Faculty of Agriculture, University of Novi Sad, Novi Sad, Republic of Serbia ... Purpose: To evaluate the predictive performance of phenytoin multiple dosing non-linear pharmacokinetic ... status epilepticus affects an estimated 152,000 ..... causal factors, i.e., infection, inflammation, tissue.
Implementation of a Tour Guide Robot System Using RFID Technology and Viterbi Algorithm-Based HMM for Speech Recognition

Directory of Open Access Journals (Sweden)

Neng-Sheng Pai

2014-01-01

Full Text Available This paper applied speech recognition and RFID technologies to develop an omni-directional mobile robot into a robot with voice control and guide introduction functions. For speech recognition, the speech signals were captured by short-time processing. The speaker first recorded the isolated words for the robot to create speech database of specific speakers. After the speech pre-processing of this speech database, the feature parameters of cepstrum and delta-cepstrum were obtained using linear predictive coefficient (LPC. Then, the Hidden Markov Model (HMM was used for model training of the speech database, and the Viterbi algorithm was used to find an optimal state sequence as the reference sample for speech recognition. The trained reference model was put into the industrial computer on the robot platform, and the user entered the isolated words to be tested. After processing by the same reference model and comparing with previous reference model, the path of the maximum total probability in various models found using the Viterbi algorithm in the recognition was the recognition result. Finally, the speech recognition and RFID systems were achieved in an actual environment to prove its feasibility and stability, and implemented into the omni-directional mobile robot.
Predicting Speech Intelligibility Decline in Amyotrophic Lateral Sclerosis Based on the Deterioration of Individual Speech Subsystems

Science.gov (United States)

Yunusova, Yana; Wang, Jun; Zinman, Lorne; Pattee, Gary L.; Berry, James D.; Perry, Bridget; Green, Jordan R.

2016-01-01

Purpose To determine the mechanisms of speech intelligibility impairment due to neurologic impairments, intelligibility decline was modeled as a function of co-occurring changes in the articulatory, resonatory, phonatory, and respiratory subsystems. Method Sixty-six individuals diagnosed with amyotrophic lateral sclerosis (ALS) were studied longitudinally. The disease-related changes in articulatory, resonatory, phonatory, and respiratory subsystems were quantified using multiple instrumental measures, which were subjected to a principal component analysis and mixed effects models to derive a set of speech subsystem predictors. A stepwise approach was used to select the best set of subsystem predictors to model the overall decline in intelligibility. Results Intelligibility was modeled as a function of five predictors that corresponded to velocities of lip and jaw movements (articulatory), number of syllable repetitions in the alternating motion rate task (articulatory), nasal airflow (resonatory), maximum fundamental frequency (phonatory), and speech pauses (respiratory). The model accounted for 95.6% of the variance in intelligibility, among which the articulatory predictors showed the most substantial independent contribution (57.7%). Conclusion Articulatory impairments characterized by reduced velocities of lip and jaw movements and resonatory impairments characterized by increased nasal airflow served as the subsystem predictors of the longitudinal decline of speech intelligibility in ALS. Declines in maximum performance tasks such as the alternating motion rate preceded declines in intelligibility, thus serving as early predictors of bulbar dysfunction. Following the rapid decline in speech intelligibility, a precipitous decline in maximum performance tasks subsequently occurred. PMID:27148967
Robust Recognition of Loud and Lombard speech in the Fighter Cockpit Environment

Science.gov (United States)

1988-08-01

the latter as inter-speaker variability. According to Zue [Z85j, inter-speaker variabilities can be attributed to sociolinguistic background, dialect...34 Journal of the Acoustical Society of America , Vol 50, 1971. [At74I B. S. Atal, "Linear prediction for speaker identification," Journal of the Acoustical...Society of America , Vol 55, 1974. [B771 B. Beek, E. P. Neuberg, and D. C. Hodge, "An Assessment of the Technology of Automatic Speech Recognition for
Predictive Brain Mechanisms in Sound-to-Meaning Mapping during Speech Processing.

Science.gov (United States)

Lyu, Bingjiang; Ge, Jianqiao; Niu, Zhendong; Tan, Li Hai; Gao, Jia-Hong

2016-10-19

Spoken language comprehension relies not only on the identification of individual words, but also on the expectations arising from contextual information. A distributed frontotemporal network is known to facilitate the mapping of speech sounds onto their corresponding meanings. However, how prior expectations influence this efficient mapping at the neuroanatomical level, especially in terms of individual words, remains unclear. Using fMRI, we addressed this question in the framework of the dual-stream model by scanning native speakers of Mandarin Chinese, a language highly dependent on context. We found that, within the ventral pathway, the violated expectations elicited stronger activations in the left anterior superior temporal gyrus and the ventral inferior frontal gyrus (IFG) for the phonological-semantic prediction of spoken words. Functional connectivity analysis showed that expectations were mediated by both top-down modulation from the left ventral IFG to the anterior temporal regions and enhanced cross-stream integration through strengthened connections between different subregions of the left IFG. By further investigating the dynamic causality within the dual-stream model, we elucidated how the human brain accomplishes sound-to-meaning mapping for words in a predictive manner. In daily communication via spoken language, one of the core processes is understanding the words being used. Effortless and efficient information exchange via speech relies not only on the identification of individual spoken words, but also on the contextual information giving rise to expected meanings. Despite the accumulating evidence for the bottom-up perception of auditory input, it is still not fully understood how the top-down modulation is achieved in the extensive frontotemporal cortical network. Here, we provide a comprehensive description of the neural substrates underlying sound-to-meaning mapping and demonstrate how the dual-stream model functions in the modulation of
Genomic prediction based on data from three layer lines: a comparison between linear methods

NARCIS (Netherlands)

Calus, M.P.L.; Huang, H.; Vereijken, J.; Visscher, J.; Napel, ten J.; Windig, J.J.

2014-01-01

Background The prediction accuracy of several linear genomic prediction models, which have previously been used for within-line genomic prediction, was evaluated for multi-line genomic prediction. Methods Compared to a conventional BLUP (best linear unbiased prediction) model using pedigree data, we
Genomic prediction based on data from three layer lines using non-linear regression models.

Science.gov (United States)

Huang, Heyun; Windig, Jack J; Vereijken, Addie; Calus, Mario P L

2014-11-06

Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values. When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction. Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional
Speech and Speech-Related Quality of Life After Late Palate Repair: A Patient's Perspective.

Science.gov (United States)

Schönmeyr, Björn; Wendby, Lisa; Sharma, Mitali; Jacobson, Lia; Restrepo, Carolina; Campbell, Alex

2015-07-01

Many patients with cleft palate deformities worldwide receive treatment at a later age than is recommended for normal speech to develop. The outcomes after late palate repairs in terms of speech and quality of life (QOL) still remain largely unstudied. In the current study, questionnaires were used to assess the patients' perception of speech and QOL before and after primary palate repair. All of the patients were operated at a cleft center in northeast India and had a cleft palate with a normal lip or with a cleft lip that had been previously repaired. A total of 134 patients (7-35 years) were interviewed preoperatively and 46 patients (7-32 years) were assessed in the postoperative survey. The survey showed that scores based on the speech handicap index, concerning speech and speech-related QOL, did not improve postoperatively. In fact, the questionnaires indicated that the speech became more unpredictable (P reported that their self-confidence had improved after the operation. Thus, the majority of interviewed patients who underwent late primary palate repair were satisfied with the surgery. At the same time, speech and speech-related QOL did not improve according to the speech handicap index-based survey. Speech predictability may even become worse and nasal regurgitation may increase after late palate repair, according to these results.
Biochemical methane potential prediction of plant biomasses: Comparing chemical composition versus near infrared methods and linear versus non-linear models.

Science.gov (United States)

Godin, Bruno; Mayer, Frédéric; Agneessens, Richard; Gerin, Patrick; Dardenne, Pierre; Delfosse, Philippe; Delcarte, Jérôme

2015-01-01

The reliability of different models to predict the biochemical methane potential (BMP) of various plant biomasses using a multispecies dataset was compared. The most reliable prediction models of the BMP were those based on the near infrared (NIR) spectrum compared to those based on the chemical composition. The NIR predictions of local (specific regression and non-linear) models were able to estimate quantitatively, rapidly, cheaply and easily the BMP. Such a model could be further used for biomethanation plant management and optimization. The predictions of non-linear models were more reliable compared to those of linear models. The presentation form (green-dried, silage-dried and silage-wet form) of biomasses to the NIR spectrometer did not influence the performances of the NIR prediction models. The accuracy of the BMP method should be improved to enhance further the BMP prediction models. Copyright © 2014 Elsevier Ltd. All rights reserved.
Functional connectivity between face-movement and speech-intelligibility areas during auditory-only speech perception.

Science.gov (United States)

Schall, Sonja; von Kriegstein, Katharina

2014-01-01

It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.
78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

Science.gov (United States)

2013-08-15

...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...
Predicting Intelligibility Gains in Dysarthria through Automated Speech Feature Analysis

Science.gov (United States)

Fletcher, Annalise R.; Wisler, Alan A.; McAuliffe, Megan J.; Lansford, Kaitlin L.; Liss, Julie M.

2017-01-01

Purpose: Behavioral speech modifications have variable effects on the intelligibility of speakers with dysarthria. In the companion article, a significant relationship was found between measures of speakers' baseline speech and their intelligibility gains following cues to speak louder and reduce rate (Fletcher, McAuliffe, Lansford, Sinex, &…
Successful and rapid response of speech bulb reduction program combined with speech therapy in velopharyngeal dysfunction: a case report.

Science.gov (United States)

Shin, Yu-Jeong; Ko, Seung-O

2015-12-01

Velopharyngeal dysfunction in cleft palate patients following the primary palate repair may result in nasal air emission, hypernasality, articulation disorder and poor intelligibility of speech. Among conservative treatment methods, speech aid prosthesis combined with speech therapy is widely used method. However because of its long time of treatment more than a year and low predictability, some clinicians prefer a surgical intervention. Thus, the purpose of this report was to increase an attention on the effectiveness of speech aid prosthesis by introducing a case that was successfully treated. In this clinical report, speech bulb reduction program with intensive speech therapy was applied for a patient with velopharyngeal dysfunction and it was rapidly treated by 5months which was unusually short period for speech aid therapy. Furthermore, advantages of pre-operative speech aid therapy were discussed.
Corollary discharge provides the sensory content of inner speech.

Science.gov (United States)

Scott, Mark

2013-09-01

Inner speech is one of the most common, but least investigated, mental activities humans perform. It is an internal copy of one's external voice and so is similar to a well-established component of motor control: corollary discharge. Corollary discharge is a prediction of the sound of one's voice generated by the motor system. This prediction is normally used to filter self-caused sounds from perception, which segregates them from externally caused sounds and prevents the sensory confusion that would otherwise result. The similarity between inner speech and corollary discharge motivates the theory, tested here, that corollary discharge provides the sensory content of inner speech. The results reported here show that inner speech attenuates the impact of external sounds. This attenuation was measured using a context effect (an influence of contextual speech sounds on the perception of subsequent speech sounds), which weakens in the presence of speech imagery that matches the context sound. Results from a control experiment demonstrated this weakening in external speech as well. Such sensory attenuation is a hallmark of corollary discharge.
Large-scale linear programs in planning and prediction.

Science.gov (United States)

2017-06-01

Large-scale linear programs are at the core of many traffic-related optimization problems in both planning and prediction. Moreover, many of these involve significant uncertainty, and hence are modeled using either chance constraints, or robust optim...
Speech Intelligibility Evaluation for Mobile Phones

DEFF Research Database (Denmark)

Jørgensen, Søren; Cubick, Jens; Dau, Torsten

2015-01-01

In the development process of modern telecommunication systems, such as mobile phones, it is common practice to use computer models to objectively evaluate the transmission quality of the system, instead of time-consuming perceptual listening tests. Such models have typically focused on the quality...... of the transmitted speech, while little or no attention has been provided to speech intelligibility. The present study investigated to what extent three state-of-the art speech intelligibility models could predict the intelligibility of noisy speech transmitted through mobile phones. Sentences from the Danish...... Dantale II speech material were mixed with three different kinds of background noise, transmitted through three different mobile phones, and recorded at the receiver via a local network simulator. The speech intelligibility of the transmitted sentences was assessed by six normal-hearing listeners...
Effects of cognitive impairment on prosodic parameters of speech production planning in multiple sclerosis.

Science.gov (United States)

De Looze, Céline; Moreau, Noémie; Renié, Laurent; Kelly, Finnian; Ghio, Alain; Rico, Audrey; Audoin, Bertrand; Viallet, François; Pelletier, Jean; Petrone, Caterina

2017-05-24

Cognitive impairment (CI) affects 40-65% of patients with multiple sclerosis (MS). CI can have a negative impact on a patient's everyday activities, such as engaging in conversations. Speech production planning ability is crucial for successful verbal interactions and thus for preserving social and occupational skills. This study investigates the effect of cognitive-linguistic demand and CI on speech production planning in MS, as reflected in speech prosody. A secondary aim is to explore the clinical potential of prosodic features for the prediction of an individual's cognitive status in MS. A total of 45 subjects, that is 22 healthy controls (HC) and 23 patients in early stages of relapsing-remitting MS, underwent neuropsychological tests probing specific cognitive processes involved in speech production planning. All subjects also performed a read speech task, in which they had to read isolated sentences manipulated as for phonological length. Results show that the speech of MS patients with CI is mainly affected at the temporal level (articulation and speech rate, pause duration). Regression analyses further indicate that rate measures are correlated with working memory scores. In addition, linear discriminant analysis shows the ROC AUC of identifying MS patients with CI is 0.70 (95% confidence interval: 0.68-0.73). Our findings indicate that prosodic planning is deficient in patients with MS-CI and that the scope of planning depends on patients' cognitive abilities. We discuss how speech-based approaches could be used as an ecological method for the assessment and monitoring of CI in MS. © 2017 The British Psychological Society.
Relations Between the Intelligibility of Speech in Noise and Psychophysical Measures of Hearing Measured in Four Languages Using the Auditory Profile Test Battery

Directory of Open Access Journals (Sweden)

T. E. M. Van Esch

2015-12-01

Full Text Available The aim of the present study was to determine the relations between the intelligibility of speech in noise and measures of auditory resolution, loudness recruitment, and cognitive function. The analyses were based on data published earlier as part of the presentation of the Auditory Profile, a test battery implemented in four languages. Tests of the intelligibility of speech, resolution, loudness recruitment, and lexical decision making were measured using headphones in five centers: in Germany, the Netherlands, Sweden, and the United Kingdom. Correlations and stepwise linear regression models were calculated. In sum, 72 hearing-impaired listeners aged 22 to 91 years with a broad range of hearing losses were included in the study. Several significant correlations were found with the intelligibility of speech in noise. Stepwise linear regression analyses showed that pure-tone average, age, spectral and temporal resolution, and loudness recruitment were significant predictors of the intelligibility of speech in fluctuating noise. Complex interrelationships between auditory factors and the intelligibility of speech in noise were revealed using the Auditory Profile data set in four languages. After taking into account the effects of pure-tone average and age, spectral and temporal resolution and loudness recruitment had an added value in the prediction of variation among listeners with respect to the intelligibility of speech in noise. The results of the lexical decision making test were not related to the intelligibility of speech in noise, in the population studied.

Who Will Win?: Predicting the Presidential Election Using Linear Regression

Science.gov (United States)

Lamb, John H.

2007-01-01

This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…
Speech-to-Speech Relay Service

Science.gov (United States)

Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...
Non-fluent speech following stroke is caused by impaired efference copy.

Science.gov (United States)

Feenaughty, Lynda; Basilakos, Alexandra; Bonilha, Leonardo; den Ouden, Dirk-Bart; Rorden, Chris; Stark, Brielle; Fridriksson, Julius

2017-09-01

Efference copy is a cognitive mechanism argued to be critical for initiating and monitoring speech: however, the extent to which breakdown of efference copy mechanisms impact speech production is unclear. This study examined the best mechanistic predictors of non-fluent speech among 88 stroke survivors. Objective speech fluency measures were subjected to a principal component analysis (PCA). The primary PCA factor was then entered into a multiple stepwise linear regression analysis as the dependent variable, with a set of independent mechanistic variables. Participants' ability to mimic audio-visual speech ("speech entrainment response") was the best independent predictor of non-fluent speech. We suggest that this "speech entrainment" factor reflects integrity of internal monitoring (i.e., efference copy) of speech production, which affects speech initiation and maintenance. Results support models of normal speech production and suggest that therapy focused on speech initiation and maintenance may improve speech fluency for individuals with chronic non-fluent aphasia post stroke.
Linear and nonlinear dynamic systems in financial time series prediction

Directory of Open Access Journals (Sweden)

Salim Lahmiri

2012-10-01

Full Text Available Autoregressive moving average (ARMA process and dynamic neural networks namely the nonlinear autoregressive moving average with exogenous inputs (NARX are compared by evaluating their ability to predict financial time series; for instance the S&P500 returns. Two classes of ARMA are considered. The first one is the standard ARMA model which is a linear static system. The second one uses Kalman filter (KF to estimate and predict ARMA coefficients. This model is a linear dynamic system. The forecasting ability of each system is evaluated by means of mean absolute error (MAE and mean absolute deviation (MAD statistics. Simulation results indicate that the ARMA-KF system performs better than the standard ARMA alone. Thus, introducing dynamics into the ARMA process improves the forecasting accuracy. In addition, the ARMA-KF outperformed the NARX. This result may suggest that the linear component found in the S&P500 return series is more dominant than the nonlinear part. In sum, we conclude that introducing dynamics into the ARMA process provides an effective system for S&P500 time series prediction.
Modelling speech intelligibility in adverse conditions

DEFF Research Database (Denmark)

Jørgensen, Søren; Dau, Torsten

2013-01-01

Jørgensen and Dau (J Acoust Soc Am 130:1475-1487, 2011) proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII) in conditions with nonlinearly processed speech...... subjected to phase jitter, a condition in which the spectral structure of the intelligibility of speech signal is strongly affected, while the broadband temporal envelope is kept largely intact. In contrast, the effects of this distortion can be predicted -successfully by the spectro-temporal modulation...... suggest that the SNRenv might reflect a powerful decision metric, while some explicit across-frequency analysis seems crucial in some conditions. How such across-frequency analysis is "realized" in the auditory system remains unresolved....
Spectral integration in speech and non-speech sounds

Science.gov (United States)

Jacewicz, Ewa

2005-04-01

Spectral integration (or formant averaging) was proposed in vowel perception research to account for the observation that a reduction of the intensity of one of two closely spaced formants (as in /u/) produced a predictable shift in vowel quality [Delattre et al., Word 8, 195-210 (1952)]. A related observation was reported in psychoacoustics, indicating that when the components of a two-tone periodic complex differ in amplitude and frequency, its perceived pitch is shifted toward that of the more intense tone [Helmholtz, App. XIV (1875/1948)]. Subsequent research in both fields focused on the frequency interval that separates these two spectral components, in an attempt to determine the size of the bandwidth for spectral integration to occur. This talk will review the accumulated evidence for and against spectral integration within the hypothesized limit of 3.5 Bark for static and dynamic signals in speech perception and psychoacoustics. Based on similarities in the processing of speech and non-speech sounds, it is suggested that spectral integration may reflect a general property of the auditory system. A larger frequency bandwidth, possibly close to 3.5 Bark, may be utilized in integrating acoustic information, including speech, complex signals, or sound quality of a violin.
Convergence Guaranteed Nonlinear Constraint Model Predictive Control via I/O Linearization

Directory of Open Access Journals (Sweden)

Xiaobing Kong

2013-01-01

Full Text Available Constituting reliable optimal solution is a key issue for the nonlinear constrained model predictive control. Input-output feedback linearization is a popular method in nonlinear control. By using an input-output feedback linearizing controller, the original linear input constraints will change to nonlinear constraints and sometimes the constraints are state dependent. This paper presents an iterative quadratic program (IQP routine on the continuous-time system. To guarantee its convergence, another iterative approach is incorporated. The proposed algorithm can reach a feasible solution over the entire prediction horizon. Simulation results on both a numerical example and the continuous stirred tank reactors (CSTR demonstrate the effectiveness of the proposed method.
Visual speech alters the discrimination and identification of non-intact auditory speech in children with hearing loss.

Science.gov (United States)

Jerger, Susan; Damian, Markus F; McAlpine, Rachel P; Abdi, Hervé

2017-03-01

Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/-B/aa or/-B/az). The items started with an easy-to-speechread/B/or difficult-to-speechread/G/onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/-B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same-as opposed to different-responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g.,/-B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz-as opposed to az- responses in the audiovisual than auditory mode. Performance in the audiovisual mode showed more same
Visual Speech Alters the Discrimination and Identification of Non-Intact Auditory Speech in Children with Hearing Loss

Science.gov (United States)

Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Hervé

2017-01-01

Objectives Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Methods Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/–B/aa or /–B/az). The items started with an easy-to-speechread /B/ or difficult-to-speechread /G/ onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/–B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same—as opposed to different—responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g., /–B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz—as opposed to az— responses in the audiovisual than auditory mode. Results
Computational speech segregation based on an auditory-inspired modulation analysis

DEFF Research Database (Denmark)

May, Tobias; Dau, Torsten

2014-01-01

A monaural speech segregation system is presented that estimates the ideal binary mask from noisy speech based on the supervised learning of amplitude modulation spectrogram (AMS) features. Instead of using linearly scaled modulation filters with constant absolute bandwidth, an auditory- inspired...... about speech activity present in neighboring time-frequency units. In order to evaluate the generalization performance of the system to unseen acoustic conditions, the speech segregation system is trained with a limited set of low signal-to-noise ratio (SNR) conditions, but tested over a wide range...
Enhancement of Non-Stationary Speech using Harmonic Chirp Filters

DEFF Research Database (Denmark)

Nørholm, Sidsel Marie; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

2015-01-01

In this paper, the issue of single channel speech enhancement of non-stationary voiced speech is addressed. The non-stationarity of speech is well known, but state of the art speech enhancement methods assume stationarity within frames of 20–30 ms. We derive optimal distortionless filters that take...... the non-stationarity nature of voiced speech into account via linear constraints. This is facilitated by imposing a harmonic chirp model on the speech signal. As an implicit part of the filter design, the noise statistics are also estimated based on the observed signal and parameters of the harmonic chirp...... model. Simulations on real speech show that the chirp based filters perform better than their harmonic counterparts. Further, it is seen that the gain of using the chirp model increases when the estimated chirp parameter is big corresponding to periods in the signal where the instantaneous fundamental...
Prediction of Beck Depression Inventory (BDI-II) Score Using Acoustic Measurements in a Sample of Iium Engineering Students

Science.gov (United States)

Fikri Zanil, Muhamad; Nur Wahidah Nik Hashim, Nik; Azam, Huda

2017-11-01

Psychiatrist currently relies on questionnaires and interviews for psychological assessment. These conservative methods often miss true positives and might lead to death, especially in cases where a patient might be experiencing suicidal predisposition but was only diagnosed as major depressive disorder (MDD). With modern technology, an assessment tool might aid psychiatrist with a more accurate diagnosis and thus hope to reduce casualty. This project will explore on the relationship between speech features of spoken audio signal (reading) in Bahasa Malaysia with the Beck Depression Inventory scores. The speech features used in this project were Power Spectral Density (PSD), Mel-frequency Ceptral Coefficients (MFCC), Transition Parameter, formant and pitch. According to analysis, the optimum combination of speech features to predict BDI-II scores include PSD, MFCC and Transition Parameters. The linear regression approach with sequential forward/backward method was used to predict the BDI-II scores using reading speech. The result showed 0.4096 mean absolute error (MAE) for female reading speech. For male, the BDI-II scores successfully predicted 100% less than 1 scores difference with MAE of 0.098437. A prediction system called Depression Severity Evaluator (DSE) was developed. The DSE managed to predict one out of five subjects. Although the prediction rate was low, the system precisely predict the score within the maximum difference of 4.93 for each person. This demonstrates that the scores are not random numbers.
Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

Science.gov (United States)

Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

2012-12-01

In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Novel acoustic features for speech emotion recognition

Institute of Scientific and Technical Information of China (English)

ROH; Yong-Wan; KIM; Dong-Ju; LEE; Woo-Seok; HONG; Kwang-Seok

2009-01-01

This paper focuses on acoustic features that effectively improve the recognition of emotion in human speech.The novel features in this paper are based on spectral-based entropy parameters such as fast Fourier transform(FFT) spectral entropy,delta FFT spectral entropy,Mel-frequency filter bank(MFB) spectral entropy,and Delta MFB spectral entropy.Spectral-based entropy features are simple.They reflect frequency characteristic and changing characteristic in frequency of speech.We implement an emotion rejection module using the probability distribution of recognized-scores and rejected-scores.This reduces the false recognition rate to improve overall performance.Recognized-scores and rejected-scores refer to probabilities of recognized and rejected emotion recognition results,respectively.These scores are first obtained from a pattern recognition procedure.The pattern recognition phase uses the Gaussian mixture model(GMM).We classify the four emotional states as anger,sadness,happiness and neutrality.The proposed method is evaluated using 45 sentences in each emotion for 30 subjects,15 males and 15 females.Experimental results show that the proposed method is superior to the existing emotion recognition methods based on GMM using energy,Zero Crossing Rate(ZCR),linear prediction coefficient(LPC),and pitch parameters.We demonstrate the effectiveness of the proposed approach.One of the proposed features,combined MFB and delta MFB spectral entropy improves performance approximately 10% compared to the existing feature parameters for speech emotion recognition methods.We demonstrate a 4% performance improvement in the applied emotion rejection with low confidence score.
Novel acoustic features for speech emotion recognition

Institute of Scientific and Technical Information of China (English)

ROH Yong-Wan; KIM Dong-Ju; LEE Woo-Seok; HONG Kwang-Seok

2009-01-01

This paper focuses on acoustic features that effectively improve the recognition of emotion in human speech. The novel features in this paper are based on spectral-based entropy parameters such as fast Fourier transform (FFT) spectral entropy, delta FFT spectral entropy, Mel-frequency filter bank (MFB)spectral entropy, and Delta MFB spectral entropy. Spectral-based entropy features are simple. They reflect frequency characteristic and changing characteristic in frequency of speech. We implement an emotion rejection module using the probability distribution of recognized-scores and rejected-scores.This reduces the false recognition rate to improve overall performance. Recognized-scores and rejected-scores refer to probabilities of recognized and rejected emotion recognition results, respectively.These scores are first obtained from a pattern recognition procedure. The pattern recognition phase uses the Gaussian mixture model (GMM). We classify the four emotional states as anger, sadness,happiness and neutrality. The proposed method is evaluated using 45 sentences in each emotion for 30 subjects, 15 males and 15 females. Experimental results show that the proposed method is superior to the existing emotion recognition methods based on GMM using energy, Zero Crossing Rate (ZCR),linear prediction coefficient (LPC), and pitch parameters. We demonstrate the effectiveness of the proposed approach. One of the proposed features, combined MFB and delta MFB spectral entropy improves performance approximately 10% compared to the existing feature parameters for speech emotion recognition methods. We demonstrate a 4% performance improvement in the applied emotion rejection with low confidence score.
Histogram Equalization to Model Adaptation for Robust Speech Recognition

Directory of Open Access Journals (Sweden)

Suh Youngjoo

2010-01-01

Full Text Available We propose a new model adaptation method based on the histogram equalization technique for providing robustness in noisy environments. The trained acoustic mean models of a speech recognizer are adapted into environmentally matched conditions by using the histogram equalization algorithm on a single utterance basis. For more robust speech recognition in the heavily noisy conditions, trained acoustic covariance models are efficiently adapted by the signal-to-noise ratio-dependent linear interpolation between trained covariance models and utterance-level sample covariance models. Speech recognition experiments on both the digit-based Aurora2 task and the large vocabulary-based task showed that the proposed model adaptation approach provides significant performance improvements compared to the baseline speech recognizer trained on the clean speech data.
Predicting musically induced emotions from physiological inputs: linear and neural network models.

Science.gov (United States)

Russo, Frank A; Vempala, Naresh N; Sandstrom, Gillian M

2013-01-01

Listening to music often leads to physiological responses. Do these physiological responses contain sufficient information to infer emotion induced in the listener? The current study explores this question by attempting to predict judgments of "felt" emotion from physiological responses alone using linear and neural network models. We measured five channels of peripheral physiology from 20 participants-heart rate (HR), respiration, galvanic skin response, and activity in corrugator supercilii and zygomaticus major facial muscles. Using valence and arousal (VA) dimensions, participants rated their felt emotion after listening to each of 12 classical music excerpts. After extracting features from the five channels, we examined their correlation with VA ratings, and then performed multiple linear regression to see if a linear relationship between the physiological responses could account for the ratings. Although linear models predicted a significant amount of variance in arousal ratings, they were unable to do so with valence ratings. We then used a neural network to provide a non-linear account of the ratings. The network was trained on the mean ratings of eight of the 12 excerpts and tested on the remainder. Performance of the neural network confirms that physiological responses alone can be used to predict musically induced emotion. The non-linear model derived from the neural network was more accurate than linear models derived from multiple linear regression, particularly along the valence dimension. A secondary analysis allowed us to quantify the relative contributions of inputs to the non-linear model. The study represents a novel approach to understanding the complex relationship between physiological responses and musically induced emotion.
Drug-Target Interaction Prediction through Label Propagation with Linear Neighborhood Information.

Science.gov (United States)

Zhang, Wen; Chen, Yanlin; Li, Dingfang

2017-11-25

Interactions between drugs and target proteins provide important information for the drug discovery. Currently, experiments identified only a small number of drug-target interactions. Therefore, the development of computational methods for drug-target interaction prediction is an urgent task of theoretical interest and practical significance. In this paper, we propose a label propagation method with linear neighborhood information (LPLNI) for predicting unobserved drug-target interactions. Firstly, we calculate drug-drug linear neighborhood similarity in the feature spaces, by considering how to reconstruct data points from neighbors. Then, we take similarities as the manifold of drugs, and assume the manifold unchanged in the interaction space. At last, we predict unobserved interactions between known drugs and targets by using drug-drug linear neighborhood similarity and known drug-target interactions. The experiments show that LPLNI can utilize only known drug-target interactions to make high-accuracy predictions on four benchmark datasets. Furthermore, we consider incorporating chemical structures into LPLNI models. Experimental results demonstrate that the model with integrated information (LPLNI-II) can produce improved performances, better than other state-of-the-art methods. The known drug-target interactions are an important information source for computational predictions. The usefulness of the proposed method is demonstrated by cross validation and the case study.
Auditory and Non-Auditory Contributions for Unaided Speech Recognition in Noise as a Function of Hearing Aid Use.

Science.gov (United States)

Gieseler, Anja; Tahden, Maike A S; Thiel, Christiane M; Wagener, Kirsten C; Meis, Markus; Colonius, Hans

2017-01-01

Differences in understanding speech in noise among hearing-impaired individuals cannot be explained entirely by hearing thresholds alone, suggesting the contribution of other factors beyond standard auditory ones as derived from the audiogram. This paper reports two analyses addressing individual differences in the explanation of unaided speech-in-noise performance among n = 438 elderly hearing-impaired listeners ( mean = 71.1 ± 5.8 years). The main analysis was designed to identify clinically relevant auditory and non-auditory measures for speech-in-noise prediction using auditory (audiogram, categorical loudness scaling) and cognitive tests (verbal-intelligence test, screening test of dementia), as well as questionnaires assessing various self-reported measures (health status, socio-economic status, and subjective hearing problems). Using stepwise linear regression analysis, 62% of the variance in unaided speech-in-noise performance was explained, with measures Pure-tone average (PTA), Age , and Verbal intelligence emerging as the three most important predictors. In the complementary analysis, those individuals with the same hearing loss profile were separated into hearing aid users (HAU) and non-users (NU), and were then compared regarding potential differences in the test measures and in explaining unaided speech-in-noise recognition. The groupwise comparisons revealed significant differences in auditory measures and self-reported subjective hearing problems, while no differences in the cognitive domain were found. Furthermore, groupwise regression analyses revealed that Verbal intelligence had a predictive value in both groups, whereas Age and PTA only emerged significant in the group of hearing aid NU.
Factors Affecting Acoustics and Speech Intelligibility in the Operating Room: Size Matters.

Science.gov (United States)

McNeer, Richard R; Bennett, Christopher L; Horn, Danielle Bodzin; Dudaryk, Roman

2017-06-01

Noise in health care settings has increased since 1960 and represents a significant source of dissatisfaction among staff and patients and risk to patient safety. Operating rooms (ORs) in which effective communication is crucial are particularly noisy. Speech intelligibility is impacted by noise, room architecture, and acoustics. For example, sound reverberation time (RT60) increases with room size, which can negatively impact intelligibility, while room objects are hypothesized to have the opposite effect. We explored these relationships by investigating room construction and acoustics of the surgical suites at our institution. We studied our ORs during times of nonuse. Room dimensions were measured to calculate room volumes (VR). Room content was assessed by estimating size and assigning items into 5 volume categories to arrive at an adjusted room content volume (VC) metric. Psychoacoustic analyses were performed by playing sweep tones from a speaker and recording the impulse responses (ie, resulting sound fields) from 3 locations in each room. The recordings were used to calculate 6 psychoacoustic indices of intelligibility. Multiple linear regression was performed using VR and VC as predictor variables and each intelligibility index as an outcome variable. A total of 40 ORs were studied. The surgical suites were characterized by a large degree of construction and surface finish heterogeneity and varied in size from 71.2 to 196.4 m (average VR = 131.1 [34.2] m). An insignificant correlation was observed between VR and VC (Pearson correlation = 0.223, P = .166). Multiple linear regression model fits and β coefficients for VR were highly significant for each of the intelligibility indices and were best for RT60 (R = 0.666, F(2, 37) = 39.9, P the size and contents of an OR can predict a range of psychoacoustic indices of speech intelligibility. Specifically, increasing OR size correlated with worse speech intelligibility, while increasing amounts of OR contents

Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling.

Science.gov (United States)

Kawashima, Issaku; Kumano, Hiroaki

2017-01-01

Mind-wandering (MW), task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG) variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR) to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.
Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling

Directory of Open Access Journals (Sweden)

Issaku Kawashima

2017-07-01

Full Text Available Mind-wandering (MW, task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.
Speech-like rhythm in a voiced and voiceless orangutan call.

Directory of Open Access Journals (Sweden)

Adriano R Lameira

Full Text Available The evolutionary origins of speech remain obscure. Recently, it was proposed that speech derived from monkey facial signals which exhibit a speech-like rhythm of ∼5 open-close lip cycles per second. In monkeys, these signals may also be vocalized, offering a plausible evolutionary stepping stone towards speech. Three essential predictions remain, however, to be tested to assess this hypothesis' validity; (i Great apes, our closest relatives, should likewise produce 5Hz-rhythm signals, (ii speech-like rhythm should involve calls articulatorily similar to consonants and vowels given that speech rhythm is the direct product of stringing together these two basic elements, and (iii speech-like rhythm should be experience-based. Via cinematic analyses we demonstrate that an ex-entertainment orangutan produces two calls at a speech-like rhythm, coined "clicks" and "faux-speech." Like voiceless consonants, clicks required no vocal fold action, but did involve independent manoeuvring over lips and tongue. In parallel to vowels, faux-speech showed harmonic and formant modulations, implying vocal fold and supralaryngeal action. This rhythm was several times faster than orangutan chewing rates, as observed in monkeys and humans. Critically, this rhythm was seven-fold faster, and contextually distinct, than any other known rhythmic calls described to date in the largest database of the orangutan repertoire ever assembled. The first two predictions advanced by this study are validated and, based on parsimony and exclusion of potential alternative explanations, initial support is given to the third prediction. Irrespectively of the putative origins of these calls and underlying mechanisms, our findings demonstrate irrevocably that great apes are not respiratorily, articulatorilly, or neurologically constrained for the production of consonant- and vowel-like calls at speech rhythm. Orangutan clicks and faux-speech confirm the importance of rhythmic speech
A unified frame of predicting side effects of drugs by using linear neighborhood similarity.

Science.gov (United States)

Zhang, Wen; Yue, Xiang; Liu, Feng; Chen, Yanlin; Tu, Shikui; Zhang, Xining

2017-12-14

Drug side effects are one of main concerns in the drug discovery, which gains wide attentions. Investigating drug side effects is of great importance, and the computational prediction can help to guide wet experiments. As far as we known, a great number of computational methods have been proposed for the side effect predictions. The assumption that similar drugs may induce same side effects is usually employed for modeling, and how to calculate the drug-drug similarity is critical in the side effect predictions. In this paper, we present a novel measure of drug-drug similarity named "linear neighborhood similarity", which is calculated in a drug feature space by exploring linear neighborhood relationship. Then, we transfer the similarity from the feature space into the side effect space, and predict drug side effects by propagating known side effect information through a similarity-based graph. Under a unified frame based on the linear neighborhood similarity, we propose method "LNSM" and its extension "LNSM-SMI" to predict side effects of new drugs, and propose the method "LNSM-MSE" to predict unobserved side effect of approved drugs. We evaluate the performances of LNSM and LNSM-SMI in predicting side effects of new drugs, and evaluate the performances of LNSM-MSE in predicting missing side effects of approved drugs. The results demonstrate that the linear neighborhood similarity can improve the performances of side effect prediction, and the linear neighborhood similarity-based methods can outperform existing side effect prediction methods. More importantly, the proposed methods can predict side effects of new drugs as well as unobserved side effects of approved drugs under a unified frame.
Causal inference of asynchronous audiovisual speech

Directory of Open Access Journals (Sweden)

John F Magnotti

2013-11-01

Full Text Available During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions abut the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.
Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter

DEFF Research Database (Denmark)

Waseem, Zeerak; Hovy, Dirk

2016-01-01

Hate speech in the form of racist and sexist remarks are a common occurrence on social media. For that reason, many social media services address the problem of identifying hate speech, but the definition of hate speech varies markedly and is largely a manual effort. We provide a list of criteria...
Auditory evoked potentials: predicting speech therapy outcomes in children with phonological disorders

Directory of Open Access Journals (Sweden)

Renata Aparecida Leite

2014-03-01

Full Text Available OBJECTIVES: This study investigated whether neurophysiologic responses (auditory evoked potentials differ between typically developed children and children with phonological disorders and whether these responses are modified in children with phonological disorders after speech therapy. METHODS: The participants included 24 typically developing children (Control Group, mean age: eight years and ten months and 23 children clinically diagnosed with phonological disorders (Study Group, mean age: eight years and eleven months. Additionally, 12 study group children were enrolled in speech therapy (Study Group 1, and 11 were not enrolled in speech therapy (Study Group 2. The subjects were submitted to the following procedures: conventional audiological, auditory brainstem response, auditory middle-latency response, and P300 assessments. All participants presented with normal hearing thresholds. The study group 1 subjects were reassessed after 12 speech therapy sessions, and the study group 2 subjects were reassessed 3 months after the initial assessment. Electrophysiological results were compared between the groups. RESULTS: Latency differences were observed between the groups (the control and study groups regarding the auditory brainstem response and the P300 tests. Additionally, the P300 responses improved in the study group 1 children after speech therapy. CONCLUSION: The findings suggest that children with phonological disorders have impaired auditory brainstem and cortical region pathways that may benefit from speech therapy.
Structured Semantic Knowledge Can Emerge Automatically from Predicting Word Sequences in Child-Directed Speech

Science.gov (United States)

Huebner, Philip A.; Willits, Jon A.

2018-01-01

Previous research has suggested that distributional learning mechanisms may contribute to the acquisition of semantic knowledge. However, distributional learning mechanisms, statistical learning, and contemporary “deep learning” approaches have been criticized for being incapable of learning the kind of abstract and structured knowledge that many think is required for acquisition of semantic knowledge. In this paper, we show that recurrent neural networks, trained on noisy naturalistic speech to children, do in fact learn what appears to be abstract and structured knowledge. We trained two types of recurrent neural networks (Simple Recurrent Network, and Long Short-Term Memory) to predict word sequences in a 5-million-word corpus of speech directed to children ages 0–3 years old, and assessed what semantic knowledge they acquired. We found that learned internal representations are encoding various abstract grammatical and semantic features that are useful for predicting word sequences. Assessing the organization of semantic knowledge in terms of the similarity structure, we found evidence of emergent categorical and hierarchical structure in both models. We found that the Long Short-term Memory (LSTM) and SRN are both learning very similar kinds of representations, but the LSTM achieved higher levels of performance on a quantitative evaluation. We also trained a non-recurrent neural network, Skip-gram, on the same input to compare our results to the state-of-the-art in machine learning. We found that Skip-gram achieves relatively similar performance to the LSTM, but is representing words more in terms of thematic compared to taxonomic relations, and we provide reasons why this might be the case. Our findings show that a learning system that derives abstract, distributed representations for the purpose of predicting sequential dependencies in naturalistic language may provide insight into emergence of many properties of the developing semantic system. PMID
Structured Semantic Knowledge Can Emerge Automatically from Predicting Word Sequences in Child-Directed Speech

Directory of Open Access Journals (Sweden)

Philip A. Huebner

2018-02-01

Full Text Available Previous research has suggested that distributional learning mechanisms may contribute to the acquisition of semantic knowledge. However, distributional learning mechanisms, statistical learning, and contemporary “deep learning” approaches have been criticized for being incapable of learning the kind of abstract and structured knowledge that many think is required for acquisition of semantic knowledge. In this paper, we show that recurrent neural networks, trained on noisy naturalistic speech to children, do in fact learn what appears to be abstract and structured knowledge. We trained two types of recurrent neural networks (Simple Recurrent Network, and Long Short-Term Memory to predict word sequences in a 5-million-word corpus of speech directed to children ages 0–3 years old, and assessed what semantic knowledge they acquired. We found that learned internal representations are encoding various abstract grammatical and semantic features that are useful for predicting word sequences. Assessing the organization of semantic knowledge in terms of the similarity structure, we found evidence of emergent categorical and hierarchical structure in both models. We found that the Long Short-term Memory (LSTM and SRN are both learning very similar kinds of representations, but the LSTM achieved higher levels of performance on a quantitative evaluation. We also trained a non-recurrent neural network, Skip-gram, on the same input to compare our results to the state-of-the-art in machine learning. We found that Skip-gram achieves relatively similar performance to the LSTM, but is representing words more in terms of thematic compared to taxonomic relations, and we provide reasons why this might be the case. Our findings show that a learning system that derives abstract, distributed representations for the purpose of predicting sequential dependencies in naturalistic language may provide insight into emergence of many properties of the developing
An analysis of machine translation and speech synthesis in speech-to-speech translation system

OpenAIRE

Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.

2011-01-01

This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...
Modeling speech intelligibility in adverse conditions

DEFF Research Database (Denmark)

Dau, Torsten

2012-01-01

) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting...... understanding speech when more than one person is talking, even when reduced audibility has been fully compensated for by a hearing aid. The reasons for these difficulties are not well understood. This presentation highlights recent concepts of the monaural and binaural signal processing strategies employed...... by the normal as well as impaired auditory system. Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII...
Predicting Voice Disorder Status From Smoothed Measures of Cepstral Peak Prominence Using Praat and Analysis of Dysphonia in Speech and Voice (ADSV).

Science.gov (United States)

Sauder, Cara; Bretl, Michelle; Eadie, Tanya

2017-09-01

The purposes of this study were to (1) determine and compare the diagnostic accuracy of a single acoustic measure, smoothed cepstral peak prominence (CPPS), to predict voice disorder status from connected speech samples using two software systems: Analysis of Dysphonia in Speech and Voice (ADSV) and Praat; and (2) to determine the relationship between measures of CPPS generated from these programs. This is a retrospective cross-sectional study. Measures of CPPS were obtained from connected speech recordings of 100 subjects with voice disorders and 70 nondysphonic subjects without vocal complaints using commercially available ADSV and freely downloadable Praat software programs. Logistic regression and receiver operating characteristic (ROC) analyses were used to evaluate and compare the diagnostic accuracy of CPPS measures. Relationships between CPPS measures from the programs were determined. Results showed acceptable overall accuracy rates (75% accuracy, ADSV; 82% accuracy, Praat) and area under the ROC curves (area under the curve [AUC] = 0.81, ADSV; AUC = 0.91, Praat) for predicting voice disorder status, with slight differences in sensitivity and specificity. CPPS measures derived from Praat were uniquely predictive of disorder status above and beyond CPPS measures from ADSV (χ 2 (1) = 40.71, P disorder status using either program. Clinicians may consider using CPPS to complement clinical voice evaluation and screening protocols. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Structural Dynamic Analyses And Test Predictions For Spacecraft Structures With Non-Linearities

Science.gov (United States)

Vergniaud, Jean-Baptiste; Soula, Laurent; Newerla, Alfred

2012-07-01

The overall objective of the mechanical development and verification process is to ensure that the spacecraft structure is able to sustain the mechanical environments encountered during launch. In general the spacecraft structures are a-priori assumed to behave linear, i.e. the responses to a static load or dynamic excitation, respectively, will increase or decrease proportionally to the amplitude of the load or excitation induced. However, past experiences have shown that various non-linearities might exist in spacecraft structures and the consequences of their dynamic effects can significantly affect the development and verification process. Current processes are mainly adapted to linear spacecraft structure behaviour. No clear rules exist for dealing with major structure non-linearities. They are handled outside the process by individual analysis and margin policy, and analyses after tests to justify the CLA coverage. Non-linearities can primarily affect the current spacecraft development and verification process on two aspects. Prediction of flights loads by launcher/satellite coupled loads analyses (CLA): only linear satellite models are delivered for performing CLA and no well-established rules exist how to properly linearize a model when non- linearities are present. The potential impact of the linearization on the results of the CLA has not yet been properly analyzed. There are thus difficulties to assess that CLA results will cover actual flight levels. Management of satellite verification tests: the CLA results generated with a linear satellite FEM are assumed flight representative. If the internal non- linearities are present in the tested satellite then there might be difficulties to determine which input level must be passed to cover satellite internal loads. The non-linear behaviour can also disturb the shaker control, putting the satellite at risk by potentially imposing too high levels. This paper presents the results of a test campaign performed in
Technical note: A linear model for predicting δ13 Cprotein.

Science.gov (United States)

Pestle, William J; Hubbe, Mark; Smith, Erin K; Stevenson, Joseph M

2015-08-01

Development of a model for the prediction of δ(13) Cprotein from δ(13) Ccollagen and Δ(13) Cap-co . Model-generated values could, in turn, serve as "consumer" inputs for multisource mixture modeling of paleodiet. Linear regression analysis of previously published controlled diet data facilitated the development of a mathematical model for predicting δ(13) Cprotein (and an experimentally generated error term) from isotopic data routinely generated during the analysis of osseous remains (δ(13) Cco and Δ(13) Cap-co ). Regression analysis resulted in a two-term linear model (δ(13) Cprotein (%) = (0.78 × δ(13) Cco ) - (0.58× Δ(13) Cap-co ) - 4.7), possessing a high R-value of 0.93 (r(2) = 0.86, P analysis of human osseous remains. These predicted values are ideal for use in multisource mixture modeling of dietary protein source contribution. © 2015 Wiley Periodicals, Inc.
Tuning Neural Phase Entrainment to Speech.

Science.gov (United States)

Falk, Simone; Lanzilotti, Cosima; Schön, Daniele

2017-08-01

Musical rhythm positively impacts on subsequent speech processing. However, the neural mechanisms underlying this phenomenon are so far unclear. We investigated whether carryover effects from a preceding musical cue to a speech stimulus result from a continuation of neural phase entrainment to periodicities that are present in both music and speech. Participants listened and memorized French metrical sentences that contained (quasi-)periodic recurrences of accents and syllables. Speech stimuli were preceded by a rhythmically regular or irregular musical cue. Our results show that the presence of a regular cue modulates neural response as estimated by EEG power spectral density, intertrial coherence, and source analyses at critical frequencies during speech processing compared with the irregular condition. Importantly, intertrial coherences for regular cues were indicative of the participants' success in memorizing the subsequent speech stimuli. These findings underscore the highly adaptive nature of neural phase entrainment across fundamentally different auditory stimuli. They also support current models of neural phase entrainment as a tool of predictive timing and attentional selection across cognitive domains.
Noise Reduction with Optimal Variable Span Linear Filters

DEFF Research Database (Denmark)

Jensen, Jesper Rindom; Benesty, Jacob; Christensen, Mads Græsbøll

2016-01-01

In this paper, the problem of noise reduction is addressed as a linear filtering problem in a novel way by using concepts from subspace-based enhancement methods, resulting in variable span linear filters. This is done by forming the filter coefficients as linear combinations of a number...... included in forming the filter. Using these concepts, a number of different filter designs are considered, like minimum distortion, Wiener, maximum SNR, and tradeoff filters. Interestingly, all these can be expressed as special cases of variable span filters. We also derive expressions for the speech...... demonstrate the advantages and properties of the variable span filter designs, and their potential performance gain compared to widely used speech enhancement methods....
Modeling Speech Intelligibility in Hearing Impaired Listeners

DEFF Research Database (Denmark)

Scheidiger, Christoph; Jørgensen, Søren; Dau, Torsten

2014-01-01

speech, e.g. phase jitter or spectral subtraction. Recent studies predict SI for normal-hearing (NH) listeners based on a signal-to-noise ratio measure in the envelope domain (SNRenv), in the framework of the speech-based envelope power spectrum model (sEPSM, [20, 21]). These models have shown good...... agreement with measured data under a broad range of conditions, including stationary and modulated interferers, reverberation, and spectral subtraction. Despite the advances in modeling intelligibility in NH listeners, a broadly applicable model that can predict SI in hearing-impaired (HI) listeners...... is not yet available. As a firrst step towards such a model, this study investigates to what extent eects of hearing impairment on SI can be modeled in the sEPSM framework. Preliminary results show that, by only modeling the loss of audibility, the model cannot account for the higher speech reception...
Sensitivity of the Speech Intelligibility Index to the Assumed Dynamic Range

Science.gov (United States)

Jin, In-Ki; Kates, James M.; Arehart, Kathryn H.

2017-01-01

Purpose: This study aims to evaluate the sensitivity of the speech intelligibility index (SII) to the assumed speech dynamic range (DR) in different languages and with different types of stimuli. Method: Intelligibility prediction uses the absolute transfer function (ATF) to map the SII value to the predicted intelligibility for a given stimuli.…
Psychoacoustic cues to emotion in speech prosody and music.

Science.gov (United States)

Coutinho, Eduardo; Dibben, Nicola

2013-01-01

There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.
Modelling the Architecture of Phonetic Plans: Evidence from Apraxia of Speech

Science.gov (United States)

Ziegler, Wolfram

2009-01-01

In theories of spoken language production, the gestural code prescribing the movements of the speech organs is usually viewed as a linear string of holistic, encapsulated, hard-wired, phonetic plans, e.g., of the size of phonemes or syllables. Interactions between phonetic units on the surface of overt speech are commonly attributed to either the…

Predicting musically induced emotions from physiological inputs: Linear and neural network models

Directory of Open Access Journals (Sweden)

Frank A. Russo

2013-08-01

Full Text Available Listening to music often leads to physiological responses. Do these physiological responses contain sufficient information to infer emotion induced in the listener? The current study explores this question by attempting to predict judgments of 'felt' emotion from physiological responses alone using linear and neural network models. We measured five channels of peripheral physiology from 20 participants – heart rate, respiration, galvanic skin response, and activity in corrugator supercilii and zygomaticus major facial muscles. Using valence and arousal (VA dimensions, participants rated their felt emotion after listening to each of 12 classical music excerpts. After extracting features from the five channels, we examined their correlation with VA ratings, and then performed multiple linear regression to see if a linear relationship between the physiological responses could account for the ratings. Although linear models predicted a significant amount of variance in arousal ratings, they were unable to do so with valence ratings. We then used a neural network to provide a nonlinear account of the ratings. The network was trained on the mean ratings of eight of the 12 excerpts and tested on the remainder. Performance of the neural network confirms that physiological responses alone can be used to predict musically induced emotion. The nonlinear model derived from the neural network was more accurate than linear models derived from multiple linear regression, particularly along the valence dimension. A secondary analysis allowed us to quantify the relative contributions of inputs to the nonlinear model. The study represents a novel approach to understanding the complex relationship between physiological responses and musically induced emotion.
The Hierarchical Cortical Organization of Human Speech Processing.

Science.gov (United States)

de Heer, Wendy A; Huth, Alexander G; Griffiths, Thomas L; Gallant, Jack L; Theunissen, Frédéric E

2017-07-05

Speech comprehension requires that the brain extract semantic meaning from the spectral features represented at the cochlea. To investigate this process, we performed an fMRI experiment in which five men and two women passively listened to several hours of natural narrative speech. We then used voxelwise modeling to predict BOLD responses based on three different feature spaces that represent the spectral, articulatory, and semantic properties of speech. The amount of variance explained by each feature space was then assessed using a separate validation dataset. Because some responses might be explained equally well by more than one feature space, we used a variance partitioning analysis to determine the fraction of the variance that was uniquely explained by each feature space. Consistent with previous studies, we found that speech comprehension involves hierarchical representations starting in primary auditory areas and moving laterally on the temporal lobe: spectral features are found in the core of A1, mixtures of spectral and articulatory in STG, mixtures of articulatory and semantic in STS, and semantic in STS and beyond. Our data also show that both hemispheres are equally and actively involved in speech perception and interpretation. Further, responses as early in the auditory hierarchy as in STS are more correlated with semantic than spectral representations. These results illustrate the importance of using natural speech in neurolinguistic research. Our methodology also provides an efficient way to simultaneously test multiple specific hypotheses about the representations of speech without using block designs and segmented or synthetic speech. SIGNIFICANCE STATEMENT To investigate the processing steps performed by the human brain to transform natural speech sound into meaningful language, we used models based on a hierarchical set of speech features to predict BOLD responses of individual voxels recorded in an fMRI experiment while subjects listened to
Neural Entrainment to Speech Modulates Speech Intelligibility

NARCIS (Netherlands)

Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne

2018-01-01

Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and
Song and speech: examining the link between singing talent and speech imitation ability.

Science.gov (United States)

Christiner, Markus; Reiterer, Susanne M

2013-01-01

In previous research on speech imitation, musicality, and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Forty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64% of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66% of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi) could be explained by working memory together with a singer's sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and auditory memory with singing fitting better into the category of "speech" on the productive level and "music" on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. (1) Motor flexibility and the ability to sing improve language and musical function. (2) Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. (3) The ability to sing improves the memory span of the auditory working memory.
Song and speech: examining the link between singing talent and speech imitation ability

Directory of Open Access Journals (Sweden)

Markus eChristiner

2013-11-01

Full Text Available In previous research on speech imitation, musicality and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Fourty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64 % of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66 % of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi could be explained by working memory together with a singer’s sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and sound memory with singing fitting better into the category of "speech" on the productive level and "music" on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. 1. Motor flexibility and the ability to sing improve language and musical function. 2. Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. 3. The ability to sing improves the memory span of the auditory short term memory.
Instantaneous Fundamental Frequency Estimation with Optimal Segmentation for Nonstationary Voiced Speech

DEFF Research Database (Denmark)

Nørholm, Sidsel Marie; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

2016-01-01

In speech processing, the speech is often considered stationary within segments of 20–30 ms even though it is well known not to be true. In this paper, we take the non-stationarity of voiced speech into account by using a linear chirp model to describe the speech signal. We propose a maximum...... likelihood estimator of the fundamental frequency and chirp rate of this model, and show that it reaches the Cramer-Rao bound. Since the speech varies over time, a fixed segment length is not optimal, and we propose to make a segmentation of the signal based on the maximum a posteriori (MAP) criterion. Using...... of the chirp model than the harmonic model to the speech signal. The methods are based on an assumption of white Gaussian noise, and, therefore, two prewhitening filters are also proposed....
Linear degrees of freedom in speech production: analysis of cineradio- and labio-film data and articulatory-acoustic modeling.

Science.gov (United States)

Beautemps, D; Badin, P; Bailly, G

2001-05-01

The following contribution addresses several issues concerning speech degrees of freedom in French oral vowels, stop, and fricative consonants based on an analysis of tongue and lip shapes extracted from cineradio- and labio-films. The midsagittal tongue shapes have been submitted to a linear decomposition where some of the loading factors were selected such as jaw and larynx position while four other components were derived from principal component analysis (PCA). For the lips, in addition to the more traditional protrusion and opening components, a supplementary component was extracted to explain the upward movement of both the upper and lower lips in [v] production. A linear articulatory model was developed; the six tongue degrees of freedom were used as the articulatory control parameters of the midsagittal tongue contours and explained 96% of the tongue data variance. These control parameters were also used to specify the frontal lip width dimension derived from the labio-film front views. Finally, this model was complemented by a conversion model going from the midsagittal to the area function, based on a fitting of the midsagittal distances and the formant frequencies for both vowels and consonants.
Modified linear predictive coding approach for moving target tracking by Doppler radar

Science.gov (United States)

Ding, Yipeng; Lin, Xiaoyi; Sun, Ke-Hui; Xu, Xue-Mei; Liu, Xi-Yao

2016-07-01

Doppler radar is a cost-effective tool for moving target tracking, which can support a large range of civilian and military applications. A modified linear predictive coding (LPC) approach is proposed to increase the target localization accuracy of the Doppler radar. Based on the time-frequency analysis of the received echo, the proposed approach first real-time estimates the noise statistical parameters and constructs an adaptive filter to intelligently suppress the noise interference. Then, a linear predictive model is applied to extend the available data, which can help improve the resolution of the target localization result. Compared with the traditional LPC method, which empirically decides the extension data length, the proposed approach develops an error array to evaluate the prediction accuracy and thus, adjust the optimum extension data length intelligently. Finally, the prediction error array is superimposed with the predictor output to correct the prediction error. A series of experiments are conducted to illustrate the validity and performance of the proposed techniques.
Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech

DEFF Research Database (Denmark)

Nørholm, Sidsel Marie; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

2016-01-01

In this paper, single channel speech enhancement in the time domain is considered. We address the problem of modelling non-stationary speech by describing the voiced speech parts by a harmonic linear chirp model instead of using the traditional harmonic model. This means that the speech signal...... through simulations on synthetic and speech signals, that the chirp versions of the filters perform better than their harmonic counterparts in terms of output signal-to-noise ratio (SNR) and signal reduction factor. For synthetic signals, the output SNR for the harmonic chirp APES based filter...... is increased 3 dB compared to the harmonic APES based filter at an input SNR of 10 dB, and at the same time the signal reduction factor is decreased. For speech signals, the increase is 1.5 dB along with a decrease in the signal reduction factor of 0.7. As an implicit part of the APES filter, a noise...
Subjective and Objective Quality Assessment of Single-Channel Speech Separation Algorithms

DEFF Research Database (Denmark)

Mowlaee, Pejman; Saeidi, Rahim; Christensen, Mads Græsbøll

2012-01-01

Previous studies on performance evaluation of single-channel speech separation (SCSS) algorithms mostly focused on automatic speech recognition (ASR) accuracy as their performance measure. Assessing the separated signals by different metrics other than this has the benefit that the results...... are expected to carry on to other applications beyond ASR. In this paper, in addition to conventional speech quality metrics (PESQ and SNRloss), we also evaluate the separation systems output using different source separation metrics: blind source separation evaluation (BSS EVAL) and perceptual evaluation...... that PESQ and PEASS quality metrics predict well the subjective quality of separated signals obtained by the separation systems. From the results it is observed that the short-time objective intelligibility (STOI) measure predict the speech intelligibility results....
Linear and nonlinear models for predicting fish bioconcentration factors for pesticides.

Science.gov (United States)

Yuan, Jintao; Xie, Chun; Zhang, Ting; Sun, Jinfang; Yuan, Xuejie; Yu, Shuling; Zhang, Yingbiao; Cao, Yunyuan; Yu, Xingchen; Yang, Xuan; Yao, Wu

2016-08-01

This work is devoted to the applications of the multiple linear regression (MLR), multilayer perceptron neural network (MLP NN) and projection pursuit regression (PPR) to quantitative structure-property relationship analysis of bioconcentration factors (BCFs) of pesticides tested on Bluegill (Lepomis macrochirus). Molecular descriptors of a total of 107 pesticides were calculated with the DRAGON Software and selected by inverse enhanced replacement method. Based on the selected DRAGON descriptors, a linear model was built by MLR, nonlinear models were developed using MLP NN and PPR. The robustness of the obtained models was assessed by cross-validation and external validation using test set. Outliers were also examined and deleted to improve predictive power. Comparative results revealed that PPR achieved the most accurate predictions. This study offers useful models and information for BCF prediction, risk assessment, and pesticide formulation. Copyright © 2016 Elsevier Ltd. All rights reserved.
Mathematical modeling and signal processing in speech and hearing sciences

CERN Document Server

Xin, Jack

2014-01-01

The aim of the book is to give an accessible introduction of mathematical models and signal processing methods in speech and hearing sciences for senior undergraduate and beginning graduate students with basic knowledge of linear algebra, differential equations, numerical analysis, and probability. Speech and hearing sciences are fundamental to numerous technological advances of the digital world in the past decade, from music compression in MP3 to digital hearing aids, from network based voice enabled services to speech interaction with mobile phones. Mathematics and computation are intimately related to these leaps and bounds. On the other hand, speech and hearing are strongly interdisciplinary areas where dissimilar scientific and engineering publications and approaches often coexist and make it difficult for newcomers to enter.
PRESCHOOL SPEECH ARTICULATION AND NONWORD REPETITION ABILITIES MAY HELP PREDICT EVENTUAL RECOVERY OR PERSISTENCE OF STUTTERING

Science.gov (United States)

Spencer, Caroline; Weber-Fox, Christine

2014-01-01

Purpose In preschool children, we investigated whether expressive and receptive language, phonological, articulatory, and/or verbal working memory proficiencies aid in predicting eventual recovery or persistence of stuttering. Methods Participants included 65 children, including 25 children who do not stutter (CWNS) and 40 who stutter (CWS) recruited at age 3;9–5;8. At initial testing, participants were administered the Test of Auditory Comprehension of Language, 3rd edition (TACL-3), Structured Photographic Expressive Language Test, 3rd edition (SPELT-3), Bankson-Bernthal Test of Phonology-Consonant Inventory subtest (BBTOP-CI), Nonword Repetition Test (NRT; Dollaghan & Campbell, 1998), and Test of Auditory Perceptual Skills-Revised (TAPS-R) auditory number memory and auditory word memory subtests. Stuttering behaviors of CWS were assessed in subsequent years, forming groups whose stuttering eventually persisted (CWS-Per; n=19) or recovered (CWS-Rec; n=21). Proficiency scores in morphosyntactic skills, consonant production, verbal working memory for known words, and phonological working memory and speech production for novel nonwords obtained at the initial testing were analyzed for each group. Results CWS-Per were less proficient than CWNS and CWS-Rec in measures of consonant production (BBTOP-CI) and repetition of novel phonological sequences (NRT). In contrast, receptive language, expressive language, and verbal working memory abilities did not distinguish CWS-Rec from CWS-Per. Binary logistic regression analysis indicated that preschool BBTOP-CI scores and overall NRT proficiency significantly predicted future recovery status. Conclusion Results suggest that phonological and speech articulation abilities in the preschool years should be considered with other predictive factors as part of a comprehensive risk assessment for the development of chronic stuttering. PMID:25173455
Nonverbal oral apraxia in primary progressive aphasia and apraxia of speech.

Science.gov (United States)

Botha, Hugo; Duffy, Joseph R; Strand, Edythe A; Machulda, Mary M; Whitwell, Jennifer L; Josephs, Keith A

2014-05-13

The goal of this study was to explore the prevalence of nonverbal oral apraxia (NVOA), its association with other forms of apraxia, and associated imaging findings in patients with primary progressive aphasia (PPA) and progressive apraxia of speech (PAOS). Patients with a degenerative speech or language disorder were prospectively recruited and diagnosed with a subtype of PPA or with PAOS. All patients had comprehensive speech and language examinations. Voxel-based morphometry was performed to determine whether atrophy of a specific region correlated with the presence of NVOA. Eighty-nine patients were identified, of which 34 had PAOS, 9 had agrammatic PPA, 41 had logopenic aphasia, and 5 had semantic dementia. NVOA was very common among patients with PAOS but was found in patients with PPA as well. Several patients exhibited only one of NVOA or apraxia of speech. Among patients with apraxia of speech, the severity of the apraxia of speech was predictive of NVOA, whereas ideomotor apraxia severity was predictive of the presence of NVOA in those without apraxia of speech. Bilateral atrophy of the prefrontal cortex anterior to the premotor area and supplementary motor area was associated with NVOA. Apraxia of speech, NVOA, and ideomotor apraxia are at least partially separable disorders. The association of NVOA and apraxia of speech likely results from the proximity of the area reported here and the premotor area, which has been implicated in apraxia of speech. The association of ideomotor apraxia and NVOA among patients without apraxia of speech could represent disruption of modules shared by nonverbal oral movements and limb movements.
Freedom of racist speech: Ego and expressive threats.

Science.gov (United States)

White, Mark H; Crandall, Christian S

2017-09-01

Do claims of "free speech" provide cover for prejudice? We investigate whether this defense of racist or hate speech serves as a justification for prejudice. In a series of 8 studies (N = 1,624), we found that explicit racial prejudice is a reliable predictor of the "free speech defense" of racist expression. Participants endorsed free speech values for singing racists songs or posting racist comments on social media; people high in prejudice endorsed free speech more than people low in prejudice (meta-analytic r = .43). This endorsement was not principled-high levels of prejudice did not predict endorsement of free speech values when identical speech was directed at coworkers or the police. Participants low in explicit racial prejudice actively avoided endorsing free speech values in racialized conditions compared to nonracial conditions, but participants high in racial prejudice increased their endorsement of free speech values in racialized conditions. Three experiments failed to find evidence that defense of racist speech by the highly prejudiced was based in self-relevant or self-protective motives. Two experiments found evidence that the free speech argument protected participants' own freedom to express their attitudes; the defense of other's racist speech seems motivated more by threats to autonomy than threats to self-regard. These studies serve as an elaboration of the Justification-Suppression Model (Crandall & Eshleman, 2003) of prejudice expression. The justification of racist speech by endorsing fundamental political values can serve to buffer racial and hate speech from normative disapproval. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Statistical analysis of acoustic characteristics of Tibetan Lhasa dialect speech emotion

Directory of Open Access Journals (Sweden)

Guo Dandan

2016-01-01

Full Text Available The paper makes a quantitative analysis and comparison on the continuous speech emotion of Lhasa Tibetan in the four basic emotional patterns (happy, surprise, sad, neutral pitch, energy and time length by experimental phonetics and the linear statistical research methods, found that there is a positive correlation between the Lhasa Tibetan emotional speech and pitch, energy and duration, etc. And the pitch, energy and duration of negative emotion acoustic parameters are bigger than positive emotion, on this basis, drawing the Lhasa Tibetan speech emotion acoustic feature patterns. Compared with the Chinese language and the Tibetan, even though both have the tone prosodic features, they also have significant differences in the acoustic characteristics of the speech emotion.
Song and speech: examining the link between singing talent and speech imitation ability

Science.gov (United States)

Christiner, Markus; Reiterer, Susanne M.

2013-01-01

In previous research on speech imitation, musicality, and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Forty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64% of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66% of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi) could be explained by working memory together with a singer's sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and auditory memory with singing fitting better into the category of “speech” on the productive level and “music” on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. (1) Motor flexibility and the ability to sing improve language and musical function. (2) Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. (3) The ability to sing improves the memory span of the auditory working memory. PMID:24319438
Inner Speech's Relationship With Overt Speech in Poststroke Aphasia.

Science.gov (United States)

Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A

2017-09-18

Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.
A national-scale model of linear features improves predictions of farmland biodiversity.

Science.gov (United States)

Sullivan, Martin J P; Pearce-Higgins, James W; Newson, Stuart E; Scholefield, Paul; Brereton, Tom; Oliver, Tom H

2017-12-01

Modelling species distribution and abundance is important for many conservation applications, but it is typically performed using relatively coarse-scale environmental variables such as the area of broad land-cover types. Fine-scale environmental data capturing the most biologically relevant variables have the potential to improve these models. For example, field studies have demonstrated the importance of linear features, such as hedgerows, for multiple taxa, but the absence of large-scale datasets of their extent prevents their inclusion in large-scale modelling studies.We assessed whether a novel spatial dataset mapping linear and woody-linear features across the UK improves the performance of abundance models of 18 bird and 24 butterfly species across 3723 and 1547 UK monitoring sites, respectively.Although improvements in explanatory power were small, the inclusion of linear features data significantly improved model predictive performance for many species. For some species, the importance of linear features depended on landscape context, with greater importance in agricultural areas. Synthesis and applications . This study demonstrates that a national-scale model of the extent and distribution of linear features improves predictions of farmland biodiversity. The ability to model spatial variability in the role of linear features such as hedgerows will be important in targeting agri-environment schemes to maximally deliver biodiversity benefits. Although this study focuses on farmland, data on the extent of different linear features are likely to improve species distribution and abundance models in a wide range of systems and also can potentially be used to assess habitat connectivity.
Prediction of minimum temperatures in an alpine region by linear and non-linear post-processing of meteorological models

Directory of Open Access Journals (Sweden)

R. Barbiero

2007-05-01

Full Text Available Model Output Statistics (MOS refers to a method of post-processing the direct outputs of numerical weather prediction (NWP models in order to reduce the biases introduced by a coarse horizontal resolution. This technique is especially useful in orographically complex regions, where large differences can be found between the NWP elevation model and the true orography. This study carries out a comparison of linear and non-linear MOS methods, aimed at the prediction of minimum temperatures in a fruit-growing region of the Italian Alps, based on the output of two different NWPs (ECMWF T511–L60 and LAMI-3. Temperature, of course, is a particularly important NWP output; among other roles it drives the local frost forecast, which is of great interest to agriculture. The mechanisms of cold air drainage, a distinctive aspect of mountain environments, are often unsatisfactorily captured by global circulation models. The simplest post-processing technique applied in this work was a correction for the mean bias, assessed at individual model grid points. We also implemented a multivariate linear regression on the output at the grid points surrounding the target area, and two non-linear models based on machine learning techniques: Neural Networks and Random Forest. We compare the performance of all these techniques on four different NWP data sets. Downscaling the temperatures clearly improved the temperature forecasts with respect to the raw NWP output, and also with respect to the basic mean bias correction. Multivariate methods generally yielded better results, but the advantage of using non-linear algorithms was small if not negligible. RF, the best performing method, was implemented on ECMWF prognostic output at 06:00 UTC over the 9 grid points surrounding the target area. Mean absolute errors in the prediction of 2 m temperature at 06:00 UTC were approximately 1.2°C, close to the natural variability inside the area itself.

Musician advantage for speech-on-speech perception

NARCIS (Netherlands)

Başkent, Deniz; Gaudrain, Etienne

Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level
How and when predictability interacts with accentuation in temporally selective attention during speech comprehension.

Science.gov (United States)

Li, Xiaoqing; Lu, Yong; Zhao, Haiyan

2014-11-01

The present study used EEG to investigate how and when top-down prediction interacts with bottom-up acoustic signals in temporally selective attention during speech comprehension. Mandarin Chinese spoken sentences were used as stimuli. We systematically manipulated the predictability and de/accentuation of the critical words in the sentence context. Meanwhile, a linguistic attention probe 'ba' was presented concurrently with the critical words or not. The results showed that, first, words with a linguistic attention probe elicited a larger N1 than those without a probe. The latency of this N1 effect was shortened for accented or lowly predictable words, indicating more attentional resources allocated to these words. Importantly, prediction and accentuation showed a complementary interplay on the latency of this N1 effect, demonstrating that when the words had already attracted attention due to low predictability or due to the presence of pitch accent, the other factor did not modulate attention allocation anymore. Second, relative to the lowly predictable words, the highly predictable words elicited a reduced N400 and enhanced gamma-band power increases, especially under the accented conditions; moreover, under the accented conditions, shorter N1 peak-latency was found to correlate with larger gamma-band power enhancement, which indicates that a close relationship might exist between early selective attention and later semantic integration. Finally, the interaction between top-down selective attention (driven by prediction) and bottom-up selective attention (driven by accentuation) occurred before lexical-semantic processing, namely before the N400 effect evoked by predictability, which was discussed with regard to the language comprehension models. Copyright © 2014 Elsevier Ltd. All rights reserved.
Warped Linear Prediction of Physical Model Excitations with Applications in Audio Compression and Instrument Synthesis

Science.gov (United States)

Glass, Alexis; Fukudome, Kimitoshi

2004-12-01

A sound recording of a plucked string instrument is encoded and resynthesized using two stages of prediction. In the first stage of prediction, a simple physical model of a plucked string is estimated and the instrument excitation is obtained. The second stage of prediction compensates for the simplicity of the model in the first stage by encoding either the instrument excitation or the model error using warped linear prediction. These two methods of compensation are compared with each other, and to the case of single-stage warped linear prediction, adjustments are introduced, and their applications to instrument synthesis and MPEG4's audio compression within the structured audio format are discussed.
Applications of Kalman filters based on non-linear functions to numerical weather predictions

Directory of Open Access Journals (Sweden)

G. Galanis

2006-10-01

Full Text Available This paper investigates the use of non-linear functions in classical Kalman filter algorithms on the improvement of regional weather forecasts. The main aim is the implementation of non linear polynomial mappings in a usual linear Kalman filter in order to simulate better non linear problems in numerical weather prediction. In addition, the optimal order of the polynomials applied for such a filter is identified. This work is based on observations and corresponding numerical weather predictions of two meteorological parameters characterized by essential differences in their evolution in time, namely, air temperature and wind speed. It is shown that in both cases, a polynomial of low order is adequate for eliminating any systematic error, while higher order functions lead to instabilities in the filtered results having, at the same time, trivial contribution to the sensitivity of the filter. It is further demonstrated that the filter is independent of the time period and the geographic location of application.
Applications of Kalman filters based on non-linear functions to numerical weather predictions

Directory of Open Access Journals (Sweden)

G. Galanis

2006-10-01

Full Text Available This paper investigates the use of non-linear functions in classical Kalman filter algorithms on the improvement of regional weather forecasts. The main aim is the implementation of non linear polynomial mappings in a usual linear Kalman filter in order to simulate better non linear problems in numerical weather prediction. In addition, the optimal order of the polynomials applied for such a filter is identified. This work is based on observations and corresponding numerical weather predictions of two meteorological parameters characterized by essential differences in their evolution in time, namely, air temperature and wind speed. It is shown that in both cases, a polynomial of low order is adequate for eliminating any systematic error, while higher order functions lead to instabilities in the filtered results having, at the same time, trivial contribution to the sensitivity of the filter. It is further demonstrated that the filter is independent of the time period and the geographic location of application.
Analysis of Acoustic Features in Speakers with Cognitive Disorders and Speech Impairments

Science.gov (United States)

Saz, Oscar; Simón, Javier; Rodríguez, W. Ricardo; Lleida, Eduardo; Vaquero, Carlos

2009-12-01

This work presents the results in the analysis of the acoustic features (formants and the three suprasegmental features: tone, intensity and duration) of the vowel production in a group of 14 young speakers suffering different kinds of speech impairments due to physical and cognitive disorders. A corpus with unimpaired children's speech is used to determine the reference values for these features in speakers without any kind of speech impairment within the same domain of the impaired speakers; this is 57 isolated words. The signal processing to extract the formant and pitch values is based on a Linear Prediction Coefficients (LPCs) analysis of the segments considered as vowels in a Hidden Markov Model (HMM) based Viterbi forced alignment. Intensity and duration are also based in the outcome of the automated segmentation. As main conclusion of the work, it is shown that intelligibility of the vowel production is lowered in impaired speakers even when the vowel is perceived as correct by human labelers. The decrease in intelligibility is due to a 30% of increase in confusability in the formants map, a reduction of 50% in the discriminative power in energy between stressed and unstressed vowels and to a 50% increase of the standard deviation in the length of the vowels. On the other hand, impaired speakers keep good control of tone in the production of stressed and unstressed vowels.
Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling

Directory of Open Access Journals (Sweden)

Eric R. Edelman

2017-06-01

Full Text Available For efficient utilization of operating rooms (ORs, accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT. We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT. TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related
Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling.

Science.gov (United States)

Edelman, Eric R; van Kuijk, Sander M J; Hamaekers, Ankie E W; de Korte, Marcel J M; van Merode, Godefridus G; Buhre, Wolfgang F F A

2017-01-01

For efficient utilization of operating rooms (ORs), accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT) per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT) and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA) physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT). We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT). TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related benefits.
Relation between speech-in-noise threshold, hearing loss and cognition from 40-69 years of age.

Science.gov (United States)

Moore, David R; Edmondson-Jones, Mark; Dawes, Piers; Fortnum, Heather; McCormack, Abby; Pierzycki, Robert H; Munro, Kevin J

2014-01-01

Healthy hearing depends on sensitive ears and adequate brain processing. Essential aspects of both hearing and cognition decline with advancing age, but it is largely unknown how one influences the other. The current standard measure of hearing, the pure-tone audiogram is not very cognitively demanding and does not predict well the most important yet challenging use of hearing, listening to speech in noisy environments. We analysed data from UK Biobank that asked 40-69 year olds about their hearing, and assessed their ability on tests of speech-in-noise hearing and cognition. About half a million volunteers were recruited through NHS registers. Respondents completed 'whole-body' testing in purpose-designed, community-based test centres across the UK. Objective hearing (spoken digit recognition in noise) and cognitive (reasoning, memory, processing speed) data were analysed using logistic and multiple regression methods. Speech hearing in noise declined exponentially with age for both sexes from about 50 years, differing from previous audiogram data that showed a more linear decline from speech-in-noise hearing was especially dramatic among those with lower cognitive scores. Decreasing cognitive ability and increasing age were both independently associated with decreasing ability to hear speech-in-noise (0.70 and 0.89 dB, respectively) among the population studied. Men subjectively reported up to 60% higher rates of difficulty hearing than women. Workplace noise history associated with difficulty in both subjective hearing and objective speech hearing in noise. Leisure noise history was associated with subjective, but not with objective difficulty hearing. Older people have declining cognitive processing ability associated with reduced ability to hear speech in noise, measured by recognition of recorded spoken digits. Subjective reports of hearing difficulty generally show a higher prevalence than objective measures, suggesting that current objective methods could
Measures to Evaluate the Effects of DBS on Speech Production

Science.gov (United States)

Weismer, Gary; Yunusova, Yana; Bunton, Kate

2011-01-01

The purpose of this paper is to review and evaluate measures of speech production that could be used to document effects of Deep Brain Stimulation (DBS) on speech performance, especially in persons with Parkinson disease (PD). A small set of evaluative criteria for these measures is presented first, followed by consideration of several speech physiology and speech acoustic measures that have been studied frequently and reported on in the literature on normal speech production, and speech production affected by neuromotor disorders (dysarthria). Each measure is reviewed and evaluated against the evaluative criteria. Embedded within this review and evaluation is a presentation of new data relating speech motions to speech intelligibility measures in speakers with PD, amyotrophic lateral sclerosis (ALS), and control speakers (CS). These data are used to support the conclusion that at the present time the slope of second formant transitions (F2 slope), an acoustic measure, is well suited to make inferences to speech motion and to predict speech intelligibility. The use of other measures should not be ruled out, however, and we encourage further development of evaluative criteria for speech measures designed to probe the effects of DBS or any treatment with potential effects on speech production and communication skills. PMID:24932066
Relationship between pure-tone audiogram findings and speech perception among older Japanese persons.

Science.gov (United States)

Maeda, Yukihide; Takao, Soshi; Sugaya, Akiko; Kataoka, Yuko; Kariya, Shin; Tanaka, Satomi; Nagayasu, Rie; Nakagawa, Atsuko; Nishizaki, Kazunori

2018-02-01

To clarify how the pure-tone threshold (PTT) on the PTA predicts speech perception (SP) in elderly Japanese persons. Data on PTT and SP were cross-sectionally analyzed in Japanese persons (656 ears in 353 patients, aged ≥65 years). Correlations of SP and average PTT in all tested frequencies were evaluated by Pearson's correlation coefficient and simple linear regression. After adjusting for sex, laterality of ears, and age, the relationship of average and frequency-specific PTT with impaired SP ≤50% was estimated by logistic regression models. SP correlated well (r = -0.699) with the average PTT of all tested frequencies. On the other hand, the correlation between patient age and SP was weak, especially among ≤85-year-old persons (r = -0.092). Linear regression showed that the average PTT corresponding to SP of 50% was 76.4 dB nHL. Odds ratios for impaired SP were highest for PTT at 2000 Hz. Odds ratios were higher for middle (500, 1000, 2000 Hz) and high frequencies (4000, 8000 Hz) than low frequencies (125, 250 Hz). The PTT on the pure-tone audiogram (PTA) is a good predictor of SP by speech audiometry among older persons, which could provide clinically important information for hearing aid fitting and cochlear implantation.
Speech endpoint detection with non-language speech sounds for generic speech processing applications

Science.gov (United States)

McClain, Matthew; Romanowski, Brian

2009-05-01

Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
Cortical oscillations and entrainment in speech processing during working memory load.

Science.gov (United States)

Hjortkjaer, Jens; Märcher-Rørsted, Jonatan; Fuglsang, Søren A; Dau, Torsten

2018-02-02

Neuronal oscillations are thought to play an important role in working memory (WM) and speech processing. Listening to speech in real-life situations is often cognitively demanding but it is unknown whether WM load influences how auditory cortical activity synchronizes to speech features. Here, we developed an auditory n-back paradigm to investigate cortical entrainment to speech envelope fluctuations under different degrees of WM load. We measured the electroencephalogram, pupil dilations and behavioural performance from 22 subjects listening to continuous speech with an embedded n-back task. The speech stimuli consisted of long spoken number sequences created to match natural speech in terms of sentence intonation, syllabic rate and phonetic content. To burden different WM functions during speech processing, listeners performed an n-back task on the speech sequences in different levels of background noise. Increasing WM load at higher n-back levels was associated with a decrease in posterior alpha power as well as increased pupil dilations. Frontal theta power increased at the start of the trial and increased additionally with higher n-back level. The observed alpha-theta power changes are consistent with visual n-back paradigms suggesting general oscillatory correlates of WM processing load. Speech entrainment was measured as a linear mapping between the envelope of the speech signal and low-frequency cortical activity (level) decreased cortical speech envelope entrainment. Although entrainment persisted under high load, our results suggest a top-down influence of WM processing on cortical speech entrainment. © 2018 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Speech-associated gestures, Broca’s area, and the human mirror system

Science.gov (United States)

Skipper, Jeremy I.; Goldin-Meadow, Susan; Nusbaum, Howard C.; Small, Steven L

2009-01-01

Speech-associated gestures are hand and arm movements that not only convey semantic information to listeners but are themselves actions. Broca’s area has been assumed to play an important role both in semantic retrieval or selection (as part of a language comprehension system) and in action recognition (as part of a “mirror” or “observation–execution matching” system). We asked whether the role that Broca’s area plays in processing speech-associated gestures is consistent with the semantic retrieval/selection account (predicting relatively weak interactions between Broca’s area and other cortical areas because the meaningful information that speech-associated gestures convey reduces semantic ambiguity and thus reduces the need for semantic retrieval/selection) or the action recognition account (predicting strong interactions between Broca’s area and other cortical areas because speech-associated gestures are goal-direct actions that are “mirrored”). We compared the functional connectivity of Broca’s area with other cortical areas when participants listened to stories while watching meaningful speech-associated gestures, speech-irrelevant self-grooming hand movements, or no hand movements. A network analysis of neuroimaging data showed that interactions involving Broca’s area and other cortical areas were weakest when spoken language was accompanied by meaningful speech-associated gestures, and strongest when spoken language was accompanied by self-grooming hand movements or by no hand movements at all. Results are discussed with respect to the role that the human mirror system plays in processing speech-associated movements. PMID:17533001
[Effect of speech estimation on social anxiety].

Science.gov (United States)

Shirotsuki, Kentaro; Sasagawa, Satoko; Nomura, Shinobu

2009-02-01

This study investigates the effect of speech estimation on social anxiety to further understanding of this characteristic of Social Anxiety Disorder (SAD). In the first study, we developed the Speech Estimation Scale (SES) to assess negative estimation before giving a speech which has been reported to be the most fearful social situation in SAD. Undergraduate students (n = 306) completed a set of questionnaires, which consisted of the Short Fear of Negative Evaluation Scale (SFNE), the Social Interaction Anxiety Scale (SIAS), the Social Phobia Scale (SPS), and the SES. Exploratory factor analysis showed an adequate one-factor structure with eight items. Further analysis indicated that the SES had good reliability and validity. In the second study, undergraduate students (n = 315) completed the SFNE, SIAS, SPS, SES, and the Self-reported Depression Scale (SDS). The results of path analysis showed that fear of negative evaluation from others (FNE) predicted social anxiety, and speech estimation mediated the relationship between FNE and social anxiety. These results suggest that speech estimation might maintain SAD symptoms, and could be used as a specific target for cognitive intervention in SAD.
The Contribution of Cognitive Factors to Individual Differences in Understanding Noise-Vocoded Speech in Young and Older Adults

Directory of Open Access Journals (Sweden)

Stephanie Rosemann

2017-06-01

Full Text Available Noise-vocoded speech is commonly used to simulate the sensation after cochlear implantation as it consists of spectrally degraded speech. High individual variability exists in learning to understand both noise-vocoded speech and speech perceived through a cochlear implant (CI. This variability is partly ascribed to differing cognitive abilities like working memory, verbal skills or attention. Although clinically highly relevant, up to now, no consensus has been achieved about which cognitive factors exactly predict the intelligibility of speech in noise-vocoded situations in healthy subjects or in patients after cochlear implantation. We aimed to establish a test battery that can be used to predict speech understanding in patients prior to receiving a CI. Young and old healthy listeners completed a noise-vocoded speech test in addition to cognitive tests tapping on verbal memory, working memory, lexicon and retrieval skills as well as cognitive flexibility and attention. Partial-least-squares analysis revealed that six variables were important to significantly predict vocoded-speech performance. These were the ability to perceive visually degraded speech tested by the Text Reception Threshold, vocabulary size assessed with the Multiple Choice Word Test, working memory gauged with the Operation Span Test, verbal learning and recall of the Verbal Learning and Retention Test and task switching abilities tested by the Comprehensive Trail-Making Test. Thus, these cognitive abilities explain individual differences in noise-vocoded speech understanding and should be considered when aiming to predict hearing-aid outcome.
Rapid Statistical Learning Supporting Word Extraction From Continuous Speech.

Science.gov (United States)

Batterink, Laura J

2017-07-01

The identification of words in continuous speech, known as speech segmentation, is a critical early step in language acquisition. This process is partially supported by statistical learning, the ability to extract patterns from the environment. Given that speech segmentation represents a potential bottleneck for language acquisition, patterns in speech may be extracted very rapidly, without extensive exposure. This hypothesis was examined by exposing participants to continuous speech streams composed of novel repeating nonsense words. Learning was measured on-line using a reaction time task. After merely one exposure to an embedded novel word, learners demonstrated significant learning effects, as revealed by faster responses to predictable than to unpredictable syllables. These results demonstrate that learners gained sensitivity to the statistical structure of unfamiliar speech on a very rapid timescale. This ability may play an essential role in early stages of language acquisition, allowing learners to rapidly identify word candidates and "break in" to an unfamiliar language.
Linear and Non-linear Multi-Input Multi-Output Model Predictive Control of Continuous Stirred Tank Reactor

Directory of Open Access Journals (Sweden)

Muayad Al-Qaisy

2015-02-01

Full Text Available In this article, multi-input multi-output (MIMO linear model predictive controller (LMPC based on state space model and nonlinear model predictive controller based on neural network (NNMPC are applied on a continuous stirred tank reactor (CSTR. The idea is to have a good control system that will be able to give optimal performance, reject high load disturbance, and track set point change. In order to study the performance of the two model predictive controllers, MIMO Proportional-Integral-Derivative controller (PID strategy is used as benchmark. The LMPC, NNMPC, and PID strategies are used for controlling the residual concentration (CA and reactor temperature (T. NNMPC control shows a superior performance over the LMPC and PID controllers by presenting a smaller overshoot and shorter settling time.
Non-linear frequency scale mapping for voice conversion in text-to-speech system with cepstral description

Czech Academy of Sciences Publication Activity Database

Přibilová, Anna; Přibil, Jiří

2006-01-01

Roč. 48, č. 12 (2006), s. 1691-1703 ISSN 0167-6393 R&D Projects: GA MŠk(CZ) OC 277.001; GA AV ČR(CZ) 1QS108040569 Grant - others:MŠk(SK) 102/VTP/2000; MŠk(SK) 1/3107/06 Institutional research plan: CEZ:AV0Z20670512 Keywords : signal processing * speech processing * speech synthesis Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering Impact factor: 0.678, year: 2006
On the Evaluation of the Conversational Speech Quality in Telecommunications

Directory of Open Access Journals (Sweden)

Vincent Barriac

2008-04-01

Full Text Available We propose an objective method to assess speech quality in the conversational context by taking into account the talking and listening speech qualities and the impact of delay. This approach is applied to the results of four subjective tests on the effects of echo, delay, packet loss, and noise. The dataset is divided into training and validation sets. For the training set, a multiple linear regression is applied to determine a relationship between conversational, talking, and listening speech qualities and the delay value. The multiple linear regression leads to an accurate estimation of the conversational scores with high correlation and low error between subjective and estimated scores, both on the training and validation sets. In addition, a validation is performed on the data of a subjective test found in the literature which confirms the reliability of the regression. The relationship is then applied to an objective level by replacing talking and listening subjective scores with talking and listening objective scores provided by existing objective models, fed by speech signals recorded during the subjective tests. The conversational model achieves high performance as revealed by comparison with the test results and with the existing standard methodology Ã¢Â€ÂœE-model,Ã¢Â€Â presented in the ITU-T (International Telecommunication Union Recommendation G.107.

Specific acoustic models for spontaneous and dictated style in indonesian speech recognition

Science.gov (United States)

Vista, C. B.; Satriawan, C. H.; Lestari, D. P.; Widyantoro, D. H.

2018-03-01

The performance of an automatic speech recognition system is affected by differences in speech style between the data the model is originally trained upon and incoming speech to be recognized. In this paper, the usage of GMM-HMM acoustic models for specific speech styles is investigated. We develop two systems for the experiments; the first employs a speech style classifier to predict the speech style of incoming speech, either spontaneous or dictated, then decodes this speech using an acoustic model specifically trained for that speech style. The second system uses both acoustic models to recognise incoming speech and decides upon a final result by calculating a confidence score of decoding. Results show that training specific acoustic models for spontaneous and dictated speech styles confers a slight recognition advantage as compared to a baseline model trained on a mixture of spontaneous and dictated training data. In addition, the speech style classifier approach of the first system produced slightly more accurate results than the confidence scoring employed in the second system.
A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech.

Directory of Open Access Journals (Sweden)

John F Magnotti

2017-02-01

Full Text Available Audiovisual speech integration combines information from auditory speech (talker's voice and visual speech (talker's mouth movements to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory "ba" + visual "ga" (AbaVga, that are integrated to produce a fused percept ("da". This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba. We describe a simplified model of causal inference in multisensory speech perception (CIMS that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others.
Music and Speech Perception in Children Using Sung Speech.

Science.gov (United States)

Nie, Yingjiu; Galvin, John J; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie

2018-01-01

This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.
Should visual speech cues (speechreading) be considered when fitting hearing aids?

Science.gov (United States)

Grant, Ken

2002-05-01

When talker and listener are face-to-face, visual speech cues become an important part of the communication environment, and yet, these cues are seldom considered when designing hearing aids. Models of auditory-visual speech recognition highlight the importance of complementary versus redundant speech information for predicting auditory-visual recognition performance. Thus, for hearing aids to work optimally when visual speech cues are present, it is important to know whether the cues provided by amplification and the cues provided by speechreading complement each other. In this talk, data will be reviewed that show nonmonotonicity between auditory-alone speech recognition and auditory-visual speech recognition, suggesting that efforts designed solely to improve auditory-alone recognition may not always result in improved auditory-visual recognition. Data will also be presented showing that one of the most important speech cues for enhancing auditory-visual speech recognition performance, voicing, is often the cue that benefits least from amplification.
The Interaction of Temporal and Spectral Acoustic Information with Word Predictability on Speech Intelligibility

Science.gov (United States)

Shahsavarani, Somayeh Bahar

High-level, top-down information such as linguistic knowledge is a salient cortical resource that influences speech perception under most listening conditions. But, are all listeners able to exploit these resources for speech facilitation to the same extent? It was found that children with cochlear implants showed different patterns of benefit from contextual information in speech perception compared with their normal-haring peers. Previous studies have discussed the role of non-acoustic factors such as linguistic and cognitive capabilities to account for this discrepancy. Given the fact that the amount of acoustic information encoded and processed by auditory nerves of listeners with cochlear implants differs from normal-hearing listeners and even varies across individuals with cochlear implants, it is important to study the interaction of specific acoustic properties of the speech signal with contextual cues. This relationship has been mostly neglected in previous research. In this dissertation, we aimed to explore how different acoustic dimensions interact to affect listeners' abilities to combine top-down information with bottom-up information in speech perception beyond the known effects of linguistic and cognitive capacities shown previously. Specifically, the present study investigated whether there were any distinct context effects based on the resolution of spectral versus slowly-varying temporal information in perception of spectrally impoverished speech. To that end, two experiments were conducted. In both experiments, a noise-vocoded technique was adopted to generate spectrally-degraded speech to approximate acoustic cues delivered to listeners with cochlear implants. The frequency resolution was manipulated by varying the number of frequency channels. The temporal resolution was manipulated by low-pass filtering of amplitude envelope with varying low-pass cutoff frequencies. The stimuli were presented to normal-hearing native speakers of American
Apraxia of Speech

Science.gov (United States)

... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...
Genomic prediction based on data from three layer lines using non-linear regression models

NARCIS (Netherlands)

Huang, H.; Windig, J.J.; Vereijken, A.; Calus, M.P.L.

2014-01-01

Background - Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. Methods - In an attempt to alleviate
A Search Complexity Improvement of Vector Quantization to Immittance Spectral Frequency Coefficients in AMR-WB Speech Codec

Directory of Open Access Journals (Sweden)

Bing-Jhih Yao

2016-09-01

Full Text Available An adaptive multi-rate wideband (AMR-WB code is a speech codec developed on the basis of an algebraic code-excited linear-prediction (ACELP coding technique, and has a double advantage of low bit rates and high speech quality. This coding technique is widely used in modern mobile communication systems for a high speech quality in handheld devices. However, a major disadvantage is that a vector quantization (VQ of immittance spectral frequency (ISF coefficients occupies a significant computational load in the AMR-WB encoder. Hence, this paper presents a triangular inequality elimination (TIE algorithm combined with a dynamic mechanism and an intersection mechanism, abbreviated as the DI-TIE algorithm, to remarkably improve the complexity of ISF coefficient quantization in the AMR-WB speech codec. Both mechanisms are designed in a way that recursively enhances the performance of the TIE algorithm. At the end of this work, this proposal is experimentally validated as a superior search algorithm relative to a conventional TIE, a multiple TIE (MTIE, and an equal-average equal-variance equal-norm nearest neighbor search (EEENNS approach. With a full search algorithm as a benchmark for search load comparison, this work provides a search load reduction above 77%, a figure far beyond 36% in the TIE, 49% in the MTIE, and 68% in the EEENNS approach.
Iterated non-linear model predictive control based on tubes and contractive constraints.

Science.gov (United States)

Murillo, M; Sánchez, G; Giovanini, L

2016-05-01

This paper presents a predictive control algorithm for non-linear systems based on successive linearizations of the non-linear dynamic around a given trajectory. A linear time varying model is obtained and the non-convex constrained optimization problem is transformed into a sequence of locally convex ones. The robustness of the proposed algorithm is addressed adding a convex contractive constraint. To account for linearization errors and to obtain more accurate results an inner iteration loop is added to the algorithm. A simple methodology to obtain an outer bounding-tube for state trajectories is also presented. The convergence of the iterative process and the stability of the closed-loop system are analyzed. The simulation results show the effectiveness of the proposed algorithm in controlling a quadcopter type unmanned aerial vehicle. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
Common neural substrates support speech and non-speech vocal tract gestures.

Science.gov (United States)

Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M J; Poletto, Christopher J; Ludlow, Christy L

2009-08-01

The issue of whether speech is supported by the same neural substrates as non-speech vocal tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, was compared to the production of speech syllables without meaning. Brain activation related to overt production was captured with BOLD fMRI using a sparse sampling design for both conditions. Speech and non-speech were compared using voxel-wise whole brain analyses, and ROI analyses focused on frontal and temporoparietal structures previously reported to support speech production. Results showed substantial activation overlap between speech and non-speech function in regions. Although non-speech gesture production showed greater extent and amplitude of activation in the regions examined, both speech and non-speech showed comparable left laterality in activation for both target perception and production. These findings posit a more general role of the previously proposed "auditory dorsal stream" in the left hemisphere--to support the production of vocal tract gestures that are not limited to speech processing.
Bayesian prediction of spatial count data using generalized linear mixed models

DEFF Research Database (Denmark)

Christensen, Ole Fredslund; Waagepetersen, Rasmus Plenge

2002-01-01

Spatial weed count data are modeled and predicted using a generalized linear mixed model combined with a Bayesian approach and Markov chain Monte Carlo. Informative priors for a data set with sparse sampling are elicited using a previously collected data set with extensive sampling. Furthermore, ...
Ordinal models of audiovisual speech perception

DEFF Research Database (Denmark)

Andersen, Tobias

2011-01-01

Audiovisual information is integrated in speech perception. One manifestation of this is the McGurk illusion in which watching the articulating face alters the auditory phonetic percept. Understanding this phenomenon fully requires a computational model with predictive power. Here, we describe...
Channel normalization technique for speech recognition in mismatched conditions

CSIR Research Space (South Africa)

Kleynhans, N

2008-11-01

Full Text Available , where one wishes to use any available training data for a variety of purposes. Research into a new channel normalization (CN) technique for channel mismatched speech recognition is presented. A process of inverse linear filtering is used in order...
No, there is no 150 ms lead of visual speech on auditory speech, but a range of audiovisual asynchronies varying from small audio lead to large audio lag.

Directory of Open Access Journals (Sweden)

Jean-Luc Schwartz

2014-07-01

Full Text Available An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call "preparatory gestures". However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call "comodulatory gestures" providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction.
Preschool speech articulation and nonword repetition abilities may help predict eventual recovery or persistence of stuttering.

Science.gov (United States)

Spencer, Caroline; Weber-Fox, Christine

2014-09-01

In preschool children, we investigated whether expressive and receptive language, phonological, articulatory, and/or verbal working memory proficiencies aid in predicting eventual recovery or persistence of stuttering. Participants included 65 children, including 25 children who do not stutter (CWNS) and 40 who stutter (CWS) recruited at age 3;9-5;8. At initial testing, participants were administered the Test of Auditory Comprehension of Language, 3rd edition (TACL-3), Structured Photographic Expressive Language Test, 3rd edition (SPELT-3), Bankson-Bernthal Test of Phonology-Consonant Inventory subtest (BBTOP-CI), Nonword Repetition Test (NRT; Dollaghan & Campbell, 1998), and Test of Auditory Perceptual Skills-Revised (TAPS-R) auditory number memory and auditory word memory subtests. Stuttering behaviors of CWS were assessed in subsequent years, forming groups whose stuttering eventually persisted (CWS-Per; n=19) or recovered (CWS-Rec; n=21). Proficiency scores in morphosyntactic skills, consonant production, verbal working memory for known words, and phonological working memory and speech production for novel nonwords obtained at the initial testing were analyzed for each group. CWS-Per were less proficient than CWNS and CWS-Rec in measures of consonant production (BBTOP-CI) and repetition of novel phonological sequences (NRT). In contrast, receptive language, expressive language, and verbal working memory abilities did not distinguish CWS-Rec from CWS-Per. Binary logistic regression analysis indicated that preschool BBTOP-CI scores and overall NRT proficiency significantly predicted future recovery status. Results suggest that phonological and speech articulation abilities in the preschool years should be considered with other predictive factors as part of a comprehensive risk assessment for the development of chronic stuttering. At the end of this activity the reader will be able to: (1) describe the current status of nonlinguistic and linguistic predictors for
Financial Distress Prediction using Linear Discriminant Analysis and Support Vector Machine

Science.gov (United States)

Santoso, Noviyanti; Wibowo, Wahyu

2018-03-01

A financial difficulty is the early stages before the bankruptcy. Bankruptcies caused by the financial distress can be seen from the financial statements of the company. The ability to predict financial distress became an important research topic because it can provide early warning for the company. In addition, predicting financial distress is also beneficial for investors and creditors. This research will be made the prediction model of financial distress at industrial companies in Indonesia by comparing the performance of Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) combined with variable selection technique. The result of this research is prediction model based on hybrid Stepwise-SVM obtains better balance among fitting ability, generalization ability and model stability than the other models.
THE DIRECTIVE SPEECH ACTS USED IN ENGLISH SPEAKING CLASS

Directory of Open Access Journals (Sweden)

Muhammad Khatib Bayanuddin

2016-12-01

Full Text Available This research discusses about an analysis of the directive speech acts used in english speaking class at the third semester of english speaking class of english study program of IAIN STS Jambi. The aims of this research are to describe the types of directive speech acts and politeness strategies that found in English speaking class. This research used descriptive qualitative method. This method used to describe clearly about the types and politeness strategies of directive speech acts based on the data in English speaking class. The result showed that in English speaking class that there are some types and politeness strategies of directive speech acts, such as: requestives, questions, requirements, prohibitives, permissives, and advisores as types, as well as on-record indirect strategies (prediction statement, strong obligation statement, possibility statement, weaker obligation statement, volitional statement, direct strategies (imperative, performative, and nonsentential strategies as politeness strategies. The achievement of this research are hoped can be additional knowledge about linguistics study, especially in directive speech acts and can be developed for future researches. Key words: directive speech acts, types, politeness strategies.
Causes of Speech Disorders in Primary School Students of Zahedan

Directory of Open Access Journals (Sweden)

Saeed Fakhrerahimi

2013-02-01

Full Text Available Background: Since making communication with others is the most important function of speech, undoubtedly, any type of disorder in speech will affect the human communicability with others. The objective of the study was to investigate reasons behind the [high] prevalence rate of stammer, producing disorders and aglossia.Materials and Methods: This descriptive-analytical study was conducted on 118 male and female students, who were studying in a primary school in Zahedan; they had referred to the Speech Therapy Centers of Zahedan University of Medical Sciences in a period of seven months. The speech therapist examinations, diagnosis tools common in speech therapy, Spielberg Children Trait and also patients' cases were used to find the reasons behind the [high] prevalence rate of speech disorders. Results: Psychological causes had the highest rate of correlation with the speech disorders among the other factors affecting the speech disorders. After psychological causes, family history and age of the subjects are the other factors which may bring about the speech disorders (P<0.05. Bilingualism and birth order has a negative relationship with the speech disorders. Likewise, another result of this study shows that only psychological causes, social causes, hereditary causes and age of subjects can predict the speech disorders (P<0.05.Conclusion: The present study shows that the speech disorders have a strong and close relationship with the psychological causes at the first step and also history of family and age of individuals at the next steps.
The role of high-frequency envelope fluctuations for speech masking release

DEFF Research Database (Denmark)

Jørgensen, Søren; Dau, Torsten

2013-01-01

The speech-based envelope power spectrum model (sEPSM; Jørgensen and Dau, 2011; Jørgensen et al., 2013) was shown to successfully predict speech intelligibility in conditions with stationary and fluctuating interferers, reverberation, and spectral subtraction. The key element in the model...... was the multi-resolution estimation of the signal-to-noise ratio in the envelope domain (SNRenv) at the output of a modulation filterbank. The simulations suggested that mainly modulation filters centered in the range from 1-8 Hz contribute to speech intelligibility in the case of stationary maskers whereas...... modulation filters tuned to frequencies above 16 Hz might be important in the case of fluctuating maskers. In the present study, the role of high-frequency envelope fluctuations for speech masking release was further investigated in conditions of speech-on-speech masking. Simulations were compared to various...
The role of high-frequency envelope fluctuations for speech masking release

DEFF Research Database (Denmark)

Jørgensen, Søren; Dau, Torsten

2013-01-01

The speech-based envelope power spectrum model [sEPSM; Jørgensen and Dau (2011), Jørgensen et al. (2013)] was shown to successfully predict speech intelligibility in conditions with stationary and fluctuating interferers, reverberation, and spectral subtraction. The key element in the model...... was the multi-resolution estimation of the signal-to-noise ratio in the envelope domain (SNRenv) at the output of a modulation filterbank. The simulations suggested that mainly modulation filters centered in the range from 1 to 8 Hz contribute to speech intelligibility in the case of stationary maskers whereas...... modulation filters tuned to frequencies above 16 Hz might be important in the case of fluctuating maskers. In the present study, the role of high-frequency envelope fluctuations for speech masking release was further investigated in conditions of speech-on-speech masking. Simulations were compared to various...

Objective support for subjective reports of successful inner speech in two people with aphasia.

Science.gov (United States)

Hayward, William; Snider, Sarah F; Luta, George; Friedman, Rhonda B; Turkeltaub, Peter E

2016-01-01

People with aphasia frequently report being able to say a word correctly in their heads, even if they are unable to say that word aloud. It is difficult to know what is meant by these reports of "successful inner speech". We probe the experience of successful inner speech in two people with aphasia. We show that these reports are associated with correct overt speech and phonologically related nonword errors, that they relate to word characteristics associated with ease of lexical access but not ease of production, and that they predict whether or not individual words are relearned during anomia treatment. These findings suggest that reports of successful inner speech are meaningful and may be useful to study self-monitoring in aphasia, to better understand anomia, and to predict treatment outcomes. Ultimately, the study of inner speech in people with aphasia could provide critical insights that inform our understanding of normal language.
An Investigation of effective factors on nurses\\' speech errors

Directory of Open Access Journals (Sweden)

Maryam Tafaroji yeganeh

2017-03-01

Full Text Available Background : Speech errors are a branch of psycholinguistic science. Speech error or slip of tongue is a natural process that happens to everyone. The importance of this research is because of sensitivity and importance of nursing in which the speech errors may be interfere in the treatment of patients, but unfortunately no research has been done yet in this field.This research has been done to study the factors (personality, stress, fatigue and insomnia which cause speech errors happen to nurses of Ilam province. Materials and Methods: The sample of this correlation-descriptive research consists of 50 nurses working in Mustafa Khomeini Hospital of Ilam province who were selected randomly. Our data were collected using The Minnesota Multiphasic Personality Inventory, NEO-Five Factor Inventory and Expanded Nursing Stress Scale, and were analyzed using SPSS version 20, descriptive, inferential and multivariate linear regression or two-variable statistical methods (with significant level: p≤0. 05. Results: 30 (60% of nurses participating in the study were female and 19 (38% were male. In this study, all three factors (type of personality, stress and fatigue have significant effects on nurses' speech errors Conclusion: 30 (60% of nurses participating in the study were female and 19 (38% were male. In this study, all three factors (type of personality, stress and fatigue have significant effects on nurses' speech errors.
Examining speech perception in noise and cognitive functions in the elderly.

Science.gov (United States)

Meister, Hartmut; Schreitmüller, Stefan; Grugel, Linda; Beutner, Dirk; Walger, Martin; Meister, Ingo

2013-12-01

The purpose of this study was to investigate the relationship of cognitive functions (i.e., working memory [WM]) and speech recognition against different background maskers in older individuals. Speech reception thresholds (SRTs) were determined using a matrix-sentence test. Unmodulated noise, modulated noise (International Collegium for Rehabilitative Audiology [ICRA] noise 5-250), and speech fragments (International Speech Test Signal [ISTS]) were used as background maskers. Verbal WM was assessed using the Verbal Learning and Memory Test (VLMT; Helmstaedter & Durwen, 1990). Measurements were conducted with 14 normal-hearing older individuals and a control group of 12 normal-hearing young listeners. Despite their normal hearing ability, the young listeners outperformed the older individuals in all background maskers. These differences were largest for the modulated maskers. SRTs were significantly correlated with the scores of the VLMT. A linear regression model also included WM as the only significant predictor variable. The results support the assumption that WM plays an important role for speech understanding and that it might have impact on results obtained using speech audiometry. Thus, an individual's WM capacity should be considered with aural diagnosis and rehabilitation. The VLMT proved to be a clinically applicable test for WM. Further cognitive functions important with speech understanding are currently being investigated within the SAKoLA (Sprachaudiometrie und kognitive Leistungen im Alter [Speech Audiometry and Cognitive Functions in the Elderly]) project.
Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals: Effects of Adding Visual Cues to Auditory Speech Stimuli.

Science.gov (United States)

Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

2016-06-17

The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. © The Author(s) 2016.
Using NCAP to predict RFI effects in linear bipolar integrated circuits

Science.gov (United States)

Fang, T.-F.; Whalen, J. J.; Chen, G. K. C.

1980-11-01

Applications of the Nonlinear Circuit Analysis Program (NCAP) to calculate RFI effects in electronic circuits containing discrete semiconductor devices have been reported upon previously. The objective of this paper is to demonstrate that the computer program NCAP also can be used to calcuate RFI effects in linear bipolar integrated circuits (IC's). The IC's reported upon are the microA741 operational amplifier (op amp) which is one of the most widely used IC's, and a differential pair which is a basic building block in many linear IC's. The microA741 op amp was used as the active component in a unity-gain buffer amplifier. The differential pair was used in a broad-band cascode amplifier circuit. The computer program NCAP was used to predict how amplitude-modulated RF signals are demodulated in the IC's to cause undesired low-frequency responses. The predicted and measured results for radio frequencies in the 0.050-60-MHz range are in good agreement.
The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility.

Science.gov (United States)

Bentsen, Thomas; May, Tobias; Kressner, Abigail A; Dau, Torsten

2018-01-01

Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements. A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech intelligibility in normal-hearing listeners. A substantial improvement of 25.4 percentage points in speech intelligibility scores was found going from a subband-based architecture, in which a Gaussian Mixture Model-based classifier predicts the distributions of speech and noise for each frequency channel, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where the units are assigned a continuous value between zero and one. Therefore, both components play significant roles and by combining them, speech intelligibility improvements were obtained in a six-talker condition at a low signal-to-noise ratio.
Lexical and sublexical units in speech perception.

Science.gov (United States)

Giroux, Ibrahima; Rey, Arnaud

2009-03-01

Saffran, Newport, and Aslin (1996a) found that human infants are sensitive to statistical regularities corresponding to lexical units when hearing an artificial spoken language. Two sorts of segmentation strategies have been proposed to account for this early word-segmentation ability: bracketing strategies, in which infants are assumed to insert boundaries into continuous speech, and clustering strategies, in which infants are assumed to group certain speech sequences together into units (Swingley, 2005). In the present study, we test the predictions of two computational models instantiating each of these strategies i.e., Serial Recurrent Networks: Elman, 1990; and Parser: Perruchet & Vinter, 1998 in an experiment where we compare the lexical and sublexical recognition performance of adults after hearing 2 or 10 min of an artificial spoken language. The results are consistent with Parser's predictions and the clustering approach, showing that performance on words is better than performance on part-words only after 10 min. This result suggests that word segmentation abilities are not merely due to stronger associations between sublexical units but to the emergence of stronger lexical representations during the development of speech perception processes. Copyright © 2009, Cognitive Science Society, Inc.
Predicting recovery of cognitive function soon after stroke: differential modeling of logarithmic and linear regression.

Science.gov (United States)

Suzuki, Makoto; Sugimura, Yuko; Yamada, Sumio; Omori, Yoshitsugu; Miyamoto, Masaaki; Yamamoto, Jun-ichi

2013-01-01

Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE) scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2) = 0.676, PLogarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.
Predicting Attitudes toward Press- and Speech Freedom across the U.S.A.: A Test of Climato-Economic, Parasite Stress, and Life History Theories.

Science.gov (United States)

Zhang, Jinguang; Reid, Scott A; Xu, Jing

2015-01-01

National surveys reveal notable individual differences in U.S. citizens' attitudes toward freedom of expression, including freedom of the press and speech. Recent theoretical developments and empirical findings suggest that ecological factors impact censorship attitudes in addition to individual difference variables (e.g., education, conservatism), but no research has compared the explanatory power of prominent ecological theories. This study tested climato-economic, parasite stress, and life history theories using four measures of attitudes toward censoring the press and offensive speech obtained from two national surveys in the U.S.A. Neither climate demands nor its interaction with state wealth--two key variables for climato-economic theory--predicted any of the four outcome measures. Interstate parasite stress significantly predicted two, with a marginally significant effect on the third, but the effects became non-significant when the analyses were stratified for race (as a control for extrinsic risks). Teenage birth rates (a proxy of human life history) significantly predicted attitudes toward press freedom during wartime, but the effect was the opposite of what life history theory predicted. While none of the three theories provided a fully successful explanation of individual differences in attitudes toward freedom of expression, parasite stress and life history theories do show potentials. Future research should continue examining the impact of these ecological factors on human psychology by further specifying the mechanisms and developing better measures for those theories.
Comparing the information conveyed by envelope modulation for speech intelligibility, speech quality, and music quality.

Science.gov (United States)

Kates, James M; Arehart, Kathryn H

2015-10-01

This paper uses mutual information to quantify the relationship between envelope modulation fidelity and perceptual responses. Data from several previous experiments that measured speech intelligibility, speech quality, and music quality are evaluated for normal-hearing and hearing-impaired listeners. A model of the auditory periphery is used to generate envelope signals, and envelope modulation fidelity is calculated using the normalized cross-covariance of the degraded signal envelope with that of a reference signal. Two procedures are used to describe the envelope modulation: (1) modulation within each auditory frequency band and (2) spectro-temporal processing that analyzes the modulation of spectral ripple components fit to successive short-time spectra. The results indicate that low modulation rates provide the highest information for intelligibility, while high modulation rates provide the highest information for speech and music quality. The low-to-mid auditory frequencies are most important for intelligibility, while mid frequencies are most important for speech quality and high frequencies are most important for music quality. Differences between the spectral ripple components used for the spectro-temporal analysis were not significant in five of the six experimental conditions evaluated. The results indicate that different modulation-rate and auditory-frequency weights may be appropriate for indices designed to predict different types of perceptual relationships.
Speech Production and Speech Discrimination by Hearing-Impaired Children.

Science.gov (United States)

Novelli-Olmstead, Tina; Ling, Daniel

1984-01-01

Seven hearing impaired children (five to seven years old) assigned to the Speakers group made highly significant gains in speech production and auditory discrimination of speech, while Listeners made only slight speech production gains and no gains in auditory discrimination. Combined speech and auditory training was more effective than auditory…
Implementation of Hybrid Speech Dereverberation Systems and Proposing Dual Microphone Farsi Database in Order to Evaluating Enhancement Systems

Directory of Open Access Journals (Sweden)

Farhad Faghani

2013-01-01

Full Text Available In various applications, such as speech recognition and automatic teleconferencing, the recorded speech signals may be corrupted by both noise and reverberation. Reverberation causes a noticeable change in speech intelligibility and quality. In this research, firstly reverberation is described. There are some de-reverberation enhancement algorithms that use only one microphone. They mostly use inverse filtering and spectral subtraction as their sub-systems. On the other hand, there are many multi-microphone speech enhancement systems; Delay-and-sum beam former is the most famous amongst them. Moreover, several efficient approaches have been also reported that use linear prediction (LP residual signal, inverse filtering, and phase error. Despite the improvements and benefits gained by the use of several input microphones, considering the tradeoff between these gains and the complexity and computational cost forced by the use of more microphones, many researchers have focused on dual-microphones systems. So, a review on Microphone array signal processing is explained and then an arrangement for two microphones systems is proposed. As we want to evaluate these algorithms for Farsi speech signals, the problem of speech intelligibility assessment has been explained and a Farsi word list for Diagnostic Rhyme Test (DRT is presented.The structure of presented word list is similar to that of English DRT words. In this research, after a brief study of above-mentioned methods, we propose and implement some hybrid techniques to benefit from the advantages of several methods and achieve significant improvement in output signals. It will be shown that the proposed method performs superior to the state-of-the-art dereverberation algorithms.
Damage to the anterior arcuate fasciculus predicts non-fluent speech production in aphasia

OpenAIRE

Fridriksson, Julius; Guo, Dazhou; Fillmore, Paul; Holland, Audrey; Rorden, Chris

2013-01-01

Non-fluent aphasia implies a relatively straightforward neurological condition characterized by limited speech output. However, it is an umbrella term for different underlying impairments affecting speech production. Several studies have sought the critical lesion location that gives rise to non-fluent aphasia. The results have been mixed but typically implicate anterior cortical regions such as Broca’s area, the left anterior insula, and deep white matter regions. To provide a clearer pictur...
Live Speech Driven Head-and-Eye Motion Generators.

Science.gov (United States)

Le, Binh H; Ma, Xiaohan; Deng, Zhigang

2012-11-01

This paper describes a fully automated framework to generate realistic head motion, eye gaze, and eyelid motion simultaneously based on live (or recorded) speech input. Its central idea is to learn separate yet interrelated statistical models for each component (head motion, gaze, or eyelid motion) from a prerecorded facial motion data set: 1) Gaussian Mixture Models and gradient descent optimization algorithm are employed to generate head motion from speech features; 2) Nonlinear Dynamic Canonical Correlation Analysis model is used to synthesize eye gaze from head motion and speech features, and 3) nonnegative linear regression is used to model voluntary eye lid motion and log-normal distribution is used to describe involuntary eye blinks. Several user studies are conducted to evaluate the effectiveness of the proposed speech-driven head and eye motion generator using the well-established paired comparison methodology. Our evaluation results clearly show that this approach can significantly outperform the state-of-the-art head and eye motion generation algorithms. In addition, a novel mocap+video hybrid data acquisition technique is introduced to record high-fidelity head movement, eye gaze, and eyelid motion simultaneously.
Stuttering Frequency, Speech Rate, Speech Naturalness, and Speech Effort During the Production of Voluntary Stuttering.

Science.gov (United States)

Davidow, Jason H; Grossman, Heather L; Edge, Robin L

2018-05-01

Voluntary stuttering techniques involve persons who stutter purposefully interjecting disfluencies into their speech. Little research has been conducted on the impact of these techniques on the speech pattern of persons who stutter. The present study examined whether changes in the frequency of voluntary stuttering accompanied changes in stuttering frequency, articulation rate, speech naturalness, and speech effort. In total, 12 persons who stutter aged 16-34 years participated. Participants read four 300-syllable passages during a control condition, and three voluntary stuttering conditions that involved attempting to produce purposeful, tension-free repetitions of initial sounds or syllables of a word for two or more repetitions (i.e., bouncing). The three voluntary stuttering conditions included bouncing on 5%, 10%, and 15% of syllables read. Friedman tests and follow-up Wilcoxon signed ranks tests were conducted for the statistical analyses. Stuttering frequency, articulation rate, and speech naturalness were significantly different between the voluntary stuttering conditions. Speech effort did not differ between the voluntary stuttering conditions. Stuttering frequency was significantly lower during the three voluntary stuttering conditions compared to the control condition, and speech effort was significantly lower during two of the three voluntary stuttering conditions compared to the control condition. Due to changes in articulation rate across the voluntary stuttering conditions, it is difficult to conclude, as has been suggested previously, that voluntary stuttering is the reason for stuttering reductions found when using voluntary stuttering techniques. Additionally, future investigations should examine different types of voluntary stuttering over an extended period of time to determine their impact on stuttering frequency, speech rate, speech naturalness, and speech effort.
Comprehension of synthetic speech and digitized natural speech by adults with aphasia.

Science.gov (United States)

Hux, Karen; Knollman-Porter, Kelly; Brown, Jessica; Wallace, Sarah E

2017-09-01

Using text-to-speech technology to provide simultaneous written and auditory content presentation may help compensate for chronic reading challenges if people with aphasia can understand synthetic speech output; however, inherent auditory comprehension challenges experienced by people with aphasia may make understanding synthetic speech difficult. This study's purpose was to compare the preferences and auditory comprehension accuracy of people with aphasia when listening to sentences generated with digitized natural speech, Alex synthetic speech (i.e., Macintosh platform), or David synthetic speech (i.e., Windows platform). The methodology required each of 20 participants with aphasia to select one of four images corresponding in meaning to each of 60 sentences comprising three stimulus sets. Results revealed significantly better accuracy given digitized natural speech than either synthetic speech option; however, individual participant performance analyses revealed three patterns: (a) comparable accuracy regardless of speech condition for 30% of participants, (b) comparable accuracy between digitized natural speech and one, but not both, synthetic speech option for 45% of participants, and (c) greater accuracy with digitized natural speech than with either synthetic speech option for remaining participants. Ranking and Likert-scale rating data revealed a preference for digitized natural speech and David synthetic speech over Alex synthetic speech. Results suggest many individuals with aphasia can comprehend synthetic speech options available on popular operating systems. Further examination of synthetic speech use to support reading comprehension through text-to-speech technology is thus warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Common neural substrates support speech and non-speech vocal tract gestures

OpenAIRE

Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M.J.; Poletto, Christopher J.; Ludlow, Christy L.

2009-01-01

The issue of whether speech is supported by the same neural substrates as non-speech vocal-tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, were compared to the production of speech sylla...
Introductory speeches

International Nuclear Information System (INIS)

2001-01-01

This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering
Linear filters as a method of real-time prediction of geomagnetic activity

International Nuclear Information System (INIS)

McPherron, R.L.; Baker, D.N.; Bargatze, L.F.

1985-01-01

Important factors controlling geomagnetic activity include the solar wind velocity, the strength of the interplanetary magnetic field (IMF), and the field orientation. Because these quantities change so much in transit through the solar wind, real-time monitoring immediately upstream of the earth provides the best input for any technique of real-time prediction. One such technique is linear prediction filtering which utilizes past histories of the input and output of a linear system to create a time-invariant filter characterizing the system. Problems of nonlinearity or temporal changes of the system can be handled by appropriate choice of input parameters and piecewise approximation in various ranges of the input. We have created prediction filters for all the standard magnetic indices and tested their efficiency. The filters show that the initial response of the magnetosphere to a southward turning of the IMF peaks in 20 minutes and then again in 55 minutes. After a northward turning, auroral zone indices and the midlatitude ASYM index return to background within 2 hours, while Dst decays exponentially with a time constant of about 8 hours. This paper describes a simple, real-time system utilizing these filters which could predict a substantial fraction of the variation in magnetic activity indices 20 to 50 minutes in advance
The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system.

Science.gov (United States)

Zekveld, Adriana A; Kramer, Sophia E; Kessens, Judith M; Vlaming, Marcel S M G; Houtgast, Tammo

2009-04-01

The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that speech comprehension in noise by young listeners with normal hearing improves when presenting partly incorrect, automatically generated subtitles. The current study focused on the effects of age, hearing loss, visual working memory capacity, and linguistic skills on the benefit obtained from automatically generated subtitles during listening to speech in noise. In order to investigate the effects of age and hearing loss, three groups of participants were included: 22 young persons with normal hearing (YNH, mean age = 21 years), 22 middle-aged adults with normal hearing (MA-NH, mean age = 55 years) and 30 middle-aged adults with hearing impairment (MA-HI, mean age = 57 years). The benefit from automatic subtitling was measured by Speech Reception Threshold (SRT) tests (Plomp & Mimpen, 1979). Both unimodal auditory and bimodal audiovisual SRT tests were performed. In the audiovisual tests, the subtitles were presented simultaneously with the speech, whereas in the auditory test, only speech was presented. The difference between the auditory and audiovisual SRT was defined as the audiovisual benefit. Participants additionally rated the listening effort. We examined the influences of ASR accuracy level and text delay on the audiovisual benefit and the listening effort using a repeated measures General Linear Model analysis. In a correlation analysis, we evaluated the relationships between age, auditory SRT, visual working memory capacity and the audiovisual benefit and listening effort. The automatically generated subtitles improved speech comprehension in noise for all ASR accuracies and delays covered by the current study. Higher ASR accuracy levels resulted in more benefit obtained

On Low-level Cognitive Components of Speech

DEFF Research Database (Denmark)

Feng, Ling; Hansen, Lars Kai

2005-01-01

In this paper we analyze speech for low-level cognitive features using linear component analysis. We demonstrate generalizable component 'fingerprints' stemming from both phonemes and speaker. Phonemes are fingerprints found at the basic analysis window time scale (20 msec), while speaker...... 'voiceprints' are found at time scales around 1000 msec. The analysis is based on homomorphic filtering features and energy based sparsification....
Exploring Australian speech-language pathologists' use and perceptions ofnon-speech oral motor exercises.

Science.gov (United States)

Rumbach, Anna F; Rose, Tanya A; Cheah, Mynn

2018-01-29

To explore Australian speech-language pathologists' use of non-speech oral motor exercises, and rationales for using/not using non-speech oral motor exercises in clinical practice. A total of 124 speech-language pathologists practising in Australia, working with paediatric and/or adult clients with speech sound difficulties, completed an online survey. The majority of speech-language pathologists reported that they did not use non-speech oral motor exercises when working with paediatric or adult clients with speech sound difficulties. However, more than half of the speech-language pathologists working with adult clients who have dysarthria reported using non-speech oral motor exercises with this population. The most frequently reported rationale for using non-speech oral motor exercises in speech sound difficulty management was to improve awareness/placement of articulators. The majority of speech-language pathologists agreed there is no clear clinical or research evidence base to support non-speech oral motor exercise use with clients who have speech sound difficulties. This study provides an overview of Australian speech-language pathologists' reported use and perceptions of non-speech oral motor exercises' applicability and efficacy in treating paediatric and adult clients who have speech sound difficulties. The research findings provide speech-language pathologists with insight into how and why non-speech oral motor exercises are currently used, and adds to the knowledge base regarding Australian speech-language pathology practice of non-speech oral motor exercises in the treatment of speech sound difficulties. Implications for Rehabilitation Non-speech oral motor exercises refer to oral motor activities which do not involve speech, but involve the manipulation or stimulation of oral structures including the lips, tongue, jaw, and soft palate. Non-speech oral motor exercises are intended to improve the function (e.g., movement, strength) of oral structures. The
Refining Stimulus Parameters in Assessing Infant Speech Perception Using Visual Reinforcement Infant Speech Discrimination: Sensation Level.

Science.gov (United States)

Uhler, Kristin M; Baca, Rosalinda; Dudas, Emily; Fredrickson, Tammy

2015-01-01

Speech perception measures have long been considered an integral piece of the audiological assessment battery. Currently, a prelinguistic, standardized measure of speech perception is missing in the clinical assessment battery for infants and young toddlers. Such a measure would allow systematic assessment of speech perception abilities of infants as well as the potential to investigate the impact early identification of hearing loss and early fitting of amplification have on the auditory pathways. To investigate the impact of sensation level (SL) on the ability of infants with normal hearing (NH) to discriminate /a-i/ and /ba-da/ and to determine if performance on the two contrasts are significantly different in predicting the discrimination criterion. The design was based on a survival analysis model for event occurrence and a repeated measures logistic model for binary outcomes. The outcome for survival analysis was the minimum SL for criterion and the outcome for the logistic regression model was the presence/absence of achieving the criterion. Criterion achievement was designated when an infant's proportion correct score was >0.75 on the discrimination performance task. Twenty-two infants with NH sensitivity participated in this study. There were 9 males and 13 females, aged 6-14 mo. Testing took place over two to three sessions. The first session consisted of a hearing test, threshold assessment of the two speech sounds (/a/ and /i/), and if time and attention allowed, visual reinforcement infant speech discrimination (VRISD). The second session consisted of VRISD assessment for the two test contrasts (/a-i/ and /ba-da/). The presentation level started at 50 dBA. If the infant was unable to successfully achieve criterion (>0.75) at 50 dBA, the presentation level was increased to 70 dBA followed by 60 dBA. Data examination included an event analysis, which provided the probability of criterion distribution across SL. The second stage of the analysis was a
Speech intelligibility for normal hearing and hearing-impaired listeners in simulated room acoustic conditions

DEFF Research Database (Denmark)

Arweiler, Iris; Dau, Torsten; Poulsen, Torben

Speech intelligibility depends on many factors such as room acoustics, the acoustical properties and location of the signal and the interferers, and the ability of the (normal and impaired) auditory system to process monaural and binaural sounds. In the present study, the effect of reverberation...... on spatial release from masking was investigated in normal hearing and hearing impaired listeners using three types of interferers: speech shaped noise, an interfering female talker and speech-modulated noise. Speech reception thresholds (SRT) were obtained in three simulated environments: a listening room......, a classroom and a church. The data from the study provide constraints for existing models of speech intelligibility prediction (based on the speech intelligibility index, SII, or the speech transmission index, STI) which have shortcomings when reverberation and/or fluctuating noise affect speech...
Model for predicting non-linear crack growth considering load sequence effects (LOSEQ)

International Nuclear Information System (INIS)

Fuehring, H.

1982-01-01

A new analytical model for predicting non-linear crack growth is presented which takes into account the retardation as well as the acceleration effects due to irregular loading. It considers not only the maximum peak of a load sequence to effect crack growth but also all other loads of the history according to a generalised memory criterion. Comparisons between crack growth predicted by using the LOSEQ-programme and experimentally observed data are presented. (orig.) [de
Optimizing acoustical conditions for speech intelligibility in classrooms

Science.gov (United States)

Yang, Wonyoung

High speech intelligibility is imperative in classrooms where verbal communication is critical. However, the optimal acoustical conditions to achieve a high degree of speech intelligibility have previously been investigated with inconsistent results, and practical room-acoustical solutions to optimize the acoustical conditions for speech intelligibility have not been developed. This experimental study validated auralization for speech-intelligibility testing, investigated the optimal reverberation for speech intelligibility for both normal and hearing-impaired listeners using more realistic room-acoustical models, and proposed an optimal sound-control design for speech intelligibility based on the findings. The auralization technique was used to perform subjective speech-intelligibility tests. The validation study, comparing auralization results with those of real classroom speech-intelligibility tests, found that if the room to be auralized is not very absorptive or noisy, speech-intelligibility tests using auralization are valid. The speech-intelligibility tests were done in two different auralized sound fields---approximately diffuse and non-diffuse---using the Modified Rhyme Test and both normal and hearing-impaired listeners. A hybrid room-acoustical prediction program was used throughout the work, and it and a 1/8 scale-model classroom were used to evaluate the effects of ceiling barriers and reflectors. For both subject groups, in approximately diffuse sound fields, when the speech source was closer to the listener than the noise source, the optimal reverberation time was zero. When the noise source was closer to the listener than the speech source, the optimal reverberation time was 0.4 s (with another peak at 0.0 s) with relative output power levels of the speech and noise sources SNS = 5 dB, and 0.8 s with SNS = 0 dB. In non-diffuse sound fields, when the noise source was between the speaker and the listener, the optimal reverberation time was 0.6 s with
Speech Perception With Combined Electric-Acoustic Stimulation: A Simulation and Model Comparison.

Science.gov (United States)

Rader, Tobias; Adel, Youssef; Fastl, Hugo; Baumann, Uwe

2015-01-01

The aim of this study is to simulate speech perception with combined electric-acoustic stimulation (EAS), verify the advantage of combined stimulation in normal-hearing (NH) subjects, and then compare it with cochlear implant (CI) and EAS user results from the authors' previous study. Furthermore, an automatic speech recognition (ASR) system was built to examine the impact of low-frequency information and is proposed as an applied model to study different hypotheses of the combined-stimulation advantage. Signal-detection-theory (SDT) models were applied to assess predictions of subject performance without the need to assume any synergistic effects. Speech perception was tested using a closed-set matrix test (Oldenburg sentence test), and its speech material was processed to simulate CI and EAS hearing. A total of 43 NH subjects and a customized ASR system were tested. CI hearing was simulated by an aurally adequate signal spectrum analysis and representation, the part-tone-time-pattern, which was vocoded at 12 center frequencies according to the MED-EL DUET speech processor. Residual acoustic hearing was simulated by low-pass (LP)-filtered speech with cutoff frequencies 200 and 500 Hz for NH subjects and in the range from 100 to 500 Hz for the ASR system. Speech reception thresholds were determined in amplitude-modulated noise and in pseudocontinuous noise. Previously proposed SDT models were lastly applied to predict NH subject performance with EAS simulations. NH subjects tested with EAS simulations demonstrated the combined-stimulation advantage. Increasing the LP cutoff frequency from 200 to 500 Hz significantly improved speech reception thresholds in both noise conditions. In continuous noise, CI and EAS users showed generally better performance than NH subjects tested with simulations. In modulated noise, performance was comparable except for the EAS at cutoff frequency 500 Hz where NH subject performance was superior. The ASR system showed similar behavior
Reviewing the connection between speech and obstructive sleep apnea.

Science.gov (United States)

Espinoza-Cuadros, Fernando; Fernández-Pozo, Rubén; Toledano, Doroteo T; Alcázar-Ramírez, José D; López-Gonzalo, Eduardo; Hernández-Gómez, Luis A

2016-02-20

Sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). The altered UA structure or function in OSA speakers has led to hypothesize the automatic analysis of speech for OSA assessment. In this paper we critically review several approaches using speech analysis and machine learning techniques for OSA detection, and discuss the limitations that can arise when using machine learning techniques for diagnostic applications. A large speech database including 426 male Spanish speakers suspected to suffer OSA and derived to a sleep disorders unit was used to study the clinical validity of several proposals using machine learning techniques to predict the apnea-hypopnea index (AHI) or classify individuals according to their OSA severity. AHI describes the severity of patients' condition. We first evaluate AHI prediction using state-of-the-art speaker recognition technologies: speech spectral information is modelled using supervectors or i-vectors techniques, and AHI is predicted through support vector regression (SVR). Using the same database we then critically review several OSA classification approaches previously proposed. The influence and possible interference of other clinical variables or characteristics available for our OSA population: age, height, weight, body mass index, and cervical perimeter, are also studied. The poor results obtained when estimating AHI using supervectors or i-vectors followed by SVR contrast with the positive results reported by previous research. This fact prompted us to a careful review of these approaches, also testing some reported results over our database. Several methodological limitations and deficiencies were detected that may have led to overoptimistic results. The methodological deficiencies observed after critically reviewing previous research can be relevant examples of potential pitfalls when using machine learning techniques for
[Improving speech comprehension using a new cochlear implant speech processor].

Science.gov (United States)

Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

2009-06-01

The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg
Speech coding

Energy Technology Data Exchange (ETDEWEB)

Ravishankar, C., Hughes Network Systems, Germantown, MD

1998-05-08

Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the
Integration of auditory and visual speech information

NARCIS (Netherlands)

Hall, M.; Smeele, P.M.T.; Kuhl, P.K.

1998-01-01

The integration of auditory and visual speech is observed when modes specify different places of articulation. Influences of auditory variation on integration were examined using consonant identifi-cation, plus quality and similarity ratings. Auditory identification predicted auditory-visual
Mechanisms of Interaction in Speech Production

Science.gov (United States)

Baese-Berk, Melissa; Goldrick, Matthew

2009-01-01

Many theories predict the presence of interactive effects involving information represented by distinct cognitive processes in speech production. There is considerably less agreement regarding the precise cognitive mechanisms that underlie these interactive effects. For example, are they driven by purely production-internal mechanisms (e.g., Dell,…
Application of linear and non-linear low-Re k-ε models in two-dimensional predictions of convective heat transfer in passages with sudden contractions

International Nuclear Information System (INIS)

Raisee, M.; Hejazi, S.H.

2007-01-01

This paper presents comparisons between heat transfer predictions and measurements for developing turbulent flow through straight rectangular channels with sudden contractions at the mid-channel section. The present numerical results were obtained using a two-dimensional finite-volume code which solves the governing equations in a vertical plane located at the lateral mid-point of the channel. The pressure field is obtained with the well-known SIMPLE algorithm. The hybrid scheme was employed for the discretization of convection in all transport equations. For modeling of the turbulence, a zonal low-Reynolds number k-ε model and the linear and non-linear low-Reynolds number k-ε models with the 'Yap' and 'NYP' length-scale correction terms have been employed. The main objective of present study is to examine the ability of the above turbulence models in the prediction of convective heat transfer in channels with sudden contraction at a mid-channel section. The results of this study show that a sudden contraction creates a relatively small recirculation bubble immediately downstream of the channel contraction. This separation bubble influences the distribution of local heat transfer coefficient and increases the heat transfer levels by a factor of three. Computational results indicate that all the turbulence models employed produce similar flow fields. The zonal k-ε model produces the wrong Nusselt number distribution by underpredicting heat transfer levels in the recirculation bubble and overpredicting them in the developing region. The linear low-Re k-ε model, on the other hand, returns the correct Nusselt number distribution in the recirculation region, although it somewhat overpredicts heat transfer levels in the developing region downstream of the separation bubble. The replacement of the 'Yap' term with the 'NYP' term in the linear low-Re k-ε model results in a more accurate local Nusselt number distribution. Moreover, the application of the non-linear k
Brain responses and looking behaviour during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life.

Directory of Open Access Journals (Sweden)

Elena V Kushnerenko

2013-07-01

Full Text Available The use of visual cues during the processing of audiovisual speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6-9 months to 14-16 months of age. We used eye-tracking to examine whether individual differences in visual attention during audiovisual processing of speech in 6 to 9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6-9 month old infants also participated in an event-related potential (ERP audiovisual task within the same experimental session. Language development was then followed-up at the age of 14-16 months, using two measures of language development, the Preschool Language Scale (PLS and the Oxford Communicative Development Inventory (CDI. The results show that those infants who were less efficient in auditory speech processing at the age of 6-9 months had lower receptive language scores at 14-16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audio-visually incongruent stimuli at 6-9 months were both significantly associated with language development at 14-16 months. These findings add to the understanding of individual differences in neural signatures of audiovisual processing and associated looking behaviour in infants.
The analysis of speech acts patterns in two Egyptian inaugural speeches

Directory of Open Access Journals (Sweden)

Imad Hayif Sameer

2017-09-01

Full Text Available The theory of speech acts, which clarifies what people do when they speak, is not about individual words or sentences that form the basic elements of human communication, but rather about particular speech acts that are performed when uttering words. A speech act is the attempt at doing something purely by speaking. Many things can be done by speaking. Speech acts are studied under what is called speech act theory, and belong to the domain of pragmatics. In this paper, two Egyptian inaugural speeches from El-Sadat and El-Sisi, belonging to different periods were analyzed to find out whether there were differences within this genre in the same culture or not. The study showed that there was a very small difference between these two speeches which were analyzed according to Searle’s theory of speech acts. In El Sadat’s speech, commissives came to occupy the first place. Meanwhile, in El–Sisi’s speech, assertives occupied the first place. Within the speeches of one culture, we can find that the differences depended on the circumstances that surrounded the elections of the Presidents at the time. Speech acts were tools they used to convey what they wanted and to obtain support from their audiences.
Speech Problems

Science.gov (United States)

... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...
Linearized and Kernelized Sparse Multitask Learning for Predicting Cognitive Outcomes in Alzheimer’s Disease

Directory of Open Access Journals (Sweden)

Xiaoli Liu

2018-01-01

Full Text Available Alzheimer’s disease (AD has been not only the substantial financial burden to the health care system but also the emotional burden to patients and their families. Predicting cognitive performance of subjects from their magnetic resonance imaging (MRI measures and identifying relevant imaging biomarkers are important research topics in the study of Alzheimer’s disease. Recently, the multitask learning (MTL methods with sparsity-inducing norm (e.g., l2,1-norm have been widely studied to select the discriminative feature subset from MRI features by incorporating inherent correlations among multiple clinical cognitive measures. However, these previous works formulate the prediction tasks as a linear regression problem. The major limitation is that they assumed a linear relationship between the MRI features and the cognitive outcomes. Some multikernel-based MTL methods have been proposed and shown better generalization ability due to the nonlinear advantage. We quantify the power of existing linear and nonlinear MTL methods by evaluating their performance on cognitive score prediction of Alzheimer’s disease. Moreover, we extend the traditional l2,1-norm to a more general lql1-norm (q≥1. Experiments on the Alzheimer’s Disease Neuroimaging Initiative database showed that the nonlinear l2,1lq-MKMTL method not only achieved better prediction performance than the state-of-the-art competitive methods but also effectively fused the multimodality data.
Alternative Speech Communication System for Persons with Severe Speech Disorders

Science.gov (United States)

Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

2009-12-01

Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.
A Danish open-set speech corpus for competing-speech studies

DEFF Research Database (Denmark)

Nielsen, Jens Bo; Dau, Torsten; Neher, Tobias

2014-01-01

Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs......) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded...... with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed...
Speech entrainment enables patients with Broca’s aphasia to produce fluent speech

Science.gov (United States)

Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

2012-01-01

A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production

Prediction of speech masking release for fluctuating interferers based on the envelope power signal-to-noise ratio

DEFF Research Database (Denmark)

Jørgensen, Søren; Dau, Torsten

2012-01-01

-hearing listeners in conditions with additive stationary noise, reverberation, and nonlinear processing with spectral subtraction. The latter condition represents a case in which the standardized speech intelligibility index and speech transmission index fail. However, the sEPSM is limited to conditions...... for the stationary and non-stationary interferers, demonstrating further that the envelope SNR is crucial for speech comprehension....
Multispectral code excited linear prediction coding and its application in magnetic resonance images.

Science.gov (United States)

Hu, J H; Wang, Y; Cahill, P T

1997-01-01

This paper reports a multispectral code excited linear prediction (MCELP) method for the compression of multispectral images. Different linear prediction models and adaptation schemes have been compared. The method that uses a forward adaptive autoregressive (AR) model has been proven to achieve a good compromise between performance, complexity, and robustness. This approach is referred to as the MFCELP method. Given a set of multispectral images, the linear predictive coefficients are updated over nonoverlapping three-dimensional (3-D) macroblocks. Each macroblock is further divided into several 3-D micro-blocks, and the best excitation signal for each microblock is determined through an analysis-by-synthesis procedure. The MFCELP method has been applied to multispectral magnetic resonance (MR) images. To satisfy the high quality requirement for medical images, the error between the original image set and the synthesized one is further specified using a vector quantizer. This method has been applied to images from 26 clinical MR neuro studies (20 slices/study, three spectral bands/slice, 256x256 pixels/band, 12 b/pixel). The MFCELP method provides a significant visual improvement over the discrete cosine transform (DCT) based Joint Photographers Expert Group (JPEG) method, the wavelet transform based embedded zero-tree wavelet (EZW) coding method, and the vector tree (VT) coding method, as well as the multispectral segmented autoregressive moving average (MSARMA) method we developed previously.
On Low-level Cognitive Components of Speech

DEFF Research Database (Denmark)

Feng, Ling; Hansen, Lars Kai

2006-01-01

In this paper we analyze speech for low-level cognitive features using linear component analysis. We demonstrate generalizable component ‘fingerprints’ stemming from both phonemes and speakers. Phonemes are fingerprints found at the basic analysis window time scale (20 msec), while speaker...... ‘voiceprints’ are found at time scales around 1000 msec. The analysis is based on homomorphic filtering features and energy based sparsification....
Predicting Madura cattle growth curve using non-linear model

Science.gov (United States)

Widyas, N.; Prastowo, S.; Widi, T. S. M.; Baliarti, E.

2018-03-01

Madura cattle is Indonesian native. It is a composite breed that has undergone hundreds of years of selection and domestication to reach nowadays remarkable uniformity. Crossbreeding has reached the isle of Madura and the Madrasin, a cross between Madura cows and Limousine semen emerged. This paper aimed to compare the growth curve between Madrasin and one type of pure Madura cows, the common Madura cattle (Madura) using non-linear models. Madura cattles are kept traditionally thus reliable records are hardly available. Data were collected from small holder farmers in Madura. Cows from different age classes (5years) were observed, and body measurements (chest girth, body length and wither height) were taken. In total 63 Madura and 120 Madrasin records obtained. Linear model was built with cattle sub-populations and age as explanatory variables. Body weights were estimated based on the chest girth. Growth curves were built using logistic regression. Results showed that within the same age, Madrasin has significantly larger body compared to Madura (plogistic models fit better for Madura and Madrasin cattle data; with the estimated MSE for these models were 39.09 and 759.28 with prediction accuracy of 99 and 92% for Madura and Madrasin, respectively. Prediction of growth curve using logistic regression model performed well in both types of Madura cattle. However, attempts to administer accurate data on Madura cattle are necessary to better characterize and study these cattle.
Predicting Attitudes toward Press- and Speech Freedom across the U.S.A.: A Test of Climato-Economic, Parasite Stress, and Life History Theories.

Directory of Open Access Journals (Sweden)

Jinguang Zhang

Full Text Available National surveys reveal notable individual differences in U.S. citizens' attitudes toward freedom of expression, including freedom of the press and speech. Recent theoretical developments and empirical findings suggest that ecological factors impact censorship attitudes in addition to individual difference variables (e.g., education, conservatism, but no research has compared the explanatory power of prominent ecological theories. This study tested climato-economic, parasite stress, and life history theories using four measures of attitudes toward censoring the press and offensive speech obtained from two national surveys in the U.S.A. Neither climate demands nor its interaction with state wealth--two key variables for climato-economic theory--predicted any of the four outcome measures. Interstate parasite stress significantly predicted two, with a marginally significant effect on the third, but the effects became non-significant when the analyses were stratified for race (as a control for extrinsic risks. Teenage birth rates (a proxy of human life history significantly predicted attitudes toward press freedom during wartime, but the effect was the opposite of what life history theory predicted. While none of the three theories provided a fully successful explanation of individual differences in attitudes toward freedom of expression, parasite stress and life history theories do show potentials. Future research should continue examining the impact of these ecological factors on human psychology by further specifying the mechanisms and developing better measures for those theories.
Predicting Attitudes toward Press- and Speech Freedom across the U.S.A.: A Test of Climato-Economic, Parasite Stress, and Life History Theories

Science.gov (United States)

Zhang, Jinguang; Reid, Scott A.; Xu, Jing

2015-01-01

National surveys reveal notable individual differences in U.S. citizens’ attitudes toward freedom of expression, including freedom of the press and speech. Recent theoretical developments and empirical findings suggest that ecological factors impact censorship attitudes in addition to individual difference variables (e.g., education, conservatism), but no research has compared the explanatory power of prominent ecological theories. This study tested climato-economic, parasite stress, and life history theories using four measures of attitudes toward censoring the press and offensive speech obtained from two national surveys in the U.S.A. Neither climate demands nor its interaction with state wealth—two key variables for climato-economic theory—predicted any of the four outcome measures. Interstate parasite stress significantly predicted two, with a marginally significant effect on the third, but the effects became non-significant when the analyses were stratified for race (as a control for extrinsic risks). Teenage birth rates (a proxy of human life history) significantly predicted attitudes toward press freedom during wartime, but the effect was the opposite of what life history theory predicted. While none of the three theories provided a fully successful explanation of individual differences in attitudes toward freedom of expression, parasite stress and life history theories do show potentials. Future research should continue examining the impact of these ecological factors on human psychology by further specifying the mechanisms and developing better measures for those theories. PMID:26030736
Multimodal Speech Capture System for Speech Rehabilitation and Learning.

Science.gov (United States)

Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam

2017-11-01

Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.
Audiomotor Perceptual Training Enhances Speech Intelligibility in Background Noise.

Science.gov (United States)

Whitton, Jonathon P; Hancock, Kenneth E; Shannon, Jeffrey M; Polley, Daniel B

2017-11-06

Sensory and motor skills can be improved with training, but learning is often restricted to practice stimuli. As an exception, training on closed-loop (CL) sensorimotor interfaces, such as action video games and musical instruments, can impart a broad spectrum of perceptual benefits. Here we ask whether computerized CL auditory training can enhance speech understanding in levels of background noise that approximate a crowded restaurant. Elderly hearing-impaired subjects trained for 8 weeks on a CL game that, like a musical instrument, challenged them to monitor subtle deviations between predicted and actual auditory feedback as they moved their fingertip through a virtual soundscape. We performed our study as a randomized, double-blind, placebo-controlled trial by training other subjects in an auditory working-memory (WM) task. Subjects in both groups improved at their respective auditory tasks and reported comparable expectations for improved speech processing, thereby controlling for placebo effects. Whereas speech intelligibility was unchanged after WM training, subjects in the CL training group could correctly identify 25% more words in spoken sentences or digit sequences presented in high levels of background noise. Numerically, CL audiomotor training provided more than three times the benefit of our subjects' hearing aids for speech processing in noisy listening conditions. Gains in speech intelligibility could be predicted from gameplay accuracy and baseline inhibitory control. However, benefits did not persist in the absence of continuing practice. These studies employ stringent clinical standards to demonstrate that perceptual learning on a computerized audio game can transfer to "real-world" communication challenges. Copyright © 2017 Elsevier Ltd. All rights reserved.
Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

Science.gov (United States)

van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

2007-01-01

Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…
Simultaneous Treatment of Grammatical and Speech-Comprehensibility Deficits in Children with Down Syndrome

Science.gov (United States)

Camarata, Stephen; Yoder, Paul; Camarata, Mary

2006-01-01

Children with Down syndrome often display speech-comprehensibility and grammatical deficits beyond what would be predicted based upon general mental age. Historically, speech-comprehensibility has often been treated using traditional articulation therapy and oral-motor training so there may be little or no coordination of grammatical and…
Detecting Parkinson's disease from sustained phonation and speech signals.

Directory of Open Access Journals (Sweden)

Evaldas Vaiciukynas

Full Text Available This study investigates signals from sustained phonation and text-dependent speech modalities for Parkinson's disease screening. Phonation corresponds to the vowel /a/ voicing task and speech to the pronunciation of a short sentence in Lithuanian language. Signals were recorded through two channels simultaneously, namely, acoustic cardioid (AC and smart phone (SP microphones. Additional modalities were obtained by splitting speech recording into voiced and unvoiced parts. Information in each modality is summarized by 18 well-known audio feature sets. Random forest (RF is used as a machine learning algorithm, both for individual feature sets and for decision-level fusion. Detection performance is measured by the out-of-bag equal error rate (EER and the cost of log-likelihood-ratio. Essentia audio feature set was the best using the AC speech modality and YAAFE audio feature set was the best using the SP unvoiced modality, achieving EER of 20.30% and 25.57%, respectively. Fusion of all feature sets and modalities resulted in EER of 19.27% for the AC and 23.00% for the SP channel. Non-linear projection of a RF-based proximity matrix into the 2D space enriched medical decision support by visualization.
Effects of noise, nonlinear processing, and linear filtering on perceived music quality.

Science.gov (United States)

Arehart, Kathryn H; Kates, James M; Anderson, Melinda C

2011-03-01

The purpose of this study was to determine the relative impact of different forms of hearing aid signal processing on quality ratings of music. Music quality was assessed using a rating scale for three types of music: orchestral classical music, jazz instrumental, and a female vocalist. The music stimuli were subjected to a wide range of simulated hearing aid processing conditions including, (1) noise and nonlinear processing, (2) linear filtering, and (3) combinations of noise, nonlinear, and linear filtering. Quality ratings were measured in a group of 19 listeners with normal hearing and a group of 15 listeners with sensorineural hearing impairment. Quality ratings in both groups were generally comparable, were reliable across test sessions, were impacted more by noise and nonlinear signal processing than by linear filtering, and were significantly affected by the genre of music. The average quality ratings for music were reasonably well predicted by the hearing aid speech quality index (HASQI), but additional work is needed to optimize the index to the wide range of music genres and processing conditions included in this study.
A Homogeneous and Self-Dual Interior-Point Linear Programming Algorithm for Economic Model Predictive Control

DEFF Research Database (Denmark)

Sokoler, Leo Emil; Frison, Gianluca; Skajaa, Anders

2015-01-01

We develop an efficient homogeneous and self-dual interior-point method (IPM) for the linear programs arising in economic model predictive control of constrained linear systems with linear objective functions. The algorithm is based on a Riccati iteration procedure, which is adapted to the linear...... system of equations solved in homogeneous and self-dual IPMs. Fast convergence is further achieved using a warm-start strategy. We implement the algorithm in MATLAB and C. Its performance is tested using a conceptual power management case study. Closed loop simulations show that 1) the proposed algorithm...
Co-speech iconic gestures and visuo-spatial working memory.

Science.gov (United States)

Wu, Ying Choon; Coulson, Seana

2014-11-01

Three experiments tested the role of verbal versus visuo-spatial working memory in the comprehension of co-speech iconic gestures. In Experiment 1, participants viewed congruent discourse primes in which the speaker's gestures matched the information conveyed by his speech, and incongruent ones in which the semantic content of the speaker's gestures diverged from that in his speech. Discourse primes were followed by picture probes that participants judged as being either related or unrelated to the preceding clip. Performance on this picture probe classification task was faster and more accurate after congruent than incongruent discourse primes. The effect of discourse congruency on response times was linearly related to measures of visuo-spatial, but not verbal, working memory capacity, as participants with greater visuo-spatial WM capacity benefited more from congruent gestures. In Experiments 2 and 3, participants performed the same picture probe classification task under conditions of high and low loads on concurrent visuo-spatial (Experiment 2) and verbal (Experiment 3) memory tasks. Effects of discourse congruency and verbal WM load were additive, while effects of discourse congruency and visuo-spatial WM load were interactive. Results suggest that congruent co-speech gestures facilitate multi-modal language comprehension, and indicate an important role for visuo-spatial WM in these speech-gesture integration processes. Copyright © 2014 Elsevier B.V. All rights reserved.
Speech-discrimination scores modeled as a binomial variable.

Science.gov (United States)

Thornton, A R; Raffin, M J

1978-09-01

Many studies have reported variability data for tests of speech discrimination, and the disparate results of these studies have not been given a simple explanation. Arguments over the relative merits of 25- vs 50-word tests have ignored the basic mathematical properties inherent in the use of percentage scores. The present study models performance on clinical tests of speech discrimination as a binomial variable. A binomial model was developed, and some of its characteristics were tested against data from 4120 scores obtained on the CID Auditory Test W-22. A table for determining significant deviations between scores was generated and compared to observed differences in half-list scores for the W-22 tests. Good agreement was found between predicted and observed values. Implications of the binomial characteristics of speech-discrimination scores are discussed.
Electrophysiological measures of attention during speech perception predict metalinguistic skills in children

Directory of Open Access Journals (Sweden)

Lori Astheimer

2014-01-01

Full Text Available Event-related potential (ERP evidence demonstrates that preschool-aged children selectively attend to informative moments such as word onsets during speech perception. Although this observation indicates a role for attention in language processing, it is unclear whether this type of attention is part of basic speech perception mechanisms, higher-level language skills, or general cognitive abilities. The current study examined these possibilities by measuring ERPs from 5-year-old children listening to a narrative containing attention probes presented before, during, and after word onsets as well as at random control times. Children also completed behavioral tests assessing verbal and nonverbal skills. Probes presented after word onsets elicited a more negative ERP response beginning around 100 ms after probe onset than control probes, indicating increased attention to word-initial segments. Crucially, the magnitude of this difference was correlated with performance on verbal tasks, but showed no relationship to nonverbal measures. More specifically, ERP attention effects were most strongly correlated with performance on a complex metalinguistic task involving grammaticality judgments. These results demonstrate that effective allocation of attention during speech perception supports higher-level, controlled language processing in children by allowing them to focus on relevant information at individual word and complex sentence levels.
Effects of Instantaneous Multiband Dynamic Compression on Speech Intelligibility

Directory of Open Access Journals (Sweden)

Herzke Tobias

2005-01-01

Full Text Available The recruitment phenomenon, that is, the reduced dynamic range between threshold and uncomfortable level, is attributed to the loss of instantaneous dynamic compression on the basilar membrane. Despite this, hearing aids commonly use slow-acting dynamic compression for its compensation, because this was found to be the most successful strategy in terms of speech quality and intelligibility rehabilitation. Former attempts to use fast-acting compression gave ambiguous results, raising the question as to whether auditory-based recruitment compensation by instantaneous compression is in principle applicable in hearing aids. This study thus investigates instantaneous multiband dynamic compression based on an auditory filterbank. Instantaneous envelope compression is performed in each frequency band of a gammatone filterbank, which provides a combination of time and frequency resolution comparable to the normal healthy cochlea. The gain characteristics used for dynamic compression are deduced from categorical loudness scaling. In speech intelligibility tests, the instantaneous dynamic compression scheme was compared against a linear amplification scheme, which used the same filterbank for frequency analysis, but employed constant gain factors that restored the sound level for medium perceived loudness in each frequency band. In subjective comparisons, five of nine subjects preferred the linear amplification scheme and would not accept the instantaneous dynamic compression in hearing aids. Four of nine subjects did not perceive any quality differences. A sentence intelligibility test in noise (Oldenburg sentence test showed little to no negative effects of the instantaneous dynamic compression, compared to linear amplification. A word intelligibility test in quiet (one-syllable rhyme test showed that the subjects benefit from the larger amplification at low levels provided by instantaneous dynamic compression. Further analysis showed that the increase
The Dangers of Estimating V˙O2max Using Linear, Nonexercise Prediction Models.

Science.gov (United States)

Nevill, Alan M; Cooke, Carlton B

2017-05-01

This study aimed to compare the accuracy and goodness of fit of two competing models (linear vs allometric) when estimating V˙O2max (mL·kg·min) using nonexercise prediction models. The two competing models were fitted to the V˙O2max (mL·kg·min) data taken from two previously published studies. Study 1 (the Allied Dunbar National Fitness Survey) recruited 1732 randomly selected healthy participants, 16 yr and older, from 30 English parliamentary constituencies. Estimates of V˙O2max were obtained using a progressive incremental test on a motorized treadmill. In study 2, maximal oxygen uptake was measured directly during a fatigue limited treadmill test in older men (n = 152) and women (n = 146) 55 to 86 yr old. In both studies, the quality of fit associated with estimating V˙O2max (mL·kg·min) was superior using allometric rather than linear (additive) models based on all criteria (R, maximum log-likelihood, and Akaike information criteria). Results suggest that linear models will systematically overestimate V˙O2max for participants in their 20s and underestimate V˙O2max for participants in their 60s and older. The residuals saved from the linear models were neither normally distributed nor independent of the predicted values nor age. This will probably explain the absence of a key quadratic age term in the linear models, crucially identified using allometric models. Not only does the curvilinear age decline within an exponential function follow a more realistic age decline (the right-hand side of a bell-shaped curve), but the allometric models identified either a stature-to-body mass ratio (study 1) or a fat-free mass-to-body mass ratio (study 2), both associated with leanness when estimating V˙O2max. Adopting allometric models will provide more accurate predictions of V˙O2max (mL·kg·min) using plausible, biologically sound, and interpretable models.
Measurement of speech levels in the presence of time varying background noise

Science.gov (United States)

Pearsons, K. S.; Horonjeff, R.

1982-01-01

Short-term speech level measurements which could be used to note changes in vocal effort in a time varying noise environment were studied. Knowing the changes in speech level would in turn allow prediction of intelligibility in the presence of aircraft flyover noise. Tests indicated that it is possible to use two second samples of speech to estimate long term root mean square speech levels. Other tests were also performed in which people read out loud during aircraft flyover noise. Results of these tests indicate that people do indeed raise their voice during flyovers at a rate of about 3-1/2 dB for each 10 dB increase in background level. This finding is in agreement with other tests of speech levels in the presence of steady state background noise.
Exploring the Link Between Cognitive Abilities and Speech Recognition in the Elderly Under Different Listening Conditions

Directory of Open Access Journals (Sweden)

Theresa Nuesse

2018-05-01

Full Text Available Elderly listeners are known to differ considerably in their ability to understand speech in noise. Several studies have addressed the underlying factors that contribute to these differences. These factors include audibility, and age-related changes in supra-threshold auditory processing abilities, and it has been suggested that differences in cognitive abilities may also be important. The objective of this study was to investigate associations between performance in cognitive tasks and speech recognition under different listening conditions in older adults with either age appropriate hearing or hearing-impairment. To that end, speech recognition threshold (SRT measurements were performed under several masking conditions that varied along the perceptual dimensions of dip listening, spatial separation, and informational masking. In addition, a neuropsychological test battery was administered, which included measures of verbal working and short-term memory, executive functioning, selective and divided attention, and lexical and semantic abilities. Age-matched groups of older adults with either age-appropriate hearing (ENH, n = 20 or aided hearing impairment (EHI, n = 21 participated. In repeated linear regression analyses, composite scores of cognitive test outcomes (evaluated using PCA were included to predict SRTs. These associations were different for the two groups. When hearing thresholds were controlled for, composed cognitive factors were significantly associated with the SRTs for the ENH listeners. Whereas better lexical and semantic abilities were associated with lower (better SRTs in this group, there was a negative association between attentional abilities and speech recognition in the presence of spatially separated speech-like maskers. For the EHI group, the pure-tone thresholds (averaged across 0.5, 1, 2, and 4 kHz were significantly associated with the SRTs, despite the fact that all signals were amplified and therefore in principle

Application of genetic algorithm - multiple linear regressions to predict the activity of RSK inhibitors

Directory of Open Access Journals (Sweden)

Avval Zhila Mohajeri

2015-01-01

Full Text Available This paper deals with developing a linear quantitative structure-activity relationship (QSAR model for predicting the RSK inhibition activity of some new compounds. A dataset consisting of 62 pyrazino [1,2-α] indole, diazepino [1,2-α] indole, and imidazole derivatives with known inhibitory activities was used. Multiple linear regressions (MLR technique combined with the stepwise (SW and the genetic algorithm (GA methods as variable selection tools was employed. For more checking stability, robustness and predictability of the proposed models, internal and external validation techniques were used. Comparison of the results obtained, indicate that the GA-MLR model is superior to the SW-MLR model and that it isapplicable for designing novel RSK inhibitors.
Steganalysis of recorded speech

Science.gov (United States)

Johnson, Micah K.; Lyu, Siwei; Farid, Hany

2005-03-01

Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.
Enhancement of speech signals - with a focus on voiced speech models

DEFF Research Database (Denmark)

Nørholm, Sidsel Marie

This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based...... on this model. The basic model used in this thesis is the harmonic model which is a commonly used model for describing the voiced part of the speech signal. We show that it can be beneficial to extend the model to take inharmonicities or the non-stationarity of speech into account. Extending the model...
Semantic and phonetic enhancements for speech-in-noise recognition by native and non-native listeners.

Science.gov (United States)

Bradlow, Ann R; Alexander, Jennifer A

2007-04-01

Previous research has shown that speech recognition differences between native and proficient non-native listeners emerge under suboptimal conditions. Current evidence has suggested that the key deficit that underlies this disproportionate effect of unfavorable listening conditions for non-native listeners is their less effective use of compensatory information at higher levels of processing to recover from information loss at the phoneme identification level. The present study investigated whether this non-native disadvantage could be overcome if enhancements at various levels of processing were presented in combination. Native and non-native listeners were presented with English sentences in which the final word varied in predictability and which were produced in either plain or clear speech. Results showed that, relative to the low-predictability-plain-speech baseline condition, non-native listener final word recognition improved only when both semantic and acoustic enhancements were available (high-predictability-clear-speech). In contrast, the native listeners benefited from each source of enhancement separately and in combination. These results suggests that native and non-native listeners apply similar strategies for speech-in-noise perception: The crucial difference is in the signal clarity required for contextual information to be effective, rather than in an inability of non-native listeners to take advantage of this contextual information per se.
Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression

KAUST Repository

Abdul Jameel, Abdul Gani; Naser, Nimal; Emwas, Abdul-Hamid M.; Dooley, Stephen; Sarathy, Mani

2016-01-01

An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN
Intelligibility for Binaural Speech with Discarded Low-SNR Speech Components.

Science.gov (United States)

Schoenmaker, Esther; van de Par, Steven

2016-01-01

Speech intelligibility in multitalker settings improves when the target speaker is spatially separated from the interfering speakers. A factor that may contribute to this improvement is the improved detectability of target-speech components due to binaural interaction in analogy to the Binaural Masking Level Difference (BMLD). This would allow listeners to hear target speech components within specific time-frequency intervals that have a negative SNR, similar to the improvement in the detectability of a tone in noise when these contain disparate interaural difference cues. To investigate whether these negative-SNR target-speech components indeed contribute to speech intelligibility, a stimulus manipulation was performed where all target components were removed when local SNRs were smaller than a certain criterion value. It can be expected that for sufficiently high criterion values target speech components will be removed that do contribute to speech intelligibility. For spatially separated speakers, assuming that a BMLD-like detection advantage contributes to intelligibility, degradation in intelligibility is expected already at criterion values below 0 dB SNR. However, for collocated speakers it is expected that higher criterion values can be applied without impairing speech intelligibility. Results show that degradation of intelligibility for separated speakers is only seen for criterion values of 0 dB and above, indicating a negligible contribution of a BMLD-like detection advantage in multitalker settings. These results show that the spatial benefit is related to a spatial separation of speech components at positive local SNRs rather than to a BMLD-like detection improvement for speech components at negative local SNRs.
An experimental Dutch keyboard-to-speech system for the speech impaired

NARCIS (Netherlands)

Deliege, R.J.H.

1989-01-01

An experimental Dutch keyboard-to-speech system has been developed to explor the possibilities and limitations of Dutch speech synthesis in a communication aid for the speech impaired. The system uses diphones and a formant synthesizer chip for speech synthesis. Input to the system is in
Development of a Low-Cost, Noninvasive, Portable Visual Speech Recognition Program.

Science.gov (United States)

Kohlberg, Gavriel D; Gal, Ya'akov Kobi; Lalwani, Anil K

2016-09-01

Loss of speech following tracheostomy and laryngectomy severely limits communication to simple gestures and facial expressions that are largely ineffective. To facilitate communication in these patients, we seek to develop a low-cost, noninvasive, portable, and simple visual speech recognition program (VSRP) to convert articulatory facial movements into speech. A Microsoft Kinect-based VSRP was developed to capture spatial coordinates of lip movements and translate them into speech. The articulatory speech movements associated with 12 sentences were used to train an artificial neural network classifier. The accuracy of the classifier was then evaluated on a separate, previously unseen set of articulatory speech movements. The VSRP was successfully implemented and tested in 5 subjects. It achieved an accuracy rate of 77.2% (65.0%-87.6% for the 5 speakers) on a 12-sentence data set. The mean time to classify an individual sentence was 2.03 milliseconds (1.91-2.16). We have demonstrated the feasibility of a low-cost, noninvasive, portable VSRP based on Kinect to accurately predict speech from articulation movements in clinically trivial time. This VSRP could be used as a novel communication device for aphonic patients. © The Author(s) 2016.
Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets.

Science.gov (United States)

Marchese Robinson, Richard L; Palczewska, Anna; Palczewski, Jan; Kidley, Nathan

2017-08-28

The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical
Predicting oropharyngeal tumor volume throughout the course of radiation therapy from pretreatment computed tomography data using general linear models.

Science.gov (United States)

Yock, Adam D; Rao, Arvind; Dong, Lei; Beadle, Beth M; Garden, Adam S; Kudchadker, Rajat J; Court, Laurence E

2014-05-01

The purpose of this work was to develop and evaluate the accuracy of several predictive models of variation in tumor volume throughout the course of radiation therapy. Nineteen patients with oropharyngeal cancers were imaged daily with CT-on-rails for image-guided alignment per an institutional protocol. The daily volumes of 35 tumors in these 19 patients were determined and used to generate (1) a linear model in which tumor volume changed at a constant rate, (2) a general linear model that utilized the power fit relationship between the daily and initial tumor volumes, and (3) a functional general linear model that identified and exploited the primary modes of variation between time series describing the changing tumor volumes. Primary and nodal tumor volumes were examined separately. The accuracy of these models in predicting daily tumor volumes were compared with those of static and linear reference models using leave-one-out cross-validation. In predicting the daily volume of primary tumors, the general linear model and the functional general linear model were more accurate than the static reference model by 9.9% (range: -11.6%-23.8%) and 14.6% (range: -7.3%-27.5%), respectively, and were more accurate than the linear reference model by 14.2% (range: -6.8%-40.3%) and 13.1% (range: -1.5%-52.5%), respectively. In predicting the daily volume of nodal tumors, only the 14.4% (range: -11.1%-20.5%) improvement in accuracy of the functional general linear model compared to the static reference model was statistically significant. A general linear model and a functional general linear model trained on data from a small population of patients can predict the primary tumor volume throughout the course of radiation therapy with greater accuracy than standard reference models. These more accurate models may increase the prognostic value of information about the tumor garnered from pretreatment computed tomography images and facilitate improved treatment management.
Predicting oropharyngeal tumor volume throughout the course of radiation therapy from pretreatment computed tomography data using general linear models

International Nuclear Information System (INIS)

Yock, Adam D.; Kudchadker, Rajat J.; Rao, Arvind; Dong, Lei; Beadle, Beth M.; Garden, Adam S.; Court, Laurence E.

2014-01-01

Purpose: The purpose of this work was to develop and evaluate the accuracy of several predictive models of variation in tumor volume throughout the course of radiation therapy. Methods: Nineteen patients with oropharyngeal cancers were imaged daily with CT-on-rails for image-guided alignment per an institutional protocol. The daily volumes of 35 tumors in these 19 patients were determined and used to generate (1) a linear model in which tumor volume changed at a constant rate, (2) a general linear model that utilized the power fit relationship between the daily and initial tumor volumes, and (3) a functional general linear model that identified and exploited the primary modes of variation between time series describing the changing tumor volumes. Primary and nodal tumor volumes were examined separately. The accuracy of these models in predicting daily tumor volumes were compared with those of static and linear reference models using leave-one-out cross-validation. Results: In predicting the daily volume of primary tumors, the general linear model and the functional general linear model were more accurate than the static reference model by 9.9% (range: −11.6%–23.8%) and 14.6% (range: −7.3%–27.5%), respectively, and were more accurate than the linear reference model by 14.2% (range: −6.8%–40.3%) and 13.1% (range: −1.5%–52.5%), respectively. In predicting the daily volume of nodal tumors, only the 14.4% (range: −11.1%–20.5%) improvement in accuracy of the functional general linear model compared to the static reference model was statistically significant. Conclusions: A general linear model and a functional general linear model trained on data from a small population of patients can predict the primary tumor volume throughout the course of radiation therapy with greater accuracy than standard reference models. These more accurate models may increase the prognostic value of information about the tumor garnered from pretreatment computed tomography
Speech Function and Speech Role in Carl Fredricksen's Dialogue on Up Movie

OpenAIRE

Rehana, Ridha; Silitonga, Sortha

2013-01-01

One aim of this article is to show through a concrete example how speech function and speech role used in movie. The illustrative example is taken from the dialogue of Up movie. Central to the analysis proper form of dialogue on Up movie that contain of speech function and speech role; i.e. statement, offer, question, command, giving, and demanding. 269 dialogue were interpreted by actor, and it was found that the use of speech function and speech role.
Narrative Ability of Children With Speech Sound Disorders and the Prediction of Later Literacy Skills

Science.gov (United States)

Wellman, Rachel L.; Lewis, Barbara A.; Freebairn, Lisa A.; Avrich, Allison A.; Hansen, Amy J.; Stein, Catherine M.

2012-01-01

Purpose The main purpose of this study was to examine how children with isolated speech sound disorders (SSDs; n = 20), children with combined SSDs and language impairment (LI; n = 20), and typically developing children (n = 20), ages 3;3 (years;months) to 6;6, differ in narrative ability. The second purpose was to determine if early narrative ability predicts school-age (8–12 years) literacy skills. Method This study employed a longitudinal cohort design. The children completed a narrative retelling task before their formal literacy instruction began. The narratives were analyzed and compared for group differences. Performance on these early narratives was then used to predict the children’s reading decoding, reading comprehension, and written language ability at school age. Results Significant group differences were found in children’s (a) ability to answer questions about the story, (b) use of story grammars, and (c) number of correct and irrelevant utterances. Regression analysis demonstrated that measures of story structure and accuracy were the best predictors of the decoding of real words, reading comprehension, and written language. Measures of syntax and lexical diversity were the best predictors of the decoding of nonsense words. Conclusion Combined SSDs and LI, and not isolated SSDs, impact a child’s narrative abilities. Narrative retelling is a useful task for predicting which children may be at risk for later literacy problems. PMID:21969531
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

Science.gov (United States)

Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko

2016-03-01

In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Robust Speech/Non-Speech Classification in Heterogeneous Multimedia Content

NARCIS (Netherlands)

Huijbregts, M.A.H.; de Jong, Franciska M.G.

In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because
Computationally Efficient Amplitude Modulated Sinusoidal Audio Coding using Frequency-Domain Linear Prediction

DEFF Research Database (Denmark)

Christensen, M. G.; Jensen, Søren Holdt

2006-01-01

A method for amplitude modulated sinusoidal audio coding is presented that has low complexity and low delay. This is based on a subband processing system, where, in each subband, the signal is modeled as an amplitude modulated sum of sinusoids. The envelopes are estimated using frequency......-domain linear prediction and the prediction coefficients are quantized. As a proof of concept, we evaluate different configurations in a subjective listening test, and this shows that the proposed method offers significant improvements in sinusoidal coding. Furthermore, the properties of the frequency...
Robust entry guidance using linear covariance-based model predictive control

Directory of Open Access Journals (Sweden)

Jianjun Luo

2017-02-01

Full Text Available For atmospheric entry vehicles, guidance design can be accomplished by solving an optimal issue using optimal control theories. However, traditional design methods generally focus on the nominal performance and do not include considerations of the robustness in the design process. This paper proposes a linear covariance-based model predictive control method for robust entry guidance design. Firstly, linear covariance analysis is employed to directly incorporate the robustness into the guidance design. The closed-loop covariance with the feedback updated control command is initially formulated to provide the expected errors of the nominal state variables in the presence of uncertainties. Then, the closed-loop covariance is innovatively used as a component of the cost function to guarantee the robustness to reduce its sensitivity to uncertainties. After that, the models predictive control is used to solve the optimal problem, and the control commands (bank angles are calculated. Finally, a series of simulations for different missions have been completed to demonstrate the high performance in precision and the robustness with respect to initial perturbations as well as uncertainties in the entry process. The 3σ confidence region results in the presence of uncertainties which show that the robustness of the guidance has been improved, and the errors of the state variables are decreased by approximately 35%.
Evaluation of accuracy of linear regression models in predicting urban stormwater discharge characteristics.

Science.gov (United States)

Madarang, Krish J; Kang, Joo-Hyon

2014-06-01

Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.
Daily Suspended Sediment Discharge Prediction Using Multiple Linear Regression and Artificial Neural Network

Science.gov (United States)

Uca; Toriman, Ekhwan; Jaafar, Othman; Maru, Rosmini; Arfan, Amal; Saleh Ahmar, Ansari

2018-01-01

Prediction of suspended sediment discharge in a catchments area is very important because it can be used to evaluation the erosion hazard, management of its water resources, water quality, hydrology project management (dams, reservoirs, and irrigation) and to determine the extent of the damage that occurred in the catchments. Multiple Linear Regression analysis and artificial neural network can be used to predict the amount of daily suspended sediment discharge. Regression analysis using the least square method, whereas artificial neural networks using Radial Basis Function (RBF) and feedforward multilayer perceptron with three learning algorithms namely Levenberg-Marquardt (LM), Scaled Conjugate Descent (SCD) and Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton (BFGS). The number neuron of hidden layer is three to sixteen, while in output layer only one neuron because only one output target. The mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2 ) and coefficient of efficiency (CE) of the multiple linear regression (MLRg) value Model 2 (6 input variable independent) has the lowest the value of MAE and RMSE (0.0000002 and 13.6039) and highest R2 and CE (0.9971 and 0.9971). When compared between LM, SCG and RBF, the BFGS model structure 3-7-1 is the better and more accurate to prediction suspended sediment discharge in Jenderam catchment. The performance value in testing process, MAE and RMSE (13.5769 and 17.9011) is smallest, meanwhile R2 and CE (0.9999 and 0.9998) is the highest if it compared with the another BFGS Quasi-Newton model (6-3-1, 9-10-1 and 12-12-1). Based on the performance statistics value, MLRg, LM, SCG, BFGS and RBF suitable and accurately for prediction by modeling the non-linear complex behavior of suspended sediment responses to rainfall, water depth and discharge. The comparison between artificial neural network (ANN) and MLRg, the MLRg Model 2 accurately for to prediction suspended sediment discharge (kg
Intelligibility of speech of children with speech and sound disorders

OpenAIRE

Ivetac, Tina

2014-01-01

The purpose of this study is to examine speech intelligibility of children with primary speech and sound disorders aged 3 to 6 years in everyday life. The research problem is based on the degree to which parents or guardians, immediate family members (sister, brother, grandparents), extended family members (aunt, uncle, cousin), child's friends, other acquaintances, child's teachers and strangers understand the speech of children with speech sound disorders. We examined whether the level ...

Bandwidth extension of speech using perceptual criteria

CERN Document Server

Berisha, Visar; Liss, Julie

2013-01-01

Bandwidth extension of speech is used in the International Telecommunication Union G.729.1 standard in which the narrowband bitstream is combined with quantized high-band parameters. Although this system produces high-quality wideband speech, the additional bits used to represent the high band can be further reduced. In addition to the algorithm used in the G.729.1 standard, bandwidth extension methods based on spectrum prediction have also been proposed. Although these algorithms do not require additional bits, they perform poorly when the correlation between the low and the high band is weak. In this book, two wideband speech coding algorithms that rely on bandwidth extension are developed. The algorithms operate as wrappers around existing narrowband compression schemes. More specifically, in these algorithms, the low band is encoded using an existing toll-quality narrowband system, whereas the high band is generated using the proposed extension techniques. The first method relies only on transmitted high-...
Prediction of IOI-HA Scores Using Speech Reception Thresholds and Speech Discrimination Scores in Quiet

DEFF Research Database (Denmark)

Brännström, K Jonas; Lantz, Johannes; Nielsen, Lars Holme

2014-01-01

), and speech discrimination scores (SDSs) in quiet or in noise are common assessments made prior to hearing aid (HA) fittings. It is not known whether SRT and SDS in quiet relate to HA outcome measured with the International Outcome Inventory for Hearing Aids (IOI-HA). PURPOSE: The aim of the present study...... COLLECTION AND ANALYSIS: The psychometric properties were evaluated and compared to previous studies using the IOI-HA. The associations and differences between the outcome scores and a number of descriptive variables (age, gender, fitted monaurally/binaurally with HA, first-time/experienced HA users, years...
A Riccati Based Homogeneous and Self-Dual Interior-Point Method for Linear Economic Model Predictive Control

DEFF Research Database (Denmark)

Sokoler, Leo Emil; Frison, Gianluca; Edlund, Kristian

2013-01-01

In this paper, we develop an efficient interior-point method (IPM) for the linear programs arising in economic model predictive control of linear systems. The novelty of our algorithm is that it combines a homogeneous and self-dual model, and a specialized Riccati iteration procedure. We test...
A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography.

Science.gov (United States)

Ozker, Muge; Schepers, Inga M; Magnotti, John F; Yoshor, Daniel; Beauchamp, Michael S

2017-06-01

Human speech can be comprehended using only auditory information from the talker's voice. However, comprehension is improved if the talker's face is visible, especially if the auditory information is degraded as occurs in noisy environments or with hearing loss. We explored the neural substrates of audiovisual speech perception using electrocorticography, direct recording of neural activity using electrodes implanted on the cortical surface. We observed a double dissociation in the responses to audiovisual speech with clear and noisy auditory component within the superior temporal gyrus (STG), a region long known to be important for speech perception. Anterior STG showed greater neural activity to audiovisual speech with clear auditory component, whereas posterior STG showed similar or greater neural activity to audiovisual speech in which the speech was replaced with speech-like noise. A distinct border between the two response patterns was observed, demarcated by a landmark corresponding to the posterior margin of Heschl's gyrus. To further investigate the computational roles of both regions, we considered Bayesian models of multisensory integration, which predict that combining the independent sources of information available from different modalities should reduce variability in the neural responses. We tested this prediction by measuring the variability of the neural responses to single audiovisual words. Posterior STG showed smaller variability than anterior STG during presentation of audiovisual speech with noisy auditory component. Taken together, these results suggest that posterior STG but not anterior STG is important for multisensory integration of noisy auditory and visual speech.
Acoustic and temporal analysis of speech: A potential biomarker for schizophrenia.

LENUS (Irish Health Repository)

Rapcan, Viliam

2010-11-01

Currently, there are no established objective biomarkers for the diagnosis or monitoring of schizophrenia. It has been previously reported that there are notable qualitative differences in the speech of schizophrenics. The objective of this study was to determine whether a quantitative acoustic and temporal analysis of speech may be a potential biomarker for schizophrenia. In this study, 39 schizophrenic patients and 18 controls were digitally recorded reading aloud an emotionally neutral text passage from a children\\'s story. Temporal, energy and vocal pitch features were automatically extracted from the recordings. A classifier based on linear discriminant analysis was employed to differentiate between controls and schizophrenic subjects. Processing the recordings with the algorithm developed demonstrated that it is possible to differentiate schizophrenic patients and controls with a classification accuracy of 79.4% (specificity=83.6%, sensitivity=75.2%) based on speech pause related parameters extracted from recordings carried out in standard office (non-studio) environments. Acoustic and temporal analysis of speech may represent a potential tool for the objective analysis in schizophrenia.
Detecting self-produced speech errors before and after articulation: An ERP investigation

Directory of Open Access Journals (Sweden)

Kevin Michael Trewartha

2013-11-01

Full Text Available It has been argued that speech production errors are monitored by the same neural system involved in monitoring other types of action errors. Behavioral evidence has shown that speech errors can be detected and corrected prior to articulation, yet the neural basis for such pre-articulatory speech error monitoring is poorly understood. The current study investigated speech error monitoring using a phoneme-substitution task known to elicit speech errors. Stimulus-locked event-related potential (ERP analyses comparing correct and incorrect utterances were used to assess pre-articulatory error monitoring and response-locked ERP analyses were used to assess post-articulatory monitoring. Our novel finding in the stimulus-locked analysis revealed that words that ultimately led to a speech error were associated with a larger P2 component at midline sites (FCz, Cz, and CPz. This early positivity may reflect the detection of an error in speech formulation, or a predictive mechanism to signal the potential for an upcoming speech error. The data also revealed that general conflict monitoring mechanisms are involved during this task as both correct and incorrect responses elicited an anterior N2 component typically associated with conflict monitoring. The response-locked analyses corroborated previous observations that self-produced speech errors led to a fronto-central ERN. These results demonstrate that speech errors can be detected prior to articulation, and that speech error monitoring relies on a central error monitoring mechanism.
Speech disorders - children

Science.gov (United States)

... disorder; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder; Speech disorder - stuttering ... evaluation tools that can help identify and diagnose speech disorders: Denver Articulation Screening Examination Goldman-Fristoe Test of ...
Auditory, Visual and Audiovisual Speech Processing Streams in Superior Temporal Sulcus.

Science.gov (United States)

Venezia, Jonathan H; Vaden, Kenneth I; Rong, Feng; Maddox, Dale; Saberi, Kourosh; Hickok, Gregory

2017-01-01

The human superior temporal sulcus (STS) is responsive to visual and auditory information, including sounds and facial cues during speech recognition. We investigated the functional organization of STS with respect to modality-specific and multimodal speech representations. Twenty younger adult participants were instructed to perform an oddball detection task and were presented with auditory, visual, and audiovisual speech stimuli, as well as auditory and visual nonspeech control stimuli in a block fMRI design. Consistent with a hypothesized anterior-posterior processing gradient in STS, auditory, visual and audiovisual stimuli produced the largest BOLD effects in anterior, posterior and middle STS (mSTS), respectively, based on whole-brain, linear mixed effects and principal component analyses. Notably, the mSTS exhibited preferential responses to multisensory stimulation, as well as speech compared to nonspeech. Within the mid-posterior and mSTS regions, response preferences changed gradually from visual, to multisensory, to auditory moving posterior to anterior. Post hoc analysis of visual regions in the posterior STS revealed that a single subregion bordering the mSTS was insensitive to differences in low-level motion kinematics yet distinguished between visual speech and nonspeech based on multi-voxel activation patterns. These results suggest that auditory and visual speech representations are elaborated gradually within anterior and posterior processing streams, respectively, and may be integrated within the mSTS, which is sensitive to more abstract speech information within and across presentation modalities. The spatial organization of STS is consistent with processing streams that are hypothesized to synthesize perceptual speech representations from sensory signals that provide convergent information from visual and auditory modalities.
Neurophysiology of speech differences in childhood apraxia of speech.

Science.gov (United States)

Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole

2014-01-01

Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.
Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor

DEFF Research Database (Denmark)

de los Campos, Gustavo; Vazquez, Ana I; Fernando, Rohan

2013-01-01

Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR......) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations....... However, breeding and human populations differ greatly in a number of factors that can affect the predictive performance of G-BLUP. Using theory, simulations, and real data analysis, we study the erformance of G-BLUP when applied to data from related and unrelated human subjects. Under perfect linkage...
Reference-Free Assessment of Speech Intelligibility Using Bispectrum of an Auditory Neurogram

Science.gov (United States)

Hossain, Mohammad E.; Jassim, Wissam A.; Zilany, Muhammad S. A.

2016-01-01

Sensorineural hearing loss occurs due to damage to the inner and outer hair cells of the peripheral auditory system. Hearing loss can cause decreases in audibility, dynamic range, frequency and temporal resolution of the auditory system, and all of these effects are known to affect speech intelligibility. In this study, a new reference-free speech intelligibility metric is proposed using 2-D neurograms constructed from the output of a computational model of the auditory periphery. The responses of the auditory-nerve fibers with a wide range of characteristic frequencies were simulated to construct neurograms. The features of the neurograms were extracted using third-order statistics referred to as bispectrum. The phase coupling of neurogram bispectrum provides a unique insight for the presence (or deficit) of supra-threshold nonlinearities beyond audibility for listeners with normal hearing (or hearing loss). The speech intelligibility scores predicted by the proposed method were compared to the behavioral scores for listeners with normal hearing and hearing loss both in quiet and under noisy background conditions. The results were also compared to the performance of some existing methods. The predicted results showed a good fit with a small error suggesting that the subjective scores can be estimated reliably using the proposed neural-response-based metric. The proposed metric also had a wide dynamic range, and the predicted scores were well-separated as a function of hearing loss. The proposed metric successfully captures the effects of hearing loss and supra-threshold nonlinearities on speech intelligibility. This metric could be applied to evaluate the performance of various speech-processing algorithms designed for hearing aids and cochlear implants. PMID:26967160
Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.

Science.gov (United States)

Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina

2015-07-01

It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line
A sparse neural code for some speech sounds but not for others.

Directory of Open Access Journals (Sweden)

Mathias Scharinger

Full Text Available The precise neural mechanisms underlying speech sound representations are still a matter of debate. Proponents of 'sparse representations' assume that on the level of speech sounds, only contrastive or otherwise not predictable information is stored in long-term memory. Here, in a passive oddball paradigm, we challenge the neural foundations of such a 'sparse' representation; we use words that differ only in their penultimate consonant ("coronal" [t] vs. "dorsal" [k] place of articulation and for example distinguish between the German nouns Latz ([lats]; bib and Lachs ([laks]; salmon. Changes from standard [t] to deviant [k] and vice versa elicited a discernible Mismatch Negativity (MMN response. Crucially, however, the MMN for the deviant [lats] was stronger than the MMN for the deviant [laks]. Source localization showed this difference to be due to enhanced brain activity in right superior temporal cortex. These findings reflect a difference in phonological 'sparsity': Coronal [t] segments, but not dorsal [k] segments, are based on more sparse representations and elicit less specific neural predictions; sensory deviations from this prediction are more readily 'tolerated' and accordingly trigger weaker MMNs. The results foster the neurocomputational reality of 'representationally sparse' models of speech perception that are compatible with more general predictive mechanisms in auditory perception.
An improved robust model predictive control for linear parameter-varying input-output models

NARCIS (Netherlands)

Abbas, H.S.; Hanema, J.; Tóth, R.; Mohammadpour, J.; Meskin, N.

2018-01-01

This paper describes a new robust model predictive control (MPC) scheme to control the discrete-time linear parameter-varying input-output models subject to input and output constraints. Closed-loop asymptotic stability is guaranteed by including a quadratic terminal cost and an ellipsoidal terminal
The role of high-level processes for oscillatory phase entrainment to speech sound

Directory of Open Access Journals (Sweden)

Benedikt eZoefel

2015-12-01

Full Text Available Constantly bombarded with input, the brain has the need to filter out relevant information while ignoring the irrelevant rest. A powerful tool may be represented by neural oscillations which entrain their high-excitability phase to important input while their low-excitability phase attenuates irrelevant information. Indeed, the alignment between brain oscillations and speech improves intelligibility and helps dissociating speakers during a cocktail party. Although well-investigated, the contribution of low- and high-level processes to phase entrainment to speech sound has only recently begun to be understood. Here, we review those findings, and concentrate on three main results: (1 Phase entrainment to speech sound is modulated by attention or predictions, likely supported by top-down signals and indicating higher-level processes involved in the brain’s adjustment to speech. (2 As phase entrainment to speech can be observed without systematic fluctuations in sound amplitude or spectral content, it does not only reflect a passive steady-state ringing of the cochlea, but entails a higher-level process. (3 The role of intelligibility for phase entrainment is debated. Recent results suggest that intelligibility modulates the behavioral consequences of entrainment, rather than directly affecting the strength of entrainment in auditory regions. We conclude that phase entrainment to speech reflects a sophisticated mechanism: Several high-level processes interact to optimally align neural oscillations with predicted events of high relevance, even when they are hidden in a continuous stream of background noise.
Neural networks supporting audiovisual integration for speech: A large-scale lesion study.

Science.gov (United States)

Hickok, Gregory; Rogalsky, Corianne; Matchin, William; Basilakos, Alexandra; Cai, Julia; Pillay, Sara; Ferrill, Michelle; Mickelsen, Soren; Anderson, Steven W; Love, Tracy; Binder, Jeffrey; Fridriksson, Julius

2018-06-01

Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration. Copyright © 2018 Elsevier Ltd. All rights reserved.
Linear predictions of supercritical flow instability in two parallel channels

International Nuclear Information System (INIS)

Shah, M.

2008-01-01

A steady state linear code that can predict thermo-hydraulic instability boundaries in a two parallel channel system under supercritical conditions has been developed. Linear and non-linear solutions of the instability boundary in a two parallel channel system are also compared. The effect of gravity on the instability boundary in a two parallel channel system, by changing the orientation of the system flow from horizontal flow to vertical up-flow and vertical down-flow has been analyzed. Vertical up-flow is found to be more unstable than horizontal flow and vertical down flow is found to be the most unstable configuration. The type of instability present in each flow-orientation of a parallel channel system has been checked and the density wave oscillation type is observed in horizontal flow and vertical up-flow, while the static type of instability is observed in a vertical down-flow for the cases studied here. The parameters affecting the instability boundary, such as the heating power, inlet temperature, inlet and outlet K-factors are varied to assess their effects. This study is important for the design of future Generation IV nuclear reactors in which supercritical light water is proposed as the primary coolant. (author)
Quantifying the predictive consequences of model error with linear subspace analysis

Science.gov (United States)

White, Jeremy T.; Doherty, John E.; Hughes, Joseph D.

2014-01-01

All computer models are simplified and imperfect simulators of complex natural systems. The discrepancy arising from simplification induces bias in model predictions, which may be amplified by the process of model calibration. This paper presents a new method to identify and quantify the predictive consequences of calibrating a simplified computer model. The method is based on linear theory, and it scales efficiently to the large numbers of parameters and observations characteristic of groundwater and petroleum reservoir models. The method is applied to a range of predictions made with a synthetic integrated surface-water/groundwater model with thousands of parameters. Several different observation processing strategies and parameterization/regularization approaches are examined in detail, including use of the Karhunen-Loève parameter transformation. Predictive bias arising from model error is shown to be prediction specific and often invisible to the modeler. The amount of calibration-induced bias is influenced by several factors, including how expert knowledge is applied in the design of parameterization schemes, the number of parameters adjusted during calibration, how observations and model-generated counterparts are processed, and the level of fit with observations achieved through calibration. Failure to properly implement any of these factors in a prediction-specific manner may increase the potential for predictive bias in ways that are not visible to the calibration and uncertainty analysis process.
Listeners Experience Linguistic Masking Release in Noise-Vocoded Speech-in-Speech Recognition

Science.gov (United States)

Viswanathan, Navin; Kokkinakis, Kostas; Williams, Brittany T.

2018-01-01

Purpose: The purpose of this study was to evaluate whether listeners with normal hearing perceiving noise-vocoded speech-in-speech demonstrate better intelligibility of target speech when the background speech was mismatched in language (linguistic release from masking [LRM]) and/or location (spatial release from masking [SRM]) relative to the…
Speech Perception and Short-Term Memory Deficits in Persistent Developmental Speech Disorder

Science.gov (United States)

Kenney, Mary Kay; Barac-Cikoja, Dragana; Finnegan, Kimberly; Jeffries, Neal; Ludlow, Christy L.

2006-01-01

Children with developmental speech disorders may have additional deficits in speech perception and/or short-term memory. To determine whether these are only transient developmental delays that can accompany the disorder in childhood or persist as part of the speech disorder, adults with a persistent familial speech disorder were tested on speech…

Speech-language pathologist job satisfaction in school versus medical settings.

Science.gov (United States)

Kalkhoff, Nicole L; Collins, Dana R

2012-04-01

The goal of this study was to determine if job satisfaction differs between speech-language pathologists (SLPs) working in school settings and SLPs working in medical settings. The Job Satisfaction Survey (JSS) by Spector (1997) was sent via electronic mail to 250 SLPs in each of the 2 settings. Job satisfaction scores were computed from subscale category ratings and were compared between the 2 settings. Subscale category ratings for pay, promotion, supervision, benefits, contingent rewards, operating conditions, coworkers, nature of work, and communication were analyzed for differences between and within settings. Age, caseload size, and years-at-position were analyzed by linear regression to determine whether these factors might predict SLPs' job satisfaction. The survey had a response rate of 19.6% (N = 98 participants). Although SLPs in both settings were generally satisfied with their jobs, SLPs in medical settings had significantly higher total job satisfaction scores. Respondents from both settings had similar satisfaction ratings for subscale categories, with nature of work receiving the highest rating and operating conditions and promotion the lowest. Results of the linear regression analysis for age, caseload size, and years-at-position were not significant. Further research should evaluate important aspects of job satisfaction in both settings, especially nature of work operating conditions, and promotion.
A review of model predictive control: moving from linear to nonlinear design methods

International Nuclear Information System (INIS)

Nandong, J.; Samyudia, Y.; Tade, M.O.

2006-01-01

Linear model predictive control (LMPC) has now been considered as an industrial control standard in process industry. Its extension to nonlinear cases however has not yet gained wide acceptance due to many reasons, e.g. excessively heavy computational load and effort, thus, preventing its practical implementation in real-time control. The application of nonlinear MPC (NMPC) is advantageous for processes with strong nonlinearity or when the operating points are frequently moved from one set point to another due to, for instance, changes in market demands. Much effort has been dedicated towards improving the computational efficiency of NMPC as well as its stability analysis. This paper provides a review on alternative ways of extending linear MPC to the nonlinear one. We also highlight the critical issues pertinent to the applications of NMPC and discuss possible solutions to address these issues. In addition, we outline the future research trend in the area of model predictive control by emphasizing on the potential applications of multi-scale process model within NMPC
Automatic speech recognition (ASR) based approach for speech therapy of aphasic patients: A review

Science.gov (United States)

Jamal, Norezmi; Shanta, Shahnoor; Mahmud, Farhanahani; Sha'abani, MNAH

2017-09-01

This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system's library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.
Predicting haemodynamic networks using electrophysiology: The role of non-linear and cross-frequency interactions

Science.gov (United States)

Tewarie, P.; Bright, M.G.; Hillebrand, A.; Robson, S.E.; Gascoyne, L.E.; Morris, P.G.; Meier, J.; Van Mieghem, P.; Brookes, M.J.

2016-01-01

Understanding the electrophysiological basis of resting state networks (RSNs) in the human brain is a critical step towards elucidating how inter-areal connectivity supports healthy brain function. In recent years, the relationship between RSNs (typically measured using haemodynamic signals) and electrophysiology has been explored using functional Magnetic Resonance Imaging (fMRI) and magnetoencephalography (MEG). Significant progress has been made, with similar spatial structure observable in both modalities. However, there is a pressing need to understand this relationship beyond simple visual similarity of RSN patterns. Here, we introduce a mathematical model to predict fMRI-based RSNs using MEG. Our unique model, based upon a multivariate Taylor series, incorporates both phase and amplitude based MEG connectivity metrics, as well as linear and non-linear interactions within and between neural oscillations measured in multiple frequency bands. We show that including non-linear interactions, multiple frequency bands and cross-frequency terms significantly improves fMRI network prediction. This shows that fMRI connectivity is not only the result of direct electrophysiological connections, but is also driven by the overlap of connectivity profiles between separate regions. Our results indicate that a complete understanding of the electrophysiological basis of RSNs goes beyond simple frequency-specific analysis, and further exploration of non-linear and cross-frequency interactions will shed new light on distributed network connectivity, and its perturbation in pathology. PMID:26827811
Speech and Language Delay

Science.gov (United States)

... OTC Relief for Diarrhea Home Diseases and Conditions Speech and Language Delay Condition Speech and Language Delay Share Print Table of Contents1. ... Treatment6. Everyday Life7. Questions8. Resources What is a speech and language delay? A speech and language delay ...
Plasticity in the Human Speech Motor System Drives Changes in Speech Perception

Science.gov (United States)

Lametti, Daniel R.; Rochet-Capellan, Amélie; Neufeld, Emily; Shiller, Douglas M.

2014-01-01

Recent studies of human speech motor learning suggest that learning is accompanied by changes in auditory perception. But what drives the perceptual change? Is it a consequence of changes in the motor system? Or is it a result of sensory inflow during learning? Here, subjects participated in a speech motor-learning task involving adaptation to altered auditory feedback and they were subsequently tested for perceptual change. In two separate experiments, involving two different auditory perceptual continua, we show that changes in the speech motor system that accompany learning drive changes in auditory speech perception. Specifically, we obtained changes in speech perception when adaptation to altered auditory feedback led to speech production that fell into the phonetic range of the speech perceptual tests. However, a similar change in perception was not observed when the auditory feedback that subjects' received during learning fell into the phonetic range of the perceptual tests. This indicates that the central motor outflow associated with vocal sensorimotor adaptation drives changes to the perceptual classification of speech sounds. PMID:25080594
Multitasking During Degraded Speech Recognition in School-Age Children.

Science.gov (United States)

Grieco-Calub, Tina M; Ward, Kristina M; Brehm, Laurel

2017-01-01

Multitasking requires individuals to allocate their cognitive resources across different tasks. The purpose of the current study was to assess school-age children's multitasking abilities during degraded speech recognition. Children (8 to 12 years old) completed a dual-task paradigm including a sentence recognition (primary) task containing speech that was either unprocessed or noise-band vocoded with 8, 6, or 4 spectral channels and a visual monitoring (secondary) task. Children's accuracy and reaction time on the visual monitoring task was quantified during the dual-task paradigm in each condition of the primary task and compared with single-task performance. Children experienced dual-task costs in the 6- and 4-channel conditions of the primary speech recognition task with decreased accuracy on the visual monitoring task relative to baseline performance. In all conditions, children's dual-task performance on the visual monitoring task was strongly predicted by their single-task (baseline) performance on the task. Results suggest that children's proficiency with the secondary task contributes to the magnitude of dual-task costs while multitasking during degraded speech recognition.
Automated Intelligibility Assessment of Pathological Speech Using Phonological Features

Directory of Open Access Journals (Sweden)

Catherine Middag

2009-01-01

Full Text Available It is commonly acknowledged that word or phoneme intelligibility is an important criterion in the assessment of the communication efficiency of a pathological speaker. People have therefore put a lot of effort in the design of perceptual intelligibility rating tests. These tests usually have the drawback that they employ unnatural speech material (e.g., nonsense words and that they cannot fully exclude errors due to listener bias. Therefore, there is a growing interest in the application of objective automatic speech recognition technology to automate the intelligibility assessment. Current research is headed towards the design of automated methods which can be shown to produce ratings that correspond well with those emerging from a well-designed and well-performed perceptual test. In this paper, a novel methodology that is built on previous work (Middag et al., 2008 is presented. It utilizes phonological features, automatic speech alignment based on acoustic models that were trained on normal speech, context-dependent speaker feature extraction, and intelligibility prediction based on a small model that can be trained on pathological speech samples. The experimental evaluation of the new system reveals that the root mean squared error of the discrepancies between perceived and computed intelligibilities can be as low as 8 on a scale of 0 to 100.
Childhood apraxia of speech: A survey of praxis and typical speech characteristics.

Science.gov (United States)

Malmenholt, Ann; Lohmander, Anette; McAllister, Anita

2017-07-01

The purpose of this study was to investigate current knowledge of the diagnosis childhood apraxia of speech (CAS) in Sweden and compare speech characteristics and symptoms to those of earlier survey findings in mainly English-speakers. In a web-based questionnaire 178 Swedish speech-language pathologists (SLPs) anonymously answered questions about their perception of typical speech characteristics for CAS. They graded own assessment skills and estimated clinical occurrence. The seven top speech characteristics reported as typical for children with CAS were: inconsistent speech production (85%), sequencing difficulties (71%), oro-motor deficits (63%), vowel errors (62%), voicing errors (61%), consonant cluster deletions (54%), and prosodic disturbance (53%). Motor-programming deficits described as lack of automatization of speech movements were perceived by 82%. All listed characteristics were consistent with the American Speech-Language-Hearing Association (ASHA) consensus-based features, Strand's 10-point checklist, and the diagnostic model proposed by Ozanne. The mode for clinical occurrence was 5%. Number of suspected cases of CAS in the clinical caseload was approximately one new patient/year and SLP. The results support and add to findings from studies of CAS in English-speaking children with similar speech characteristics regarded as typical. Possibly, these findings could contribute to cross-linguistic consensus on CAS characteristics.
Temporal Fine-Structure Coding and Lateralized Speech Perception in Normal-Hearing and Hearing-Impaired Listeners

DEFF Research Database (Denmark)

Locsei, Gusztav; Pedersen, Julie Hefting; Laugesen, Søren

2016-01-01

This study investigated the relationship between speech perception performance in spatially complex, lateralized listening scenarios and temporal fine-structure (TFS) coding at low frequencies. Young normal-hearing (NH) and two groups of elderly hearing-impaired (HI) listeners with mild or moderate...... hearing loss above 1.5 kHz participated in the study. Speech reception thresholds (SRTs) were estimated in the presence of either speech-shaped noise, two-, four-, or eight-talker babble played reversed, or a nonreversed two-talker masker. Target audibility was ensured by applying individualized linear...... threshold nor the interaural phase difference threshold tasks showed a correlation with the SRTs or with the amount of masking release due to binaural unmasking, respectively. The results suggest that, although HI listeners with normal hearing thresholds below 1.5 kHz experienced difficulties with speech...
Linear genetic programming application for successive-station monthly streamflow prediction

Science.gov (United States)

Danandeh Mehr, Ali; Kahya, Ercan; Yerdelen, Cahit

2014-09-01

In recent decades, artificial intelligence (AI) techniques have been pronounced as a branch of computer science to model wide range of hydrological phenomena. A number of researches have been still comparing these techniques in order to find more effective approaches in terms of accuracy and applicability. In this study, we examined the ability of linear genetic programming (LGP) technique to model successive-station monthly streamflow process, as an applied alternative for streamflow prediction. A comparative efficiency study between LGP and three different artificial neural network algorithms, namely feed forward back propagation (FFBP), generalized regression neural networks (GRNN), and radial basis function (RBF), has also been presented in this study. For this aim, firstly, we put forward six different successive-station monthly streamflow prediction scenarios subjected to training by LGP and FFBP using the field data recorded at two gauging stations on Çoruh River, Turkey. Based on Nash-Sutcliffe and root mean squared error measures, we then compared the efficiency of these techniques and selected the best prediction scenario. Eventually, GRNN and RBF algorithms were utilized to restructure the selected scenario and to compare with corresponding FFBP and LGP. Our results indicated the promising role of LGP for successive-station monthly streamflow prediction providing more accurate results than those of all the ANN algorithms. We found an explicit LGP-based expression evolved by only the basic arithmetic functions as the best prediction model for the river, which uses the records of the both target and upstream stations.
Speech-specific audiovisual perception affects identification but not detection of speech

DEFF Research Database (Denmark)

Eskelund, Kasper; Andersen, Tobias

Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate...... of audiovisual integration specific to speech perception. However, the results of Tuomainen et al. might have been influenced by another effect. When observers were naïve, they had little motivation to look at the face. When informed, they knew that the face was relevant for the task and this could increase...... visual detection task. In our first experiment, observers presented with congruent and incongruent audiovisual sine-wave speech stimuli did only show a McGurk effect when informed of the speech nature of the stimulus. Performance on the secondary visual task was very good, thus supporting the finding...
Individual differences in selective attention predict speech identification at a cocktail party.

Science.gov (United States)

Oberfeld, Daniel; Klöckner-Nowotny, Felicitas

2016-08-31

Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise.
Speech-Language Therapy (For Parents)

Science.gov (United States)

... Staying Safe Videos for Educators Search English Español Speech-Language Therapy KidsHealth / For Parents / Speech-Language Therapy ... most kids with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech ...
Digital speech processing using Matlab

CERN Document Server

Gopi, E S

2014-01-01

Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.
Developmental apraxia of speech in children. Quantitive assessment of speech characteristics

NARCIS (Netherlands)

Thoonen, G.H.J.

1998-01-01

Developmental apraxia of speech (DAS) in children is a speech disorder, supposed to have a neurological origin, which is commonly considered to result from particular deficits in speech processing (i.e., phonological planning, motor programming). However, the label DAS has often been used as
Predictive IP controller for robust position control of linear servo system.

Science.gov (United States)

Lu, Shaowu; Zhou, Fengxing; Ma, Yajie; Tang, Xiaoqi

2016-07-01

Position control is a typical application of linear servo system. In this paper, to reduce the system overshoot, an integral plus proportional (IP) controller is used in the position control implementation. To further improve the control performance, a gain-tuning IP controller based on a generalized predictive control (GPC) law is proposed. Firstly, to represent the dynamics of the position loop, a second-order linear model is used and its model parameters are estimated on-line by using a recursive least squares method. Secondly, based on the GPC law, an optimal control sequence is obtained by using receding horizon, then directly supplies the IP controller with the corresponding control parameters in the real operations. Finally, simulation and experimental results are presented to show the efficiency of proposed scheme. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems.

Science.gov (United States)

Greene, Beth G; Logan, John S; Pisoni, David B

1986-03-01

We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.
Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems

Science.gov (United States)

GREENE, BETH G.; LOGAN, JOHN S.; PISONI, DAVID B.

2012-01-01

We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered. PMID:23225916
The speech perception skills of children with and without speech sound disorder.

Science.gov (United States)

Hearnshaw, Stephanie; Baker, Elise; Munro, Natalie

To investigate whether Australian-English speaking children with and without speech sound disorder (SSD) differ in their overall speech perception accuracy. Additionally, to investigate differences in the perception of specific phonemes and the association between speech perception and speech production skills. Twenty-five Australian-English speaking children aged 48-60 months participated in this study. The SSD group included 12 children and the typically developing (TD) group included 13 children. Children completed routine speech and language assessments in addition to an experimental Australian-English lexical and phonetic judgement task based on Rvachew's Speech Assessment and Interactive Learning System (SAILS) program (Rvachew, 2009). This task included eight words across four word-initial phonemes-/k, ɹ, ʃ, s/. Children with SSD showed significantly poorer perceptual accuracy on the lexical and phonetic judgement task compared with TD peers. The phonemes /ɹ/ and /s/ were most frequently perceived in error across both groups. Additionally, the phoneme /ɹ/ was most commonly produced in error. There was also a positive correlation between overall speech perception and speech production scores. Children with SSD perceived speech less accurately than their typically developing peers. The findings suggest that an Australian-English variation of a lexical and phonetic judgement task similar to the SAILS program is promising and worthy of a larger scale study. Copyright © 2017 Elsevier Inc. All rights reserved.

A Decomposition Algorithm for Mean-Variance Economic Model Predictive Control of Stochastic Linear Systems

DEFF Research Database (Denmark)

Sokoler, Leo Emil; Dammann, Bernd; Madsen, Henrik

2014-01-01

This paper presents a decomposition algorithm for solving the optimal control problem (OCP) that arises in Mean-Variance Economic Model Predictive Control of stochastic linear systems. The algorithm applies the alternating direction method of multipliers to a reformulation of the OCP...
Timing in audiovisual speech perception: A mini review and new psychophysical data.

Science.gov (United States)

Venezia, Jonathan H; Thurman, Steven M; Matchin, William; George, Sahara E; Hickok, Gregory

2016-02-01

Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.
Timing in Audiovisual Speech Perception: A Mini Review and New Psychophysical Data

Science.gov (United States)

Venezia, Jonathan H.; Thurman, Steven M.; Matchin, William; George, Sahara E.; Hickok, Gregory

2015-01-01

Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually-relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (∼35% identification of /apa/ compared to ∼5% in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually-relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (∼130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content. PMID:26669309
Speech Matters

DEFF Research Database (Denmark)

Hasse Jørgensen, Stina

2011-01-01

About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....
Phase effects in masking by harmonic complexes: speech recognition.

Science.gov (United States)

Deroche, Mickael L D; Culling, John F; Chatterjee, Monita

2013-12-01

Harmonic complexes that generate highly modulated temporal envelopes on the basilar membrane (BM) mask a tone less effectively than complexes that generate relatively flat temporal envelopes, because the non-linear active gain of the BM selectively amplifies a low-level tone in the dips of a modulated masker envelope. The present study examines a similar effect in speech recognition. Speech reception thresholds (SRTs) were measured for a voice masked by harmonic complexes with partials in sine phase (SP) or in random phase (RP). The masker's fundamental frequency (F0) was 50, 100 or 200 Hz. SRTs were considerably lower for SP than for RP maskers at 50-Hz F0, but the two converged at 100-Hz F0, while at 200-Hz F0, SRTs were a little higher for SP than RP maskers. The results were similar whether the target voice was male or female and whether the masker's spectral profile was flat or speech-shaped. Although listening in the masker dips has been shown to play a large role for artificial stimuli such as Schroeder-phase complexes at high levels, it contributes weakly to speech recognition in the presence of harmonic maskers with different crest factors at more moderate sound levels (65 dB SPL). Copyright © 2013 Elsevier B.V. All rights reserved.
Hate speech

Directory of Open Access Journals (Sweden)

Anne Birgitta Nilsen

2014-12-01

Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the
Speech Inconsistency in Children with Childhood Apraxia of Speech, Language Impairment, and Speech Delay: Depends on the Stimuli

Science.gov (United States)

Iuzzini-Seigel, Jenya; Hogan, Tiffany P.; Green, Jordan R.

2017-01-01

Purpose: The current research sought to determine (a) if speech inconsistency is a core feature of childhood apraxia of speech (CAS) or if it is driven by comorbid language impairment that affects a large subset of children with CAS and (b) if speech inconsistency is a sensitive and specific diagnostic marker that can differentiate between CAS and…
Rhythm Perception and Its Role in Perception and Learning of Dysrhythmic Speech.

Science.gov (United States)

Borrie, Stephanie A; Lansford, Kaitlin L; Barrett, Tyson S

2017-03-01

The perception of rhythm cues plays an important role in recognizing spoken language, especially in adverse listening conditions. Indeed, this has been shown to hold true even when the rhythm cues themselves are dysrhythmic. This study investigates whether expertise in rhythm perception provides a processing advantage for perception (initial intelligibility) and learning (intelligibility improvement) of naturally dysrhythmic speech, dysarthria. Fifty young adults with typical hearing participated in 3 key tests, including a rhythm perception test, a receptive vocabulary test, and a speech perception and learning test, with standard pretest, familiarization, and posttest phases. Initial intelligibility scores were calculated as the proportion of correct pretest words, while intelligibility improvement scores were calculated by subtracting this proportion from the proportion of correct posttest words. Rhythm perception scores predicted intelligibility improvement scores but not initial intelligibility. On the other hand, receptive vocabulary scores predicted initial intelligibility scores but not intelligibility improvement. Expertise in rhythm perception appears to provide an advantage for processing dysrhythmic speech, but a familiarization experience is required for the advantage to be realized. Findings are discussed in relation to the role of rhythm in speech processing and shed light on processing models that consider the consequence of rhythm abnormalities in dysarthria.
Temporal Context in Speech Processing and Attentional Stream Selection: A Behavioral and Neural perspective

Science.gov (United States)

Zion Golumbic, Elana M.; Poeppel, David; Schroeder, Charles E.

2012-01-01

The human capacity for processing speech is remarkable, especially given that information in speech unfolds over multiple time scales concurrently. Similarly notable is our ability to filter out of extraneous sounds and focus our attention on one conversation, epitomized by the ‘Cocktail Party’ effect. Yet, the neural mechanisms underlying on-line speech decoding and attentional stream selection are not well understood. We review findings from behavioral and neurophysiological investigations that underscore the importance of the temporal structure of speech for achieving these perceptual feats. We discuss the hypothesis that entrainment of ambient neuronal oscillations to speech’s temporal structure, across multiple time-scales, serves to facilitate its decoding and underlies the selection of an attended speech stream over other competing input. In this regard, speech decoding and attentional stream selection are examples of ‘active sensing’, emphasizing an interaction between proactive and predictive top-down modulation of neuronal dynamics and bottom-up sensory input. PMID:22285024
Using Hierarchical Linear Modelling to Examine Factors Predicting English Language Students' Reading Achievement

Science.gov (United States)

Fung, Karen; ElAtia, Samira

2015-01-01

Using Hierarchical Linear Modelling (HLM), this study aimed to identify factors such as ESL/ELL/EAL status that would predict students' reading performance in an English language arts exam taken across Canada. Using data from the 2007 administration of the Pan-Canadian Assessment Program (PCAP) along with the accompanying surveys for students and…
QSAR models for prediction study of HIV protease inhibitors using support vector machines, neural networks and multiple linear regression

Directory of Open Access Journals (Sweden)

Rachid Darnag

2017-02-01

Full Text Available Support vector machines (SVM represent one of the most promising Machine Learning (ML tools that can be applied to develop a predictive quantitative structure–activity relationship (QSAR models using molecular descriptors. Multiple linear regression (MLR and artificial neural networks (ANNs were also utilized to construct quantitative linear and non linear models to compare with the results obtained by SVM. The prediction results are in good agreement with the experimental value of HIV activity; also, the results reveal the superiority of the SVM over MLR and ANN model. The contribution of each descriptor to the structure–activity relationships was evaluated.
Clear Speech - Mere Speech? How segmental and prosodic speech reduction shape the impression that speakers create on listeners

DEFF Research Database (Denmark)

Niebuhr, Oliver

2017-01-01

of reduction levels and perceived speaker attributes in which moderate reduction can make a better impression on listeners than no reduction. In addition to its relevance in reduction models and theories, this interplay is instructive for various fields of speech application from social robotics to charisma...... whether variation in the degree of reduction also has a systematic effect on the attributes we ascribe to the speaker who produces the speech signal. A perception experiment was carried out for German in which 46 listeners judged whether or not speakers showing 3 different combinations of segmental...... and prosodic reduction levels (unreduced, moderately reduced, strongly reduced) are appropriately described by 13 physical, social, and cognitive attributes. The experiment shows that clear speech is not mere speech, and less clear speech is not just reduced either. Rather, results revealed a complex interplay...
Audiovisual Temporal Recalibration for Speech in Synchrony Perception and Speech Identification

Science.gov (United States)

Asakawa, Kaori; Tanaka, Akihiro; Imai, Hisato

We investigated whether audiovisual synchrony perception for speech could change after observation of the audiovisual temporal mismatch. Previous studies have revealed that audiovisual synchrony perception is re-calibrated after exposure to a constant timing difference between auditory and visual signals in non-speech. In the present study, we examined whether this audiovisual temporal recalibration occurs at the perceptual level even for speech (monosyllables). In Experiment 1, participants performed an audiovisual simultaneity judgment task (i.e., a direct measurement of the audiovisual synchrony perception) in terms of the speech signal after observation of the speech stimuli which had a constant audiovisual lag. The results showed that the “simultaneous” responses (i.e., proportion of responses for which participants judged the auditory and visual stimuli to be synchronous) at least partly depended on exposure lag. In Experiment 2, we adopted the McGurk identification task (i.e., an indirect measurement of the audiovisual synchrony perception) to exclude the possibility that this modulation of synchrony perception was solely attributable to the response strategy using stimuli identical to those of Experiment 1. The characteristics of the McGurk effect reported by participants depended on exposure lag. Thus, it was shown that audiovisual synchrony perception for speech could be modulated following exposure to constant lag both in direct and indirect measurement. Our results suggest that temporal recalibration occurs not only in non-speech signals but also in monosyllabic speech at the perceptual level.
Tools for signal compression applications to speech and audio coding

CERN Document Server

Moreau, Nicolas

2013-01-01

This book presents tools and algorithms required to compress/uncompress signals such as speech and music. These algorithms are largely used in mobile phones, DVD players, HDTV sets, etc. In a first rather theoretical part, this book presents the standard tools used in compression systems: scalar and vector quantization, predictive quantization, transform quantization, entropy coding. In particular we show the consistency between these different tools. The second part explains how these tools are used in the latest speech and audio coders. The third part gives Matlab programs simulating t
Individual differences in selective attention predict speech identification at a cocktail party

Science.gov (United States)

Oberfeld, Daniel; Klöckner-Nowotny, Felicitas

2016-01-01

Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise. DOI: http://dx.doi.org/10.7554/eLife.16747.001 PMID:27580272
A Dynamic Speech Comprehension Test for Assessing Real-World Listening Ability.

Science.gov (United States)

Best, Virginia; Keidser, Gitte; Freeston, Katrina; Buchholz, Jörg M

2016-07-01

Many listeners with hearing loss report particular difficulties with multitalker communication situations, but these difficulties are not well predicted using current clinical and laboratory assessment tools. The overall aim of this work is to create new speech tests that capture key aspects of multitalker communication situations and ultimately provide better predictions of real-world communication abilities and the effect of hearing aids. A test of ongoing speech comprehension introduced previously was extended to include naturalistic conversations between multiple talkers as targets, and a reverberant background environment containing competing conversations. In this article, we describe the development of this test and present a validation study. Thirty listeners with normal hearing participated in this study. Speech comprehension was measured for one-, two-, and three-talker passages at three different signal-to-noise ratios (SNRs), and working memory ability was measured using the reading span test. Analyses were conducted to examine passage equivalence, learning effects, and test-retest reliability, and to characterize the effects of number of talkers and SNR. Although we observed differences in difficulty across passages, it was possible to group the passages into four equivalent sets. Using this grouping, we achieved good test-retest reliability and observed no significant learning effects. Comprehension performance was sensitive to the SNR but did not decrease as the number of talkers increased. Individual performance showed associations with age and reading span score. This new dynamic speech comprehension test appears to be valid and suitable for experimental purposes. Further work will explore its utility as a tool for predicting real-world communication ability and hearing aid benefit. American Academy of Audiology.
Conversation electrified: ERP correlates of speech act recognition in underspecified utterances.

Directory of Open Access Journals (Sweden)

Rosa S Gisladottir

Full Text Available The ability to recognize speech acts (verbal actions in conversation is critical for everyday interaction. However, utterances are often underspecified for the speech act they perform, requiring listeners to rely on the context to recognize the action. The goal of this study was to investigate the time-course of auditory speech act recognition in action-underspecified utterances and explore how sequential context (the prior action impacts this process. We hypothesized that speech acts are recognized early in the utterance to allow for quick transitions between turns in conversation. Event-related potentials (ERPs were recorded while participants listened to spoken dialogues and performed an action categorization task. The dialogues contained target utterances that each of which could deliver three distinct speech acts depending on the prior turn. The targets were identical across conditions, but differed in the type of speech act performed and how it fit into the larger action sequence. The ERP results show an early effect of action type, reflected by frontal positivities as early as 200 ms after target utterance onset. This indicates that speech act recognition begins early in the turn when the utterance has only been partially processed. Providing further support for early speech act recognition, actions in highly constraining contexts did not elicit an ERP effect to the utterance-final word. We take this to show that listeners can recognize the action before the final word through predictions at the speech act level. However, additional processing based on the complete utterance is required in more complex actions, as reflected by a posterior negativity at the final word when the speech act is in a less constraining context and a new action sequence is initiated. These findings demonstrate that sentence comprehension in conversational contexts crucially involves recognition of verbal action which begins as soon as it can.
Linear regression

CERN Document Server

Olive, David J

2017-01-01

This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
Suppressed Alpha Oscillations Predict Intelligibility of Speech and its Acoustic Details

Science.gov (United States)

Weisz, Nathan

2012-01-01

Modulations of human alpha oscillations (8–13 Hz) accompany many cognitive processes, but their functional role in auditory perception has proven elusive: Do oscillatory dynamics of alpha reflect acoustic details of the speech signal and are they indicative of comprehension success? Acoustically presented words were degraded in acoustic envelope and spectrum in an orthogonal design, and electroencephalogram responses in the frequency domain were analyzed in 24 participants, who rated word comprehensibility after each trial. First, the alpha power suppression during and after a degraded word depended monotonically on spectral and, to a lesser extent, envelope detail. The magnitude of this alpha suppression exhibited an additional and independent influence on later comprehension ratings. Second, source localization of alpha suppression yielded superior parietal, prefrontal, as well as anterior temporal brain areas. Third, multivariate classification of the time–frequency pattern across participants showed that patterns of late posterior alpha power allowed best for above-chance classification of word intelligibility. Results suggest that both magnitude and topography of late alpha suppression in response to single words can indicate a listener's sensitivity to acoustic features and the ability to comprehend speech under adverse listening conditions. PMID:22100354
Under-resourced speech recognition based on the speech manifold

CSIR Research Space (South Africa)

Sahraeian, R

2015-09-01

Full Text Available Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a...

Relationship between perceptual learning in speech and statistical learning in younger and older adults

Directory of Open Access Journals (Sweden)

Thordis Marisa Neger

2014-09-01

Full Text Available Within a few sentences, listeners learn to understand severely degraded speech such as noise-vocoded speech. However, individuals vary in the amount of such perceptual learning and it is unclear what underlies these differences. The present study investigates whether perceptual learning in speech relates to statistical learning, as sensitivity to probabilistic information may aid identification of relevant cues in novel speech input. If statistical learning and perceptual learning (partly draw on the same general mechanisms, then statistical learning in a non-auditory modality using non-linguistic sequences should predict adaptation to degraded speech.In the present study, 73 older adults (aged over 60 years and 60 younger adults (aged between 18 and 30 years performed a visual artificial grammar learning task and were presented with sixty meaningful noise-vocoded sentences in an auditory recall task. Within age groups, sentence recognition performance over exposure was analyzed as a function of statistical learning performance, and other variables that may predict learning (i.e., hearing, vocabulary, attention switching control, working memory and processing speed. Younger and older adults showed similar amounts of perceptual learning, but only younger adults showed significant statistical learning. In older adults, improvement in understanding noise-vocoded speech was constrained by age. In younger adults, amount of adaptation was associated with lexical knowledge and with statistical learning ability. Thus, individual differences in general cognitive abilities explain listeners' variability in adapting to noise-vocoded speech. Results suggest that perceptual and statistical learning share mechanisms of implicit regularity detection, but that the ability to detect statistical regularities is impaired in older adults if visual sequences are presented quickly.
Evolution of non-speech sound memory in postlingual deafness: implications for cochlear implant rehabilitation.

Science.gov (United States)

Lazard, D S; Giraud, A L; Truy, E; Lee, H J

2011-07-01

Neurofunctional patterns assessed before or after cochlear implantation (CI) are informative markers of implantation outcome. Because phonological memory reorganization in post-lingual deafness is predictive of the outcome, we investigated, using a cross-sectional approach, whether memory of non-speech sounds (NSS) produced by animals or objects (i.e. non-human sounds) is also reorganized, and how this relates to speech perception after CI. We used an fMRI auditory imagery task in which sounds were evoked by pictures of noisy items for post-lingual deaf candidates for CI and for normal-hearing subjects. When deaf subjects imagined sounds, the left inferior frontal gyrus, the right posterior temporal gyrus and the right amygdala were less activated compared to controls. Activity levels in these regions decreased with duration of auditory deprivation, indicating declining NSS representations. Whole brain correlations with duration of auditory deprivation and with speech scores after CI showed an activity decline in dorsal, fronto-parietal, cortical regions, and an activity increase in ventral cortical regions, the right anterior temporal pole and the hippocampal gyrus. Both dorsal and ventral reorganizations predicted poor speech perception outcome after CI. These results suggest that post-CI speech perception relies, at least partially, on the integrity of a neural system used for processing NSS that is based on audio-visual and articulatory mapping processes. When this neural system is reorganized, post-lingual deaf subjects resort to inefficient semantic- and memory-based strategies. These results complement those of other studies on speech processing, suggesting that both speech and NSS representations need to be maintained during deafness to ensure the success of CI. Copyright © 2011 Elsevier Ltd. All rights reserved.
PRACTICING SPEECH THERAPY INTERVENTION FOR SOCIAL INTEGRATION OF CHILDREN WITH SPEECH DISORDERS

Directory of Open Access Journals (Sweden)

Martin Ofelia POPESCU

2016-11-01

Full Text Available The article presents a concise speech correction intervention program in of dyslalia in conjunction with capacity development of intra, interpersonal and social integration of children with speech disorders. The program main objectives represent: the potential increasing of individual social integration by correcting speech disorders in conjunction with intra- and interpersonal capacity, the potential growth of children and community groups for social integration by optimizing the socio-relational context of children with speech disorder. In the program were included 60 children / students with dyslalia speech disorders (monomorphic and polymorphic dyslalia, from 11 educational institutions - 6 kindergartens and 5 schools / secondary schools, joined with inter-school logopedic centre (CLI from Targu Jiu city and areas of Gorj district. The program was implemented under the assumption that therapeutic-formative intervention to correct speech disorders and facilitate the social integration will lead, in combination with correct pronunciation disorders, to social integration optimization of children with speech disorders. The results conirm the hypothesis and gives facts about the intervention program eficiency.
Schizophrenia alters intra-network functional connectivity in the caudate for detecting speech under informational speech masking conditions.

Science.gov (United States)

Zheng, Yingjun; Wu, Chao; Li, Juanhua; Li, Ruikeng; Peng, Hongjun; She, Shenglin; Ning, Yuping; Li, Liang

2018-04-04

Speech recognition under noisy "cocktail-party" environments involves multiple perceptual/cognitive processes, including target detection, selective attention, irrelevant signal inhibition, sensory/working memory, and speech production. Compared to health listeners, people with schizophrenia are more vulnerable to masking stimuli and perform worse in speech recognition under speech-on-speech masking conditions. Although the schizophrenia-related speech-recognition impairment under "cocktail-party" conditions is associated with deficits of various perceptual/cognitive processes, it is crucial to know whether the brain substrates critically underlying speech detection against informational speech masking are impaired in people with schizophrenia. Using functional magnetic resonance imaging (fMRI), this study investigated differences between people with schizophrenia (n = 19, mean age = 33 ± 10 years) and their matched healthy controls (n = 15, mean age = 30 ± 9 years) in intra-network functional connectivity (FC) specifically associated with target-speech detection under speech-on-speech-masking conditions. The target-speech detection performance under the speech-on-speech-masking condition in participants with schizophrenia was significantly worse than that in matched healthy participants (healthy controls). Moreover, in healthy controls, but not participants with schizophrenia, the strength of intra-network FC within the bilateral caudate was positively correlated with the speech-detection performance under the speech-masking conditions. Compared to controls, patients showed altered spatial activity pattern and decreased intra-network FC in the caudate. In people with schizophrenia, the declined speech-detection performance under speech-on-speech masking conditions is associated with reduced intra-caudate functional connectivity, which normally contributes to detecting target speech against speech masking via its functions of suppressing masking-speech signals.
Speech disorder prevention

Directory of Open Access Journals (Sweden)

Miladis Fornaris-Méndez

2017-04-01

Full Text Available Language therapy has trafficked from a medical focus until a preventive focus. However, difficulties are evidenced in the development of this last task, because he is devoted bigger space to the correction of the disorders of the language. Because the speech disorders is the dysfunction with more frequently appearance, acquires special importance the preventive work that is developed to avoid its appearance. Speech education since early age of the childhood makes work easier for prevent the appearance of speech disorders in the children. The present work has as objective to offer different activities for the prevention of the speech disorders.
Visual Speech Fills in Both Discrimination and Identification of Non-Intact Auditory Speech in Children

Science.gov (United States)

Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Herve

2018-01-01

To communicate, children must discriminate and identify speech sounds. Because visual speech plays an important role in this process, we explored how visual speech influences phoneme discrimination and identification by children. Critical items had intact visual speech (e.g. baez) coupled to non-intact (excised onsets) auditory speech (signified…
Tackling the complexity in speech

DEFF Research Database (Denmark)

section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...
Speech in spinocerebellar ataxia.

Science.gov (United States)

Schalling, Ellika; Hartelius, Lena

2013-12-01

Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.
Hearing and seeing meaning in noise. Alpha, beta and gamma oscillations predict gestural enhancement of degraded speech comprehension

NARCIS (Netherlands)

Drijvers, L.; Özyürek, A.; Jensen, O.

2018-01-01

During face-to-face communication, listeners integrate speech with gestures. The semantic information conveyed by iconic gestures (e.g., a drinking gesture) can aid speech comprehension in adverse listening conditions. In this magnetoencephalography (MEG) study, we investigated the spatiotemporal
MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILER EFFICIENCY

OpenAIRE

Chayalakshmi C.L

2018-01-01

MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILER EFFICIENCY ABSTRACT Calculation of boiler efficiency is essential if its parameters need to be controlled for either maintaining or enhancing its efficiency. But determination of boiler efficiency using conventional method is time consuming and very expensive. Hence, it is not recommended to find boiler efficiency frequently. The work presented in this paper deals with establishing the statistical mo...
A causal test of the motor theory of speech perception: a case of impaired speech production and spared speech perception.

Science.gov (United States)

Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z

2015-01-01

The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.
On the context-dependent nature of the contribution of the ventral premotor cortex to speech perception

Science.gov (United States)

Tremblay, Pascale; Small, Steven L.

2011-01-01

What is the nature of the interface between speech perception and production, where auditory and motor representations converge? One set of explanations suggests that during perception, the motor circuits involved in producing a perceived action are in some way enacting the action without actually causing movement (covert simulation) or sending along the motor information to be used to predict its sensory consequences (i.e., efference copy). Other accounts either reject entirely the involvement of motor representations in perception, or explain their role as being more supportive than integral, and not employing the identical circuits used in production. Using fMRI, we investigated whether there are brain regions that are conjointly active for both speech perception and production, and whether these regions are sensitive to articulatory (syllabic) complexity during both processes, which is predicted by a covert simulation account. A group of healthy young adults (1) observed a female speaker produce a set of familiar words (perception), and (2) observed and then repeated the words (production). There were two types of words, varying in articulatory complexity, as measured by the presence or absence of consonant clusters. The simple words contained no consonant cluster (e.g. “palace”), while the complex words contained one to three consonant clusters (e.g. “planet”). Results indicate that the left ventral premotor cortex (PMv) was significantly active during speech perception and speech production but that activation in this region was scaled to articulatory complexity only during speech production, revealing an incompletely specified efferent motor signal during speech perception. The right planum temporal (PT) was also active during speech perception and speech production, and activation in this region was scaled to articulatory complexity during both production and perception. These findings are discussed in the context of current theories theory of
The Relationship between Speech Production and Speech Perception Deficits in Parkinson's Disease

Science.gov (United States)

De Keyser, Kim; Santens, Patrick; Bockstael, Annelies; Botteldooren, Dick; Talsma, Durk; De Vos, Stefanie; Van Cauwenberghe, Mieke; Verheugen, Femke; Corthals, Paul; De Letter, Miet

2016-01-01

Purpose: This study investigated the possible relationship between hypokinetic speech production and speech intensity perception in patients with Parkinson's disease (PD). Method: Participants included 14 patients with idiopathic PD and 14 matched healthy controls (HCs) with normal hearing and cognition. First, speech production was objectified…
The treatment of apraxia of speech : Speech and music therapy, an innovative joint effort

NARCIS (Netherlands)

Hurkmans, Josephus Johannes Stephanus

2016-01-01

Apraxia of Speech (AoS) is a neurogenic speech disorder. A wide variety of behavioural methods have been developed to treat AoS. Various therapy programmes use musical elements to improve speech production. A unique therapy programme combining elements of speech therapy and music therapy is called
Stop consonant voicing in young children's speech: Evidence from a cross-sectional study

Science.gov (United States)

Ganser, Emily

There are intuitive reasons to believe that speech-sound acquisition and language acquisition should be related in development. Surprisingly, only recently has research begun to parse just how the two might be related. This study investigated possible correlations between speech-sound acquisition and language acquisition, as part of a large-scale, longitudinal study of the relationship between different types of phonological development and vocabulary growth in the preschool years. Productions of voiced and voiceless stop-initial words were recorded from 96 children aged 28-39 months. Voice Onset Time (VOT, in ms) for each token context was calculated. A mixed-model logistic regression was calculated which predicted whether the sound was intended to be voiced or voiceless based on its VOT. This model estimated the slopes of the logistic function for each child. This slope was referred to as Robustness of Contrast (based on Holliday, Reidy, Beckman, and Edwards, 2015), defined as being the degree of categorical differentiation between the production of two speech sounds or classes of sounds, in this case, voiced and voiceless stops. Results showed a wide range of slopes for individual children, suggesting that slope-derived Robustness of Contrast could be a viable means of measuring a child's acquisition of the voicing contrast. Robustness of Contrast was then compared to traditional measures of speech and language skills to investigate whether there was any correlation between the production of stop voicing and broader measures of speech and language development. The Robustness of Contrast measure was found to correlate with all individual measures of speech and language, suggesting that it might indeed be predictive of later language skills.
Variables Predicting Foreign Language Reading Comprehension and Vocabulary Acquisition in a Linear Hypermedia Environment

Science.gov (United States)

Akbulut, Yavuz

2007-01-01

Factors predicting vocabulary learning and reading comprehension of advanced language learners of English in a linear multimedia text were investigated in the current study. Predictor variables of interest were multimedia type, reading proficiency, learning styles, topic interest and background knowledge about the topic. The outcome variables of…
A new time-adaptive discrete bionic wavelet transform for enhancing speech from adverse noise environment

Science.gov (United States)

Palaniswamy, Sumithra; Duraisamy, Prakash; Alam, Mohammad Showkat; Yuan, Xiaohui

2012-04-01

Automatic speech processing systems are widely used in everyday life such as mobile communication, speech and speaker recognition, and for assisting the hearing impaired. In speech communication systems, the quality and intelligibility of speech is of utmost importance for ease and accuracy of information exchange. To obtain an intelligible speech signal and one that is more pleasant to listen, noise reduction is essential. In this paper a new Time Adaptive Discrete Bionic Wavelet Thresholding (TADBWT) scheme is proposed. The proposed technique uses Daubechies mother wavelet to achieve better enhancement of speech from additive non- stationary noises which occur in real life such as street noise and factory noise. Due to the integration of human auditory system model into the wavelet transform, bionic wavelet transform (BWT) has great potential for speech enhancement which may lead to a new path in speech processing. In the proposed technique, at first, discrete BWT is applied to noisy speech to derive TADBWT coefficients. Then the adaptive nature of the BWT is captured by introducing a time varying linear factor which updates the coefficients at each scale over time. This approach has shown better performance than the existing algorithms at lower input SNR due to modified soft level dependent thresholding on time adaptive coefficients. The objective and subjective test results confirmed the competency of the TADBWT technique. The effectiveness of the proposed technique is also evaluated for speaker recognition task under noisy environment. The recognition results show that the TADWT technique yields better performance when compared to alternate methods specifically at lower input SNR.
A comparison of random forest regression and multiple linear regression for prediction in neuroscience.

Science.gov (United States)

Smith, Paul F; Ganesh, Siva; Liu, Ping

2013-10-30

Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.
Practical speech user interface design

CERN Document Server

Lewis, James R

2010-01-01

Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application
Motor Speech Phenotypes of Frontotemporal Dementia, Primary Progressive Aphasia, and Progressive Apraxia of Speech

Science.gov (United States)

Poole, Matthew L.; Brodtmann, Amy; Darby, David; Vogel, Adam P.

2017-01-01

Purpose: Our purpose was to create a comprehensive review of speech impairment in frontotemporal dementia (FTD), primary progressive aphasia (PPA), and progressive apraxia of speech in order to identify the most effective measures for diagnosis and monitoring, and to elucidate associations between speech and neuroimaging. Method: Speech and…

Distributed neural signatures of natural audiovisual speech and music in the human auditory cortex.

Science.gov (United States)

Salmi, Juha; Koistinen, Olli-Pekka; Glerean, Enrico; Jylänki, Pasi; Vehtari, Aki; Jääskeläinen, Iiro P; Mäkelä, Sasu; Nummenmaa, Lauri; Nummi-Kuisma, Katarina; Nummi, Ilari; Sams, Mikko

2017-08-15

During a conversation or when listening to music, auditory and visual information are combined automatically into audiovisual objects. However, it is still poorly understood how specific type of visual information shapes neural processing of sounds in lifelike stimulus environments. Here we applied multi-voxel pattern analysis to investigate how naturally matching visual input modulates supratemporal cortex activity during processing of naturalistic acoustic speech, singing and instrumental music. Bayesian logistic regression classifiers with sparsity-promoting priors were trained to predict whether the stimulus was audiovisual or auditory, and whether it contained piano playing, speech, or singing. The predictive performances of the classifiers were tested by leaving one participant at a time for testing and training the model using the remaining 15 participants. The signature patterns associated with unimodal auditory stimuli encompassed distributed locations mostly in the middle and superior temporal gyrus (STG/MTG). A pattern regression analysis, based on a continuous acoustic model, revealed that activity in some of these MTG and STG areas were associated with acoustic features present in speech and music stimuli. Concurrent visual stimulus modulated activity in bilateral MTG (speech), lateral aspect of right anterior STG (singing), and bilateral parietal opercular cortex (piano). Our results suggest that specific supratemporal brain areas are involved in processing complex natural speech, singing, and piano playing, and other brain areas located in anterior (facial speech) and posterior (music-related hand actions) supratemporal cortex are influenced by related visual information. Those anterior and posterior supratemporal areas have been linked to stimulus identification and sensory-motor integration, respectively. Copyright © 2017 Elsevier Inc. All rights reserved.
Prediction of linear B-cell epitopes of hepatitis C virus for vaccine development

Science.gov (United States)

2015-01-01

Background High genetic heterogeneity in the hepatitis C virus (HCV) is the major challenge of the development of an effective vaccine. Existing studies for developing HCV vaccines have mainly focused on T-cell immune response. However, identification of linear B-cell epitopes that can stimulate B-cell response is one of the major tasks of peptide-based vaccine development. Owing to the variability in B-cell epitope length, the prediction of B-cell epitopes is much more complex than that of T-cell epitopes. Furthermore, the motifs of linear B-cell epitopes in different pathogens are quite different (e. g. HCV and hepatitis B virus). To cope with this challenge, this work aims to propose an HCV-customized sequence-based prediction method to identify B-cell epitopes of HCV. Results This work establishes an experimentally verified dataset comprising the B-cell response of HCV dataset consisting of 774 linear B-cell epitopes and 774 non B-cell epitopes from the Immune Epitope Database. An interpretable rule mining system of B-cell epitopes (IRMS-BE) is proposed to select informative physicochemical properties (PCPs) and then extracts several if-then rule-based knowledge for identifying B-cell epitopes. A web server Bcell-HCV was implemented using an SVM with the 34 informative PCPs, which achieved a training accuracy of 79.7% and test accuracy of 70.7% better than the SVM-based methods for identifying B-cell epitopes of HCV and the two general-purpose methods. This work performs advanced analysis of the 34 informative properties, and the results indicate that the most effective property is the alpha-helix structure of epitopes, which influences the connection between host cells and the E2 proteins of HCV. Furthermore, 12 interpretable rules are acquired from top-five PCPs and achieve a sensitivity of 75.6% and specificity of 71.3%. Finally, a conserved promising vaccine candidate, PDREMVLYQE, is identified for inclusion in a vaccine against HCV. Conclusions This work
The neural processing of foreign-accented speech and its relationship to listener bias

Directory of Open Access Journals (Sweden)

Han-Gyol eYi

2014-10-01

Full Text Available Foreign-accented speech often presents a challenging listening condition. In addition to deviations from the target speech norms related to the inexperience of the nonnative speaker, listener characteristics may play a role in determining intelligibility levels. We have previously shown that an implicit visual bias for associating East Asian faces and foreignness predicts the listeners’ perceptual ability to process Korean-accented English audiovisual speech (Yi et al., 2013. Here, we examine the neural mechanism underlying the influence of listener bias to foreign faces on speech perception. In a functional magnetic resonance imaging (fMRI study, native English speakers listened to native- and Korean-accented English sentences, with or without faces. The participants’ Asian-foreign association was measured using an implicit association test (IAT, conducted outside the scanner. We found that foreign-accented speech evoked greater activity in the bilateral primary auditory cortices and the inferior frontal gyri, potentially reflecting greater computational demand. Higher IAT scores, indicating greater bias, were associated with increased BOLD response to foreign-accented speech with faces in the primary auditory cortex, the early node for spectrotemporal analysis. We conclude the following: (1 foreign-accented speech perception places greater demand on the neural systems underlying speech perception; (2 face of the talker can exaggerate the perceived foreignness of foreign-accented speech; (3 implicit Asian-foreign association is associated with decreased neural efficiency in early spectrotemporal processing.
Flexible non-linear predictive models for large-scale wind turbine diagnostics

DEFF Research Database (Denmark)

Bach-Andersen, Martin; Rømer-Odgaard, Bo; Winther, Ole

2017-01-01

We demonstrate how flexible non-linear models can provide accurate and robust predictions on turbine component temperature sensor data using data-driven principles and only a minimum of system modeling. The merits of different model architectures are evaluated using data from a large set...... of turbines operating under diverse conditions. We then go on to test the predictive models in a diagnostic setting, where the output of the models are used to detect mechanical faults in rotor bearings. Using retrospective data from 22 actual rotor bearing failures, the fault detection performance...... of the models are quantified using a structured framework that provides the metrics required for evaluating the performance in a fleet wide monitoring setup. It is demonstrated that faults are identified with high accuracy up to 45 days before a warning from the hard-threshold warning system....
Association of Velopharyngeal Insufficiency With Quality of Life and Patient-Reported Outcomes After Speech Surgery.

Science.gov (United States)

Bhuskute, Aditi; Skirko, Jonathan R; Roth, Christina; Bayoumi, Ahmed; Durbin-Johnson, Blythe; Tollefson, Travis T

2017-09-01

Patients with cleft palate and other causes of velopharyngeal insufficiency (VPI) suffer adverse effects on social interactions and communication. Measurement of these patient-reported outcomes is needed to help guide surgical and nonsurgical care. To further validate the VPI Effects on Life Outcomes (VELO) instrument, measure the change in quality of life (QOL) after speech surgery, and test the association of change in speech with change in QOL. Prospective descriptive cohort including children and young adults undergoing speech surgery for VPI in a tertiary academic center. Participants completed the validated VELO instrument before and after surgical treatment. The main outcome measures were preoperative and postoperative VELO scores and the perceptual speech assessment of speech intelligibility. The VELO scores are divided into subscale domains. Changes in VELO after surgery were analyzed using linear regression models. VELO scores were analyzed as a function of speech intelligibility adjusting for age and cleft type. The correlation between speech intelligibility rating and VELO scores was estimated using the polyserial correlation. Twenty-nine patients (13 males and 16 females) were included. Mean (SD) age was 7.9 (4.1) years (range, 4-20 years). Pharyngeal flap was used in 14 (48%) cases, Furlow palatoplasty in 12 (41%), and sphincter pharyngoplasty in 1 (3%). The mean (SD) preoperative speech intelligibility rating was 1.71 (1.08), which decreased postoperatively to 0.79 (0.93) in 24 patients who completed protocol (P Speech Intelligibility was correlated with preoperative and postoperative total VELO score (P speech intelligibility. Speech surgery improves VPI-specific quality of life. We confirmed validation in a population of untreated patients with VPI and included pharyngeal flap surgery, which had not previously been included in validation studies. The VELO instrument provides patient-specific outcomes, which allows a broader understanding of the
An analysis of the masking of speech by competing speech using self-report data.

Science.gov (United States)

Agus, Trevor R; Akeroyd, Michael A; Noble, William; Bhullar, Navjot

2009-01-01

Many of the items in the "Speech, Spatial, and Qualities of Hearing" scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol. 43, 85-99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively higher scores for competing speech (six items), energetic masking (one item), and no masking (three items). The results suggest significant masking by competing speech in everyday listening situations.
Neural pathways for visual speech perception

Directory of Open Access Journals (Sweden)

Lynne E Bernstein

2014-12-01

Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.
Role of working memory and lexical knowledge in perceptual restoration of interrupted speech.

Science.gov (United States)

Nagaraj, Naveen K; Magimairaj, Beula M

2017-12-01

The role of working memory (WM) capacity and lexical knowledge in perceptual restoration (PR) of missing speech was investigated using the interrupted speech perception paradigm. Speech identification ability, which indexed PR, was measured using low-context sentences periodically interrupted at 1.5 Hz. PR was measured for silent gated, low-frequency speech noise filled, and low-frequency fine-structure and envelope filled interrupted conditions. WM capacity was measured using verbal and visuospatial span tasks. Lexical knowledge was assessed using both receptive vocabulary and meaning from context tests. Results showed that PR was better for speech noise filled condition than other conditions tested. Both receptive vocabulary and verbal WM capacity explained unique variance in PR for the speech noise filled condition, but were unrelated to performance in the silent gated condition. It was only receptive vocabulary that uniquely predicted PR for fine-structure and envelope filled conditions. These findings suggest that the contribution of lexical knowledge and verbal WM during PR depends crucially on the information content that replaced the silent intervals. When perceptual continuity was partially restored by filler speech noise, both lexical knowledge and verbal WM capacity facilitated PR. Importantly, for fine-structure and envelope filled interrupted conditions, lexical knowledge was crucial for PR.
Individual differences in language and working memory affect children's speech recognition in noise.

Science.gov (United States)

McCreery, Ryan W; Spratford, Meredith; Kirby, Benjamin; Brennan, Marc

2017-05-01

We examined how cognitive and linguistic skills affect speech recognition in noise for children with normal hearing. Children with better working memory and language abilities were expected to have better speech recognition in noise than peers with poorer skills in these domains. As part of a prospective, cross-sectional study, children with normal hearing completed speech recognition in noise for three types of stimuli: (1) monosyllabic words, (2) syntactically correct but semantically anomalous sentences and (3) semantically and syntactically anomalous word sequences. Measures of vocabulary, syntax and working memory were used to predict individual differences in speech recognition in noise. Ninety-six children with normal hearing, who were between 5 and 12 years of age. Higher working memory was associated with better speech recognition in noise for all three stimulus types. Higher vocabulary abilities were associated with better recognition in noise for sentences and word sequences, but not for words. Working memory and language both influence children's speech recognition in noise, but the relationships vary across types of stimuli. These findings suggest that clinical assessment of speech recognition is likely to reflect underlying cognitive and linguistic abilities, in addition to a child's auditory skills, consistent with the Ease of Language Understanding model.
The speech-based envelope power spectrum model (sEPSM) family: Development, achievements, and current challenges

DEFF Research Database (Denmark)

Relano-Iborra, Helia; Chabot-Leclerc, Alexandre; Scheidiger, Christoph

2017-01-01

have extended the predictive power of the original model to a broad range of conditions. This contribution presents the most recent developments within the sEPSM “family:” (i) A binaural extension, the B-sEPSM [Chabot-Leclerc et al. (2016). J. Acoust. Soc. Am. 140(1), 192-205] which combines better......Intelligibility models provide insights regarding the effects of target speech characteristics, transmission channels and/or auditory processing on the speech perception performance of listeners. In 2011, Jørgensen and Dau proposed the speech-based envelope power spectrum model [sEPSM, Jørgensen...
A discourse model of affect for text-to-speech synthesis

CSIR Research Space (South Africa)

Schlunz, GI

2013-12-01

Full Text Available This paper introduces a model of affect to improve prosody in text-to-speech synthesis. It operates on the discourse level of text to predict the underlying linguistic factors that contribute towards emotional appraisal, rather than any particular...
The use of artificial neural networks and multiple linear regression to predict rate of medical waste generation

International Nuclear Information System (INIS)

Jahandideh, Sepideh; Jahandideh, Samad; Asadabadi, Ebrahim Barzegari; Askarian, Mehrdad; Movahedi, Mohammad Mehdi; Hosseini, Somayyeh; Jahandideh, Mina

2009-01-01

Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R 2 were used to evaluate performance of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R 2 confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in future.
Auditory and Cognitive Factors Underlying Individual Differences in Aided Speech-Understanding among Older Adults

Directory of Open Access Journals (Sweden)

Larry E. Humes

2013-10-01

Full Text Available This study was designed to address individual differences in aided speech understanding among a relatively large group of older adults. The group of older adults consisted of 98 adults (50 female and 48 male ranging in age from 60 to 86 (mean = 69.2. Hearing loss was typical for this age group and about 90% had not worn hearing aids. All subjects completed a battery of tests, including cognitive (6 measures, psychophysical (17 measures, and speech-understanding (9 measures, as well as the Speech, Spatial and Qualities of Hearing (SSQ self-report scale. Most of the speech-understanding measures made use of competing speech and the non-speech psychophysical measures were designed to tap phenomena thought to be relevant for the perception of speech in competing speech (e.g., stream segregation, modulation-detection interference. All measures of speech understanding were administered with spectral shaping applied to the speech stimuli to fully restore audibility through at least 4000 Hz. The measures used were demonstrated to be reliable in older adults and, when compared to a reference group of 28 young normal-hearing adults, age-group differences were observed on many of the measures. Principal-components factor analysis was applied successfully to reduce the number of independent and dependent (speech understanding measures for a multiple-regression analysis. Doing so yielded one global cognitive-processing factor and five non-speech psychoacoustic factors (hearing loss, dichotic signal detection, multi-burst masking, stream segregation, and modulation detection as potential predictors. To this set of six potential predictor variables were added subject age, Environmental Sound Identification (ESI, and performance on the text-recognition-threshold (TRT task (a visual analog of interrupted speech recognition. These variables were used to successfully predict one global aided speech-understanding factor, accounting for about 60% of the variance.
Part-of-speech effects on text-to-speech synthesis

CSIR Research Space (South Africa)

Schlunz, GI

2010-11-01

Full Text Available One of the goals of text-to-speech (TTS) systems is to produce natural-sounding synthesised speech. Towards this end various natural language processing (NLP) tasks are performed to model the prosodic aspects of the TTS voice. One of the fundamental...
Improved prediction of residue flexibility by embedding optimized amino acid grouping into RSA-based linear models.

Science.gov (United States)

Zhang, Hua; Kurgan, Lukasz

2014-12-01

Knowledge of protein flexibility is vital for deciphering the corresponding functional mechanisms. This knowledge would help, for instance, in improving computational drug design and refinement in homology-based modeling. We propose a new predictor of the residue flexibility, which is expressed by B-factors, from protein chains that use local (in the chain) predicted (or native) relative solvent accessibility (RSA) and custom-derived amino acid (AA) alphabets. Our predictor is implemented as a two-stage linear regression model that uses RSA-based space in a local sequence window in the first stage and a reduced AA pair-based space in the second stage as the inputs. This method is easy to comprehend explicit linear form in both stages. Particle swarm optimization was used to find an optimal reduced AA alphabet to simplify the input space and improve the prediction performance. The average correlation coefficients between the native and predicted B-factors measured on a large benchmark dataset are improved from 0.65 to 0.67 when using the native RSA values and from 0.55 to 0.57 when using the predicted RSA values. Blind tests that were performed on two independent datasets show consistent improvements in the average correlation coefficients by a modest value of 0.02 for both native and predicted RSA-based predictions.
Comparing models of the combined-stimulation advantage for speech recognition.

Science.gov (United States)

Micheyl, Christophe; Oxenham, Andrew J

2012-05-01

The "combined-stimulation advantage" refers to an improvement in speech recognition when cochlear-implant or vocoded stimulation is supplemented by low-frequency acoustic information. Previous studies have been interpreted as evidence for "super-additive" or "synergistic" effects in the combination of low-frequency and electric or vocoded speech information by human listeners. However, this conclusion was based on predictions of performance obtained using a suboptimal high-threshold model of information combination. The present study shows that a different model, based on Gaussian signal detection theory, can predict surprisingly large combined-stimulation advantages, even when performance with either information source alone is close to chance, without involving any synergistic interaction. A reanalysis of published data using this model reveals that previous results, which have been interpreted as evidence for super-additive effects in perception of combined speech stimuli, are actually consistent with a more parsimonious explanation, according to which the combined-stimulation advantage reflects an optimal combination of two independent sources of information. The present results do not rule out the possible existence of synergistic effects in combined stimulation; however, they emphasize the possibility that the combined-stimulation advantages observed in some studies can be explained simply by non-interactive combination of two information sources.
75 FR 26701 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...

Science.gov (United States)

2010-05-12

...] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities... proposed compensation rates for Interstate TRS, Speech-to-Speech Services (STS), Captioned Telephone... costs reported in the data submitted to NECA by VRS providers. In this regard, document DA 10-761 also...
[Non-speech oral motor treatment efficacy for children with developmental speech sound disorders].

Science.gov (United States)

Ygual-Fernandez, A; Cervera-Merida, J F

2016-01-01

In the treatment of speech disorders by means of speech therapy two antagonistic methodological approaches are applied: non-verbal ones, based on oral motor exercises (OME), and verbal ones, which are based on speech processing tasks with syllables, phonemes and words. In Spain, OME programmes are called 'programas de praxias', and are widely used and valued by speech therapists. To review the studies conducted on the effectiveness of OME-based treatments applied to children with speech disorders and the theoretical arguments that could justify, or not, their usefulness. Over the last few decades evidence has been gathered about the lack of efficacy of this approach to treat developmental speech disorders and pronunciation problems in populations without any neurological alteration of motor functioning. The American Speech-Language-Hearing Association has advised against its use taking into account the principles of evidence-based practice. The knowledge gathered to date on motor control shows that the pattern of mobility and its corresponding organisation in the brain are different in speech and other non-verbal functions linked to nutrition and breathing. Neither the studies on their effectiveness nor the arguments based on motor control studies recommend the use of OME-based programmes for the treatment of pronunciation problems in children with developmental language disorders.
Cognitive Bias for Learning Speech Sounds From a Continuous Signal Space Seems Nonlinguistic

Directory of Open Access Journals (Sweden)

Sabine van der Ham

2015-10-01

Full Text Available When learning language, humans have a tendency to produce more extreme distributions of speech sounds than those observed most frequently: In rapid, casual speech, vowel sounds are centralized, yet cross-linguistically, peripheral vowels occur almost universally. We investigate whether adults’ generalization behavior reveals selective pressure for communication when they learn skewed distributions of speech-like sounds from a continuous signal space. The domain-specific hypothesis predicts that the emergence of sound categories is driven by a cognitive bias to make these categories maximally distinct, resulting in more skewed distributions in participants’ reproductions. However, our participants showed more centered distributions, which goes against this hypothesis, indicating that there are no strong innate linguistic biases that affect learning these speech-like sounds. The centralization behavior can be explained by a lack of communicative pressure to maintain categories.
A simple method for HPLC retention time prediction: linear calibration using two reference substances.

Science.gov (United States)

Sun, Lei; Jin, Hong-Yu; Tian, Run-Tao; Wang, Ming-Juan; Liu, Li-Na; Ye, Liu-Ping; Zuo, Tian-Tian; Ma, Shuang-Cheng

2017-01-01

Analysis of related substances in pharmaceutical chemicals and multi-components in traditional Chinese medicines needs bulk of reference substances to identify the chromatographic peaks accurately. But the reference substances are costly. Thus, the relative retention (RR) method has been widely adopted in pharmacopoeias and literatures for characterizing HPLC behaviors of those reference substances unavailable. The problem is it is difficult to reproduce the RR on different columns due to the error between measured retention time (t R ) and predicted t R in some cases. Therefore, it is useful to develop an alternative and simple method for prediction of t R accurately. In the present study, based on the thermodynamic theory of HPLC, a method named linear calibration using two reference substances (LCTRS) was proposed. The method includes three steps, procedure of two points prediction, procedure of validation by multiple points regression and sequential matching. The t R of compounds on a HPLC column can be calculated by standard retention time and linear relationship. The method was validated in two medicines on 30 columns. It was demonstrated that, LCTRS method is simple, but more accurate and more robust on different HPLC columns than RR method. Hence quality standards using LCTRS method are easy to reproduce in different laboratories with lower cost of reference substances.

[Prediction model of health workforce and beds in county hospitals of Hunan by multiple linear regression].

Science.gov (United States)

Ling, Ru; Liu, Jiawang

2011-12-01

To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity

Science.gov (United States)

Moses, David A.; Mesgarani, Nima; Leonard, Matthew K.; Chang, Edward F.

2016-10-01

Objective. The superior temporal gyrus (STG) and neighboring brain regions play a key role in human language processing. Previous studies have attempted to reconstruct speech information from brain activity in the STG, but few of them incorporate the probabilistic framework and engineering methodology used in modern speech recognition systems. In this work, we describe the initial efforts toward the design of a neural speech recognition (NSR) system that performs continuous phoneme recognition on English stimuli with arbitrary vocabulary sizes using the high gamma band power of local field potentials in the STG and neighboring cortical areas obtained via electrocorticography. Approach. The system implements a Viterbi decoder that incorporates phoneme likelihood estimates from a linear discriminant analysis model and transition probabilities from an n-gram phonemic language model. Grid searches were used in an attempt to determine optimal parameterizations of the feature vectors and Viterbi decoder. Main results. The performance of the system was significantly improved by using spatiotemporal representations of the neural activity (as opposed to purely spatial representations) and by including language modeling and Viterbi decoding in the NSR system. Significance. These results emphasize the importance of modeling the temporal dynamics of neural responses when analyzing their variations with respect to varying stimuli and demonstrate that speech recognition techniques can be successfully leveraged when decoding speech from neural signals. Guided by the results detailed in this work, further development of the NSR system could have applications in the fields of automatic speech recognition and neural prosthetics.
75 FR 54040 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...

Science.gov (United States)

2010-09-03

...] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...; speech-to-speech (STS); pay-per-call (900) calls; types of calls; and equal access to interexchange... of a report, due April 16, 2011, addressing whether it is necessary for the waivers to remain in...
Auditory Verbal Working Memory as a Predictor of Speech Perception in Modulated Maskers in Listeners With Normal Hearing.

Science.gov (United States)

Millman, Rebecca E; Mattys, Sven L

2017-05-24

Background noise can interfere with our ability to understand speech. Working memory capacity (WMC) has been shown to contribute to the perception of speech in modulated noise maskers. WMC has been assessed with a variety of auditory and visual tests, often pertaining to different components of working memory. This study assessed the relationship between speech perception in modulated maskers and components of auditory verbal working memory (AVWM) over a range of signal-to-noise ratios. Speech perception in noise and AVWM were measured in 30 listeners (age range 31-67 years) with normal hearing. AVWM was estimated using forward digit recall, backward digit recall, and nonword repetition. After controlling for the effects of age and average pure-tone hearing threshold, speech perception in modulated maskers was related to individual differences in the phonological component of working memory (as assessed by nonword repetition) but only in the least favorable signal-to-noise ratio. The executive component of working memory (as assessed by backward digit) was not predictive of speech perception in any conditions. AVWM is predictive of the ability to benefit from temporal dips in modulated maskers: Listeners with greater phonological WMC are better able to correctly identify sentences in modulated noise backgrounds.
Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

Science.gov (United States)

Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

2010-01-01

A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.
Comparison of Linear and Nonlinear Model Predictive Control for Optimization of Spray Dryer Operation

DEFF Research Database (Denmark)

Petersen, Lars Norbert; Poulsen, Niels Kjølstad; Niemann, Hans Henrik

2015-01-01

In this paper, we compare the performance of an economically optimizing Nonlinear Model Predictive Controller (E-NMPC) to a linear tracking Model Predictive Controller (MPC) for a spray drying plant. We find in this simulation study, that the economic performance of the two controllers are almost...... equal. We evaluate the economic performance with an industrially recorded disturbance scenario, where unmeasured disturbances and model mismatch are present. The state of the spray dryer, used in the E-NMPC and MPC, is estimated using Kalman Filters with noise covariances estimated by a maximum...
Language and motor speech skills in children with cerebral palsy

NARCIS (Netherlands)

Pirila, Sija; van der Meere, Jaap; Pentikainen, Taina; Ruusu-Niemi, Pirjo; Korpela, Raija; Kilpinen, Jenni; Nieminen, Pirkko; Ruusu-Niemin, P; Kilpinen, R

2007-01-01

The aim of the study was to investigate associations between the severity of motor limitations, cognitive difficulties, language and motor speech problems in children with cerebral palsy. Also, the predictive power of neonatal cranial ultrasound findings on later outcome was investigated. For this
Environmental Contamination of Normal Speech.

Science.gov (United States)

Harley, Trevor A.

1990-01-01

Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…
Emotionally conditioning the target-speech voice enhances recognition of the target speech under "cocktail-party" listening conditions.

Science.gov (United States)

Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang

2018-05-01

Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.
Towards Automated Binding Affinity Prediction Using an Iterative Linear Interaction Energy Approach

Directory of Open Access Journals (Sweden)

C. Ruben Vosmeer

2014-01-01

Full Text Available Binding affinity prediction of potential drugs to target and off-target proteins is an essential asset in drug development. These predictions require the calculation of binding free energies. In such calculations, it is a major challenge to properly account for both the dynamic nature of the protein and the possible variety of ligand-binding orientations, while keeping computational costs tractable. Recently, an iterative Linear Interaction Energy (LIE approach was introduced, in which results from multiple simulations of a protein-ligand complex are combined into a single binding free energy using a Boltzmann weighting-based scheme. This method was shown to reach experimental accuracy for flexible proteins while retaining the computational efficiency of the general LIE approach. Here, we show that the iterative LIE approach can be used to predict binding affinities in an automated way. A workflow was designed using preselected protein conformations, automated ligand docking and clustering, and a (semi-automated molecular dynamics simulation setup. We show that using this workflow, binding affinities of aryloxypropanolamines to the malleable Cytochrome P450 2D6 enzyme can be predicted without a priori knowledge of dominant protein-ligand conformations. In addition, we provide an outlook for an approach to assess the quality of the LIE predictions, based on simulation outcomes only.
Integrating piecewise linear representation and ensemble neural network for stock price prediction

OpenAIRE

Asaduzzaman, Md.; Shahjahan, Md.; Ahmed, Fatema Johera; Islam, Md. Monirul; Murase, Kazuyuki

2014-01-01

Stock Prices are considered to be very dynamic and susceptible to quick changes because of the underlying nature of the financial domain, and in part because of the interchange between known parameters and unknown factors. Of late, several researchers have used Piecewise Linear Representation (PLR) to predict the stock market pricing. However, some improvements are needed to avoid the appropriate threshold of the trading decision, choosing the input index as well as improving the overall perf...
Multilevel Analysis in Analyzing Speech Data

Science.gov (United States)

Guddattu, Vasudeva; Krishna, Y.

2011-01-01

The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…
Perceived Liveliness and Speech Comprehensibility in Aphasia: The Effects of Direct Speech in Auditory Narratives

Science.gov (United States)

Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike

2014-01-01

Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in "healthy" communication direct speech constructions contribute to the liveliness, and indirectly to the comprehensibility, of speech.…
Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model

Directory of Open Access Journals (Sweden)

Lotter Thomas

2005-01-01

Full Text Available This contribution presents two spectral amplitude estimators for acoustical background noise suppression based on maximum a posteriori estimation and super-Gaussian statistical modelling of the speech DFT amplitudes. The probability density function of the speech spectral amplitude is modelled with a simple parametric function, which allows a high approximation accuracy for Laplace- or Gamma-distributed real and imaginary parts of the speech DFT coefficients. Also, the statistical model can be adapted to optimally fit the distribution of the speech spectral amplitudes for a specific noise reduction system. Based on the super-Gaussian statistical model, computationally efficient maximum a posteriori speech estimators are derived, which outperform the commonly applied Ephraim-Malah algorithm.
Exploring the role of brain oscillations in speech perception in noise: Intelligibility of isochronously retimed speech

Directory of Open Access Journals (Sweden)

Vincent Aubanel

2016-08-01

Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.
Cross-modal distraction by background speech: what role for meaning?

Science.gov (United States)

Marsh, John E; Jones, Dylan M

2010-01-01

Mental tasks are susceptible to disruption by concurrent to-be-ignored speech. The goal of the present paper is to examine whether a theoretical framework successfully applied to irrelevant speech effects in serial recall-interference by process-can be extended to verbal tasks in which meaning is the basis of retrieval and to which the irrelevant sound is related to different degrees by meaning. That the semantic characteristics of the to-be-ignored sound interact with the predominance of semantic retrieval in the focal task to determine the degree of disruption is demonstrated in three settings: free recall, category-clustering and fluency. Source monitoring-the difficulty in discriminating episodic information on the basis of the sense modality (visual or auditory) in which it was presented-contributes in part to the disruption by speech. The power of alternative accounts-interference-by-content and attentional capture-to predict these outcomes is also discussed.
Ear, Hearing and Speech

DEFF Research Database (Denmark)

Poulsen, Torben

2000-01-01

An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...
Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals

Science.gov (United States)

Lidestam, Björn; Rönnberg, Jerker

2016-01-01

The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. PMID:27317667
Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech and music

Directory of Open Access Journals (Sweden)

Hwee Ling eLee

2014-08-01

Full Text Available This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogues of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms. Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past three years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.
Who Receives Speech/Language Services by 5 Years of Age in the United States?

Science.gov (United States)

Hammer, Carol Scheffner; Farkas, George; Hillemeier, Marianne M.; Maczuga, Steve; Cook, Michael; Morano, Stephanie

2016-01-01

Purpose We sought to identify factors predictive of or associated with receipt of speech/language services during early childhood. We did so by analyzing data from the Early Childhood Longitudinal Study–Birth Cohort (ECLS-B; Andreassen & Fletcher, 2005), a nationally representative data set maintained by the U.S. Department of Education. We addressed two research questions of particular importance to speech-language pathology practice and policy. First, do early vocabulary delays increase children's likelihood of receiving speech/language services? Second, are minority children systematically less likely to receive these services than otherwise similar White children? Method Multivariate logistic regression analyses were performed for a population-based sample of 9,600 children and families participating in the ECLS-B. Results Expressive vocabulary delays by 24 months of age were strongly associated with and predictive of children's receipt of speech/language services at 24, 48, and 60 months of age (adjusted odds ratio range = 4.32–16.60). Black children were less likely to receive speech/language services than otherwise similar White children at 24, 48, and 60 months of age (adjusted odds ratio range = 0.42–0.55). Lower socioeconomic status children and those whose parental primary language was other than English were also less likely to receive services. Being born with very low birth weight also significantly increased children's receipt of services at 24, 48, and 60 months of age. Conclusion Expressive vocabulary delays at 24 months of age increase children’s risk for later speech/language services. Increased use of culturally and linguistically sensitive practices may help racial/ethnic minority children access needed services. PMID:26579989

A study on two phase flows of linear compressors for the prediction of refrigerant leakage

International Nuclear Information System (INIS)

Hwang, Il Sun; Lee, Young Lim; Oh, Won Sik; Park, Kyeong Bae

2015-01-01

Usage of linear compressors is on the rise due to their high efficiency. In this paper, leakage of a linear compressor has been studied through numerical analysis and experiments. First, nitrogen leakage for a stagnant piston with fixed cylinder pressure as well as for a moving piston with fixed cylinder pressure was analyzed to verify the validity of the two-phase flow analysis model. Next, refrigerant leakage of a linear compressor in operation was finally predicted through 3-dimensional unsteady, two phase flow CFD (Computational fluid dynamics). According to the research results, the numerical analyses for the fixed cylinder pressure models were in good agreement with the experimental results. The refrigerant leakage of the linear compressor in operation mainly occurred through the oil exit and the leakage became negligible after about 0.4s following operation where the leakage became lower than 2.0x10 -4 kg/s.
Speech comprehension aided by multiple modalities: behavioural and neural interactions

Science.gov (United States)

McGettigan, Carolyn; Faulkner, Andrew; Altarelli, Irene; Obleser, Jonas; Baverstock, Harriet; Scott, Sophie K.

2014-01-01

Speech comprehension is a complex human skill, the performance of which requires the perceiver to combine information from several sources – e.g. voice, face, gesture, linguistic context – to achieve an intelligible and interpretable percept. We describe a functional imaging investigation of how auditory, visual and linguistic information interact to facilitate comprehension. Our specific aims were to investigate the neural responses to these different information sources, alone and in interaction, and further to use behavioural speech comprehension scores to address sites of intelligibility-related activation in multifactorial speech comprehension. In fMRI, participants passively watched videos of spoken sentences, in which we varied Auditory Clarity (with noise-vocoding), Visual Clarity (with Gaussian blurring) and Linguistic Predictability. Main effects of enhanced signal with increased auditory and visual clarity were observed in overlapping regions of posterior STS. Two-way interactions of the factors (auditory × visual, auditory × predictability) in the neural data were observed outside temporal cortex, where positive signal change in response to clearer facial information and greater semantic predictability was greatest at intermediate levels of auditory clarity. Overall changes in stimulus intelligibility by condition (as determined using an independent behavioural experiment) were reflected in the neural data by increased activation predominantly in bilateral dorsolateral temporal cortex, as well as inferior frontal cortex and left fusiform gyrus. Specific investigation of intelligibility changes at intermediate auditory clarity revealed a set of regions, including posterior STS and fusiform gyrus, showing enhanced responses to both visual and linguistic information. Finally, an individual differences analysis showed that greater comprehension performance in the scanning participants (measured in a post-scan behavioural test) were associated with
Speech sound disorder at 4 years: prevalence, comorbidities, and predictors in a community cohort of children.

Science.gov (United States)

Eadie, Patricia; Morgan, Angela; Ukoumunne, Obioha C; Ttofari Eecen, Kyriaki; Wake, Melissa; Reilly, Sheena

2015-06-01

The epidemiology of preschool speech sound disorder is poorly understood. Our aims were to determine: the prevalence of idiopathic speech sound disorder; the comorbidity of speech sound disorder with language and pre-literacy difficulties; and the factors contributing to speech outcome at 4 years. One thousand four hundred and ninety-four participants from an Australian longitudinal cohort completed speech, language, and pre-literacy assessments at 4 years. Prevalence of speech sound disorder (SSD) was defined by standard score performance of ≤79 on a speech assessment. Logistic regression examined predictors of SSD within four domains: child and family; parent-reported speech; cognitive-linguistic; and parent-reported motor skills. At 4 years the prevalence of speech disorder in an Australian cohort was 3.4%. Comorbidity with SSD was 40.8% for language disorder and 20.8% for poor pre-literacy skills. Sex, maternal vocabulary, socio-economic status, and family history of speech and language difficulties predicted SSD, as did 2-year speech, language, and motor skills. Together these variables provided good discrimination of SSD (area under the curve=0.78). This is the first epidemiological study to demonstrate prevalence of SSD at 4 years of age that was consistent with previous clinical studies. Early detection of SSD at 4 years should focus on family variables and speech, language, and motor skills measured at 2 years. © 2014 Mac Keith Press.
Effect of gap detection threshold on consistency of speech in children with speech sound disorder.

Science.gov (United States)

Sayyahi, Fateme; Soleymani, Zahra; Akbari, Mohammad; Bijankhan, Mahmood; Dolatshahi, Behrooz

2017-02-01

The present study examined the relationship between gap detection threshold and speech error consistency in children with speech sound disorder. The participants were children five to six years of age who were categorized into three groups of typical speech, consistent speech disorder (CSD) and inconsistent speech disorder (ISD).The phonetic gap detection threshold test was used for this study, which is a valid test comprised six syllables with inter-stimulus intervals between 20-300ms. The participants were asked to listen to the recorded stimuli three times and indicate whether they heard one or two sounds. There was no significant difference between the typical and CSD groups (p=0.55), but there were significant differences in performance between the ISD and CSD groups and the ISD and typical groups (p=0.00). The ISD group discriminated between speech sounds at a higher threshold. Children with inconsistent speech errors could not distinguish speech sounds during time-limited phonetic discrimination. It is suggested that inconsistency in speech is a representation of inconsistency in auditory perception, which causes by high gap detection threshold. Copyright © 2016 Elsevier Ltd. All rights reserved.
Suppression of the µ rhythm during speech and non-speech discrimination revealed by independent component analysis: implications for sensorimotor integration in speech processing.

Science.gov (United States)

Bowers, Andrew; Saltuklaroglu, Tim; Harkrider, Ashley; Cuellar, Megan

2013-01-01

Constructivist theories propose that articulatory hypotheses about incoming phonetic targets may function to enhance perception by limiting the possibilities for sensory analysis. To provide evidence for this proposal, it is necessary to map ongoing, high-temporal resolution changes in sensorimotor activity (i.e., the sensorimotor μ rhythm) to accurate speech and non-speech discrimination performance (i.e., correct trials.). Sixteen participants (15 female and 1 male) were asked to passively listen to or actively identify speech and tone-sweeps in a two-force choice discrimination task while the electroencephalograph (EEG) was recorded from 32 channels. The stimuli were presented at signal-to-noise ratios (SNRs) in which discrimination accuracy was high (i.e., 80-100%) and low SNRs producing discrimination performance at chance. EEG data were decomposed using independent component analysis and clustered across participants using principle component methods in EEGLAB. ICA revealed left and right sensorimotor µ components for 14/16 and 13/16 participants respectively that were identified on the basis of scalp topography, spectral peaks, and localization to the precentral and postcentral gyri. Time-frequency analysis of left and right lateralized µ component clusters revealed significant (pFDRspeech discrimination trials relative to chance trials following stimulus offset. Findings are consistent with constructivist, internal model theories proposing that early forward motor models generate predictions about likely phonemic units that are then synthesized with incoming sensory cues during active as opposed to passive processing. Future directions and possible translational value for clinical populations in which sensorimotor integration may play a functional role are discussed.
Effects of noise on speech recognition: Challenges for communication by service members.

Science.gov (United States)

Le Prell, Colleen G; Clavier, Odile H

2017-06-01

Speech communication often takes place in noisy environments; this is an urgent issue for military personnel who must communicate in high-noise environments. The effects of noise on speech recognition vary significantly according to the sources of noise, the number and types of talkers, and the listener's hearing ability. In this review, speech communication is first described as it relates to current standards of hearing assessment for military and civilian populations. The next section categorizes types of noise (also called maskers) according to their temporal characteristics (steady or fluctuating) and perceptive effects (energetic or informational masking). Next, speech recognition difficulties experienced by listeners with hearing loss and by older listeners are summarized, and questions on the possible causes of speech-in-noise difficulty are discussed, including recent suggestions of "hidden hearing loss". The final section describes tests used by military and civilian researchers, audiologists, and hearing technicians to assess performance of an individual in recognizing speech in background noise, as well as metrics that predict performance based on a listener and background noise profile. This article provides readers with an overview of the challenges associated with speech communication in noisy backgrounds, as well as its assessment and potential impact on functional performance, and provides guidance for important new research directions relevant not only to military personnel, but also to employees who work in high noise environments. Copyright © 2016 Elsevier B.V. All rights reserved.
Speech Perception as a Multimodal Phenomenon

OpenAIRE

Rosenblum, Lawrence D.

2008-01-01

Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal s...
Poor Speech Perception Is Not a Core Deficit of Childhood Apraxia of Speech: Preliminary Findings

Science.gov (United States)

Zuk, Jennifer; Iuzzini-Seigel, Jenya; Cabbage, Kathryn; Green, Jordan R.; Hogan, Tiffany P.

2018-01-01

Purpose: Childhood apraxia of speech (CAS) is hypothesized to arise from deficits in speech motor planning and programming, but the influence of abnormal speech perception in CAS on these processes is debated. This study examined speech perception abilities among children with CAS with and without language impairment compared to those with…
Situational influences on rhythmicity in speech, music, and their interaction.

Science.gov (United States)

Hawkins, Sarah

2014-12-19

Brain processes underlying the production and perception of rhythm indicate considerable flexibility in how physical signals are interpreted. This paper explores how that flexibility might play out in rhythmicity in speech and music. There is much in common across the two domains, but there are also significant differences. Interpretations are explored that reconcile some of the differences, particularly with respect to how functional properties modify the rhythmicity of speech, within limits imposed by its structural constraints. Functional and structural differences mean that music is typically more rhythmic than speech, and that speech will be more rhythmic when the emotions are more strongly engaged, or intended to be engaged. The influence of rhythmicity on attention is acknowledged, and it is suggested that local increases in rhythmicity occur at times when attention is required to coordinate joint action, whether in talking or music-making. Evidence is presented which suggests that while these short phases of heightened rhythmical behaviour are crucial to the success of transitions in communicative interaction, their modality is immaterial: they all function to enhance precise temporal prediction and hence tightly coordinated joint action. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Situational influences on rhythmicity in speech, music, and their interaction

Science.gov (United States)

Hawkins, Sarah

2014-01-01

Brain processes underlying the production and perception of rhythm indicate considerable flexibility in how physical signals are interpreted. This paper explores how that flexibility might play out in rhythmicity in speech and music. There is much in common across the two domains, but there are also significant differences. Interpretations are explored that reconcile some of the differences, particularly with respect to how functional properties modify the rhythmicity of speech, within limits imposed by its structural constraints. Functional and structural differences mean that music is typically more rhythmic than speech, and that speech will be more rhythmic when the emotions are more strongly engaged, or intended to be engaged. The influence of rhythmicity on attention is acknowledged, and it is suggested that local increases in rhythmicity occur at times when attention is required to coordinate joint action, whether in talking or music-making. Evidence is presented which suggests that while these short phases of heightened rhythmical behaviour are crucial to the success of transitions in communicative interaction, their modality is immaterial: they all function to enhance precise temporal prediction and hence tightly coordinated joint action. PMID:25385776
Speech perception under adverse conditions: Insights from behavioral, computational and neuroscience research

Directory of Open Access Journals (Sweden)

Sara eGuediche

2014-01-01

Full Text Available Adult speech perception reflects the long-term regularities of the native language, but it is also flexible such that it accommodates and adapts to adverse listening conditions and short-term deviations from native-language norms. The purpose of this review article is to examine how the broader neuroscience literature can inform and advance research efforts in understanding the neural basis of flexibility and adaptive plasticity in speech perception. In particular, we consider several domains of neuroscience research that offer insight into how perception can be adaptively tuned to short-term deviations while also maintaining without affecting the long-term learned regularities for mapping sensory input. We review several literatures to highlight the potential role of learning algorithms that rely on prediction error signals and discuss specific neural structures that are likely to contribute to such learning. Already, a few studies have alluded to a potential role of these mechanisms in adaptive plasticity in speech perception. Better understanding the application and limitations of these algorithms for the challenges of flexible speech perception under adverse conditions promises to inform theoretical models of speech.
Principles of speech coding

CERN Document Server

Ogunfunmi, Tokunbo

2010-01-01

It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the
Performance Characteristics and Prediction of Bodyweight using Linear Body Measurements in Four Strains of Broiler Chicken

OpenAIRE

I. Udeh; J.O. Isikwenu and G. Ukughere

2011-01-01

The objectives of this study were to compare the performance characteristics of four strains of broiler chicken from 2 to 8 weeks of age and predict body weight of the broilers using linear body measurements. The four strains of broiler chicken used were Anak, Arbor Acre, Ross and Marshall. The parameters recorded were bodyweight, weight gain, total feed intake, feed conversion ratio, mortality and some linear body measurements (body length, body width, breast width, drumstick length, shank l...
Plateletpheresis efficiency and mathematical correction of software-derived platelet yield prediction: A linear regression and ROC modeling approach.

Science.gov (United States)

Jaime-Pérez, José Carlos; Jiménez-Castillo, Raúl Alberto; Vázquez-Hernández, Karina Elizabeth; Salazar-Riojas, Rosario; Méndez-Ramírez, Nereida; Gómez-Almaguer, David

2017-10-01

Advances in automated cell separators have improved the efficiency of plateletpheresis and the possibility of obtaining double products (DP). We assessed cell processor accuracy of predicted platelet (PLT) yields with the goal of a better prediction of DP collections. This retrospective proof-of-concept study included 302 plateletpheresis procedures performed on a Trima Accel v6.0 at the apheresis unit of a hematology department. Donor variables, software predicted yield and actual PLT yield were statistically evaluated. Software prediction was optimized by linear regression analysis and its optimal cut-off to obtain a DP assessed by receiver operating characteristic curve (ROC) modeling. Three hundred and two plateletpheresis procedures were performed; in 271 (89.7%) occasions, donors were men and in 31 (10.3%) women. Pre-donation PLT count had the best direct correlation with actual PLT yield (r = 0.486. P Simple correction derived from linear regression analysis accurately corrected this underestimation and ROC analysis identified a precise cut-off to reliably predict a DP. © 2016 Wiley Periodicals, Inc.
The Neural Bases of Difficult Speech Comprehension and Speech Production: Two Activation Likelihood Estimation (ALE) Meta-Analyses

Science.gov (United States)

Adank, Patti

2012-01-01

The role of speech production mechanisms in difficult speech comprehension is the subject of on-going debate in speech science. Two Activation Likelihood Estimation (ALE) analyses were conducted on neuroimaging studies investigating difficult speech comprehension or speech production. Meta-analysis 1 included 10 studies contrasting comprehension…
Individual differences in language and working memory affect children’s speech recognition in noise

Science.gov (United States)

McCreery, Ryan W.; Spratford, Meredith; Kirby, Benjamin; Brennan, Marc

2017-01-01

Objective We examined how cognitive and linguistic skills affect speech recognition in noise for children with normal hearing. Children with better working memory and language abilities were expected to have better speech recognition in noise than peers with poorer skills in these domains. Design As part of a prospective, cross-sectional study, children with normal hearing completed speech recognition in noise for three types of stimuli: (1) monosyllabic words, (2) syntactically correct but semantically anomalous sentences and (3) semantically and syntactically anomalous word sequences. Measures of vocabulary, syntax and working memory were used to predict individual differences in speech recognition in noise. Study sample Ninety-six children with normal hearing, who were between 5 and 12 years of age. Results Higher working memory was associated with better speech recognition in noise for all three stimulus types. Higher vocabulary abilities were associated with better recognition in noise for sentences and word sequences, but not for words. Conclusions Working memory and language both influence children’s speech recognition in noise, but the relationships vary across types of stimuli. These findings suggest that clinical assessment of speech recognition is likely to reflect underlying cognitive and linguistic abilities, in addition to a child’s auditory skills, consistent with the Ease of Language Understanding model. PMID:27981855
Metaheuristic applications to speech enhancement

CERN Document Server

Kunche, Prajna

2016-01-01

This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.
Systematic Studies of Modified Vocalization: The Effect of Speech Rate on Speech Production Measures during Metronome-Paced Speech in Persons Who Stutter

Science.gov (United States)

Davidow, Jason H.

2014-01-01

Background: Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control…
TongueToSpeech (TTS): Wearable wireless assistive device for augmented speech.

Science.gov (United States)

Marjanovic, Nicholas; Piccinini, Giacomo; Kerr, Kevin; Esmailbeigi, Hananeh

2017-07-01

Speech is an important aspect of human communication; individuals with speech impairment are unable to communicate vocally in real time. Our team has developed the TongueToSpeech (TTS) device with the goal of augmenting speech communication for the vocally impaired. The proposed device is a wearable wireless assistive device that incorporates a capacitive touch keyboard interface embedded inside a discrete retainer. This device connects to a computer, tablet or a smartphone via Bluetooth connection. The developed TTS application converts text typed by the tongue into audible speech. Our studies have concluded that an 8-contact point configuration between the tongue and the TTS device would yield the best user precision and speed performance. On average using the TTS device inside the oral cavity takes 2.5 times longer than the pointer finger using a T9 (Text on 9 keys) keyboard configuration to type the same phrase. In conclusion, we have developed a discrete noninvasive wearable device that allows the vocally impaired individuals to communicate in real time.
Social eye gaze modulates processing of speech and co-speech gesture.

Science.gov (United States)

Holler, Judith; Schubotz, Louise; Kelly, Spencer; Hagoort, Peter; Schuetze, Manuela; Özyürek, Aslı

2014-12-01

In human face-to-face communication, language comprehension is a multi-modal, situated activity. However, little is known about how we combine information from different modalities during comprehension, and how perceived communicative intentions, often signaled through visual signals, influence this process. We explored this question by simulating a multi-party communication context in which a speaker alternated her gaze between two recipients. Participants viewed speech-only or speech+gesture object-related messages when being addressed (direct gaze) or unaddressed (gaze averted to other participant). They were then asked to choose which of two object images matched the speaker's preceding message. Unaddressed recipients responded significantly more slowly than addressees for speech-only utterances. However, perceiving the same speech accompanied by gestures sped unaddressed recipients up to a level identical to that of addressees. That is, when unaddressed recipients' speech processing suffers, gestures can enhance the comprehension of a speaker's message. We discuss our findings with respect to two hypotheses attempting to account for how social eye gaze may modulate multi-modal language comprehension. Copyright © 2014 Elsevier B.V. All rights reserved.

Electrophysiological evidence for speech-specific audiovisual integration.

Science.gov (United States)

Baart, Martijn; Stekelenburg, Jeroen J; Vroomen, Jean

2014-01-01

Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were modulated by phonetic audiovisual congruency. In order to disentangle speech-specific (phonetic) integration from non-speech integration, we used Sine-Wave Speech (SWS) that was perceived as speech by half of the participants (they were in speech-mode), while the other half was in non-speech mode. Results showed that the N1 obtained with audiovisual stimuli peaked earlier than the N1 evoked by auditory-only stimuli. This lip-read induced speeding up of the N1 occurred for listeners in speech and non-speech mode. In contrast, if listeners were in speech-mode, lip-read speech also modulated the auditory P2, but not if listeners were in non-speech mode, thus revealing speech-specific audiovisual binding. Comparing ERPs for phonetically congruent audiovisual stimuli with ERPs for incongruent stimuli revealed an effect of phonetic stimulus congruency that started at ~200 ms after (in)congruence became apparent. Critically, akin to the P2 suppression, congruency effects were only observed if listeners were in speech mode, and not if they were in non-speech mode. Using identical stimuli, we thus confirm that audiovisual binding involves (partially) different neural mechanisms for sound processing in speech and non-speech mode. © 2013 Published by Elsevier Ltd.
Free Speech Yearbook 1978.

Science.gov (United States)

Phifer, Gregg, Ed.

The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…
Visual context enhanced. The joint contribution of iconic gestures and visible speech to degraded speech comprehension.

NARCIS (Netherlands)

Drijvers, L.; Özyürek, A.

2017-01-01

Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech
Dynamics and control of quadcopter using linear model predictive control approach

Science.gov (United States)

Islam, M.; Okasha, M.; Idres, M. M.

2017-12-01

This paper investigates the dynamics and control of a quadcopter using the Model Predictive Control (MPC) approach. The dynamic model is of high fidelity and nonlinear, with six degrees of freedom that include disturbances and model uncertainties. The control approach is developed based on MPC to track different reference trajectories ranging from simple ones such as circular to complex helical trajectories. In this control technique, a linearized model is derived and the receding horizon method is applied to generate the optimal control sequence. Although MPC is computer expensive, it is highly effective to deal with the different types of nonlinearities and constraints such as actuators’ saturation and model uncertainties. The MPC parameters (control and prediction horizons) are selected by trial-and-error approach. Several simulation scenarios are performed to examine and evaluate the performance of the proposed control approach using MATLAB and Simulink environment. Simulation results show that this control approach is highly effective to track a given reference trajectory.
Prediction of protein interaction hot spots using rough set-based multiple criteria linear programming.

Science.gov (United States)

Chen, Ruoying; Zhang, Zhiwang; Wu, Di; Zhang, Peng; Zhang, Xinyang; Wang, Yong; Shi, Yong

2011-01-21

Protein-protein interactions are fundamentally important in many biological processes and it is in pressing need to understand the principles of protein-protein interactions. Mutagenesis studies have found that only a small fraction of surface residues, known as hot spots, are responsible for the physical binding in protein complexes. However, revealing hot spots by mutagenesis experiments are usually time consuming and expensive. In order to complement the experimental efforts, we propose a new computational approach in this paper to predict hot spots. Our method, Rough Set-based Multiple Criteria Linear Programming (RS-MCLP), integrates rough sets theory and multiple criteria linear programming to choose dominant features and computationally predict hot spots. Our approach is benchmarked by a dataset of 904 alanine-mutated residues and the results show that our RS-MCLP method performs better than other methods, e.g., MCLP, Decision Tree, Bayes Net, and the existing HotSprint database. In addition, we reveal several biological insights based on our analysis. We find that four features (the change of accessible surface area, percentage of the change of accessible surface area, size of a residue, and atomic contacts) are critical in predicting hot spots. Furthermore, we find that three residues (Tyr, Trp, and Phe) are abundant in hot spots through analyzing the distribution of amino acids. Copyright © 2010 Elsevier Ltd. All rights reserved.
Multisensory integration of speech sounds with letters vs. visual speech : only visual speech induces the mismatch negativity

NARCIS (Netherlands)

Stekelenburg, J.J.; Keetels, M.N.; Vroomen, J.H.M.

2018-01-01

Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect.
Speech Research

Science.gov (United States)

Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.
Represented Speech in Qualitative Health Research

DEFF Research Database (Denmark)

Musaeus, Peter

2017-01-01

Represented speech refers to speech where we reference somebody. Represented speech is an important phenomenon in everyday conversation, health care communication, and qualitative research. This case will draw first from a case study on physicians’ workplace learning and second from a case study...... on nurses’ apprenticeship learning. The aim of the case is to guide the qualitative researcher to use own and others’ voices in the interview and to be sensitive to represented speech in everyday conversation. Moreover, reported speech matters to health professionals who aim to represent the voice...... of their patients. Qualitative researchers and students might learn to encourage interviewees to elaborate different voices or perspectives. Qualitative researchers working with natural speech might pay attention to how people talk and use represented speech. Finally, represented speech might be relevant...
Linking infant-directed speech and face preferences to language outcomes in infants at risk for autism spectrum disorder.

Science.gov (United States)

Droucker, Danielle; Curtin, Suzanne; Vouloumanos, Athena

2013-04-01

In this study, the authors aimed to examine whether biases for infant-directed (ID) speech and faces differ between infant siblings of children with autism spectrum disorder (ASD) (SIBS-A) and infant siblings of typically developing children (SIBS-TD), and whether speech and face biases predict language outcomes and risk group membership. Thirty-six infants were tested at ages 6, 8, 12, and 18 months. Infants heard 2 ID and 2 adult-directed (AD) speech passages paired with either a checkerboard or a face. The authors assessed expressive language at 12 and 18 months and general functioning at 12 months using the Mullen Scales of Early Learning (Mullen, 1995). Both infant groups preferred ID to AD speech and preferred faces to checkerboards. SIBS-TD demonstrated higher expressive language at 18 months than did SIBS-A, a finding that correlated with preferences for ID speech at 12 months. Although both groups looked longer to face stimuli than to the checkerboard, the magnitude of the preference was smaller in SIBS-A and predicted expressive vocabulary at 18 months in this group. Infants' preference for faces contributed to risk-group membership in a logistic regression analysis. Infants at heightened risk of ASD differ from typically developing infants in their preferences for ID speech and faces, which may underlie deficits in later language development and social communication.
Measurement of speech parameters in casual speech of dementia patients

NARCIS (Netherlands)

Ossewaarde, Roelant; Jonkers, Roel; Jalvingh, Fedor; Bastiaanse, Yvonne

Measurement of speech parameters in casual speech of dementia patients Roelant Adriaan Ossewaarde1,2, Roel Jonkers1, Fedor Jalvingh1,3, Roelien Bastiaanse1 1CLCG, University of Groningen (NL); 2HU University of Applied Sciences Utrecht (NL); 33St. Marienhospital - Vechta, Geriatric Clinic Vechta
Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression

KAUST Repository

Abdul Jameel, Abdul Gani

2016-09-14

An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN) of 71 pure hydrocarbons and 54 hydrocarbon blends were utilized as a data set to study the relationship between ignition quality and molecular structure. CN and DCN are functional equivalents and collectively referred to as D/CN, herein. The effect of molecular weight and weight percent of structural parameters such as paraffinic CH3 groups, paraffinic CH2 groups, paraffinic CH groups, olefinic CH–CH2 groups, naphthenic CH–CH2 groups, and aromatic C–CH groups on D/CN was studied. A particular emphasis on the effect of branching (i.e., methyl substitution) on the D/CN was studied, and a new parameter denoted as the branching index (BI) was introduced to quantify this effect. A new formula was developed to calculate the BI of hydrocarbon fuels using 1H NMR spectroscopy. Multiple linear regression (MLR) modeling was used to develop an empirical relationship between D/CN and the eight structural parameters. This was then used to predict the DCN of many hydrocarbon fuels. The developed model has a high correlation coefficient (R2 = 0.97) and was validated with experimentally measured DCN of twenty-two real fuel mixtures (e.g., gasolines and diesels) and fifty-nine blends of known composition, and the predicted values matched well with the experimental data.
Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

Science.gov (United States)

Drzewiecki, Wojciech

2016-12-01

In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.
A Dantzig-Wolfe decomposition algorithm for linear economic model predictive control of dynamically decoupled subsystems

DEFF Research Database (Denmark)

Sokoler, Leo Emil; Standardi, Laura; Edlund, Kristian

2014-01-01

This paper presents a warm-started Dantzig–Wolfe decomposition algorithm tailored to economic model predictive control of dynamically decoupled subsystems. We formulate the constrained optimal control problem solved at each sampling instant as a linear program with state space constraints, input...... limits, input rate limits, and soft output limits. The objective function of the linear program is related directly to the cost of operating the subsystems, and the cost of violating the soft output constraints. Simulations for large-scale economic power dispatch problems show that the proposed algorithm...... is significantly faster than both state-of-the-art linear programming solvers, and a structure exploiting implementation of the alternating direction method of multipliers. It is also demonstrated that the control strategy presented in this paper can be tuned using a weighted ℓ1-regularization term...
Self-monitoring and feedback : A new attempt to find the main cause of lexical bias in phonological speech errors

NARCIS (Netherlands)

Nooteboom, S.G.; Quené, H.

2008-01-01

This paper reports two experiments designed to investigate whether lexical bias in phonological speech errors is caused by immediate feedback of activation, by self-monitoring of inner speech, or by both. The experiments test a number of predictions derived from a model of self-monitoring of inner
Robust distributed model predictive control of linear systems with structured time-varying uncertainties

Science.gov (United States)

Zhang, Langwen; Xie, Wei; Wang, Jingcheng

2017-11-01

In this work, synthesis of robust distributed model predictive control (MPC) is presented for a class of linear systems subject to structured time-varying uncertainties. By decomposing a global system into smaller dimensional subsystems, a set of distributed MPC controllers, instead of a centralised controller, are designed. To ensure the robust stability of the closed-loop system with respect to model uncertainties, distributed state feedback laws are obtained by solving a min-max optimisation problem. The design of robust distributed MPC is then transformed into solving a minimisation optimisation problem with linear matrix inequality constraints. An iterative online algorithm with adjustable maximum iteration is proposed to coordinate the distributed controllers to achieve a global performance. The simulation results show the effectiveness of the proposed robust distributed MPC algorithm.
Development of The Viking Speech Scale to classify the speech of children with cerebral palsy.

Science.gov (United States)

Pennington, Lindsay; Virella, Daniel; Mjøen, Tone; da Graça Andrada, Maria; Murray, Janice; Colver, Allan; Himmelmann, Kate; Rackauskaite, Gija; Greitane, Andra; Prasauskiene, Audrone; Andersen, Guro; de la Cruz, Javier

2013-10-01

Surveillance registers monitor the prevalence of cerebral palsy and the severity of resulting impairments across time and place. The motor disorders of cerebral palsy can affect children's speech production and limit their intelligibility. We describe the development of a scale to classify children's speech performance for use in cerebral palsy surveillance registers, and its reliability across raters and across time. Speech and language therapists, other healthcare professionals and parents classified the speech of 139 children with cerebral palsy (85 boys, 54 girls; mean age 6.03 years, SD 1.09) from observation and previous knowledge of the children. Another group of health professionals rated children's speech from information in their medical notes. With the exception of parents, raters reclassified children's speech at least four weeks after their initial classification. Raters were asked to rate how easy the scale was to use and how well the scale described the child's speech production using Likert scales. Inter-rater reliability was moderate to substantial (k>.58 for all comparisons). Test-retest reliability was substantial to almost perfect for all groups (k>.68). Over 74% of raters found the scale easy or very easy to use; 66% of parents and over 70% of health care professionals judged the scale to describe children's speech well or very well. We conclude that the Viking Speech Scale is a reliable tool to describe the speech performance of children with cerebral palsy, which can be applied through direct observation of children or through case note review. Copyright © 2013 Elsevier Ltd. All rights reserved.
A Robust Approach For Acoustic Noise Suppression In Speech Using ANFIS

Science.gov (United States)

Martinek, Radek; Kelnar, Michal; Vanus, Jan; Bilik, Petr; Zidek, Jan

2015-11-01

The authors of this article deals with the implementation of a combination of techniques of the fuzzy system and artificial intelligence in the application area of non-linear noise and interference suppression. This structure used is called an Adaptive Neuro Fuzzy Inference System (ANFIS). This system finds practical use mainly in audio telephone (mobile) communication in a noisy environment (transport, production halls, sports matches, etc). Experimental methods based on the two-input adaptive noise cancellation concept was clearly outlined. Within the experiments carried out, the authors created, based on the ANFIS structure, a comprehensive system for adaptive suppression of unwanted background interference that occurs in audio communication and degrades the audio signal. The system designed has been tested on real voice signals. This article presents the investigation and comparison amongst three distinct approaches to noise cancellation in speech; they are LMS (least mean squares) and RLS (recursive least squares) adaptive filtering and ANFIS. A careful review of literatures indicated the importance of non-linear adaptive algorithms over linear ones in noise cancellation. It was concluded that the ANFIS approach had the overall best performance as it efficiently cancelled noise even in highly noise-degraded speech. Results were drawn from the successful experimentation, subjective-based tests were used to analyse their comparative performance while objective tests were used to validate them. Implementation of algorithms was experimentally carried out in Matlab to justify the claims and determine their relative performances.
Visual Context Enhanced: The Joint Contribution of Iconic Gestures and Visible Speech to Degraded Speech Comprehension

Science.gov (United States)

Drijvers, Linda; Ozyurek, Asli

2017-01-01

Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. Method:…
Impaired Feedforward Control and Enhanced Feedback Control of Speech in Patients with Cerebellar Degeneration.

Science.gov (United States)

Parrell, Benjamin; Agnew, Zarinah; Nagarajan, Srikantan; Houde, John; Ivry, Richard B

2017-09-20

The cerebellum has been hypothesized to form a crucial part of the speech motor control network. Evidence for this comes from patients with cerebellar damage, who exhibit a variety of speech deficits, as well as imaging studies showing cerebellar activation during speech production in healthy individuals. To date, the precise role of the cerebellum in speech motor control remains unclear, as it has been implicated in both anticipatory (feedforward) and reactive (feedback) control. Here, we assess both anticipatory and reactive aspects of speech motor control, comparing the performance of patients with cerebellar degeneration and matched controls. Experiment 1 tested feedforward control by examining speech adaptation across trials in response to a consistent perturbation of auditory feedback. Experiment 2 tested feedback control, examining online corrections in response to inconsistent perturbations of auditory feedback. Both male and female patients and controls were tested. The patients were impaired in adapting their feedforward control system relative to controls, exhibiting an attenuated anticipatory response to the perturbation. In contrast, the patients produced even larger compensatory responses than controls, suggesting an increased reliance on sensory feedback to guide speech articulation in this population. Together, these results suggest that the cerebellum is crucial for maintaining accurate feedforward control of speech, but relatively uninvolved in feedback control. SIGNIFICANCE STATEMENT Speech motor control is a complex activity that is thought to rely on both predictive, feedforward control as well as reactive, feedback control. While the cerebellum has been shown to be part of the speech motor control network, its functional contribution to feedback and feedforward control remains controversial. Here, we use real-time auditory perturbations of speech to show that patients with cerebellar degeneration are impaired in adapting feedforward control of
Predicting non-linear dynamics by stable local learning in a recurrent spiking neural network.

Science.gov (United States)

Gilra, Aditya; Gerstner, Wulfram

2017-11-27

The brain needs to predict how the body reacts to motor commands, but how a network of spiking neurons can learn non-linear body dynamics using local, online and stable learning rules is unclear. Here, we present a supervised learning scheme for the feedforward and recurrent connections in a network of heterogeneous spiking neurons. The error in the output is fed back through fixed random connections with a negative gain, causing the network to follow the desired dynamics. The rule for Feedback-based Online Local Learning Of Weights (FOLLOW) is local in the sense that weight changes depend on the presynaptic activity and the error signal projected onto the postsynaptic neuron. We provide examples of learning linear, non-linear and chaotic dynamics, as well as the dynamics of a two-link arm. Under reasonable approximations, we show, using the Lyapunov method, that FOLLOW learning is uniformly stable, with the error going to zero asymptotically.

Applying linear discriminant analysis to predict groundwater redox conditions conducive to denitrification

Science.gov (United States)

Wilson, S. R.; Close, M. E.; Abraham, P.

2018-01-01

Diffuse nitrate losses from agricultural land pollute groundwater resources worldwide, but can be attenuated under reducing subsurface conditions. In New Zealand, the ability to predict where groundwater denitrification occurs is important for understanding the linkage between land use and discharges of nitrate-bearing groundwater to streams. This study assesses the application of linear discriminant analysis (LDA) for predicting groundwater redox status for Southland, a major dairy farming region in New Zealand. Data cases were developed by assigning a redox status to samples derived from a regional groundwater quality database. Pre-existing regional-scale geospatial databases were used as training variables for the discriminant functions. The predictive accuracy of the discriminant functions was slightly improved by optimising the thresholds between sample depth classes. The models predict 23% of the region as being reducing at shallow depths (water table, and low-permeability clastic sediments. The coastal plains are an area of widespread groundwater discharge, and the soil and hydrology characteristics require the land to be artificially drained to render the land suitable for farming. For the improvement of water quality in coastal areas, it is therefore important that land and water management efforts focus on understanding hydrological bypassing that may occur via artificial drainage systems.
The Relationship Between Speech, Language, and Phonological Awareness in Preschool-Age Children With Developmental Disabilities.

Science.gov (United States)

Barton-Hulsey, Andrea; Sevcik, Rose A; Romski, MaryAnn

2018-05-03

A number of intrinsic factors, including expressive speech skills, have been suggested to place children with developmental disabilities at risk for limited development of reading skills. This study examines the relationship between these factors, speech ability, and children's phonological awareness skills. A nonexperimental study design was used to examine the relationship between intrinsic skills of speech, language, print, and letter-sound knowledge to phonological awareness in 42 children with developmental disabilities between the ages of 48 and 69 months. Hierarchical multiple regression was done to determine if speech ability accounted for a unique amount of variance in phonological awareness skill beyond what would be expected by developmental skills inclusive of receptive language and print and letter-sound knowledge. A range of skill in all areas of direct assessment was found. Children with limited speech were found to have emerging skills in print knowledge, letter-sound knowledge, and phonological awareness. Speech ability did not predict a significant amount of variance in phonological awareness beyond what would be expected by developmental skills of receptive language and print and letter-sound knowledge. Children with limited speech ability were found to have receptive language and letter-sound knowledge that supported the development of phonological awareness skills. This study provides implications for practitioners and researchers concerning the factors related to early reading development in children with limited speech ability and developmental disabilities.
Speech enhancement using emotion dependent codebooks

NARCIS (Netherlands)

Naidu, D.H.R.; Srinivasan, S.

2012-01-01

Several speech enhancement approaches utilize trained models of clean speech data, such as codebooks, Gaussian mixtures, and hidden Markov models. These models are typically trained on neutral clean speech data, without any emotion. However, in practical scenarios, emotional speech is a common
Neurophysiological Evidence That Musical Training Influences the Recruitment of Right Hemispheric Homologues for Speech Perception

Directory of Open Access Journals (Sweden)

McNeel Gordon Jantzen

2014-03-01

Full Text Available Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Kraus & Chandrasekaran, 2010; Parbery-Clark, Skoe, & Kraus, 2009; Zendel & Alain, 2008; Musacchia, Sams, Skoe, & Kraus, 2007. Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus (MTG and superior temporal gyrus (STG in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain.
Evaluation of speech transmission in open public spaces affected by combined noises.

Science.gov (United States)

Lee, Pyoung Jik; Jeon, Jin Yong

2011-07-01

In the present study, the effects of interference from combined noises on speech transmission were investigated in a simulated open public space. Sound fields for dominant noises were predicted using a typical urban square model surrounded by buildings. Then road traffic noise and two types of construction noises, corresponding to stationary and impulsive noises, were selected as background noises. Listening tests were performed on a group of adults, and the quality of speech transmission was evaluated using listening difficulty as well as intelligibility scores. During the listening tests, two factors that affect speech transmission performance were considered: (1) temporal characteristics of construction noise (stationary or impulsive) and (2) the levels of the construction and road traffic noises. The results indicated that word intelligibility scores and listening difficulty ratings were affected by the temporal characteristics of construction noise due to fluctuations in the background noise level. It was also observed that listening difficulty is unable to describe the speech transmission in noisy open public spaces showing larger variation than did word intelligibility scores. © 2011 Acoustical Society of America
Linguistic contributions to speech-on-speech masking for native and non-native listeners: Language familiarity and semantic content

Science.gov (United States)

Brouwer, Susanne; Van Engen, Kristin J.; Calandruccio, Lauren; Bradlow, Ann R.

2012-01-01

This study examined whether speech-on-speech masking is sensitive to variation in the degree of similarity between the target and the masker speech. Three experiments investigated whether speech-in-speech recognition varies across different background speech languages (English vs Dutch) for both English and Dutch targets, as well as across variation in the semantic content of the background speech (meaningful vs semantically anomalous sentences), and across variation in listener status vis-à-vis the target and masker languages (native, non-native, or unfamiliar). The results showed that the more similar the target speech is to the masker speech (e.g., same vs different language, same vs different levels of semantic content), the greater the interference on speech recognition accuracy. Moreover, the listener’s knowledge of the target and the background language modulate the size of the release from masking. These factors had an especially strong effect on masking effectiveness in highly unfavorable listening conditions. Overall this research provided evidence that that the degree of target-masker similarity plays a significant role in speech-in-speech recognition. The results also give insight into how listeners assign their resources differently depending on whether they are listening to their first or second language. PMID:22352516
Speech-specificity of two audiovisual integration effects

DEFF Research Database (Denmark)

Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

2010-01-01

Seeing the talker’s articulatory mouth movements can influence the auditory speech percept both in speech identification and detection tasks. Here we show that these audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers...... often fail to perceive as speech. While audiovisual integration in the identification task only occurred when observers were informed of the speech-like nature of SWS, integration occurred in the detection task both for informed and naïve observers. This shows that both speech-specific and general...... mechanisms underlie audiovisual integration of speech....
Recognizing speech in a novel accent: the motor theory of speech perception reframed.

Science.gov (United States)

Moulin-Frier, Clément; Arbib, Michael A

2013-08-01

The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory.
Advocate: A Distributed Architecture for Speech-to-Speech Translation

Science.gov (United States)

2009-01-01

tecture, are either wrapped natural-language processing ( NLP ) components or objects developed from scratch using the architecture’s API. GATE is...framework, we put together a demonstration Arabic -to- English speech translation system using both internally developed ( Arabic speech recognition and MT...conditions of our Arabic S2S demonstration system described earlier. Once again, the data size was varied and eighty identical requests were
Considering linear generator copper losses on model predictive control for a point absorber wave energy converter

International Nuclear Information System (INIS)

Montoya Andrade, Dan-El; Villa Jaén, Antonio de la; García Santana, Agustín

2014-01-01

Highlights: • We considered the linear generator copper losses in the proposed MPC strategy. • We maximized the power transferred to the generator side power converter. • The proposed MPC increases the useful average power injected into the grid. • The stress level of the PTO system can be reduced by the proposed MPC. - Abstract: The amount of energy that a wave energy converter can extract depends strongly on the control strategy applied to the power take-off system. It is well known that, ideally, the reactive control allows for maximum energy extraction from waves. However, the reactive control is intrinsically noncausal in practice and requires some kind of causal approach to be applied. Moreover, this strategy does not consider physical constraints and this could be a problem because the system could achieve unacceptable dynamic values. These, and other control techniques have focused on the wave energy extraction problem in order to maximize the energy absorbed by the power take-off device without considering the possible losses in intermediate devices. In this sense, a reactive control that considers the linear generator copper losses has been recently proposed to increase the useful power injected into the grid. Among the control techniques that have emerged recently, the model predictive control represents a promising strategy. This approach performs an optimization process on a time prediction horizon incorporating dynamic constraints associated with the physical features of the power take-off system. This paper proposes a model predictive control technique that considers the copper losses in the control optimization process of point absorbers with direct drive linear generators. This proposal makes the most of reactive control as it considers the copper losses, and it makes the most of the model predictive control, as it considers the system constraints. This means that the useful power transferred from the linear generator to the power
Linear Model-Based Predictive Control of the LHC 1.8 K Cryogenic Loop

CERN Document Server

Blanco-Viñuela, E; De Prada-Moraga, C

1999-01-01

The LHC accelerator will employ 1800 superconducting magnets (for guidance and focusing of the particle beams) in a pressurized superfluid helium bath at 1.9 K. This temperature is a severely constrained control parameter in order to avoid the transition from the superconducting to the normal state. Cryogenic processes are difficult to regulate due to their highly non-linear physical parameters (heat capacity, thermal conductance, etc.) and undesirable peculiarities like non self-regulating process, inverse response and variable dead time. To reduce the requirements on either temperature sensor or cryogenic system performance, various control strategies have been investigated on a reduced-scale LHC prototype built at CERN (String Test). Model Based Predictive Control (MBPC) is a regulation algorithm based on the explicit use of a process model to forecast the plant output over a certain prediction horizon. This predicted controlled variable is used in an on-line optimization procedure that minimizes an approp...
EXTENDED SPEECH EMOTION RECOGNITION AND PREDICTION

Directory of Open Access Journals (Sweden)

Theodoros Anagnostopoulos

2014-11-01

Full Text Available Humans are considered to reason and act rationally and that is believed to be their fundamental difference from the rest of the living entities. Furthermore, modern approaches in the science of psychology underline that humans as a thinking creatures are also sentimental and emotional organisms. There are fifteen universal extended emotions plus neutral emotion: hot anger, cold anger, panic, fear, anxiety, despair, sadness, elation, happiness, interest, boredom, shame, pride, disgust, contempt and neutral position. The scope of the current research is to understand the emotional state of a human being by capturing the speech utterances that one uses during a common conversation. It is proved that having enough acoustic evidence available the emotional state of a person can be classified by a set of majority voting classifiers. The proposed set of classifiers is based on three main classifiers: kNN, C4.5 and SVM RBF Kernel. This set achieves better performance than each basic classifier taken separately. It is compared with two other sets of classifiers: one-against-all (OAA multiclass SVM with Hybrid kernels and the set of classifiers which consists of the following two basic classifiers: C5.0 and Neural Network. The proposed variant achieves better performance than the other two sets of classifiers. The paper deals with emotion classification by a set of majority voting classifiers that combines three certain types of basic classifiers with low computational complexity. The basic classifiers stem from different theoretical background in order to avoid bias and redundancy which gives the proposed set of classifiers the ability to generalize in the emotion domain space.
Speech Planning Happens before Speech Execution: Online Reaction Time Methods in the Study of Apraxia of Speech

Science.gov (United States)

Maas, Edwin; Mailend, Marja-Liisa

2012-01-01

Purpose: The purpose of this article is to present an argument for the use of online reaction time (RT) methods to the study of apraxia of speech (AOS) and to review the existing small literature in this area and the contributions it has made to our fundamental understanding of speech planning (deficits) in AOS. Method: Following a brief…
U.S. Army Armament Research, Development and Engineering Center Grain Evaluation Software to Numerically Predict Linear Burn Regression for Solid Propellant Grain Geometries

Science.gov (United States)

2017-10-01

ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID PROPELLANT GRAIN GEOMETRIES Brian...distribution is unlimited. AD U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT AND ENGINEERING CENTER Munitions Engineering Technology Center Picatinny...U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT AND ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID
Predicting respiratory motion signals for image-guided radiotherapy using multi-step linear methods (MULIN)

International Nuclear Information System (INIS)

Ernst, Floris; Schweikard, Achim

2008-01-01

Forecasting of respiration motion in image-guided radiotherapy requires algorithms that can accurately and efficiently predict target location. Improved methods for respiratory motion forecasting were developed and tested. MULIN, a new family of prediction algorithms based on linear expansions of the prediction error, was developed and tested. Computer-generated data with a prediction horizon of 150 ms was used for testing in simulation experiments. MULIN was compared to Least Mean Squares-based predictors (LMS; normalized LMS, nLMS; wavelet-based multiscale autoregression, wLMS) and a multi-frequency Extended Kalman Filter (EKF) approach. The in vivo performance of the algorithms was tested on data sets of patients who underwent radiotherapy. The new MULIN methods are highly competitive, outperforming the LMS and the EKF prediction algorithms in real-world settings and performing similarly to optimized nLMS and wLMS prediction algorithms. On simulated, periodic data the MULIN algorithms are outperformed only by the EKF approach due to its inherent advantage in predicting periodic signals. In the presence of noise, the MULIN methods significantly outperform all other algorithms. The MULIN family of algorithms is a feasible tool for the prediction of respiratory motion, performing as well as or better than conventional algorithms while requiring significantly lower computational complexity. The MULIN algorithms are of special importance wherever high-speed prediction is required. (orig.)
Predicting respiratory motion signals for image-guided radiotherapy using multi-step linear methods (MULIN)

Energy Technology Data Exchange (ETDEWEB)

Ernst, Floris; Schweikard, Achim [University of Luebeck, Institute for Robotics and Cognitive Systems, Luebeck (Germany)

2008-06-15

Forecasting of respiration motion in image-guided radiotherapy requires algorithms that can accurately and efficiently predict target location. Improved methods for respiratory motion forecasting were developed and tested. MULIN, a new family of prediction algorithms based on linear expansions of the prediction error, was developed and tested. Computer-generated data with a prediction horizon of 150 ms was used for testing in simulation experiments. MULIN was compared to Least Mean Squares-based predictors (LMS; normalized LMS, nLMS; wavelet-based multiscale autoregression, wLMS) and a multi-frequency Extended Kalman Filter (EKF) approach. The in vivo performance of the algorithms was tested on data sets of patients who underwent radiotherapy. The new MULIN methods are highly competitive, outperforming the LMS and the EKF prediction algorithms in real-world settings and performing similarly to optimized nLMS and wLMS prediction algorithms. On simulated, periodic data the MULIN algorithms are outperformed only by the EKF approach due to its inherent advantage in predicting periodic signals. In the presence of noise, the MULIN methods significantly outperform all other algorithms. The MULIN family of algorithms is a feasible tool for the prediction of respiratory motion, performing as well as or better than conventional algorithms while requiring significantly lower computational complexity. The MULIN algorithms are of special importance wherever high-speed prediction is required. (orig.)
Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

Energy Technology Data Exchange (ETDEWEB)

Hogden, J.

1996-11-05

The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation may decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.
Subcortical processing of speech regularities underlies reading and music aptitude in children

Science.gov (United States)

2011-01-01

Background Neural sensitivity to acoustic regularities supports fundamental human behaviors such as hearing in noise and reading. Although the failure to encode acoustic regularities in ongoing speech has been associated with language and literacy deficits, how auditory expertise, such as the expertise that is associated with musical skill, relates to the brainstem processing of speech regularities is unknown. An association between musical skill and neural sensitivity to acoustic regularities would not be surprising given the importance of repetition and regularity in music. Here, we aimed to define relationships between the subcortical processing of speech regularities, music aptitude, and reading abilities in children with and without reading impairment. We hypothesized that, in combination with auditory cognitive abilities, neural sensitivity to regularities in ongoing speech provides a common biological mechanism underlying the development of music and reading abilities. Methods We assessed auditory working memory and attention, music aptitude, reading ability, and neural sensitivity to acoustic regularities in 42 school-aged children with a wide range of reading ability. Neural sensitivity to acoustic regularities was assessed by recording brainstem responses to the same speech sound presented in predictable and variable speech streams. Results Through correlation analyses and structural equation modeling, we reveal that music aptitude and literacy both relate to the extent of subcortical adaptation to regularities in ongoing speech as well as with auditory working memory and attention. Relationships between music and speech processing are specifically driven by performance on a musical rhythm task, underscoring the importance of rhythmic regularity for both language and music. Conclusions These data indicate common brain mechanisms underlying reading and music abilities that relate to how the nervous system responds to regularities in auditory input
Subcortical processing of speech regularities underlies reading and music aptitude in children.

Science.gov (United States)

Strait, Dana L; Hornickel, Jane; Kraus, Nina

2011-10-17

Neural sensitivity to acoustic regularities supports fundamental human behaviors such as hearing in noise and reading. Although the failure to encode acoustic regularities in ongoing speech has been associated with language and literacy deficits, how auditory expertise, such as the expertise that is associated with musical skill, relates to the brainstem processing of speech regularities is unknown. An association between musical skill and neural sensitivity to acoustic regularities would not be surprising given the importance of repetition and regularity in music. Here, we aimed to define relationships between the subcortical processing of speech regularities, music aptitude, and reading abilities in children with and without reading impairment. We hypothesized that, in combination with auditory cognitive abilities, neural sensitivity to regularities in ongoing speech provides a common biological mechanism underlying the development of music and reading abilities. We assessed auditory working memory and attention, music aptitude, reading ability, and neural sensitivity to acoustic regularities in 42 school-aged children with a wide range of reading ability. Neural sensitivity to acoustic regularities was assessed by recording brainstem responses to the same speech sound presented in predictable and variable speech streams. Through correlation analyses and structural equation modeling, we reveal that music aptitude and literacy both relate to the extent of subcortical adaptation to regularities in ongoing speech as well as with auditory working memory and attention. Relationships between music and speech processing are specifically driven by performance on a musical rhythm task, underscoring the importance of rhythmic regularity for both language and music. These data indicate common brain mechanisms underlying reading and music abilities that relate to how the nervous system responds to regularities in auditory input. Definition of common biological underpinnings
Subcortical processing of speech regularities underlies reading and music aptitude in children

Directory of Open Access Journals (Sweden)

Strait Dana L

2011-10-01

Full Text Available Abstract Background Neural sensitivity to acoustic regularities supports fundamental human behaviors such as hearing in noise and reading. Although the failure to encode acoustic regularities in ongoing speech has been associated with language and literacy deficits, how auditory expertise, such as the expertise that is associated with musical skill, relates to the brainstem processing of speech regularities is unknown. An association between musical skill and neural sensitivity to acoustic regularities would not be surprising given the importance of repetition and regularity in music. Here, we aimed to define relationships between the subcortical processing of speech regularities, music aptitude, and reading abilities in children with and without reading impairment. We hypothesized that, in combination with auditory cognitive abilities, neural sensitivity to regularities in ongoing speech provides a common biological mechanism underlying the development of music and reading abilities. Methods We assessed auditory working memory and attention, music aptitude, reading ability, and neural sensitivity to acoustic regularities in 42 school-aged children with a wide range of reading ability. Neural sensitivity to acoustic regularities was assessed by recording brainstem responses to the same speech sound presented in predictable and variable speech streams. Results Through correlation analyses and structural equation modeling, we reveal that music aptitude and literacy both relate to the extent of subcortical adaptation to regularities in ongoing speech as well as with auditory working memory and attention. Relationships between music and speech processing are specifically driven by performance on a musical rhythm task, underscoring the importance of rhythmic regularity for both language and music. Conclusions These data indicate common brain mechanisms underlying reading and music abilities that relate to how the nervous system responds to

Some links on this page may take you to non-federal websites. Their policies may differ from this site.