WorldWideScience

Sample records for speech neural origins

  1. Neural overlap in processing music and speech

    Science.gov (United States)

    Peretz, Isabelle; Vuvan, Dominique; Lagrois, Marie-Élaine; Armony, Jorge L.

    2015-01-01

    Neural overlap in processing music and speech, as measured by the co-activation of brain regions in neuroimaging studies, may suggest that parts of the neural circuitries established for language may have been recycled during evolution for musicality, or vice versa that musicality served as a springboard for language emergence. Such a perspective has important implications for several topics of general interest besides evolutionary origins. For instance, neural overlap is an important premise for the possibility of music training to influence language acquisition and literacy. However, neural overlap in processing music and speech does not entail sharing neural circuitries. Neural separability between music and speech may occur in overlapping brain regions. In this paper, we review the evidence and outline the issues faced in interpreting such neural data, and argue that converging evidence from several methodologies is needed before neural overlap is taken as evidence of sharing. PMID:25646513

  2. Neural overlap in processing music and speech.

    Science.gov (United States)

    Peretz, Isabelle; Vuvan, Dominique; Lagrois, Marie-Élaine; Armony, Jorge L

    2015-03-19

    Neural overlap in processing music and speech, as measured by the co-activation of brain regions in neuroimaging studies, may suggest that parts of the neural circuitries established for language may have been recycled during evolution for musicality, or vice versa that musicality served as a springboard for language emergence. Such a perspective has important implications for several topics of general interest besides evolutionary origins. For instance, neural overlap is an important premise for the possibility of music training to influence language acquisition and literacy. However, neural overlap in processing music and speech does not entail sharing neural circuitries. Neural separability between music and speech may occur in overlapping brain regions. In this paper, we review the evidence and outline the issues faced in interpreting such neural data, and argue that converging evidence from several methodologies is needed before neural overlap is taken as evidence of sharing. © 2015 The Author(s) Published by the Royal Society. All rights reserved.

  3. Neural Entrainment to Speech Modulates Speech Intelligibility

    NARCIS (Netherlands)

    Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne

    2018-01-01

    Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and

  4. Tuning Neural Phase Entrainment to Speech.

    Science.gov (United States)

    Falk, Simone; Lanzilotti, Cosima; Schön, Daniele

    2017-08-01

    Musical rhythm positively impacts on subsequent speech processing. However, the neural mechanisms underlying this phenomenon are so far unclear. We investigated whether carryover effects from a preceding musical cue to a speech stimulus result from a continuation of neural phase entrainment to periodicities that are present in both music and speech. Participants listened and memorized French metrical sentences that contained (quasi-)periodic recurrences of accents and syllables. Speech stimuli were preceded by a rhythmically regular or irregular musical cue. Our results show that the presence of a regular cue modulates neural response as estimated by EEG power spectral density, intertrial coherence, and source analyses at critical frequencies during speech processing compared with the irregular condition. Importantly, intertrial coherences for regular cues were indicative of the participants' success in memorizing the subsequent speech stimuli. These findings underscore the highly adaptive nature of neural phase entrainment across fundamentally different auditory stimuli. They also support current models of neural phase entrainment as a tool of predictive timing and attentional selection across cognitive domains.

  5. Automatic Speech Recognition from Neural Signals: A Focused Review

    Directory of Open Access Journals (Sweden)

    Christian Herff

    2016-09-01

    Full Text Available Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e.~patients suffering from locked-in syndrome. For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people.This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography. As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the emph{Brain-to-text} system.

  6. Hidden neural networks: application to speech recognition

    DEFF Research Database (Denmark)

    Riis, Søren Kamaric

    1998-01-01

    We evaluate the hidden neural network HMM/NN hybrid on two speech recognition benchmark tasks; (1) task independent isolated word recognition on the Phonebook database, and (2) recognition of broad phoneme classes in continuous speech from the TIMIT database. It is shown how hidden neural networks...

  7. Common neural substrates support speech and non-speech vocal tract gestures

    OpenAIRE

    Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M.J.; Poletto, Christopher J.; Ludlow, Christy L.

    2009-01-01

    The issue of whether speech is supported by the same neural substrates as non-speech vocal-tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, were compared to the production of speech sylla...

  8. Low-dimensional recurrent neural network-based Kalman filter for speech enhancement.

    Science.gov (United States)

    Xia, Youshen; Wang, Jun

    2015-07-01

    This paper proposes a new recurrent neural network-based Kalman filter for speech enhancement, based on a noise-constrained least squares estimate. The parameters of speech signal modeled as autoregressive process are first estimated by using the proposed recurrent neural network and the speech signal is then recovered from Kalman filtering. The proposed recurrent neural network is globally asymptomatically stable to the noise-constrained estimate. Because the noise-constrained estimate has a robust performance against non-Gaussian noise, the proposed recurrent neural network-based speech enhancement algorithm can minimize the estimation error of Kalman filter parameters in non-Gaussian noise. Furthermore, having a low-dimensional model feature, the proposed neural network-based speech enhancement algorithm has a much faster speed than two existing recurrent neural networks-based speech enhancement algorithms. Simulation results show that the proposed recurrent neural network-based speech enhancement algorithm can produce a good performance with fast computation and noise reduction. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Learning-induced neural plasticity of speech processing before birth.

    Science.gov (United States)

    Partanen, Eino; Kujala, Teija; Näätänen, Risto; Liitola, Auli; Sambeth, Anke; Huotilainen, Minna

    2013-09-10

    Learning, the foundation of adaptive and intelligent behavior, is based on plastic changes in neural assemblies, reflected by the modulation of electric brain responses. In infancy, auditory learning implicates the formation and strengthening of neural long-term memory traces, improving discrimination skills, in particular those forming the prerequisites for speech perception and understanding. Although previous behavioral observations show that newborns react differentially to unfamiliar sounds vs. familiar sound material that they were exposed to as fetuses, the neural basis of fetal learning has not thus far been investigated. Here we demonstrate direct neural correlates of human fetal learning of speech-like auditory stimuli. We presented variants of words to fetuses; unlike infants with no exposure to these stimuli, the exposed fetuses showed enhanced brain activity (mismatch responses) in response to pitch changes for the trained variants after birth. Furthermore, a significant correlation existed between the amount of prenatal exposure and brain activity, with greater activity being associated with a higher amount of prenatal speech exposure. Moreover, the learning effect was generalized to other types of similar speech sounds not included in the training material. Consequently, our results indicate neural commitment specifically tuned to the speech features heard before birth and their memory representations.

  10. Engaged listeners: shared neural processing of powerful political speeches.

    Science.gov (United States)

    Schmälzle, Ralf; Häcker, Frank E K; Honey, Christopher J; Hasson, Uri

    2015-08-01

    Powerful speeches can captivate audiences, whereas weaker speeches fail to engage their listeners. What is happening in the brains of a captivated audience? Here, we assess audience-wide functional brain dynamics during listening to speeches of varying rhetorical quality. The speeches were given by German politicians and evaluated as rhetorically powerful or weak. Listening to each of the speeches induced similar neural response time courses, as measured by inter-subject correlation analysis, in widespread brain regions involved in spoken language processing. Crucially, alignment of the time course across listeners was stronger for rhetorically powerful speeches, especially for bilateral regions of the superior temporal gyri and medial prefrontal cortex. Thus, during powerful speeches, listeners as a group are more coupled to each other, suggesting that powerful speeches are more potent in taking control of the listeners' brain responses. Weaker speeches were processed more heterogeneously, although they still prompted substantially correlated responses. These patterns of coupled neural responses bear resemblance to metaphors of resonance, which are often invoked in discussions of speech impact, and contribute to the literature on auditory attention under natural circumstances. Overall, this approach opens up possibilities for research on the neural mechanisms mediating the reception of entertaining or persuasive messages. © The Author (2015). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  11. Common neural substrates support speech and non-speech vocal tract gestures.

    Science.gov (United States)

    Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M J; Poletto, Christopher J; Ludlow, Christy L

    2009-08-01

    The issue of whether speech is supported by the same neural substrates as non-speech vocal tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, was compared to the production of speech syllables without meaning. Brain activation related to overt production was captured with BOLD fMRI using a sparse sampling design for both conditions. Speech and non-speech were compared using voxel-wise whole brain analyses, and ROI analyses focused on frontal and temporoparietal structures previously reported to support speech production. Results showed substantial activation overlap between speech and non-speech function in regions. Although non-speech gesture production showed greater extent and amplitude of activation in the regions examined, both speech and non-speech showed comparable left laterality in activation for both target perception and production. These findings posit a more general role of the previously proposed "auditory dorsal stream" in the left hemisphere--to support the production of vocal tract gestures that are not limited to speech processing.

  12. Deep Recurrent Convolutional Neural Network: Improving Performance For Speech Recognition

    OpenAIRE

    Zhang, Zewang; Sun, Zheng; Liu, Jiaqi; Chen, Jingwen; Huo, Zhao; Zhang, Xiao

    2016-01-01

    A deep learning approach has been widely applied in sequence modeling problems. In terms of automatic speech recognition (ASR), its performance has significantly been improved by increasing large speech corpus and deeper neural network. Especially, recurrent neural network and deep convolutional neural network have been applied in ASR successfully. Given the arising problem of training speed, we build a novel deep recurrent convolutional network for acoustic modeling and then apply deep resid...

  13. High school music classes enhance the neural processing of speech.

    Science.gov (United States)

    Tierney, Adam; Krizman, Jennifer; Skoe, Erika; Johnston, Kathleen; Kraus, Nina

    2013-01-01

    Should music be a priority in public education? One argument for teaching music in school is that private music instruction relates to enhanced language abilities and neural function. However, the directionality of this relationship is unclear and it is unknown whether school-based music training can produce these enhancements. Here we show that 2 years of group music classes in high school enhance the neural encoding of speech. To tease apart the relationships between music and neural function, we tested high school students participating in either music or fitness-based training. These groups were matched at the onset of training on neural timing, reading ability, and IQ. Auditory brainstem responses were collected to a synthesized speech sound presented in background noise. After 2 years of training, the neural responses of the music training group were earlier than at pre-training, while the neural timing of students in the fitness training group was unchanged. These results represent the strongest evidence to date that in-school music education can cause enhanced speech encoding. The neural benefits of musical training are, therefore, not limited to expensive private instruction early in childhood but can be elicited by cost-effective group instruction during adolescence.

  14. High school music classes enhance the neural processing of speech

    Directory of Open Access Journals (Sweden)

    Adam eTierney

    2013-12-01

    Full Text Available Should music be a priority in public education? One argument for teaching music in school is that private music instruction relates to enhanced language abilities and neural function. However, the directionality of this relationship is unclear and it is unknown whether school-based music training can produce these enhancements. Here we show that two years of group music classes in high school enhance the subcortical encoding of speech. To tease apart the relationships between music and neural function, we tested high school students participating in either music or fitness-based training. These groups were matched at the onset of training on neural timing, reading ability, and IQ. Auditory brainstem responses were collected to a synthesized speech sound presented in background noise. After 2 years of training, the subcortical responses of the music training group were earlier than at pretraining, while the neural timing of students in the fitness training group was unchanged. These results represent the strongest evidence to date that in-school music education can cause enhanced speech encoding. The neural benefits of musical training are, therefore, not limited to expensive private instruction early in childhood but can be elicited by cost-effective group instruction during adolescence.

  15. Neural encoding of the speech envelope by children with developmental dyslexia.

    Science.gov (United States)

    Power, Alan J; Colling, Lincoln J; Mead, Natasha; Barnes, Lisa; Goswami, Usha

    2016-09-01

    Developmental dyslexia is consistently associated with difficulties in processing phonology (linguistic sound structure) across languages. One view is that dyslexia is characterised by a cognitive impairment in the "phonological representation" of word forms, which arises long before the child presents with a reading problem. Here we investigate a possible neural basis for developmental phonological impairments. We assess the neural quality of speech encoding in children with dyslexia by measuring the accuracy of low-frequency speech envelope encoding using EEG. We tested children with dyslexia and chronological age-matched (CA) and reading-level matched (RL) younger children. Participants listened to semantically-unpredictable sentences in a word report task. The sentences were noise-vocoded to increase reliance on envelope cues. Envelope reconstruction for envelopes between 0 and 10Hz showed that the children with dyslexia had significantly poorer speech encoding in the 0-2Hz band compared to both CA and RL controls. These data suggest that impaired neural encoding of low frequency speech envelopes, related to speech prosody, may underpin the phonological deficit that causes dyslexia across languages. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  16. Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity

    Science.gov (United States)

    Moses, David A.; Mesgarani, Nima; Leonard, Matthew K.; Chang, Edward F.

    2016-10-01

    Objective. The superior temporal gyrus (STG) and neighboring brain regions play a key role in human language processing. Previous studies have attempted to reconstruct speech information from brain activity in the STG, but few of them incorporate the probabilistic framework and engineering methodology used in modern speech recognition systems. In this work, we describe the initial efforts toward the design of a neural speech recognition (NSR) system that performs continuous phoneme recognition on English stimuli with arbitrary vocabulary sizes using the high gamma band power of local field potentials in the STG and neighboring cortical areas obtained via electrocorticography. Approach. The system implements a Viterbi decoder that incorporates phoneme likelihood estimates from a linear discriminant analysis model and transition probabilities from an n-gram phonemic language model. Grid searches were used in an attempt to determine optimal parameterizations of the feature vectors and Viterbi decoder. Main results. The performance of the system was significantly improved by using spatiotemporal representations of the neural activity (as opposed to purely spatial representations) and by including language modeling and Viterbi decoding in the NSR system. Significance. These results emphasize the importance of modeling the temporal dynamics of neural responses when analyzing their variations with respect to varying stimuli and demonstrate that speech recognition techniques can be successfully leveraged when decoding speech from neural signals. Guided by the results detailed in this work, further development of the NSR system could have applications in the fields of automatic speech recognition and neural prosthetics.

  17. A Neural Network Based Dutch Part of Speech Tagger

    NARCIS (Netherlands)

    Boschman, E.; op den Akker, Hendrikus J.A.; Nijholt, A.; Nijholt, Antinus; Pantic, Maja; Pantic, M.; Poel, M.; Poel, Mannes; Hondorp, G.H.W.

    2008-01-01

    In this paper a Neural Network is designed for Part-of-Speech Tagging of Dutch text. Our approach uses the Corpus Gesproken Nederlands (CGN) consisting of almost 9 million transcribed words of spoken Dutch, divided into 15 different categories. The outcome of the design is a Neural Network with an

  18. Deep neural network and noise classification-based speech enhancement

    Science.gov (United States)

    Shi, Wenhua; Zhang, Xiongwei; Zou, Xia; Han, Wei

    2017-07-01

    In this paper, a speech enhancement method using noise classification and Deep Neural Network (DNN) was proposed. Gaussian mixture model (GMM) was employed to determine the noise type in speech-absent frames. DNN was used to model the relationship between noisy observation and clean speech. Once the noise type was determined, the corresponding DNN model was applied to enhance the noisy speech. GMM was trained with mel-frequency cepstrum coefficients (MFCC) and the parameters were estimated with an iterative expectation-maximization (EM) algorithm. Noise type was updated by spectrum entropy-based voice activity detection (VAD). Experimental results demonstrate that the proposed method could achieve better objective speech quality and smaller distortion under stationary and non-stationary conditions.

  19. Neural entrainment to rhythmically-presented auditory, visual and audio-visual speech in children

    Directory of Open Access Journals (Sweden)

    Alan James Power

    2012-07-01

    Full Text Available Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal ‘samples’ of information from the speech stream at different rates, phase-resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (‘phase locking’. Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase-locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically-developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate based on repetition of the syllable ba, presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a talking head. To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the ba stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a ba in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling

  20. Individual differences in speech-in-noise perception parallel neural speech processing and attention in preschoolers

    Science.gov (United States)

    Thompson, Elaine C.; Carr, Kali Woodruff; White-Schwoch, Travis; Otto-Meyer, Sebastian; Kraus, Nina

    2016-01-01

    From bustling classrooms to unruly lunchrooms, school settings are noisy. To learn effectively in the unwelcome company of numerous distractions, children must clearly perceive speech in noise. In older children and adults, speech-in-noise perception is supported by sensory and cognitive processes, but the correlates underlying this critical listening skill in young children (3–5 year olds) remain undetermined. Employing a longitudinal design (two evaluations separated by ~12 months), we followed a cohort of 59 preschoolers, ages 3.0–4.9, assessing word-in-noise perception, cognitive abilities (intelligence, short-term memory, attention), and neural responses to speech. Results reveal changes in word-in-noise perception parallel changes in processing of the fundamental frequency (F0), an acoustic cue known for playing a role central to speaker identification and auditory scene analysis. Four unique developmental trajectories (speech-in-noise perception groups) confirm this relationship, in that improvements and declines in word-in-noise perception couple with enhancements and diminishments of F0 encoding, respectively. Improvements in word-in-noise perception also pair with gains in attention. Word-in-noise perception does not relate to strength of neural harmonic representation or short-term memory. These findings reinforce previously-reported roles of F0 and attention in hearing speech in noise in older children and adults, and extend this relationship to preschool children. PMID:27864051

  1. The neural processing of foreign-accented speech and its relationship to listener bias

    Directory of Open Access Journals (Sweden)

    Han-Gyol eYi

    2014-10-01

    Full Text Available Foreign-accented speech often presents a challenging listening condition. In addition to deviations from the target speech norms related to the inexperience of the nonnative speaker, listener characteristics may play a role in determining intelligibility levels. We have previously shown that an implicit visual bias for associating East Asian faces and foreignness predicts the listeners’ perceptual ability to process Korean-accented English audiovisual speech (Yi et al., 2013. Here, we examine the neural mechanism underlying the influence of listener bias to foreign faces on speech perception. In a functional magnetic resonance imaging (fMRI study, native English speakers listened to native- and Korean-accented English sentences, with or without faces. The participants’ Asian-foreign association was measured using an implicit association test (IAT, conducted outside the scanner. We found that foreign-accented speech evoked greater activity in the bilateral primary auditory cortices and the inferior frontal gyri, potentially reflecting greater computational demand. Higher IAT scores, indicating greater bias, were associated with increased BOLD response to foreign-accented speech with faces in the primary auditory cortex, the early node for spectrotemporal analysis. We conclude the following: (1 foreign-accented speech perception places greater demand on the neural systems underlying speech perception; (2 face of the talker can exaggerate the perceived foreignness of foreign-accented speech; (3 implicit Asian-foreign association is associated with decreased neural efficiency in early spectrotemporal processing.

  2. Neural Specialization for Speech in the First Months of Life

    Science.gov (United States)

    Shultz, Sarah; Vouloumanos, Athena; Bennett, Randi H.; Pelphrey, Kevin

    2014-01-01

    How does the brain's response to speech change over the first months of life? Although behavioral findings indicate that neonates' listening biases are sharpened over the first months of life, with a species-specific preference for speech emerging by 3 months, the neural substrates underlying this developmental change are unknown. We…

  3. A common functional neural network for overt production of speech and gesture.

    Science.gov (United States)

    Marstaller, L; Burianová, H

    2015-01-22

    The perception of co-speech gestures, i.e., hand movements that co-occur with speech, has been investigated by several studies. The results show that the perception of co-speech gestures engages a core set of frontal, temporal, and parietal areas. However, no study has yet investigated the neural processes underlying the production of co-speech gestures. Specifically, it remains an open question whether Broca's area is central to the coordination of speech and gestures as has been suggested previously. The objective of this study was to use functional magnetic resonance imaging to (i) investigate the regional activations underlying overt production of speech, gestures, and co-speech gestures, and (ii) examine functional connectivity with Broca's area. We hypothesized that co-speech gesture production would activate frontal, temporal, and parietal regions that are similar to areas previously found during co-speech gesture perception and that both speech and gesture as well as co-speech gesture production would engage a neural network connected to Broca's area. Whole-brain analysis confirmed our hypothesis and showed that co-speech gesturing did engage brain areas that form part of networks known to subserve language and gesture. Functional connectivity analysis further revealed a functional network connected to Broca's area that is common to speech, gesture, and co-speech gesture production. This network consists of brain areas that play essential roles in motor control, suggesting that the coordination of speech and gesture is mediated by a shared motor control network. Our findings thus lend support to the idea that speech can influence co-speech gesture production on a motoric level. Copyright © 2014 IBRO. Published by Elsevier Ltd. All rights reserved.

  4. Neural Tuning to Low-Level Features of Speech throughout the Perisylvian Cortex.

    Science.gov (United States)

    Berezutskaya, Julia; Freudenburg, Zachary V; Güçlü, Umut; van Gerven, Marcel A J; Ramsey, Nick F

    2017-08-16

    Despite a large body of research, we continue to lack a detailed account of how auditory processing of continuous speech unfolds in the human brain. Previous research showed the propagation of low-level acoustic features of speech from posterior superior temporal gyrus toward anterior superior temporal gyrus in the human brain (Hullett et al., 2016). In this study, we investigate what happens to these neural representations past the superior temporal gyrus and how they engage higher-level language processing areas such as inferior frontal gyrus. We used low-level sound features to model neural responses to speech outside of the primary auditory cortex. Two complementary imaging techniques were used with human participants (both males and females): electrocorticography (ECoG) and fMRI. Both imaging techniques showed tuning of the perisylvian cortex to low-level speech features. With ECoG, we found evidence of propagation of the temporal features of speech sounds along the ventral pathway of language processing in the brain toward inferior frontal gyrus. Increasingly coarse temporal features of speech spreading from posterior superior temporal cortex toward inferior frontal gyrus were associated with linguistic features such as voice onset time, duration of the formant transitions, and phoneme, syllable, and word boundaries. The present findings provide the groundwork for a comprehensive bottom-up account of speech comprehension in the human brain. SIGNIFICANCE STATEMENT We know that, during natural speech comprehension, a broad network of perisylvian cortical regions is involved in sound and language processing. Here, we investigated the tuning to low-level sound features within these regions using neural responses to a short feature film. We also looked at whether the tuning organization along these brain regions showed any parallel to the hierarchy of language structures in continuous speech. Our results show that low-level speech features propagate throughout the

  5. Neural Spike-Train Analyses of the Speech-Based Envelope Power Spectrum Model

    Science.gov (United States)

    Rallapalli, Varsha H.

    2016-01-01

    Diagnosing and treating hearing impairment is challenging because people with similar degrees of sensorineural hearing loss (SNHL) often have different speech-recognition abilities. The speech-based envelope power spectrum model (sEPSM) has demonstrated that the signal-to-noise ratio (SNRENV) from a modulation filter bank provides a robust speech-intelligibility measure across a wider range of degraded conditions than many long-standing models. In the sEPSM, noise (N) is assumed to: (a) reduce S + N envelope power by filling in dips within clean speech (S) and (b) introduce an envelope noise floor from intrinsic fluctuations in the noise itself. While the promise of SNRENV has been demonstrated for normal-hearing listeners, it has not been thoroughly extended to hearing-impaired listeners because of limited physiological knowledge of how SNHL affects speech-in-noise envelope coding relative to noise alone. Here, envelope coding to speech-in-noise stimuli was quantified from auditory-nerve model spike trains using shuffled correlograms, which were analyzed in the modulation-frequency domain to compute modulation-band estimates of neural SNRENV. Preliminary spike-train analyses show strong similarities to the sEPSM, demonstrating feasibility of neural SNRENV computations. Results suggest that individual differences can occur based on differential degrees of outer- and inner-hair-cell dysfunction in listeners currently diagnosed into the single audiological SNHL category. The predicted acoustic-SNR dependence in individual differences suggests that the SNR-dependent rate of susceptibility could be an important metric in diagnosing individual differences. Future measurements of the neural SNRENV in animal studies with various forms of SNHL will provide valuable insight for understanding individual differences in speech-in-noise intelligibility.

  6. The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility

    DEFF Research Database (Denmark)

    Bentsen, Thomas; May, Tobias; Kressner, Abigail Anne

    2018-01-01

    Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements....... A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech......, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where...

  7. Why would musical training benefit the neural encoding of speech? The OPERA hypothesis.

    Directory of Open Access Journals (Sweden)

    Aniruddh D. Patel

    2011-06-01

    Full Text Available Mounting evidence suggests that musical training benefits the neural encoding of speech. This paper offers a hypothesis specifying why such benefits occur. The OPERA hypothesis proposes that such benefits are driven by adaptive plasticity in speech-processing networks, and that this plasticity occurs when five conditions are met. These are: 1 Overlap: there is anatomical overlap in the brain networks that process an acoustic feature used in both music and speech (e.g., waveform periodicity, amplitude envelope, 2 Precision: music places higher demands on these shared networks than does speech, in terms of the precision of processing, 3 Emotion: the musical activities that engage this network elicit strong positive emotion, 4 Repetition: the musical activities that engage this network are frequently repeated, and 5 Attention: the musical activities that engage this network are associated with focused attention. According to the OPERA hypothesis, when these conditions are met neural plasticity drives the networks in question to function with higher precision than needed for ordinary speech communication. Yet since speech shares these networks with music, speech processing benefits. The OPERA hypothesis is used to account for the observed superior subcortical encoding of speech in musically trained individuals, and to suggest mechanisms by which musical training might improve linguistic reading abilities.

  8. Why would Musical Training Benefit the Neural Encoding of Speech? The OPERA Hypothesis.

    Science.gov (United States)

    Patel, Aniruddh D

    2011-01-01

    Mounting evidence suggests that musical training benefits the neural encoding of speech. This paper offers a hypothesis specifying why such benefits occur. The "OPERA" hypothesis proposes that such benefits are driven by adaptive plasticity in speech-processing networks, and that this plasticity occurs when five conditions are met. These are: (1) Overlap: there is anatomical overlap in the brain networks that process an acoustic feature used in both music and speech (e.g., waveform periodicity, amplitude envelope), (2) Precision: music places higher demands on these shared networks than does speech, in terms of the precision of processing, (3) Emotion: the musical activities that engage this network elicit strong positive emotion, (4) Repetition: the musical activities that engage this network are frequently repeated, and (5) Attention: the musical activities that engage this network are associated with focused attention. According to the OPERA hypothesis, when these conditions are met neural plasticity drives the networks in question to function with higher precision than needed for ordinary speech communication. Yet since speech shares these networks with music, speech processing benefits. The OPERA hypothesis is used to account for the observed superior subcortical encoding of speech in musically trained individuals, and to suggest mechanisms by which musical training might improve linguistic reading abilities.

  9. Divergent neural responses to narrative speech in disorders of consciousness.

    Science.gov (United States)

    Iotzov, Ivan; Fidali, Brian C; Petroni, Agustin; Conte, Mary M; Schiff, Nicholas D; Parra, Lucas C

    2017-11-01

    Clinical assessment of auditory attention in patients with disorders of consciousness is often limited by motor impairment. Here, we employ intersubject correlations among electroencephalography responses to naturalistic speech in order to assay auditory attention among patients and healthy controls. Electroencephalographic data were recorded from 20 subjects with disorders of consciousness and 14 healthy controls during of two narrative audio stimuli, presented both forwards and time-reversed. Intersubject correlation of evoked electroencephalography signals were calculated, comparing responses of both groups to those of the healthy control subjects. This analysis was performed blinded and subsequently compared to the diagnostic status of each patient based on the Coma Recovery Scale-Revised. Subjects with disorders of consciousness exhibit significantly lower intersubject correlation than healthy controls during narrative speech. Additionally, while healthy subjects had higher intersubject correlation values in forwards versus backwards presentation, neural responses did not vary significantly with the direction of playback in subjects with disorders of consciousness. Increased intersubject correlation values in the backward speech condition were noted with improving disorder of consciousness diagnosis, both in cross-sectional analysis and in a subset of patients with longitudinal data. Intersubject correlation of neural responses to narrative speech audition differentiates healthy controls from patients and appears to index clinical diagnoses in disorders of consciousness.

  10. Aging affects hemispheric asymmetry in the neural representation of speech sounds.

    Science.gov (United States)

    Bellis, T J; Nicol, T; Kraus, N

    2000-01-15

    Hemispheric asymmetries in the processing of elemental speech sounds appear to be critical for normal speech perception. This study investigated the effects of age on hemispheric asymmetry observed in the neurophysiological responses to speech stimuli in three groups of normal hearing, right-handed subjects: children (ages, 8-11 years), young adults (ages, 20-25 years), and older adults (ages > 55 years). Peak-to-peak response amplitudes of the auditory cortical P1-N1 complex obtained over right and left temporal lobes were examined to determine the degree of left/right asymmetry in the neurophysiological responses elicited by synthetic speech syllables in each of the three subject groups. In addition, mismatch negativity (MMN) responses, which are elicited by acoustic change, were obtained. Whereas children and young adults demonstrated larger P1-N1-evoked response amplitudes over the left temporal lobe than over the right, responses from elderly subjects were symmetrical. In contrast, MMN responses, which reflect an echoic memory process, were symmetrical in all subject groups. The differences observed in the neurophysiological responses were accompanied by a finding of significantly poorer ability to discriminate speech syllables involving rapid spectrotemporal changes in the older adult group. This study demonstrates a biological, age-related change in the neural representation of basic speech sounds and suggests one possible underlying mechanism for the speech perception difficulties exhibited by aging adults. Furthermore, results of this study support previous findings suggesting a dissociation between neural mechanisms underlying those processes that reflect the basic representation of sound structure and those that represent auditory echoic memory and stimulus change.

  11. Sadness is unique: Neural processing of emotions in speech prosody in musicians and non-musicians

    Directory of Open Access Journals (Sweden)

    Mona ePark

    2015-01-01

    Full Text Available Musical training has been shown to have positive effects on several aspects of speech processing, however, the effects of musical training on the neural processing of speech prosody conveying distinct emotions are yet to be better understood. We used functional magnetic resonance imaging (fMRI to investigate whether the neural responses to speech prosody conveying happiness, sadness, and fear differ between musicians and non-musicians. Differences in processing of emotional speech prosody between the two groups were only observed when sadness was expressed. Musicians showed increased activation in the middle frontal gyrus, the anterior medial prefrontal cortex, the posterior cingulate cortex and the retrosplenial cortex. Our results suggest an increased sensitivity of emotional processing in musicians with respect to sadness expressed in speech, possibly reflecting empathic processes.

  12. Acoustic richness modulates the neural networks supporting intelligible speech processing.

    Science.gov (United States)

    Lee, Yune-Sang; Min, Nam Eun; Wingfield, Arthur; Grossman, Murray; Peelle, Jonathan E

    2016-03-01

    The information contained in a sensory signal plays a critical role in determining what neural processes are engaged. Here we used interleaved silent steady-state (ISSS) functional magnetic resonance imaging (fMRI) to explore how human listeners cope with different degrees of acoustic richness during auditory sentence comprehension. Twenty-six healthy young adults underwent scanning while hearing sentences that varied in acoustic richness (high vs. low spectral detail) and syntactic complexity (subject-relative vs. object-relative center-embedded clause structures). We manipulated acoustic richness by presenting the stimuli as unprocessed full-spectrum speech, or noise-vocoded with 24 channels. Importantly, although the vocoded sentences were spectrally impoverished, all sentences were highly intelligible. These manipulations allowed us to test how intelligible speech processing was affected by orthogonal linguistic and acoustic demands. Acoustically rich speech showed stronger activation than acoustically less-detailed speech in a bilateral temporoparietal network with more pronounced activity in the right hemisphere. By contrast, listening to sentences with greater syntactic complexity resulted in increased activation of a left-lateralized network including left posterior lateral temporal cortex, left inferior frontal gyrus, and left dorsolateral prefrontal cortex. Significant interactions between acoustic richness and syntactic complexity occurred in left supramarginal gyrus, right superior temporal gyrus, and right inferior frontal gyrus, indicating that the regions recruited for syntactic challenge differed as a function of acoustic properties of the speech. Our findings suggest that the neural systems involved in speech perception are finely tuned to the type of information available, and that reducing the richness of the acoustic signal dramatically alters the brain's response to spoken language, even when intelligibility is high. Copyright © 2015 Elsevier

  13. Musical intervention enhances infants' neural processing of temporal structure in music and speech.

    Science.gov (United States)

    Zhao, T Christina; Kuhl, Patricia K

    2016-05-10

    Individuals with music training in early childhood show enhanced processing of musical sounds, an effect that generalizes to speech processing. However, the conclusions drawn from previous studies are limited due to the possible confounds of predisposition and other factors affecting musicians and nonmusicians. We used a randomized design to test the effects of a laboratory-controlled music intervention on young infants' neural processing of music and speech. Nine-month-old infants were randomly assigned to music (intervention) or play (control) activities for 12 sessions. The intervention targeted temporal structure learning using triple meter in music (e.g., waltz), which is difficult for infants, and it incorporated key characteristics of typical infant music classes to maximize learning (e.g., multimodal, social, and repetitive experiences). Controls had similar multimodal, social, repetitive play, but without music. Upon completion, infants' neural processing of temporal structure was tested in both music (tones in triple meter) and speech (foreign syllable structure). Infants' neural processing was quantified by the mismatch response (MMR) measured with a traditional oddball paradigm using magnetoencephalography (MEG). The intervention group exhibited significantly larger MMRs in response to music temporal structure violations in both auditory and prefrontal cortical regions. Identical results were obtained for temporal structure changes in speech. The intervention thus enhanced temporal structure processing not only in music, but also in speech, at 9 mo of age. We argue that the intervention enhanced infants' ability to extract temporal structure information and to predict future events in time, a skill affecting both music and speech processing.

  14. Understanding the role of speech production in reading: Evidence for a print-to-speech neural network using graphical analysis.

    Science.gov (United States)

    Cummine, Jacqueline; Cribben, Ivor; Luu, Connie; Kim, Esther; Bahktiari, Reyhaneh; Georgiou, George; Boliek, Carol A

    2016-05-01

    The neural circuitry associated with language processing is complex and dynamic. Graphical models are useful for studying complex neural networks as this method provides information about unique connectivity between regions within the context of the entire network of interest. Here, the authors explored the neural networks during covert reading to determine the role of feedforward and feedback loops in covert speech production. Brain activity of skilled adult readers was assessed in real word and pseudoword reading tasks with functional MRI (fMRI). The authors provide evidence for activity coherence in the feedforward system (inferior frontal gyrus-supplementary motor area) during real word reading and in the feedback system (supramarginal gyrus-precentral gyrus) during pseudoword reading. Graphical models provided evidence of an extensive, highly connected, neural network when individuals read real words that relied on coordination of the feedforward system. In contrast, when individuals read pseudowords the authors found a limited/restricted network that relied on coordination of the feedback system. Together, these results underscore the importance of considering multiple pathways and articulatory loops during language tasks and provide evidence for a print-to-speech neural network. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  15. The neural basis of sublexical speech and corresponding nonspeech processing: a combined EEG-MEG study.

    Science.gov (United States)

    Kuuluvainen, Soila; Nevalainen, Päivi; Sorokin, Alexander; Mittag, Maria; Partanen, Eino; Putkinen, Vesa; Seppänen, Miia; Kähkönen, Seppo; Kujala, Teija

    2014-03-01

    We addressed the neural organization of speech versus nonspeech sound processing by investigating preattentive cortical auditory processing of changes in five features of a consonant-vowel syllable (consonant, vowel, sound duration, frequency, and intensity) and their acoustically matched nonspeech counterparts in a simultaneous EEG-MEG recording of mismatch negativity (MMN/MMNm). Overall, speech-sound processing was enhanced compared to nonspeech sound processing. This effect was strongest for changes which affect word meaning (consonant, vowel, and vowel duration) in the left and for the vowel identity change in the right hemisphere also. Furthermore, in the right hemisphere, speech-sound frequency and intensity changes were processed faster than their nonspeech counterparts, and there was a trend for speech-enhancement in frequency processing. In summary, the results support the proposed existence of long-term memory traces for speech sounds in the auditory cortices, and indicate at least partly distinct neural substrates for speech and nonspeech sound processing. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. Musical training during early childhood enhances the neural encoding of speech in noise.

    Science.gov (United States)

    Strait, Dana L; Parbery-Clark, Alexandra; Hittner, Emily; Kraus, Nina

    2012-12-01

    For children, learning often occurs in the presence of background noise. As such, there is growing desire to improve a child's access to a target signal in noise. Given adult musicians' perceptual and neural speech-in-noise enhancements, we asked whether similar effects are present in musically-trained children. We assessed the perception and subcortical processing of speech in noise and related cognitive abilities in musician and nonmusician children that were matched for a variety of overarching factors. Outcomes reveal that musicians' advantages for processing speech in noise are present during pivotal developmental years. Supported by correlations between auditory working memory and attention and auditory brainstem response properties, we propose that musicians' perceptual and neural enhancements are driven in a top-down manner by strengthened cognitive abilities with training. Our results may be considered by professionals involved in the remediation of language-based learning deficits, which are often characterized by poor speech perception in noise. Copyright © 2012 Elsevier Inc. All rights reserved.

  17. Co-speech gestures influence neural activity in brain regions associated with processing semantic information.

    Science.gov (United States)

    Dick, Anthony Steven; Goldin-Meadow, Susan; Hasson, Uri; Skipper, Jeremy I; Small, Steven L

    2009-11-01

    Everyday communication is accompanied by visual information from several sources, including co-speech gestures, which provide semantic information listeners use to help disambiguate the speaker's message. Using fMRI, we examined how gestures influence neural activity in brain regions associated with processing semantic information. The BOLD response was recorded while participants listened to stories under three audiovisual conditions and one auditory-only (speech alone) condition. In the first audiovisual condition, the storyteller produced gestures that naturally accompany speech. In the second, the storyteller made semantically unrelated hand movements. In the third, the storyteller kept her hands still. In addition to inferior parietal and posterior superior and middle temporal regions, bilateral posterior superior temporal sulcus and left anterior inferior frontal gyrus responded more strongly to speech when it was further accompanied by gesture, regardless of the semantic relation to speech. However, the right inferior frontal gyrus was sensitive to the semantic import of the hand movements, demonstrating more activity when hand movements were semantically unrelated to the accompanying speech. These findings show that perceiving hand movements during speech modulates the distributed pattern of neural activation involved in both biological motion perception and discourse comprehension, suggesting listeners attempt to find meaning, not only in the words speakers produce, but also in the hand movements that accompany speech.

  18. The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility.

    Science.gov (United States)

    Bentsen, Thomas; May, Tobias; Kressner, Abigail A; Dau, Torsten

    2018-01-01

    Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements. A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech intelligibility in normal-hearing listeners. A substantial improvement of 25.4 percentage points in speech intelligibility scores was found going from a subband-based architecture, in which a Gaussian Mixture Model-based classifier predicts the distributions of speech and noise for each frequency channel, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where the units are assigned a continuous value between zero and one. Therefore, both components play significant roles and by combining them, speech intelligibility improvements were obtained in a six-talker condition at a low signal-to-noise ratio.

  19. High school music classes enhance the neural processing of speech

    OpenAIRE

    Tierney, Adam; Krizman, Jennifer; Skoe, Erika; Johnston, Kathleen; Kraus, Nina

    2013-01-01

    Should music be a priority in public education? One argument for teaching music in school is that private music instruction relates to enhanced language abilities and neural function. However, the directionality of this relationship is unclear and it is unknown whether school-based music training can produce these enhancements. Here we show that two years of group music classes in high school enhance the subcortical encoding of speech. To tease apart the relationships between music and neural...

  20. A positron emission tomography study of the neural basis of informational and energetic masking effects in speech perception

    Science.gov (United States)

    Scott, Sophie K.; Rosen, Stuart; Wickham, Lindsay; Wise, Richard J. S.

    2004-02-01

    Positron emission tomography (PET) was used to investigate the neural basis of the comprehension of speech in unmodulated noise (``energetic'' masking, dominated by effects at the auditory periphery), and when presented with another speaker (``informational'' masking, dominated by more central effects). Each type of signal was presented at four different signal-to-noise ratios (SNRs) (+3, 0, -3, -6 dB for the speech-in-speech, +6, +3, 0, -3 dB for the speech-in-noise), with listeners instructed to listen for meaning to the target speaker. Consistent with behavioral studies, there was SNR-dependent activation associated with the comprehension of speech in noise, with no SNR-dependent activity for the comprehension of speech-in-speech (at low or negative SNRs). There was, in addition, activation in bilateral superior temporal gyri which was associated with the informational masking condition. The extent to which this activation of classical ``speech'' areas of the temporal lobes might delineate the neural basis of the informational masking is considered, as is the relationship of these findings to the interfering effects of unattended speech and sound on more explicit working memory tasks. This study is a novel demonstration of candidate neural systems involved in the perception of speech in noisy environments, and of the processing of multiple speakers in the dorso-lateral temporal lobes.

  1. A sparse neural code for some speech sounds but not for others.

    Directory of Open Access Journals (Sweden)

    Mathias Scharinger

    Full Text Available The precise neural mechanisms underlying speech sound representations are still a matter of debate. Proponents of 'sparse representations' assume that on the level of speech sounds, only contrastive or otherwise not predictable information is stored in long-term memory. Here, in a passive oddball paradigm, we challenge the neural foundations of such a 'sparse' representation; we use words that differ only in their penultimate consonant ("coronal" [t] vs. "dorsal" [k] place of articulation and for example distinguish between the German nouns Latz ([lats]; bib and Lachs ([laks]; salmon. Changes from standard [t] to deviant [k] and vice versa elicited a discernible Mismatch Negativity (MMN response. Crucially, however, the MMN for the deviant [lats] was stronger than the MMN for the deviant [laks]. Source localization showed this difference to be due to enhanced brain activity in right superior temporal cortex. These findings reflect a difference in phonological 'sparsity': Coronal [t] segments, but not dorsal [k] segments, are based on more sparse representations and elicit less specific neural predictions; sensory deviations from this prediction are more readily 'tolerated' and accordingly trigger weaker MMNs. The results foster the neurocomputational reality of 'representationally sparse' models of speech perception that are compatible with more general predictive mechanisms in auditory perception.

  2. Time Series Neural Network Model for Part-of-Speech Tagging Indonesian Language

    Science.gov (United States)

    Tanadi, Theo

    2018-03-01

    Part-of-speech tagging (POS tagging) is an important part in natural language processing. Many methods have been used to do this task, including neural network. This paper models a neural network that attempts to do POS tagging. A time series neural network is modelled to solve the problems that a basic neural network faces when attempting to do POS tagging. In order to enable the neural network to have text data input, the text data will get clustered first using Brown Clustering, resulting a binary dictionary that the neural network can use. To further the accuracy of the neural network, other features such as the POS tag, suffix, and affix of previous words would also be fed to the neural network.

  3. The neural basis of speech sound discrimination from infancy to adulthood

    OpenAIRE

    Partanen, Eino

    2013-01-01

    Rapid processing of speech is facilitated by neural representations of native language phonemes. However, some disorders and developmental conditions, such as developmental dyslexia, can hamper the development of these neural memory traces, leading to language delays and poor academic achievement. While the early identification of such deficits is paramount so that interventions can be started as early as possible, there is currently no systematically used ecologically valid paradigm for the ...

  4. Speech comprehension aided by multiple modalities: behavioural and neural interactions

    Science.gov (United States)

    McGettigan, Carolyn; Faulkner, Andrew; Altarelli, Irene; Obleser, Jonas; Baverstock, Harriet; Scott, Sophie K.

    2014-01-01

    Speech comprehension is a complex human skill, the performance of which requires the perceiver to combine information from several sources – e.g. voice, face, gesture, linguistic context – to achieve an intelligible and interpretable percept. We describe a functional imaging investigation of how auditory, visual and linguistic information interact to facilitate comprehension. Our specific aims were to investigate the neural responses to these different information sources, alone and in interaction, and further to use behavioural speech comprehension scores to address sites of intelligibility-related activation in multifactorial speech comprehension. In fMRI, participants passively watched videos of spoken sentences, in which we varied Auditory Clarity (with noise-vocoding), Visual Clarity (with Gaussian blurring) and Linguistic Predictability. Main effects of enhanced signal with increased auditory and visual clarity were observed in overlapping regions of posterior STS. Two-way interactions of the factors (auditory × visual, auditory × predictability) in the neural data were observed outside temporal cortex, where positive signal change in response to clearer facial information and greater semantic predictability was greatest at intermediate levels of auditory clarity. Overall changes in stimulus intelligibility by condition (as determined using an independent behavioural experiment) were reflected in the neural data by increased activation predominantly in bilateral dorsolateral temporal cortex, as well as inferior frontal cortex and left fusiform gyrus. Specific investigation of intelligibility changes at intermediate auditory clarity revealed a set of regions, including posterior STS and fusiform gyrus, showing enhanced responses to both visual and linguistic information. Finally, an individual differences analysis showed that greater comprehension performance in the scanning participants (measured in a post-scan behavioural test) were associated with

  5. Parameter masks for close talk speech segregation using deep neural networks

    Directory of Open Access Journals (Sweden)

    Jiang Yi

    2015-01-01

    Full Text Available A deep neural networks (DNN based close talk speech segregation algorithm is introduced. One nearby microphone is used to collect the target speech as close talk indicated, and another microphone is used to get the noise in environments. The time and energy difference between the two microphones signal is used as the segregation cue. A DNN estimator on each frequency channel is used to calculate the parameter masks. The parameter masks represent the target speech energy in each time frequency (T-F units. Experiment results show the good performance of the proposed system. The signal to noise ratio (SNR improvement is 8.1 dB on 0 dB noisy environment.

  6. The Neural Basis of Speech Perception through Lipreading and Manual Cues: Evidence from Deaf Native Users of Cued Speech

    Science.gov (United States)

    Aparicio, Mario; Peigneux, Philippe; Charlier, Brigitte; Balériaux, Danielle; Kavec, Martin; Leybaert, Jacqueline

    2017-01-01

    We present here the first neuroimaging data for perception of Cued Speech (CS) by deaf adults who are native users of CS. CS is a visual mode of communicating a spoken language through a set of manual cues which accompany lipreading and disambiguate it. With CS, sublexical units of the oral language are conveyed clearly and completely through the visual modality without requiring hearing. The comparison of neural processing of CS in deaf individuals with processing of audiovisual (AV) speech in normally hearing individuals represents a unique opportunity to explore the similarities and differences in neural processing of an oral language delivered in a visuo-manual vs. an AV modality. The study included deaf adult participants who were early CS users and native hearing users of French who process speech audiovisually. Words were presented in an event-related fMRI design. Three conditions were presented to each group of participants. The deaf participants saw CS words (manual + lipread), words presented as manual cues alone, and words presented to be lipread without manual cues. The hearing group saw AV spoken words, audio-alone and lipread-alone. Three findings are highlighted. First, the middle and superior temporal gyrus (excluding Heschl’s gyrus) and left inferior frontal gyrus pars triangularis constituted a common, amodal neural basis for AV and CS perception. Second, integration was inferred in posterior parts of superior temporal sulcus for audio and lipread information in AV speech, but in the occipito-temporal junction, including MT/V5, for the manual cues and lipreading in CS. Third, the perception of manual cues showed a much greater overlap with the regions activated by CS (manual + lipreading) than lipreading alone did. This supports the notion that manual cues play a larger role than lipreading for CS processing. The present study contributes to a better understanding of the role of manual cues as support of visual speech perception in the framework

  7. Musicians' Enhanced Neural Differentiation of Speech Sounds Arises Early in Life: Developmental Evidence from Ages 3 to 30

    Science.gov (United States)

    Strait, Dana L.; O'Connell, Samantha; Parbery-Clark, Alexandra; Kraus, Nina

    2014-01-01

    The perception and neural representation of acoustically similar speech sounds underlie language development. Music training hones the perception of minute acoustic differences that distinguish sounds; this training may generalize to speech processing given that adult musicians have enhanced neural differentiation of similar speech syllables compared with nonmusicians. Here, we asked whether this neural advantage in musicians is present early in life by assessing musically trained and untrained children as young as age 3. We assessed auditory brainstem responses to the speech syllables /ba/ and /ga/ as well as auditory and visual cognitive abilities in musicians and nonmusicians across 3 developmental time-points: preschoolers, school-aged children, and adults. Cross-phase analyses objectively measured the degree to which subcortical responses differed to these speech syllables in musicians and nonmusicians for each age group. Results reveal that musicians exhibit enhanced neural differentiation of stop consonants early in life and with as little as a few years of training. Furthermore, the extent of subcortical stop consonant distinction correlates with auditory-specific cognitive abilities (i.e., auditory working memory and attention). Results are interpreted according to a corticofugal framework for auditory learning in which subcortical processing enhancements are engendered by strengthened cognitive control over auditory function in musicians. PMID:23599166

  8. Crossmodal deficit in dyslexic children: practice affects the neural timing of letter-speech sound integration

    Directory of Open Access Journals (Sweden)

    Gojko eŽarić

    2015-06-01

    Full Text Available A failure to build solid letter-speech sound associations may contribute to reading impairments in developmental dyslexia. Whether this reduced neural integration of letters and speech sounds changes over time within individual children and how this relates to behavioral gains in reading skills remains unknown. In this research, we examined changes in event-related potential (ERP measures of letter-speech sound integration over a 6-month period during which 9-year-old dyslexic readers (n=17 followed a training in letter-speech sound coupling next to their regular reading curriculum. We presented the Dutch spoken vowels /a/ and /o/ as standard and deviant stimuli in one auditory and two audiovisual oddball conditions. In one audiovisual condition (AV0, the letter ‘a’ was presented simultaneously with the vowels, while in the other (AV200 it was preceding vowel onset for 200 ms. Prior to the training (T1, dyslexic readers showed the expected pattern of typical auditory mismatch responses, together with the absence of letter-speech sound effects in a late negativity (LN window. After the training (T2, our results showed earlier (and enhanced crossmodal effects in the LN window. Most interestingly, earlier LN latency at T2 was significantly related to higher behavioral accuracy in letter-speech sound coupling. On a more general level, the timing of the earlier mismatch negativity (MMN in the simultaneous condition (AV0 measured at T1, significantly related to reading fluency at both T1 and T2 as well as with reading gains. Our findings suggest that the reduced neural integration of letters and speech sounds in dyslexic children may show moderate improvement with reading instruction and training and that behavioral improvements relate especially to individual differences in the timing of this neural integration.

  9. Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach

    Directory of Open Access Journals (Sweden)

    B. Yegnanarayana

    2007-01-01

    Full Text Available Speech recorded from a throat microphone is robust to the surrounding noise, but sounds unnatural unlike the speech recorded from a close-speaking microphone. This paper addresses the issue of improving the perceptual quality of the throat microphone speech by mapping the speech spectra from the throat microphone to the close-speaking microphone. A neural network model is used to capture the speaker-dependent functional relationship between the feature vectors (cepstral coefficients of the two speech signals. A method is proposed to ensure the stability of the all-pole synthesis filter. Objective evaluations indicate the effectiveness of the proposed mapping scheme. The advantage of this method is that the model gives a smooth estimate of the spectra of the close-speaking microphone speech. No distortions are perceived in the reconstructed speech. This mapping technique is also used for bandwidth extension of telephone speech.

  10. Encoding of phonology in a recurrent neural model of grounded speech

    NARCIS (Netherlands)

    Alishahi, Afra; Barking, Marie; Chrupala, Grzegorz; Levy, Roger; Specia, Lucia

    2017-01-01

    We study the representation and encoding of phonemes in a recurrent neural network model of grounded speech. We use a model which processes images and their spoken descriptions, and projects the visual and auditory representations into the same semantic space. We perform a number of analyses on how

  11. Audio Classification in Speech and Music: A Comparison between a Statistical and a Neural Approach

    Directory of Open Access Journals (Sweden)

    Alessandro Bugatti

    2002-04-01

    Full Text Available We focus the attention on the problem of audio classification in speech and music for multimedia applications. In particular, we present a comparison between two different techniques for speech/music discrimination. The first method is based on Zero crossing rate and Bayesian classification. It is very simple from a computational point of view, and gives good results in case of pure music or speech. The simulation results show that some performance degradation arises when the music segment contains also some speech superimposed on music, or strong rhythmic components. To overcome these problems, we propose a second method, that uses more features, and is based on neural networks (specifically a multi-layer Perceptron. In this case we obtain better performance, at the expense of a limited growth in the computational complexity. In practice, the proposed neural network is simple to be implemented if a suitable polynomial is used as the activation function, and a real-time implementation is possible even if low-cost embedded systems are used.

  12. SYNAPTIC DEPRESSION IN DEEP NEURAL NETWORKS FOR SPEECH PROCESSING.

    Science.gov (United States)

    Zhang, Wenhao; Li, Hanyu; Yang, Minda; Mesgarani, Nima

    2016-03-01

    A characteristic property of biological neurons is their ability to dynamically change the synaptic efficacy in response to variable input conditions. This mechanism, known as synaptic depression, significantly contributes to the formation of normalized representation of speech features. Synaptic depression also contributes to the robust performance of biological systems. In this paper, we describe how synaptic depression can be modeled and incorporated into deep neural network architectures to improve their generalization ability. We observed that when synaptic depression is added to the hidden layers of a neural network, it reduces the effect of changing background activity in the node activations. In addition, we show that when synaptic depression is included in a deep neural network trained for phoneme classification, the performance of the network improves under noisy conditions not included in the training phase. Our results suggest that more complete neuron models may further reduce the gap between the biological performance and artificial computing, resulting in networks that better generalize to novel signal conditions.

  13. Speech reconstruction using a deep partially supervised neural network.

    Science.gov (United States)

    McLoughlin, Ian; Li, Jingjie; Song, Yan; Sharifzadeh, Hamid R

    2017-08-01

    Statistical speech reconstruction for larynx-related dysphonia has achieved good performance using Gaussian mixture models and, more recently, restricted Boltzmann machine arrays; however, deep neural network (DNN)-based systems have been hampered by the limited amount of training data available from individual voice-loss patients. The authors propose a novel DNN structure that allows a partially supervised training approach on spectral features from smaller data sets, yielding very good results compared with the current state-of-the-art.

  14. Temporal Context in Speech Processing and Attentional Stream Selection: A Behavioral and Neural perspective

    Science.gov (United States)

    Zion Golumbic, Elana M.; Poeppel, David; Schroeder, Charles E.

    2012-01-01

    The human capacity for processing speech is remarkable, especially given that information in speech unfolds over multiple time scales concurrently. Similarly notable is our ability to filter out of extraneous sounds and focus our attention on one conversation, epitomized by the ‘Cocktail Party’ effect. Yet, the neural mechanisms underlying on-line speech decoding and attentional stream selection are not well understood. We review findings from behavioral and neurophysiological investigations that underscore the importance of the temporal structure of speech for achieving these perceptual feats. We discuss the hypothesis that entrainment of ambient neuronal oscillations to speech’s temporal structure, across multiple time-scales, serves to facilitate its decoding and underlies the selection of an attended speech stream over other competing input. In this regard, speech decoding and attentional stream selection are examples of ‘active sensing’, emphasizing an interaction between proactive and predictive top-down modulation of neuronal dynamics and bottom-up sensory input. PMID:22285024

  15. Neural correlates of phonetic convergence and speech imitation.

    Science.gov (United States)

    Garnier, Maëva; Lamalle, Laurent; Sato, Marc

    2013-01-01

    Speakers unconsciously tend to mimic their interlocutor's speech during communicative interaction. This study aims at examining the neural correlates of phonetic convergence and deliberate imitation, in order to explore whether imitation of phonetic features, deliberate, or unconscious, might reflect a sensory-motor recalibration process. Sixteen participants listened to vowels with pitch varying around the average pitch of their own voice, and then produced the identified vowels, while their speech was recorded and their brain activity was imaged using fMRI. Three degrees and types of imitation were compared (unconscious, deliberate, and inhibited) using a go-nogo paradigm, which enabled the comparison of brain activations during the whole imitation process, its active perception step, and its production. Speakers followed the pitch of voices they were exposed to, even unconsciously, without being instructed to do so. After being informed about this phenomenon, 14 participants were able to inhibit it, at least partially. The results of whole brain and ROI analyses support the fact that both deliberate and unconscious imitations are based on similar neural mechanisms and networks, involving regions of the dorsal stream, during both perception and production steps of the imitation process. While no significant difference in brain activation was found between unconscious and deliberate imitations, the degree of imitation, however, appears to be determined by processes occurring during the perception step. Four regions of the dorsal stream: bilateral auditory cortex, bilateral supramarginal gyrus (SMG), and left Wernicke's area, indeed showed an activity that correlated significantly with the degree of imitation during the perception step.

  16. High gamma oscillations in medial temporal lobe during overt production of speech and gestures.

    Science.gov (United States)

    Marstaller, Lars; Burianová, Hana; Sowman, Paul F

    2014-01-01

    The study of the production of co-speech gestures (CSGs), i.e., meaningful hand movements that often accompany speech during everyday discourse, provides an important opportunity to investigate the integration of language, action, and memory because of the semantic overlap between gesture movements and speech content. Behavioral studies of CSGs and speech suggest that they have a common base in memory and predict that overt production of both speech and CSGs would be preceded by neural activity related to memory processes. However, to date the neural correlates and timing of CSG production are still largely unknown. In the current study, we addressed these questions with magnetoencephalography and a semantic association paradigm in which participants overtly produced speech or gesture responses that were either meaningfully related to a stimulus or not. Using spectral and beamforming analyses to investigate the neural activity preceding the responses, we found a desynchronization in the beta band (15-25 Hz), which originated 900 ms prior to the onset of speech and was localized to motor and somatosensory regions in the cortex and cerebellum, as well as right inferior frontal gyrus. Beta desynchronization is often seen as an indicator of motor processing and thus reflects motor activity related to the hand movements that gestures add to speech. Furthermore, our results show oscillations in the high gamma band (50-90 Hz), which originated 400 ms prior to speech onset and were localized to the left medial temporal lobe. High gamma oscillations have previously been found to be involved in memory processes and we thus interpret them to be related to contextual association of semantic information in memory. The results of our study show that high gamma oscillations in medial temporal cortex play an important role in the binding of information in human memory during speech and CSG production.

  17. High gamma oscillations in medial temporal lobe during overt production of speech and gestures.

    Directory of Open Access Journals (Sweden)

    Lars Marstaller

    Full Text Available The study of the production of co-speech gestures (CSGs, i.e., meaningful hand movements that often accompany speech during everyday discourse, provides an important opportunity to investigate the integration of language, action, and memory because of the semantic overlap between gesture movements and speech content. Behavioral studies of CSGs and speech suggest that they have a common base in memory and predict that overt production of both speech and CSGs would be preceded by neural activity related to memory processes. However, to date the neural correlates and timing of CSG production are still largely unknown. In the current study, we addressed these questions with magnetoencephalography and a semantic association paradigm in which participants overtly produced speech or gesture responses that were either meaningfully related to a stimulus or not. Using spectral and beamforming analyses to investigate the neural activity preceding the responses, we found a desynchronization in the beta band (15-25 Hz, which originated 900 ms prior to the onset of speech and was localized to motor and somatosensory regions in the cortex and cerebellum, as well as right inferior frontal gyrus. Beta desynchronization is often seen as an indicator of motor processing and thus reflects motor activity related to the hand movements that gestures add to speech. Furthermore, our results show oscillations in the high gamma band (50-90 Hz, which originated 400 ms prior to speech onset and were localized to the left medial temporal lobe. High gamma oscillations have previously been found to be involved in memory processes and we thus interpret them to be related to contextual association of semantic information in memory. The results of our study show that high gamma oscillations in medial temporal cortex play an important role in the binding of information in human memory during speech and CSG production.

  18. The origins of originality: the neural bases of creative thinking and originality.

    Science.gov (United States)

    Shamay-Tsoory, S G; Adler, N; Aharon-Peretz, J; Perry, D; Mayseless, N

    2011-01-01

    Although creativity has been related to prefrontal activity, recent neurological case studies postulate that patients who have left frontal and temporal degeneration involving deterioration of language abilities may actually develop de novo artistic abilities. In this study, we propose a neural and cognitive model according to which a balance between the two hemispheres affects a major aspect of creative cognition, namely, originality. In order to examine the neural basis of originality, that is, the ability to produce statistically infrequent ideas, patients with localized lesions in the medial prefrontal cortex (mPFC), inferior frontal gyrus (IFG), and posterior parietal and temporal cortex (PC), were assessed by two tasks involving divergent thinking and originality. Results indicate that lesions in the mPFC involved the most profound impairment in originality. Furthermore, precise anatomical mapping of lesions indicated that while the extent of lesion in the right mPFC was associated with impaired originality, lesions in the left PC were associated with somewhat elevated levels of originality. A positive correlation between creativity scores and left PC lesions indicated that the larger the lesion is in this area the greater the originality. On the other hand, a negative correlation was observed between originality scores and lesions in the right mPFC. It is concluded that the right mPFC is part of a right fronto-parietal network which is responsible for producing original ideas. It is possible that more linear cognitive processing such as language, mediated by left hemisphere structures interferes with creative cognition. Therefore, lesions in the left hemisphere may be associated with elevated levels of originality. Copyright © 2010 Elsevier Ltd. All rights reserved.

  19. Neural pathways for visual speech perception

    Directory of Open Access Journals (Sweden)

    Lynne E Bernstein

    2014-12-01

    Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.

  20. Interaction matters: A perceived social partner alters the neural processing of human speech.

    Science.gov (United States)

    Rice, Katherine; Redcay, Elizabeth

    2016-04-01

    Mounting evidence suggests that social interaction changes how communicative behaviors (e.g., spoken language, gaze) are processed, but the precise neural bases by which social-interactive context may alter communication remain unknown. Various perspectives suggest that live interactions are more rewarding, more attention-grabbing, or require increased mentalizing-thinking about the thoughts of others. Dissociating between these possibilities is difficult because most extant neuroimaging paradigms examining social interaction have not directly compared live paradigms to conventional "offline" (or recorded) paradigms. We developed a novel fMRI paradigm to assess whether and how an interactive context changes the processing of speech matched in content and vocal characteristics. Participants listened to short vignettes--which contained no reference to people or mental states--believing that some vignettes were prerecorded and that others were presented over a real-time audio-feed by a live social partner. In actuality, all speech was prerecorded. Simply believing that speech was live increased activation in each participant's own mentalizing regions, defined using a functional localizer. Contrasting live to recorded speech did not reveal significant differences in attention or reward regions. Further, higher levels of autistic-like traits were associated with altered neural specialization for live interaction. These results suggest that humans engage in ongoing mentalizing about social partners, even when such mentalizing is not explicitly required, illustrating how social context shapes social cognition. Understanding communication in social context has important implications for typical and atypical social processing, especially for disorders like autism where social difficulties are more acute in live interaction. Copyright © 2015 Elsevier Inc. All rights reserved.

  1. Understanding Freedom of Speech in America: The Origin & Evolution of the 1st Amendment.

    Science.gov (United States)

    Barnes, Judy

    In this booklet the content and implications of the First Amendment are analyzed. Historical origins of free speech from ancient Greece to England before the discovery of America, free speech in colonial America, and the Bill of Rights and its meaning for free speech are outlined. The evolution of the First Amendment is described, and the…

  2. Musical Training during Early Childhood Enhances the Neural Encoding of Speech in Noise

    Science.gov (United States)

    Strait, Dana L.; Parbery-Clark, Alexandra; Hittner, Emily; Kraus, Nina

    2012-01-01

    For children, learning often occurs in the presence of background noise. As such, there is growing desire to improve a child's access to a target signal in noise. Given adult musicians' perceptual and neural speech-in-noise enhancements, we asked whether similar effects are present in musically-trained children. We assessed the perception and…

  3. [Modeling developmental aspects of sensorimotor control of speech production].

    Science.gov (United States)

    Kröger, B J; Birkholz, P; Neuschaefer-Rube, C

    2007-05-01

    Detailed knowledge of the neurophysiology of speech acquisition is important for understanding the developmental aspects of speech perception and production and for understanding developmental disorders of speech perception and production. A computer implemented neural model of sensorimotor control of speech production was developed. The model is capable of demonstrating the neural functions of different cortical areas during speech production in detail. (i) Two sensory and two motor maps or neural representations and the appertaining neural mappings or projections establish the sensorimotor feedback control system. These maps and mappings are already formed and trained during the prelinguistic phase of speech acquisition. (ii) The feedforward sensorimotor control system comprises the lexical map (representations of sounds, syllables, and words of the first language) and the mappings from lexical to sensory and to motor maps. The training of the appertaining mappings form the linguistic phase of speech acquisition. (iii) Three prelinguistic learning phases--i. e. silent mouthing, quasi stationary vocalic articulation, and realisation of articulatory protogestures--can be defined on the basis of our simulation studies using the computational neural model. These learning phases can be associated with temporal phases of prelinguistic speech acquisition obtained from natural data. The neural model illuminates the detailed function of specific cortical areas during speech production. In particular it can be shown that developmental disorders of speech production may result from a delayed or incorrect process within one of the prelinguistic learning phases defined by the neural model.

  4. When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.

    Science.gov (United States)

    Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola

    2017-11-01

    Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Vocal Tract Images Reveal Neural Representations of Sensorimotor Transformation During Speech Imitation

    Science.gov (United States)

    Carey, Daniel; Miquel, Marc E.; Evans, Bronwen G.; Adank, Patti; McGettigan, Carolyn

    2017-01-01

    Abstract Imitating speech necessitates the transformation from sensory targets to vocal tract motor output, yet little is known about the representational basis of this process in the human brain. Here, we address this question by using real-time MR imaging (rtMRI) of the vocal tract and functional MRI (fMRI) of the brain in a speech imitation paradigm. Participants trained on imitating a native vowel and a similar nonnative vowel that required lip rounding. Later, participants imitated these vowels and an untrained vowel pair during separate fMRI and rtMRI runs. Univariate fMRI analyses revealed that regions including left inferior frontal gyrus were more active during sensorimotor transformation (ST) and production of nonnative vowels, compared with native vowels; further, ST for nonnative vowels activated somatomotor cortex bilaterally, compared with ST of native vowels. Using test representational similarity analysis (RSA) models constructed from participants’ vocal tract images and from stimulus formant distances, we found that RSA searchlight analyses of fMRI data showed either type of model could be represented in somatomotor, temporal, cerebellar, and hippocampal neural activation patterns during ST. We thus provide the first evidence of widespread and robust cortical and subcortical neural representation of vocal tract and/or formant parameters, during prearticulatory ST. PMID:28334401

  6. Cortical activity patterns predict robust speech discrimination ability in noise

    Science.gov (United States)

    Shetake, Jai A.; Wolf, Jordan T.; Cheung, Ryan J.; Engineer, Crystal T.; Ram, Satyananda K.; Kilgard, Michael P.

    2012-01-01

    The neural mechanisms that support speech discrimination in noisy conditions are poorly understood. In quiet conditions, spike timing information appears to be used in the discrimination of speech sounds. In this study, we evaluated the hypothesis that spike timing is also used to distinguish between speech sounds in noisy conditions that significantly degrade neural responses to speech sounds. We tested speech sound discrimination in rats and recorded primary auditory cortex (A1) responses to speech sounds in background noise of different intensities and spectral compositions. Our behavioral results indicate that rats, like humans, are able to accurately discriminate consonant sounds even in the presence of background noise that is as loud as the speech signal. Our neural recordings confirm that speech sounds evoke degraded but detectable responses in noise. Finally, we developed a novel neural classifier that mimics behavioral discrimination. The classifier discriminates between speech sounds by comparing the A1 spatiotemporal activity patterns evoked on single trials with the average spatiotemporal patterns evoked by known sounds. Unlike classifiers in most previous studies, this classifier is not provided with the stimulus onset time. Neural activity analyzed with the use of relative spike timing was well correlated with behavioral speech discrimination in quiet and in noise. Spike timing information integrated over longer intervals was required to accurately predict rat behavioral speech discrimination in noisy conditions. The similarity of neural and behavioral discrimination of speech in noise suggests that humans and rats may employ similar brain mechanisms to solve this problem. PMID:22098331

  7. Neural processing of speech in children is influenced by bilingual experience

    Science.gov (United States)

    Krizman, Jennifer; Slater, Jessica; Skoe, Erika; Marian, Viorica; Kraus, Nina

    2014-01-01

    Language experience fine-tunes how the auditory system processes sound. For example, bilinguals, relative to monolinguals, have more robust evoked responses to speech that manifest as stronger neural encoding of the fundamental frequency (F0) and greater across-trial consistency. However, it is unknown whether such enhancements increase with increasing second language experience. We predict that F0 amplitude and neural consistency scale with dual-language experience during childhood, such that more years of bilingual experience leads to more robust F0 encoding and greater neural consistency. To test this hypothesis, we recorded auditory brainstem responses to the synthesized syllables ‘ba’ and ‘ga’ in two groups of bilingual children who were matched for age at test (8.4+/−0.67 years) but differed in their age of second language acquisition. One group learned English and Spanish simultaneously from birth (n=13), while the second group learned the two languages sequentially (n=15), spending on average their first four years as monolingual Spanish speakers. We find that simultaneous bilinguals have a larger F0 response to ‘ba’ and ‘ga’ and a more consistent response to ‘ba’ compared to sequential bilinguals. We also demonstrate that these neural enhancements positively relate with years of bilingual experience. These findings support the notion that bilingualism enhances subcortical auditory processing. PMID:25445377

  8. Generating original ideas: The neural underpinning of originality.

    Science.gov (United States)

    Mayseless, Naama; Eran, Ayelet; Shamay-Tsoory, Simone G

    2015-08-01

    One of the key aspects of creativity is the ability to produce original ideas. Originality is defined in terms of the novelty and rarity of an idea and is measured by the infrequency of the idea compared to other ideas. In the current study we focused on divergent thinking (DT) - the ability to produce many alternate ideas - and assessed the neural pathways associated with originality. Considering that generation of original ideas involves both the ability to generate new associations and the ability to overcome automatic common responses, we hypothesized that originality would be associated with activations in regions related to associative thinking, including areas of the default mode network (DMN) such as medial prefrontal areas, as well as with areas involved in cognitive control and inhibition. Thirty participants were scanned while performing a DT task that required the generation of original uses for common objects. The results indicate that the ability to produce original ideas is mediated by activity in several regions that are part of the DMN including the medial prefrontal cortex (mPFC) and the posterior cingulate cortex (PCC). Furthermore, individuals who are more original exhibited enhanced activation in the ventral anterior cingulate cortex (vACC), which was also positively coupled with activity in the left occipital-temporal area. These results are in line with the dual model of creativity, according to which original ideas are a product of the interaction between a system that generates ideas and a control system that evaluates these ideas. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Neurophysiological Influence of Musical Training on Speech Perception

    OpenAIRE

    Shahin, Antoine J.

    2011-01-01

    Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one’s ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss, who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skill...

  10. Distributed neural signatures of natural audiovisual speech and music in the human auditory cortex.

    Science.gov (United States)

    Salmi, Juha; Koistinen, Olli-Pekka; Glerean, Enrico; Jylänki, Pasi; Vehtari, Aki; Jääskeläinen, Iiro P; Mäkelä, Sasu; Nummenmaa, Lauri; Nummi-Kuisma, Katarina; Nummi, Ilari; Sams, Mikko

    2017-08-15

    During a conversation or when listening to music, auditory and visual information are combined automatically into audiovisual objects. However, it is still poorly understood how specific type of visual information shapes neural processing of sounds in lifelike stimulus environments. Here we applied multi-voxel pattern analysis to investigate how naturally matching visual input modulates supratemporal cortex activity during processing of naturalistic acoustic speech, singing and instrumental music. Bayesian logistic regression classifiers with sparsity-promoting priors were trained to predict whether the stimulus was audiovisual or auditory, and whether it contained piano playing, speech, or singing. The predictive performances of the classifiers were tested by leaving one participant at a time for testing and training the model using the remaining 15 participants. The signature patterns associated with unimodal auditory stimuli encompassed distributed locations mostly in the middle and superior temporal gyrus (STG/MTG). A pattern regression analysis, based on a continuous acoustic model, revealed that activity in some of these MTG and STG areas were associated with acoustic features present in speech and music stimuli. Concurrent visual stimulus modulated activity in bilateral MTG (speech), lateral aspect of right anterior STG (singing), and bilateral parietal opercular cortex (piano). Our results suggest that specific supratemporal brain areas are involved in processing complex natural speech, singing, and piano playing, and other brain areas located in anterior (facial speech) and posterior (music-related hand actions) supratemporal cortex are influenced by related visual information. Those anterior and posterior supratemporal areas have been linked to stimulus identification and sensory-motor integration, respectively. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Investigating the neural correlates of voice versus speech-sound directed information in pre-school children.

    Directory of Open Access Journals (Sweden)

    Nora Maria Raschle

    Full Text Available Studies in sleeping newborns and infants propose that the superior temporal sulcus is involved in speech processing soon after birth. Speech processing also implicitly requires the analysis of the human voice, which conveys both linguistic and extra-linguistic information. However, due to technical and practical challenges when neuroimaging young children, evidence of neural correlates of speech and/or voice processing in toddlers and young children remains scarce. In the current study, we used functional magnetic resonance imaging (fMRI in 20 typically developing preschool children (average age  = 5.8 y; range 5.2-6.8 y to investigate brain activation during judgments about vocal identity versus the initial speech sound of spoken object words. FMRI results reveal common brain regions responsible for voice-specific and speech-sound specific processing of spoken object words including bilateral primary and secondary language areas of the brain. Contrasting voice-specific with speech-sound specific processing predominantly activates the anterior part of the right-hemispheric superior temporal sulcus. Furthermore, the right STS is functionally correlated with left-hemispheric temporal and right-hemispheric prefrontal regions. This finding underlines the importance of the right superior temporal sulcus as a temporal voice area and indicates that this brain region is specialized, and functions similarly to adults by the age of five. We thus extend previous knowledge of voice-specific regions and their functional connections to the young brain which may further our understanding of the neuronal mechanism of speech-specific processing in children with developmental disorders, such as autism or specific language impairments.

  12. INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM

    Directory of Open Access Journals (Sweden)

    J. SANGEETHA

    2015-02-01

    Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.

  13. Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems

    DEFF Research Database (Denmark)

    Kolbæk, Morten; Tan, Zheng-Hua; Jensen, Jesper

    2017-01-01

    In this paper, we study aspects of single microphone speech enhancement (SE) based on deep neural networks (DNNs). Specifically, we explore the generalizability capabilities of state-of-the-art DNN-based SE systems with respect to the background noise type, the gender of the target speaker...... general. Finally, we compare how a DNN-based SE system trained to be noise type general, speaker general, and SNR general performs relative to a state-of-the-art short-time spectral amplitude minimum mean square error (STSA-MMSE) based SE algorithm. We show that DNN-based SE systems, when trained...... a state-of-the-art STSA-MMSE based SE method, when tested using a range of unseen speakers and noise types. Finally, a listening test using several DNN-based SE systems tested in unseen speaker conditions show that these systems can improve SI for some SNR and noise type configurations but degrade SI...

  14. Differences in Neural Correlates of Speech Perception in 3 Month Olds at High and Low Risk for Autism Spectrum Disorder.

    Science.gov (United States)

    Edwards, Laura A; Wagner, Jennifer B; Tager-Flusberg, Helen; Nelson, Charles A

    2017-10-01

    In this study, we investigated neural precursors of language acquisition as potential endophenotypes of autism spectrum disorder (ASD) in 3-month-old infants at high and low familial ASD risk. Infants were imaged using functional near-infrared spectroscopy while they listened to auditory stimuli containing syllable repetitions; their neural responses were analyzed over left and right temporal regions. While female low risk infants showed initial neural activation that decreased over exposure to repetition-based stimuli, potentially indicating a habituation response to repetition in speech, female high risk infants showed no changes in neural activity over exposure. This finding may indicate a potential neural endophenotype of language development or ASD specific to females at risk for the disorder.

  15. Cultural-historical and cognitive approaches to understanding the origins of development of written speech

    Directory of Open Access Journals (Sweden)

    L.F. Obukhova

    2014-08-01

    Full Text Available We present an analysis of the emergence and development of written speech, its relationship to the oral speech, connections to the symbolic and modeling activities of preschool children – playing and drawing. While a child's drawing is traditionally interpreted in psychology either as a measure of intellectual development, or as a projective technique, or as a criterion for creative giftedness of the child, in this article, the artistic activity is analyzed as a prerequisite for development of written speech. The article substantiates the hypothesis that the mastery of “picture writing” – the ability to display the verbal content in a schematic picturesque plan – is connected to the success of writing speech at school age. Along with the classical works of L.S. Vygotsky, D.B. Elkonin, A.R. Luria, dedicated to finding the origins of writing, the article presents the current Russian and foreign frameworks of forming the preconditions of writing, based on the concepts of cultural-historical theory (“higher mental functions”, “zone of proximal development”, etc.. In Western psychology, a number of pilot studies used the developmental function of drawing for teaching the written skills to children of 5-7 years old. However, in cognitive psychology, relationship between drawing and writing is most often reduced mainly to the analysis of general motor circuits. Despite the recovery in research on writing and its origins in the last decade, either in domestic or in foreign psychology, the written speech is not a sufficiently studied problem.

  16. Out-of-synchrony speech entrainment in developmental dyslexia.

    Science.gov (United States)

    Molinaro, Nicola; Lizarazu, Mikel; Lallier, Marie; Bourguignon, Mathieu; Carreiras, Manuel

    2016-08-01

    Developmental dyslexia is a reading disorder often characterized by reduced awareness of speech units. Whether the neural source of this phonological disorder in dyslexic readers results from the malfunctioning of the primary auditory system or damaged feedback communication between higher-order phonological regions (i.e., left inferior frontal regions) and the auditory cortex is still under dispute. Here we recorded magnetoencephalographic (MEG) signals from 20 dyslexic readers and 20 age-matched controls while they were listening to ∼10-s-long spoken sentences. Compared to controls, dyslexic readers had (1) an impaired neural entrainment to speech in the delta band (0.5-1 Hz); (2) a reduced delta synchronization in both the right auditory cortex and the left inferior frontal gyrus; and (3) an impaired feedforward functional coupling between neural oscillations in the right auditory cortex and the left inferior frontal regions. This shows that during speech listening, individuals with developmental dyslexia present reduced neural synchrony to low-frequency speech oscillations in primary auditory regions that hinders higher-order speech processing steps. The present findings, thus, strengthen proposals assuming that improper low-frequency acoustic entrainment affects speech sampling. This low speech-brain synchronization has the strong potential to cause severe consequences for both phonological and reading skills. Interestingly, the reduced speech-brain synchronization in dyslexic readers compared to normal readers (and its higher-order consequences across the speech processing network) appears preserved through the development from childhood to adulthood. Thus, the evaluation of speech-brain synchronization could possibly serve as a diagnostic tool for early detection of children at risk of dyslexia. Hum Brain Mapp 37:2767-2783, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  17. Speech networks at rest and in action: interactions between functional brain networks controlling speech production

    Science.gov (United States)

    Fuertinger, Stefan

    2015-01-01

    Speech production is one of the most complex human behaviors. Although brain activation during speaking has been well investigated, our understanding of interactions between the brain regions and neural networks remains scarce. We combined seed-based interregional correlation analysis with graph theoretical analysis of functional MRI data during the resting state and sentence production in healthy subjects to investigate the interface and topology of functional networks originating from the key brain regions controlling speech, i.e., the laryngeal/orofacial motor cortex, inferior frontal and superior temporal gyri, supplementary motor area, cingulate cortex, putamen, and thalamus. During both resting and speaking, the interactions between these networks were bilaterally distributed and centered on the sensorimotor brain regions. However, speech production preferentially recruited the inferior parietal lobule (IPL) and cerebellum into the large-scale network, suggesting the importance of these regions in facilitation of the transition from the resting state to speaking. Furthermore, the cerebellum (lobule VI) was the most prominent region showing functional influences on speech-network integration and segregation. Although networks were bilaterally distributed, interregional connectivity during speaking was stronger in the left vs. right hemisphere, which may have underlined a more homogeneous overlap between the examined networks in the left hemisphere. Among these, the laryngeal motor cortex (LMC) established a core network that fully overlapped with all other speech-related networks, determining the extent of network interactions. Our data demonstrate complex interactions of large-scale brain networks controlling speech production and point to the critical role of the LMC, IPL, and cerebellum in the formation of speech production network. PMID:25673742

  18. Neural networks engaged in short-term memory rehearsal are disrupted by irrelevant speech in human subjects.

    Science.gov (United States)

    Kopp, Franziska; Schröger, Erich; Lipka, Sigrid

    2004-01-02

    Rehearsal mechanisms in human short-term memory are increasingly understood in the light of both behavioural and neuroanatomical findings. However, little is known about the cooperation of participating brain structures and how such cooperations are affected when memory performance is disrupted. In this paper we use EEG coherence as a measure of synchronization to investigate rehearsal processes and their disruption by irrelevant speech in a delayed serial recall paradigm. Fronto-central and fronto-parietal theta (4-7.5 Hz), beta (13-20 Hz), and gamma (35-47 Hz) synchronizations are shown to be involved in our short-term memory task. Moreover, the impairment in serial recall due to irrelevant speech was preceded by a reduction of gamma band coherence. Results suggest that the irrelevant speech effect has its neural basis in the disruption of left-lateralized fronto-central networks. This stresses the importance of gamma band activity for short-term memory operations.

  19. Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition

    OpenAIRE

    Li, Xiangang; Wu, Xihong

    2014-01-01

    Long short-term memory (LSTM) based acoustic modeling methods have recently been shown to give state-of-the-art performance on some speech recognition tasks. To achieve a further performance improvement, in this research, deep extensions on LSTM are investigated considering that deep hierarchical model has turned out to be more efficient than a shallow one. Motivated by previous research on constructing deep recurrent neural networks (RNNs), alternative deep LSTM architectures are proposed an...

  20. Speech networks at rest and in action: interactions between functional brain networks controlling speech production.

    Science.gov (United States)

    Simonyan, Kristina; Fuertinger, Stefan

    2015-04-01

    Speech production is one of the most complex human behaviors. Although brain activation during speaking has been well investigated, our understanding of interactions between the brain regions and neural networks remains scarce. We combined seed-based interregional correlation analysis with graph theoretical analysis of functional MRI data during the resting state and sentence production in healthy subjects to investigate the interface and topology of functional networks originating from the key brain regions controlling speech, i.e., the laryngeal/orofacial motor cortex, inferior frontal and superior temporal gyri, supplementary motor area, cingulate cortex, putamen, and thalamus. During both resting and speaking, the interactions between these networks were bilaterally distributed and centered on the sensorimotor brain regions. However, speech production preferentially recruited the inferior parietal lobule (IPL) and cerebellum into the large-scale network, suggesting the importance of these regions in facilitation of the transition from the resting state to speaking. Furthermore, the cerebellum (lobule VI) was the most prominent region showing functional influences on speech-network integration and segregation. Although networks were bilaterally distributed, interregional connectivity during speaking was stronger in the left vs. right hemisphere, which may have underlined a more homogeneous overlap between the examined networks in the left hemisphere. Among these, the laryngeal motor cortex (LMC) established a core network that fully overlapped with all other speech-related networks, determining the extent of network interactions. Our data demonstrate complex interactions of large-scale brain networks controlling speech production and point to the critical role of the LMC, IPL, and cerebellum in the formation of speech production network. Copyright © 2015 the American Physiological Society.

  1. Speech Motor Development in Childhood Apraxia of Speech : Generating Testable Hypotheses by Neurocomputational Modeling

    NARCIS (Netherlands)

    Terband, H.; Maassen, B.

    2010-01-01

    Childhood apraxia of speech (CAS) is a highly controversial clinical entity, with respect to both clinical signs and underlying neuromotor deficit. In the current paper, we advocate a modeling approach in which a computational neural model of speech acquisition and production is utilized in order to

  2. Speech motor development in childhood apraxia of speech: generating testable hypotheses by neurocomputational modeling.

    NARCIS (Netherlands)

    Terband, H.R.; Maassen, B.A.M.

    2010-01-01

    Childhood apraxia of speech (CAS) is a highly controversial clinical entity, with respect to both clinical signs and underlying neuromotor deficit. In the current paper, we advocate a modeling approach in which a computational neural model of speech acquisition and production is utilized in order to

  3. Neural representation of the sensorimotor speech-action-repository

    Directory of Open Access Journals (Sweden)

    Cornelia eEckers

    2013-04-01

    Full Text Available A speech-action-repository (SAR or mental syllabary has been proposed as a central module for sensorimotor processing of syllables. In this approach, syllables occurring frequently within language are assumed to be stored as holistic sensorimotor patterns, while non-frequent syllables need to be assembled from sub-syllabic units. Thus, frequent syllables are processed efficiently and quickly during production or perception by a direct activation of their sensorimotor patterns. Whereas several behavioral psycholinguistic studies provided evidence in support of the existence of a syllabary, fMRI studies have failed to demonstrate its neural reality. In the present fMRI study a reaction paradigm using homogeneous vs. heterogeneous syllable blocks are used during overt vs. covert speech production and auditory vs. visual presentation modes. Two complementary data analyses were performed: (1 in a logical conjunction, activation for syllable processing independent of input modality and response mode was assessed, in order to support the assumption of existence of a supramodal hub within a SAR. (2 In addition priming effects in the BOLD response in homogeneous vs. heterogeneous blocks were measured in order to identify brain regions, which indicate reduced activity during multiple production/perception repetitions of a specific syllable in order to determine state maps. Auditory-visual conjunction analysis revealed an activation network comprising bilateral precentral gyrus and left inferior frontal gyrus (area 44. These results are compatible with the notion of a supramodal hub within the SAR. The main effect of homogeneity priming revealed an activation pattern of areas within frontal, temporal, and parietal lobe. These findings are taken to represent sensorimotor state maps of the SAR. In conclusion, the present study provided preliminary evidence for a SAR.

  4. Electrophysiological evidence for speech-specific audiovisual integration.

    Science.gov (United States)

    Baart, Martijn; Stekelenburg, Jeroen J; Vroomen, Jean

    2014-01-01

    Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were modulated by phonetic audiovisual congruency. In order to disentangle speech-specific (phonetic) integration from non-speech integration, we used Sine-Wave Speech (SWS) that was perceived as speech by half of the participants (they were in speech-mode), while the other half was in non-speech mode. Results showed that the N1 obtained with audiovisual stimuli peaked earlier than the N1 evoked by auditory-only stimuli. This lip-read induced speeding up of the N1 occurred for listeners in speech and non-speech mode. In contrast, if listeners were in speech-mode, lip-read speech also modulated the auditory P2, but not if listeners were in non-speech mode, thus revealing speech-specific audiovisual binding. Comparing ERPs for phonetically congruent audiovisual stimuli with ERPs for incongruent stimuli revealed an effect of phonetic stimulus congruency that started at ~200 ms after (in)congruence became apparent. Critically, akin to the P2 suppression, congruency effects were only observed if listeners were in speech mode, and not if they were in non-speech mode. Using identical stimuli, we thus confirm that audiovisual binding involves (partially) different neural mechanisms for sound processing in speech and non-speech mode. © 2013 Published by Elsevier Ltd.

  5. Neurophysiological influence of musical training on speech perception.

    Science.gov (United States)

    Shahin, Antoine J

    2011-01-01

    Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL.

  6. The relationship between the neural computations for speech and music perception is context-dependent: an activation likelihood estimate study

    Science.gov (United States)

    LaCroix, Arianna N.; Diaz, Alvaro F.; Rogalsky, Corianne

    2015-01-01

    The relationship between the neurobiology of speech and music has been investigated for more than a century. There remains no widespread agreement regarding how (or to what extent) music perception utilizes the neural circuitry that is engaged in speech processing, particularly at the cortical level. Prominent models such as Patel's Shared Syntactic Integration Resource Hypothesis (SSIRH) and Koelsch's neurocognitive model of music perception suggest a high degree of overlap, particularly in the frontal lobe, but also perhaps more distinct representations in the temporal lobe with hemispheric asymmetries. The present meta-analysis study used activation likelihood estimate analyses to identify the brain regions consistently activated for music as compared to speech across the functional neuroimaging (fMRI and PET) literature. Eighty music and 91 speech neuroimaging studies of healthy adult control subjects were analyzed. Peak activations reported in the music and speech studies were divided into four paradigm categories: passive listening, discrimination tasks, error/anomaly detection tasks and memory-related tasks. We then compared activation likelihood estimates within each category for music vs. speech, and each music condition with passive listening. We found that listening to music and to speech preferentially activate distinct temporo-parietal bilateral cortical networks. We also found music and speech to have shared resources in the left pars opercularis but speech-specific resources in the left pars triangularis. The extent to which music recruited speech-activated frontal resources was modulated by task. While there are certainly limitations to meta-analysis techniques particularly regarding sensitivity, this work suggests that the extent of shared resources between speech and music may be task-dependent and highlights the need to consider how task effects may be affecting conclusions regarding the neurobiology of speech and music. PMID:26321976

  7. The relationship between the neural computations for speech and music perception is context-dependent: an activation likelihood estimate study

    Directory of Open Access Journals (Sweden)

    Arianna eLaCroix

    2015-08-01

    Full Text Available The relationship between the neurobiology of speech and music has been investigated for more than a century. There remains no widespread agreement regarding how (or to what extent music perception utilizes the neural circuitry that is engaged in speech processing, particularly at the cortical level. Prominent models such as Patel’s Shared Syntactic Integration Resource Hypothesis (SSIRH and Koelsch’s neurocognitive model of music perception suggest a high degree of overlap, particularly in the frontal lobe, but also perhaps more distinct representations in the temporal lobe with hemispheric asymmetries. The present meta-analysis study used activation likelihood estimate analyses to identify the brain regions consistently activated for music as compared to speech across the functional neuroimaging (fMRI and PET literature. Eighty music and 91 speech neuroimaging studies of healthy adult control subjects were analyzed. Peak activations reported in the music and speech studies were divided into four paradigm categories: passive listening, discrimination tasks, error/anomaly detection tasks and memory-related tasks. We then compared activation likelihood estimates within each category for music versus speech, and each music condition with passive listening. We found that listening to music and to speech preferentially activate distinct temporo-parietal bilateral cortical networks. We also found music and speech to have shared resources in the left pars opercularis but speech-specific resources in the left pars triangularis. The extent to which music recruited speech-activated frontal resources was modulated by task. While there are certainly limitations to meta-analysis techniques particularly regarding sensitivity, this work suggests that the extent of shared resources between speech and music may be task-dependent and highlights the need to consider how task effects may be affecting conclusions regarding the neurobiology of speech and music.

  8. Electrophysiological evidence for speech-specific audiovisual integration

    NARCIS (Netherlands)

    Baart, M.; Stekelenburg, J.J.; Vroomen, J.

    2014-01-01

    Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were

  9. The Functional Connectome of Speech Control.

    Directory of Open Access Journals (Sweden)

    Stefan Fuertinger

    2015-07-01

    Full Text Available In the past few years, several studies have been directed to understanding the complexity of functional interactions between different brain regions during various human behaviors. Among these, neuroimaging research installed the notion that speech and language require an orchestration of brain regions for comprehension, planning, and integration of a heard sound with a spoken word. However, these studies have been largely limited to mapping the neural correlates of separate speech elements and examining distinct cortical or subcortical circuits involved in different aspects of speech control. As a result, the complexity of the brain network machinery controlling speech and language remained largely unknown. Using graph theoretical analysis of functional MRI (fMRI data in healthy subjects, we quantified the large-scale speech network topology by constructing functional brain networks of increasing hierarchy from the resting state to motor output of meaningless syllables to complex production of real-life speech as well as compared to non-speech-related sequential finger tapping and pure tone discrimination networks. We identified a segregated network of highly connected local neural communities (hubs in the primary sensorimotor and parietal regions, which formed a commonly shared core hub network across the examined conditions, with the left area 4p playing an important role in speech network organization. These sensorimotor core hubs exhibited features of flexible hubs based on their participation in several functional domains across different networks and ability to adaptively switch long-range functional connectivity depending on task content, resulting in a distinct community structure of each examined network. Specifically, compared to other tasks, speech production was characterized by the formation of six distinct neural communities with specialized recruitment of the prefrontal cortex, insula, putamen, and thalamus, which collectively

  10. Speech entrainment enables patients with Broca’s aphasia to produce fluent speech

    Science.gov (United States)

    Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

    2012-01-01

    A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production

  11. Speech activity detection for the automated speaker recognition system of critical use

    Directory of Open Access Journals (Sweden)

    M. M. Bykov

    2017-06-01

    Full Text Available In the article, the authors developed a method for detecting speech activity for an automated system for recognizing critical use of speeches with wavelet parameterization of speech signal and classification at intervals of “language”/“pause” using a curvilinear neural network. The method of wavelet-parametrization proposed by the authors allows choosing the optimal parameters of wavelet transformation in accordance with the user-specified error of presentation of speech signal. Also, the method allows estimating the loss of information depending on the selected parameters of continuous wavelet transformation (NPP, which allowed to reduce the number of scalable coefficients of the LVP of the speech signal in order of magnitude with the allowable degree of distortion of the local spectrum of the LVP. An algorithm for detecting speech activity with a curvilinear neural network classifier is also proposed, which shows the high quality of segmentation of speech signals at intervals "language" / "pause" and is resistant to the presence in the speech signal of narrowband noise and technogenic noise due to the inherent properties of the curvilinear neural network.

  12. FOXP2 and the neuroanatomy of speech and language.

    Science.gov (United States)

    Vargha-Khadem, Faraneh; Gadian, David G; Copp, Andrew; Mishkin, Mortimer

    2005-02-01

    That speech and language are innate capacities of the human brain has long been widely accepted, but only recently has an entry point into the genetic basis of these remarkable faculties been found. The discovery of a mutation in FOXP2 in a family with a speech and language disorder has enabled neuroscientists to trace the neural expression of this gene during embryological development, track the effects of this gene mutation on brain structure and function, and so begin to decipher that part of our neural inheritance that culminates in articulate speech.

  13. Biological impact of preschool music classes on processing speech in noise.

    Science.gov (United States)

    Strait, Dana L; Parbery-Clark, Alexandra; O'Connell, Samantha; Kraus, Nina

    2013-10-01

    Musicians have increased resilience to the effects of noise on speech perception and its neural underpinnings. We do not know, however, how early in life these enhancements arise. We compared auditory brainstem responses to speech in noise in 32 preschool children, half of whom were engaged in music training. Thirteen children returned for testing one year later, permitting the first longitudinal assessment of subcortical auditory function with music training. Results indicate emerging neural enhancements in musically trained preschoolers for processing speech in noise. Longitudinal outcomes reveal that children enrolled in music classes experience further increased neural resilience to background noise following one year of continued training compared to nonmusician peers. Together, these data reveal enhanced development of neural mechanisms undergirding speech-in-noise perception in preschoolers undergoing music training and may indicate a biological impact of music training on auditory function during early childhood. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Music enrichment programs improve the neural encoding of speech in at-risk children.

    Science.gov (United States)

    Kraus, Nina; Slater, Jessica; Thompson, Elaine C; Hornickel, Jane; Strait, Dana L; Nicol, Trent; White-Schwoch, Travis

    2014-09-03

    Musicians are often reported to have enhanced neurophysiological functions, especially in the auditory system. Musical training is thought to improve nervous system function by focusing attention on meaningful acoustic cues, and these improvements in auditory processing cascade to language and cognitive skills. Correlational studies have reported musician enhancements in a variety of populations across the life span. In light of these reports, educators are considering the potential for co-curricular music programs to provide auditory-cognitive enrichment to children during critical developmental years. To date, however, no studies have evaluated biological changes following participation in existing, successful music education programs. We used a randomized control design to investigate whether community music participation induces a tangible change in auditory processing. The community music training was a longstanding and successful program that provides free music instruction to children from underserved backgrounds who stand at high risk for learning and social problems. Children who completed 2 years of music training had a stronger neurophysiological distinction of stop consonants, a neural mechanism linked to reading and language skills. One year of training was insufficient to elicit changes in nervous system function; beyond 1 year, however, greater amounts of instrumental music training were associated with larger gains in neural processing. We therefore provide the first direct evidence that community music programs enhance the neural processing of speech in at-risk children, suggesting that active and repeated engagement with sound changes neural function. Copyright © 2014 the authors 0270-6474/14/3411913-06$15.00/0.

  15. A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography.

    Science.gov (United States)

    Ozker, Muge; Schepers, Inga M; Magnotti, John F; Yoshor, Daniel; Beauchamp, Michael S

    2017-06-01

    Human speech can be comprehended using only auditory information from the talker's voice. However, comprehension is improved if the talker's face is visible, especially if the auditory information is degraded as occurs in noisy environments or with hearing loss. We explored the neural substrates of audiovisual speech perception using electrocorticography, direct recording of neural activity using electrodes implanted on the cortical surface. We observed a double dissociation in the responses to audiovisual speech with clear and noisy auditory component within the superior temporal gyrus (STG), a region long known to be important for speech perception. Anterior STG showed greater neural activity to audiovisual speech with clear auditory component, whereas posterior STG showed similar or greater neural activity to audiovisual speech in which the speech was replaced with speech-like noise. A distinct border between the two response patterns was observed, demarcated by a landmark corresponding to the posterior margin of Heschl's gyrus. To further investigate the computational roles of both regions, we considered Bayesian models of multisensory integration, which predict that combining the independent sources of information available from different modalities should reduce variability in the neural responses. We tested this prediction by measuring the variability of the neural responses to single audiovisual words. Posterior STG showed smaller variability than anterior STG during presentation of audiovisual speech with noisy auditory component. Taken together, these results suggest that posterior STG but not anterior STG is important for multisensory integration of noisy auditory and visual speech.

  16. The Neural Bases of Difficult Speech Comprehension and Speech Production: Two Activation Likelihood Estimation (ALE) Meta-Analyses

    Science.gov (United States)

    Adank, Patti

    2012-01-01

    The role of speech production mechanisms in difficult speech comprehension is the subject of on-going debate in speech science. Two Activation Likelihood Estimation (ALE) analyses were conducted on neuroimaging studies investigating difficult speech comprehension or speech production. Meta-analysis 1 included 10 studies contrasting comprehension…

  17. Effects of Time-Compressed Speech Training on Multiple Functional and Structural Neural Mechanisms Involving the Left Superior Temporal Gyrus.

    Science.gov (United States)

    Maruyama, Tsukasa; Takeuchi, Hikaru; Taki, Yasuyuki; Motoki, Kosuke; Jeong, Hyeonjeong; Kotozaki, Yuka; Nakagawa, Seishu; Nouchi, Rui; Iizuka, Kunio; Yokoyama, Ryoichi; Yamamoto, Yuki; Hanawa, Sugiko; Araki, Tsuyoshi; Sakaki, Kohei; Sasaki, Yukako; Magistro, Daniele; Kawashima, Ryuta

    2018-01-01

    Time-compressed speech is an artificial form of rapidly presented speech. Training with time-compressed speech (TCSSL) in a second language leads to adaptation toward TCSSL. Here, we newly investigated the effects of 4 weeks of training with TCSSL on diverse cognitive functions and neural systems using the fractional amplitude of spontaneous low-frequency fluctuations (fALFF), resting-state functional connectivity (RSFC) with the left superior temporal gyrus (STG), fractional anisotropy (FA), and regional gray matter volume (rGMV) of young adults by magnetic resonance imaging. There were no significant differences in change of performance of measures of cognitive functions or second language skills after training with TCSSL compared with that of the active control group. However, compared with the active control group, training with TCSSL was associated with increased fALFF, RSFC, and FA and decreased rGMV involving areas in the left STG. These results lacked evidence of a far transfer effect of time-compressed speech training on a wide range of cognitive functions and second language skills in young adults. However, these results demonstrated effects of time-compressed speech training on gray and white matter structures as well as on resting-state intrinsic activity and connectivity involving the left STG, which plays a key role in listening comprehension.

  18. Neural networks applied to discriminate botanical origin of honeys.

    Science.gov (United States)

    Anjos, Ofélia; Iglesias, Carla; Peres, Fátima; Martínez, Javier; García, Ángela; Taboada, Javier

    2015-05-15

    The aim of this work is develop a tool based on neural networks to predict the botanical origin of honeys using physical and chemical parameters. The managed database consists of 49 honey samples of 2 different classes: monofloral (almond, holm oak, sweet chestnut, eucalyptus, orange, rosemary, lavender, strawberry trees, thyme, heather, sunflower) and multifloral. The moisture content, electrical conductivity, water activity, ashes content, pH, free acidity, colorimetric coordinates in CIELAB space (L(∗), a(∗), b(∗)) and total phenols content of the honey samples were evaluated. Those properties were considered as input variables of the predictive model. The neural network is optimised through several tests with different numbers of neurons in the hidden layer and also with different input variables. The reduced error rates (5%) allow us to conclude that the botanical origin of honey can be reliably and quickly known from the colorimetric information and the electrical conductivity of honey. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. Emerging technologies with potential for objectively evaluating speech recognition skills.

    Science.gov (United States)

    Rawool, Vishakha Waman

    2016-01-01

    Work-related exposure to noise and other ototoxins can cause damage to the cochlea, synapses between the inner hair cells, the auditory nerve fibers, and higher auditory pathways, leading to difficulties in recognizing speech. Procedures designed to determine speech recognition scores (SRS) in an objective manner can be helpful in disability compensation cases where the worker claims to have poor speech perception due to exposure to noise or ototoxins. Such measures can also be helpful in determining SRS in individuals who cannot provide reliable responses to speech stimuli, including patients with Alzheimer's disease, traumatic brain injuries, and infants with and without hearing loss. Cost-effective neural monitoring hardware and software is being rapidly refined due to the high demand for neurogaming (games involving the use of brain-computer interfaces), health, and other applications. More specifically, two related advances in neuro-technology include relative ease in recording neural activity and availability of sophisticated analysing techniques. These techniques are reviewed in the current article and their applications for developing objective SRS procedures are proposed. Issues related to neuroaudioethics (ethics related to collection of neural data evoked by auditory stimuli including speech) and neurosecurity (preservation of a person's neural mechanisms and free will) are also discussed.

  20. Emergence of category-level sensitivities in non-native speech sound learning

    Directory of Open Access Journals (Sweden)

    Emily eMyers

    2014-08-01

    Full Text Available Over the course of development, speech sounds that are contrastive in one’s native language tend to become perceived categorically: that is, listeners are unaware of variation within phonetic categories while showing excellent sensitivity to speech sounds that span linguistically meaningful phonetic category boundaries. The end stage of this developmental process is that the perceptual systems that handle acoustic-phonetic information show special tuning to native language contrasts, and as such, category-level information appears to be present at even fairly low levels of the neural processing stream. Research on adults acquiring non-native speech categories offers an avenue for investigating the interplay of category-level information and perceptual sensitivities to these sounds as speech categories emerge. In particular, one can observe the neural changes that unfold as listeners learn not only to perceive acoustic distinctions that mark non-native speech sound contrasts, but also to map these distinctions onto category-level representations. An emergent literature on the neural basis of novel and non-native speech sound learning offers new insight into this question. In this review, I will examine this literature in order to answer two key questions. First, where in the neural pathway does sensitivity to category-level phonetic information first emerge over the trajectory of speech sound learning? Second, how do frontal and temporal brain areas work in concert over the course of non-native speech sound learning? Finally, in the context of this literature I will describe a model of speech sound learning in which rapidly-adapting access to categorical information in the frontal lobes modulates the sensitivity of stable, slowly-adapting responses in the temporal lobes.

  1. Developmental apraxia of speech in children. Quantitive assessment of speech characteristics

    NARCIS (Netherlands)

    Thoonen, G.H.J.

    1998-01-01

    Developmental apraxia of speech (DAS) in children is a speech disorder, supposed to have a neurological origin, which is commonly considered to result from particular deficits in speech processing (i.e., phonological planning, motor programming). However, the label DAS has often been used as

  2. Neural Segregation of Concurrent Speech: Effects of Background Noise and Reverberation on Auditory Scene Analysis in the Ventral Cochlear Nucleus.

    Science.gov (United States)

    Sayles, Mark; Stasiak, Arkadiusz; Winter, Ian M

    2016-01-01

    Concurrent complex sounds (e.g., two voices speaking at once) are perceptually disentangled into separate "auditory objects". This neural processing often occurs in the presence of acoustic-signal distortions from noise and reverberation (e.g., in a busy restaurant). A difference in periodicity between sounds is a strong segregation cue under quiet, anechoic conditions. However, noise and reverberation exert differential effects on speech intelligibility under "cocktail-party" listening conditions. Previous neurophysiological studies have concentrated on understanding auditory scene analysis under ideal listening conditions. Here, we examine the effects of noise and reverberation on periodicity-based neural segregation of concurrent vowels /a/ and /i/, in the responses of single units in the guinea-pig ventral cochlear nucleus (VCN): the first processing station of the auditory brain stem. In line with human psychoacoustic data, we find reverberation significantly impairs segregation when vowels have an intonated pitch contour, but not when they are spoken on a monotone. In contrast, noise impairs segregation independent of intonation pattern. These results are informative for models of speech processing under ecologically valid listening conditions, where noise and reverberation abound.

  3. The ctenophore genome and the evolutionary origins of neural systems

    NARCIS (Netherlands)

    Moroz, Leonid L.; Kocot, Kevin M.; Citarella, Mathew R.; Dosung, Sohn; Norekian, Tigran P.; Povolotskaya, Inna S.; Grigorenko, Anastasia P.; Dailey, Christopher; Berezikov, Eugene; Buckley, Katherine M.; Ptitsyn, Andrey; Reshetov, Denis; Mukherjee, Krishanu; Moroz, Tatiana P.; Bobkova, Yelena; Yu, Fahong; Kapitonov, Vladimir V.; Jurka, Jerzy; Bobkov, Yuri V.; Swore, Joshua J.; Girardo, David O.; Fodor, Alexander; Gusev, Fedor; Sanford, Rachel; Bruders, Rebecca; Kittler, Ellen; Mills, Claudia E.; Rast, Jonathan P.; Derelle, Romain; Solovyev, Victor V.; Kondrashov, Fyodor A.; Swalla, Billie J.; Sweedler, Jonathan V.; Rogaev, Evgeny I.; Halanych, Kenneth M.; Kohn, Andrea B.

    2014-01-01

    The origins of neural systems remain unresolved. In contrast to other basal metazoans, ctenophores (comb jellies) have both complex nervous and mesoderm-derived muscular systems. These holoplanktonic predators also have sophisticated ciliated locomotion, behaviour and distinct development. Here we

  4. Sleep Disrupts High-Level Speech Parsing Despite Significant Basic Auditory Processing.

    Science.gov (United States)

    Makov, Shiri; Sharon, Omer; Ding, Nai; Ben-Shachar, Michal; Nir, Yuval; Zion Golumbic, Elana

    2017-08-09

    The extent to which the sleeping brain processes sensory information remains unclear. This is particularly true for continuous and complex stimuli such as speech, in which information is organized into hierarchically embedded structures. Recently, novel metrics for assessing the neural representation of continuous speech have been developed using noninvasive brain recordings that have thus far only been tested during wakefulness. Here we investigated, for the first time, the sleeping brain's capacity to process continuous speech at different hierarchical levels using a newly developed Concurrent Hierarchical Tracking (CHT) approach that allows monitoring the neural representation and processing-depth of continuous speech online. Speech sequences were compiled with syllables, words, phrases, and sentences occurring at fixed time intervals such that different linguistic levels correspond to distinct frequencies. This enabled us to distinguish their neural signatures in brain activity. We compared the neural tracking of intelligible versus unintelligible (scrambled and foreign) speech across states of wakefulness and sleep using high-density EEG in humans. We found that neural tracking of stimulus acoustics was comparable across wakefulness and sleep and similar across all conditions regardless of speech intelligibility. In contrast, neural tracking of higher-order linguistic constructs (words, phrases, and sentences) was only observed for intelligible speech during wakefulness and could not be detected at all during nonrapid eye movement or rapid eye movement sleep. These results suggest that, whereas low-level auditory processing is relatively preserved during sleep, higher-level hierarchical linguistic parsing is severely disrupted, thereby revealing the capacity and limits of language processing during sleep. SIGNIFICANCE STATEMENT Despite the persistence of some sensory processing during sleep, it is unclear whether high-level cognitive processes such as speech

  5. Subcortical processing of speech regularities underlies reading and music aptitude in children

    Science.gov (United States)

    2011-01-01

    Background Neural sensitivity to acoustic regularities supports fundamental human behaviors such as hearing in noise and reading. Although the failure to encode acoustic regularities in ongoing speech has been associated with language and literacy deficits, how auditory expertise, such as the expertise that is associated with musical skill, relates to the brainstem processing of speech regularities is unknown. An association between musical skill and neural sensitivity to acoustic regularities would not be surprising given the importance of repetition and regularity in music. Here, we aimed to define relationships between the subcortical processing of speech regularities, music aptitude, and reading abilities in children with and without reading impairment. We hypothesized that, in combination with auditory cognitive abilities, neural sensitivity to regularities in ongoing speech provides a common biological mechanism underlying the development of music and reading abilities. Methods We assessed auditory working memory and attention, music aptitude, reading ability, and neural sensitivity to acoustic regularities in 42 school-aged children with a wide range of reading ability. Neural sensitivity to acoustic regularities was assessed by recording brainstem responses to the same speech sound presented in predictable and variable speech streams. Results Through correlation analyses and structural equation modeling, we reveal that music aptitude and literacy both relate to the extent of subcortical adaptation to regularities in ongoing speech as well as with auditory working memory and attention. Relationships between music and speech processing are specifically driven by performance on a musical rhythm task, underscoring the importance of rhythmic regularity for both language and music. Conclusions These data indicate common brain mechanisms underlying reading and music abilities that relate to how the nervous system responds to regularities in auditory input

  6. Subcortical processing of speech regularities underlies reading and music aptitude in children.

    Science.gov (United States)

    Strait, Dana L; Hornickel, Jane; Kraus, Nina

    2011-10-17

    Neural sensitivity to acoustic regularities supports fundamental human behaviors such as hearing in noise and reading. Although the failure to encode acoustic regularities in ongoing speech has been associated with language and literacy deficits, how auditory expertise, such as the expertise that is associated with musical skill, relates to the brainstem processing of speech regularities is unknown. An association between musical skill and neural sensitivity to acoustic regularities would not be surprising given the importance of repetition and regularity in music. Here, we aimed to define relationships between the subcortical processing of speech regularities, music aptitude, and reading abilities in children with and without reading impairment. We hypothesized that, in combination with auditory cognitive abilities, neural sensitivity to regularities in ongoing speech provides a common biological mechanism underlying the development of music and reading abilities. We assessed auditory working memory and attention, music aptitude, reading ability, and neural sensitivity to acoustic regularities in 42 school-aged children with a wide range of reading ability. Neural sensitivity to acoustic regularities was assessed by recording brainstem responses to the same speech sound presented in predictable and variable speech streams. Through correlation analyses and structural equation modeling, we reveal that music aptitude and literacy both relate to the extent of subcortical adaptation to regularities in ongoing speech as well as with auditory working memory and attention. Relationships between music and speech processing are specifically driven by performance on a musical rhythm task, underscoring the importance of rhythmic regularity for both language and music. These data indicate common brain mechanisms underlying reading and music abilities that relate to how the nervous system responds to regularities in auditory input. Definition of common biological underpinnings

  7. Subcortical processing of speech regularities underlies reading and music aptitude in children

    Directory of Open Access Journals (Sweden)

    Strait Dana L

    2011-10-01

    Full Text Available Abstract Background Neural sensitivity to acoustic regularities supports fundamental human behaviors such as hearing in noise and reading. Although the failure to encode acoustic regularities in ongoing speech has been associated with language and literacy deficits, how auditory expertise, such as the expertise that is associated with musical skill, relates to the brainstem processing of speech regularities is unknown. An association between musical skill and neural sensitivity to acoustic regularities would not be surprising given the importance of repetition and regularity in music. Here, we aimed to define relationships between the subcortical processing of speech regularities, music aptitude, and reading abilities in children with and without reading impairment. We hypothesized that, in combination with auditory cognitive abilities, neural sensitivity to regularities in ongoing speech provides a common biological mechanism underlying the development of music and reading abilities. Methods We assessed auditory working memory and attention, music aptitude, reading ability, and neural sensitivity to acoustic regularities in 42 school-aged children with a wide range of reading ability. Neural sensitivity to acoustic regularities was assessed by recording brainstem responses to the same speech sound presented in predictable and variable speech streams. Results Through correlation analyses and structural equation modeling, we reveal that music aptitude and literacy both relate to the extent of subcortical adaptation to regularities in ongoing speech as well as with auditory working memory and attention. Relationships between music and speech processing are specifically driven by performance on a musical rhythm task, underscoring the importance of rhythmic regularity for both language and music. Conclusions These data indicate common brain mechanisms underlying reading and music abilities that relate to how the nervous system responds to

  8. A wireless brain-machine interface for real-time speech synthesis.

    Directory of Open Access Journals (Sweden)

    Frank H Guenther

    2009-12-01

    Full Text Available Brain-machine interfaces (BMIs involving electrodes implanted into the human cerebral cortex have recently been developed in an attempt to restore function to profoundly paralyzed individuals. Current BMIs for restoring communication can provide important capabilities via a typing process, but unfortunately they are only capable of slow communication rates. In the current study we use a novel approach to speech restoration in which we decode continuous auditory parameters for a real-time speech synthesizer from neuronal activity in motor cortex during attempted speech.Neural signals recorded by a Neurotrophic Electrode implanted in a speech-related region of the left precentral gyrus of a human volunteer suffering from locked-in syndrome, characterized by near-total paralysis with spared cognition, were transmitted wirelessly across the scalp and used to drive a speech synthesizer. A Kalman filter-based decoder translated the neural signals generated during attempted speech into continuous parameters for controlling a synthesizer that provided immediate (within 50 ms auditory feedback of the decoded sound. Accuracy of the volunteer's vowel productions with the synthesizer improved quickly with practice, with a 25% improvement in average hit rate (from 45% to 70% and 46% decrease in average endpoint error from the first to the last block of a three-vowel task.Our results support the feasibility of neural prostheses that may have the potential to provide near-conversational synthetic speech output for individuals with severely impaired speech motor control. They also provide an initial glimpse into the functional properties of neurons in speech motor cortical areas.

  9. Learning to Produce Syllabic Speech Sounds via Reward-Modulated Neural Plasticity

    Science.gov (United States)

    Warlaumont, Anne S.; Finnegan, Megan K.

    2016-01-01

    At around 7 months of age, human infants begin to reliably produce well-formed syllables containing both consonants and vowels, a behavior called canonical babbling. Over subsequent months, the frequency of canonical babbling continues to increase. How the infant’s nervous system supports the acquisition of this ability is unknown. Here we present a computational model that combines a spiking neural network, reinforcement-modulated spike-timing-dependent plasticity, and a human-like vocal tract to simulate the acquisition of canonical babbling. Like human infants, the model’s frequency of canonical babbling gradually increases. The model is rewarded when it produces a sound that is more auditorily salient than sounds it has previously produced. This is consistent with data from human infants indicating that contingent adult responses shape infant behavior and with data from deaf and tracheostomized infants indicating that hearing, including hearing one’s own vocalizations, is critical for canonical babbling development. Reward receipt increases the level of dopamine in the neural network. The neural network contains a reservoir with recurrent connections and two motor neuron groups, one agonist and one antagonist, which control the masseter and orbicularis oris muscles, promoting or inhibiting mouth closure. The model learns to increase the number of salient, syllabic sounds it produces by adjusting the base level of muscle activation and increasing their range of activity. Our results support the possibility that through dopamine-modulated spike-timing-dependent plasticity, the motor cortex learns to harness its natural oscillations in activity in order to produce syllabic sounds. It thus suggests that learning to produce rhythmic mouth movements for speech production may be supported by general cortical learning mechanisms. The model makes several testable predictions and has implications for our understanding not only of how syllabic vocalizations develop

  10. Hemispheric asymmetries in speech perception: sense, nonsense and modulations.

    Directory of Open Access Journals (Sweden)

    Stuart Rosen

    Full Text Available The well-established left hemisphere specialisation for language processing has long been claimed to be based on a low-level auditory specialization for specific acoustic features in speech, particularly regarding 'rapid temporal processing'.A novel analysis/synthesis technique was used to construct a variety of sounds based on simple sentences which could be manipulated in spectro-temporal complexity, and whether they were intelligible or not. All sounds consisted of two noise-excited spectral prominences (based on the lower two formants in the original speech which could be static or varying in frequency and/or amplitude independently. Dynamically varying both acoustic features based on the same sentence led to intelligible speech but when either or both acoustic features were static, the stimuli were not intelligible. Using the frequency dynamics from one sentence with the amplitude dynamics of another led to unintelligible sounds of comparable spectro-temporal complexity to the intelligible ones. Positron emission tomography (PET was used to compare which brain regions were active when participants listened to the different sounds.Neural activity to spectral and amplitude modulations sufficient to support speech intelligibility (without actually being intelligible was seen bilaterally, with a right temporal lobe dominance. A left dominant response was seen only to intelligible sounds. It thus appears that the left hemisphere specialisation for speech is based on the linguistic properties of utterances, not on particular acoustic features.

  11. Speech perception in autism spectrum disorder: An activation likelihood estimation meta-analysis.

    Science.gov (United States)

    Tryfon, Ana; Foster, Nicholas E V; Sharda, Megha; Hyde, Krista L

    2018-02-15

    Autism spectrum disorder (ASD) is often characterized by atypical language profiles and auditory and speech processing. These can contribute to aberrant language and social communication skills in ASD. The study of the neural basis of speech perception in ASD can serve as a potential neurobiological marker of ASD early on, but mixed results across studies renders it difficult to find a reliable neural characterization of speech processing in ASD. To this aim, the present study examined the functional neural basis of speech perception in ASD versus typical development (TD) using an activation likelihood estimation (ALE) meta-analysis of 18 qualifying studies. The present study included separate analyses for TD and ASD, which allowed us to examine patterns of within-group brain activation as well as both common and distinct patterns of brain activation across the ASD and TD groups. Overall, ASD and TD showed mostly common brain activation of speech processing in bilateral superior temporal gyrus (STG) and left inferior frontal gyrus (IFG). However, the results revealed trends for some distinct activation in the TD group showing additional activation in higher-order brain areas including left superior frontal gyrus (SFG), left medial frontal gyrus (MFG), and right IFG. These results provide a more reliable neural characterization of speech processing in ASD relative to previous single neuroimaging studies and motivate future work to investigate how these brain signatures relate to behavioral measures of speech processing in ASD. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Mapping the brain's orchestration during speech comprehension: task-specific facilitation of regional synchrony in neural networks

    Directory of Open Access Journals (Sweden)

    Keil Andreas

    2004-10-01

    Full Text Available Abstract Background How does the brain convert sounds and phonemes into comprehensible speech? In the present magnetoencephalographic study we examined the hypothesis that the coherence of electromagnetic oscillatory activity within and across brain areas indicates neurophysiological processes linked to speech comprehension. Results Amplitude-modulated (sinusoidal 41.5 Hz auditory verbal and nonverbal stimuli served to drive steady-state oscillations in neural networks involved in speech comprehension. Stimuli were presented to 12 subjects in the following conditions (a an incomprehensible string of words, (b the same string of words after being introduced as a comprehensible sentence by proper articulation, and (c nonverbal stimulations that included a 600-Hz tone, a scale, and a melody. Coherence, defined as correlated activation of magnetic steady state fields across brain areas and measured as simultaneous activation of current dipoles in source space (Minimum-Norm-Estimates, increased within left- temporal-posterior areas when the sound string was perceived as a comprehensible sentence. Intra-hemispheric coherence was larger within the left than the right hemisphere for the sentence (condition (b relative to all other conditions, and tended to be larger within the right than the left hemisphere for nonverbal stimuli (condition (c, tone and melody relative to the other conditions, leading to a more pronounced hemispheric asymmetry for nonverbal than verbal material. Conclusions We conclude that coherent neuronal network activity may index encoding of verbal information on the sentence level and can be used as a tool to investigate auditory speech comprehension.

  13. What are artificial neural networks?

    DEFF Research Database (Denmark)

    Krogh, Anders

    2008-01-01

    Artificial neural networks have been applied to problems ranging from speech recognition to prediction of protein secondary structure, classification of cancers and gene prediction. How do they work and what might they be good for? Udgivelsesdato: 2008-Feb......Artificial neural networks have been applied to problems ranging from speech recognition to prediction of protein secondary structure, classification of cancers and gene prediction. How do they work and what might they be good for? Udgivelsesdato: 2008-Feb...

  14. Low-Frequency Cortical Entrainment to Speech Reflects Phoneme-Level Processing.

    Science.gov (United States)

    Di Liberto, Giovanni M; O'Sullivan, James A; Lalor, Edmund C

    2015-10-05

    The human ability to understand speech is underpinned by a hierarchical auditory system whose successive stages process increasingly complex attributes of the acoustic input. It has been suggested that to produce categorical speech perception, this system must elicit consistent neural responses to speech tokens (e.g., phonemes) despite variations in their acoustics. Here, using electroencephalography (EEG), we provide evidence for this categorical phoneme-level speech processing by showing that the relationship between continuous speech and neural activity is best described when that speech is represented using both low-level spectrotemporal information and categorical labeling of phonetic features. Furthermore, the mapping between phonemes and EEG becomes more discriminative for phonetic features at longer latencies, in line with what one might expect from a hierarchical system. Importantly, these effects are not seen for time-reversed speech. These findings may form the basis for future research on natural language processing in specific cohorts of interest and for broader insights into how brains transform acoustic input into meaning. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Neural correlates of quality perception for complex speech signals

    CERN Document Server

    Antons, Jan-Niklas

    2015-01-01

    This book interconnects two essential disciplines to study the perception of speech: Neuroscience and Quality of Experience, which to date have rarely been used together for the purposes of research on speech quality perception. In five key experiments, the book demonstrates the application of standard clinical methods in neurophysiology on the one hand, and of methods used in fields of research concerned with speech quality perception on the other. Using this combination, the book shows that speech stimuli with different lengths and different quality impairments are accompanied by physiological reactions related to quality variations, e.g., a positive peak in an event-related potential. Furthermore, it demonstrates that – in most cases – quality impairment intensity has an impact on the intensity of physiological reactions.

  16. Differential modulation of auditory responses to attended and unattended speech in different listening conditions.

    Science.gov (United States)

    Kong, Ying-Yee; Mullangi, Ala; Ding, Nai

    2014-10-01

    This study investigates how top-down attention modulates neural tracking of the speech envelope in different listening conditions. In the quiet conditions, a single speech stream was presented and the subjects paid attention to the speech stream (active listening) or watched a silent movie instead (passive listening). In the competing speaker (CS) conditions, two speakers of opposite genders were presented diotically. Ongoing electroencephalographic (EEG) responses were measured in each condition and cross-correlated with the speech envelope of each speaker at different time lags. In quiet, active and passive listening resulted in similar neural responses to the speech envelope. In the CS conditions, however, the shape of the cross-correlation function was remarkably different between the attended and unattended speech. The cross-correlation with the attended speech showed stronger N1 and P2 responses but a weaker P1 response compared to the cross-correlation with the unattended speech. Furthermore, the N1 response to the attended speech in the CS condition was enhanced and delayed compared with the active listening condition in quiet, while the P2 response to the unattended speaker in the CS condition was attenuated compared with the passive listening in quiet. Taken together, these results demonstrate that top-down attention differentially modulates envelope-tracking neural activity at different time lags and suggest that top-down attention can both enhance the neural responses to the attended sound stream and suppress the responses to the unattended sound stream. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Auditory-neurophysiological responses to speech during early childhood: Effects of background noise.

    Science.gov (United States)

    White-Schwoch, Travis; Davies, Evan C; Thompson, Elaine C; Woodruff Carr, Kali; Nicol, Trent; Bradlow, Ann R; Kraus, Nina

    2015-10-01

    Early childhood is a critical period of auditory learning, during which children are constantly mapping sounds to meaning. But this auditory learning rarely occurs in ideal listening conditions-children are forced to listen against a relentless din. This background noise degrades the neural coding of these critical sounds, in turn interfering with auditory learning. Despite the importance of robust and reliable auditory processing during early childhood, little is known about the neurophysiology underlying speech processing in children so young. To better understand the physiological constraints these adverse listening scenarios impose on speech sound coding during early childhood, auditory-neurophysiological responses were elicited to a consonant-vowel syllable in quiet and background noise in a cohort of typically-developing preschoolers (ages 3-5 yr). Overall, responses were degraded in noise: they were smaller, less stable across trials, slower, and there was poorer coding of spectral content and the temporal envelope. These effects were exacerbated in response to the consonant transition relative to the vowel, suggesting that the neural coding of spectrotemporally-dynamic speech features is more tenuous in noise than the coding of static features-even in children this young. Neural coding of speech temporal fine structure, however, was more resilient to the addition of background noise than coding of temporal envelope information. Taken together, these results demonstrate that noise places a neurophysiological constraint on speech processing during early childhood by causing a breakdown in neural processing of speech acoustics. These results may explain why some listeners have inordinate difficulties understanding speech in noise. Speech-elicited auditory-neurophysiological responses offer objective insight into listening skills during early childhood by reflecting the integrity of neural coding in quiet and noise; this paper documents typical response

  18. Comment on "Monkey vocal tracts are speech-ready".

    Science.gov (United States)

    Lieberman, Philip

    2017-07-01

    Monkey vocal tracts are capable of producing monkey speech, not the full range of articulate human speech. The evolution of human speech entailed both anatomy and brains. Fitch, de Boer, Mathur, and Ghazanfar in Science Advances claim that "monkey vocal tracts are speech-ready," and conclude that "…the evolution of human speech capabilities required neural change rather than modifications of vocal anatomy." Neither premise is consistent either with the data presented and the conclusions reached by de Boer and Fitch themselves in their own published papers on the role of anatomy in the evolution of human speech or with the body of independent studies published since the 1950s.

  19. Beyond production: Brain responses during speech perception in adults who stutter

    Directory of Open Access Journals (Sweden)

    Tali Halag-Milo

    2016-01-01

    Full Text Available Developmental stuttering is a speech disorder that disrupts the ability to produce speech fluently. While stuttering is typically diagnosed based on one's behavior during speech production, some models suggest that it involves more central representations of language, and thus may affect language perception as well. Here we tested the hypothesis that developmental stuttering implicates neural systems involved in language perception, in a task that manipulates comprehensibility without an overt speech production component. We used functional magnetic resonance imaging to measure blood oxygenation level dependent (BOLD signals in adults who do and do not stutter, while they were engaged in an incidental speech perception task. We found that speech perception evokes stronger activation in adults who stutter (AWS compared to controls, specifically in the right inferior frontal gyrus (RIFG and in left Heschl's gyrus (LHG. Significant differences were additionally found in the lateralization of response in the inferior frontal cortex: AWS showed bilateral inferior frontal activity, while controls showed a left lateralized pattern of activation. These findings suggest that developmental stuttering is associated with an imbalanced neural network for speech processing, which is not limited to speech production, but also affects cortical responses during speech perception.

  20. Use of amplitude modulation cues recovered from frequency modulation for cochlear implant users when original speech cues are severely degraded.

    Science.gov (United States)

    Won, Jong Ho; Shim, Hyun Joon; Lorenzi, Christian; Rubinstein, Jay T

    2014-06-01

    Won et al. (J Acoust Soc Am 132:1113-1119, 2012) reported that cochlear implant (CI) speech processors generate amplitude-modulation (AM) cues recovered from broadband speech frequency modulation (FM) and that CI users can use these cues for speech identification in quiet. The present study was designed to extend this finding for a wide range of listening conditions, where the original speech cues were severely degraded by manipulating either the acoustic signals or the speech processor. The manipulation of the acoustic signals included the presentation of background noise, simulation of reverberation, and amplitude compression. The manipulation of the speech processor included changing the input dynamic range and the number of channels. For each of these conditions, multiple levels of speech degradation were tested. Speech identification was measured for CI users and compared for stimuli having both AM and FM information (intact condition) or FM information only (FM condition). Each manipulation degraded speech identification performance for both intact and FM conditions. Performance for the intact and FM conditions became similar for stimuli having the most severe degradations. Identification performance generally overlapped for the intact and FM conditions. Moreover, identification performance for the FM condition was better than chance performance even at the maximum level of distortion. Finally, significant correlations were found between speech identification scores for the intact and FM conditions. Altogether, these results suggest that despite poor frequency selectivity, CI users can make efficient use of AM cues recovered from speech FM in difficult listening situations.

  1. Histogram equalization with Bayesian estimation for noise robust speech recognition.

    Science.gov (United States)

    Suh, Youngjoo; Kim, Hoirin

    2018-02-01

    The histogram equalization approach is an efficient feature normalization technique for noise robust automatic speech recognition. However, it suffers from performance degradation when some fundamental conditions are not satisfied in the test environment. To remedy these limitations of the original histogram equalization methods, class-based histogram equalization approach has been proposed. Although this approach showed substantial performance improvement under noise environments, it still suffers from performance degradation due to the overfitting problem when test data are insufficient. To address this issue, the proposed histogram equalization technique employs the Bayesian estimation method in the test cumulative distribution function estimation. It was reported in a previous study conducted on the Aurora-4 task that the proposed approach provided substantial performance gains in speech recognition systems based on the acoustic modeling of the Gaussian mixture model-hidden Markov model. In this work, the proposed approach was examined in speech recognition systems with deep neural network-hidden Markov model (DNN-HMM), the current mainstream speech recognition approach where it also showed meaningful performance improvement over the conventional maximum likelihood estimation-based method. The fusion of the proposed features with the mel-frequency cepstral coefficients provided additional performance gains in DNN-HMM systems, which otherwise suffer from performance degradation in the clean test condition.

  2. Development of a System for Automatic Recognition of Speech

    Directory of Open Access Journals (Sweden)

    Roman Jarina

    2003-01-01

    Full Text Available The article gives a review of a research on processing and automatic recognition of speech signals (ARR at the Department of Telecommunications of the Faculty of Electrical Engineering, University of iilina. On-going research is oriented to speech parametrization using 2-dimensional cepstral analysis, and to an application of HMMs and neural networks for speech recognition in Slovak language. The article summarizes achieved results and outlines future orientation of our research in automatic speech recognition.

  3. Upregulation of cognitive control networks in older adults’ speech comprehension

    Directory of Open Access Journals (Sweden)

    Julia eErb

    2013-12-01

    Full Text Available Speech comprehension abilities decline with age and with age-related hearing loss, but it is unclear how this decline expresses in terms of central neural mechanisms. The current study examined neural speech processing in a group of older adults (aged 56–77, n=16, with varying degrees of sensorineural hearing loss, and compared them to a cohort of young adults (aged 22–31, n=30, self-reported normal hearing. In an fMRI experiment, listeners heard and repeated back degraded sentences (4-band vocoding, which preserves the temporal envelope of the acoustic signal, while substantially degrading spectral information. Behaviourally, older adults adapted to degraded speech at the same rate as young listeners, although their overall comprehension of degraded speech was lower. Neurally, both older and young adults relied on the left anterior insula for degraded more than clear speech perception. However, anterior insula engagement in older adults was dependent on hearing acuity. Young adults additionally employed the anterior cingulate cortex (ACC. Interestingly, this age group × degradation interaction was driven by a reduced dynamic range in older adults, who displayed elevated levels of ACC activity in both conditions, consistent with a persistent upregulation in cognitive control irrespective of task difficulty. For correct speech comprehension, older adults recruited the middle frontal gyrus in addition to a core speech comprehension network on which young adults relied, suggestive of a compensatory mechanism. Taken together, the results indicate that older adults increasingly recruit cognitive control networks, even under optimal listening conditions, at the expense of these systems’ dynamic range.

  4. Neural indices of phonemic discrimination and sentence-level speech intelligibility in quiet and noise: A P3 study.

    Science.gov (United States)

    Koerner, Tess K; Zhang, Yang; Nelson, Peggy B; Wang, Boxiang; Zou, Hui

    2017-07-01

    This study examined how speech babble noise differentially affected the auditory P3 responses and the associated neural oscillatory activities for consonant and vowel discrimination in relation to segmental- and sentence-level speech perception in noise. The data were collected from 16 normal-hearing participants in a double-oddball paradigm that contained a consonant (/ba/ to /da/) and vowel (/ba/ to /bu/) change in quiet and noise (speech-babble background at a -3 dB signal-to-noise ratio) conditions. Time-frequency analysis was applied to obtain inter-trial phase coherence (ITPC) and event-related spectral perturbation (ERSP) measures in delta, theta, and alpha frequency bands for the P3 response. Behavioral measures included percent correct phoneme detection and reaction time as well as percent correct IEEE sentence recognition in quiet and in noise. Linear mixed-effects models were applied to determine possible brain-behavior correlates. A significant noise-induced reduction in P3 amplitude was found, accompanied by significantly longer P3 latency and decreases in ITPC across all frequency bands of interest. There was a differential effect of noise on consonant discrimination and vowel discrimination in both ERP and behavioral measures, such that noise impacted the detection of the consonant change more than the vowel change. The P3 amplitude and some of the ITPC and ERSP measures were significant predictors of speech perception at segmental- and sentence-levels across listening conditions and stimuli. These data demonstrate that the P3 response with its associated cortical oscillations represents a potential neurophysiological marker for speech perception in noise. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Audiovisual integration for speech during mid-childhood: Electrophysiological evidence

    Science.gov (United States)

    Kaganovich, Natalya; Schumaker, Jennifer

    2014-01-01

    Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7–8-year-olds and 10–11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception. PMID:25463815

  6. Alternative Speech Communication System for Persons with Severe Speech Disorders

    Science.gov (United States)

    Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

    2009-12-01

    Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.

  7. Classifying laughter and speech using audio-visual feature prediction

    NARCIS (Netherlands)

    Petridis, Stavros; Asghar, Ali; Pantic, Maja

    2010-01-01

    In this study, a system that discriminates laughter from speech by modelling the relationship between audio and visual features is presented. The underlying assumption is that this relationship is different between speech and laughter. Neural networks are trained which learn the audio-to-visual and

  8. Use of a Deep Recurrent Neural Network to Reduce Wind Noise: Effects on Judged Speech Intelligibility and Sound Quality.

    Science.gov (United States)

    Keshavarzi, Mahmoud; Goehring, Tobias; Zakis, Justin; Turner, Richard E; Moore, Brian C J

    2018-01-01

    Despite great advances in hearing-aid technology, users still experience problems with noise in windy environments. The potential benefits of using a deep recurrent neural network (RNN) for reducing wind noise were assessed. The RNN was trained using recordings of the output of the two microphones of a behind-the-ear hearing aid in response to male and female speech at various azimuths in the presence of noise produced by wind from various azimuths with a velocity of 3 m/s, using the "clean" speech as a reference. A paired-comparison procedure was used to compare all possible combinations of three conditions for subjective intelligibility and for sound quality or comfort. The conditions were unprocessed noisy speech, noisy speech processed using the RNN, and noisy speech that was high-pass filtered (which also reduced wind noise). Eighteen native English-speaking participants were tested, nine with normal hearing and nine with mild-to-moderate hearing impairment. Frequency-dependent linear amplification was provided for the latter. Processing using the RNN was significantly preferred over no processing by both subject groups for both subjective intelligibility and sound quality, although the magnitude of the preferences was small. High-pass filtering (HPF) was not significantly preferred over no processing. Although RNN was significantly preferred over HPF only for sound quality for the hearing-impaired participants, for the results as a whole, there was a preference for RNN over HPF. Overall, the results suggest that reduction of wind noise using an RNN is possible and might have beneficial effects when used in hearing aids.

  9. Use of a Deep Recurrent Neural Network to Reduce Wind Noise: Effects on Judged Speech Intelligibility and Sound Quality

    Science.gov (United States)

    Keshavarzi, Mahmoud; Goehring, Tobias; Zakis, Justin; Turner, Richard E.; Moore, Brian C. J.

    2018-01-01

    Despite great advances in hearing-aid technology, users still experience problems with noise in windy environments. The potential benefits of using a deep recurrent neural network (RNN) for reducing wind noise were assessed. The RNN was trained using recordings of the output of the two microphones of a behind-the-ear hearing aid in response to male and female speech at various azimuths in the presence of noise produced by wind from various azimuths with a velocity of 3 m/s, using the “clean” speech as a reference. A paired-comparison procedure was used to compare all possible combinations of three conditions for subjective intelligibility and for sound quality or comfort. The conditions were unprocessed noisy speech, noisy speech processed using the RNN, and noisy speech that was high-pass filtered (which also reduced wind noise). Eighteen native English-speaking participants were tested, nine with normal hearing and nine with mild-to-moderate hearing impairment. Frequency-dependent linear amplification was provided for the latter. Processing using the RNN was significantly preferred over no processing by both subject groups for both subjective intelligibility and sound quality, although the magnitude of the preferences was small. High-pass filtering (HPF) was not significantly preferred over no processing. Although RNN was significantly preferred over HPF only for sound quality for the hearing-impaired participants, for the results as a whole, there was a preference for RNN over HPF. Overall, the results suggest that reduction of wind noise using an RNN is possible and might have beneficial effects when used in hearing aids. PMID:29708061

  10. Biological impact of preschool music classes on processing speech in noise

    OpenAIRE

    Strait, Dana L.; Parbery-Clark, Alexandra; O’Connell, Samantha; Kraus, Nina

    2013-01-01

    Musicians have increased resilience to the effects of noise on speech perception and its neural underpinnings. We do not know, however, how early in life these enhancements arise. We compared auditory brainstem responses to speech in noise in 32 preschool children, half of whom were engaged in music training. Thirteen children returned for testing one year later, permitting the first longitudinal assessment of subcortical auditory function with music training. Results indicate emerging neural...

  11. A longitudinal study investigating neural processing of speech envelope modulation rates in children with (a family risk for) dyslexia.

    Science.gov (United States)

    De Vos, Astrid; Vanvooren, Sophie; Vanderauwera, Jolijn; Ghesquière, Pol; Wouters, Jan

    2017-08-01

    Recent evidence suggests that a fundamental deficit in the synchronization of neural oscillations to temporal information in speech may underlie phonological processing problems in dyslexia. Since previous studies were performed cross-sectionally in school-aged children or adults, developmental aspects of neural auditory processing in relation to reading acquisition and dyslexia remain to be investigated. The present longitudinal study followed 68 children during development from pre-reader (5 years old) to beginning reader (7 years old) and more advanced reader (9 years old). Thirty-six children had a family risk for dyslexia and 14 children eventually developed dyslexia. EEG recordings of auditory steady-state responses to 4 and 20 Hz modulations, corresponding to syllable and phoneme rates, were collected at each point in time. Our results demonstrate an increase in neural synchronization to phoneme-rate modulations around the onset of reading acquisition. This effect was negatively correlated with later reading and phonological skills, indicating that children who exhibit the largest increase in neural synchronization to phoneme rates, develop the poorest reading and phonological skills. Accordingly, neural synchronization to phoneme-rate modulations was found to be significantly higher in beginning and more advanced readers with dyslexia. We found no developmental effects regarding neural synchronization to syllable rates, nor any effects of a family risk for dyslexia. Altogether, our findings suggest that the onset of reading instruction coincides with an increase in neural responsiveness to phoneme-rate modulations, and that the extent of this increase is related to (the outcome of) reading development. Hereby, dyslexic children persistently demonstrate atypically high neural synchronization to phoneme rates from the beginning of reading acquisition onwards. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Robust digital processing of speech signals

    CERN Document Server

    Kovacevic, Branko; Veinović, Mladen; Marković, Milan

    2017-01-01

    This book focuses on speech signal phenomena, presenting a robustification of the usual speech generation models with regard to the presumed types of excitation signals, which is equivalent to the introduction of a class of nonlinear models and the corresponding criterion functions for parameter estimation. Compared to the general class of nonlinear models, such as various neural networks, these models possess good properties of controlled complexity, the option of working in “online” mode, as well as a low information volume for efficient speech encoding and transmission. Providing comprehensive insights, the book is based on the authors’ research, which has already been published, supplemented by additional texts discussing general considerations of speech modeling, linear predictive analysis and robust parameter estimation.

  13. Temporal modulations in speech and music.

    Science.gov (United States)

    Ding, Nai; Patel, Aniruddh D; Chen, Lin; Butler, Henry; Luo, Cheng; Poeppel, David

    2017-10-01

    Speech and music have structured rhythms. Here we discuss a major acoustic correlate of spoken and musical rhythms, the slow (0.25-32Hz) temporal modulations in sound intensity and compare the modulation properties of speech and music. We analyze these modulations using over 25h of speech and over 39h of recordings of Western music. We show that the speech modulation spectrum is highly consistent across 9 languages (including languages with typologically different rhythmic characteristics). A different, but similarly consistent modulation spectrum is observed for music, including classical music played by single instruments of different types, symphonic, jazz, and rock. The temporal modulations of speech and music show broad but well-separated peaks around 5 and 2Hz, respectively. These acoustically dominant time scales may be intrinsic features of speech and music, a possibility which should be investigated using more culturally diverse samples in each domain. Distinct modulation timescales for speech and music could facilitate their perceptual analysis and its neural processing. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Neural networks supporting audiovisual integration for speech: A large-scale lesion study.

    Science.gov (United States)

    Hickok, Gregory; Rogalsky, Corianne; Matchin, William; Basilakos, Alexandra; Cai, Julia; Pillay, Sara; Ferrill, Michelle; Mickelsen, Soren; Anderson, Steven W; Love, Tracy; Binder, Jeffrey; Fridriksson, Julius

    2018-06-01

    Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration. Copyright © 2018 Elsevier Ltd. All rights reserved.

  15. [The Identification of the Origin of Chinese Wolfberry Based on Infrared Spectral Technology and the Artificial Neural Network].

    Science.gov (United States)

    Li, Zhong; Liu, Ming-de; Ji, Shou-xiang

    2016-03-01

    The Fourier Transform Infrared Spectroscopy (FTIR) is established to find the geographic origins of Chinese wolfberry quickly. In the paper, the 45 samples of Chinese wolfberry from different places of Qinghai Province are to be surveyed by FTIR. The original data matrix of FTIR is pretreated with common preprocessing and wavelet transform. Compared with common windows shifting smoothing preprocessing, standard normal variation correction and multiplicative scatter correction, wavelet transform is an effective spectrum data preprocessing method. Before establishing model through the artificial neural networks, the spectra variables are compressed by means of the wavelet transformation so as to enhance the training speed of the artificial neural networks, and at the same time the related parameters of the artificial neural networks model are also discussed in detail. The survey shows even if the infrared spectroscopy data is compressed to 1/8 of its original data, the spectral information and analytical accuracy are not deteriorated. The compressed spectra variables are used for modeling parameters of the backpropagation artificial neural network (BP-ANN) model and the geographic origins of Chinese wolfberry are used for parameters of export. Three layers of neural network model are built to predict the 10 unknown samples by using the MATLAB neural network toolbox design error back propagation network. The number of hidden layer neurons is 5, and the number of output layer neuron is 1. The transfer function of hidden layer is tansig, while the transfer function of output layer is purelin. Network training function is trainl and the learning function of weights and thresholds is learngdm. net. trainParam. epochs=1 000, while net. trainParam. goal = 0.001. The recognition rate of 100% is to be achieved. It can be concluded that the method is quite suitable for the quick discrimination of producing areas of Chinese wolfberry. The infrared spectral analysis technology

  16. Behavioural, computational, and neuroimaging studies of acquired apraxia of speech

    Directory of Open Access Journals (Sweden)

    Kirrie J Ballard

    2014-11-01

    Full Text Available A critical examination of speech motor control depends on an in-depth understanding of network connectivity associated with Brodmann areas 44 and 45 and surrounding cortices. Damage to these areas has been associated with two conditions - the speech motor programming disorder apraxia of speech (AOS and the linguistic / grammatical disorder of Broca’s aphasia. Here we focus on AOS, which is most commonly associated with damage to posterior Broca's area and adjacent cortex. We provide an overview of our own studies into the nature of AOS, including behavioral and neuroimaging methods, to explore components of the speech motor network that are associated with normal and disordered speech motor programming in AOS. Behavioral, neuroimaging, and computational modeling studies are indicating that AOS is associated with impairment in learning feedforward models and/or implementing feedback mechanisms and with the functional contribution of BA6. While functional connectivity methods are not yet routinely applied to the study of AOS, we highlight the need for focusing on the functional impact of localised lesions throughout the speech network, as well as larger scale comparative studies to distinguish the unique behavioral and neurological signature of AOS. By coupling these methods with neural network models, we have a powerful set of tools to improve our understanding of the neural mechanisms that underlie AOS, and speech production generally.

  17. Detecting self-produced speech errors before and after articulation: An ERP investigation

    Directory of Open Access Journals (Sweden)

    Kevin Michael Trewartha

    2013-11-01

    Full Text Available It has been argued that speech production errors are monitored by the same neural system involved in monitoring other types of action errors. Behavioral evidence has shown that speech errors can be detected and corrected prior to articulation, yet the neural basis for such pre-articulatory speech error monitoring is poorly understood. The current study investigated speech error monitoring using a phoneme-substitution task known to elicit speech errors. Stimulus-locked event-related potential (ERP analyses comparing correct and incorrect utterances were used to assess pre-articulatory error monitoring and response-locked ERP analyses were used to assess post-articulatory monitoring. Our novel finding in the stimulus-locked analysis revealed that words that ultimately led to a speech error were associated with a larger P2 component at midline sites (FCz, Cz, and CPz. This early positivity may reflect the detection of an error in speech formulation, or a predictive mechanism to signal the potential for an upcoming speech error. The data also revealed that general conflict monitoring mechanisms are involved during this task as both correct and incorrect responses elicited an anterior N2 component typically associated with conflict monitoring. The response-locked analyses corroborated previous observations that self-produced speech errors led to a fronto-central ERN. These results demonstrate that speech errors can be detected prior to articulation, and that speech error monitoring relies on a central error monitoring mechanism.

  18. The natural statistics of audiovisual speech.

    Directory of Open Access Journals (Sweden)

    Chandramouli Chandrasekaran

    2009-07-01

    Full Text Available Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2-7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver.

  19. Imitation and speech: commonalities within Broca's area.

    Science.gov (United States)

    Kühn, Simone; Brass, Marcel; Gallinat, Jürgen

    2013-11-01

    The so-called embodiment of communication has attracted considerable interest. Recently a growing number of studies have proposed a link between Broca's area's involvement in action processing and its involvement in speech. The present quantitative meta-analysis set out to test whether neuroimaging studies on imitation and overt speech show overlap within inferior frontal gyrus. By means of activation likelihood estimation (ALE), we investigated concurrence of brain regions activated by object-free hand imitation studies as well as overt speech studies including simple syllable and more complex word production. We found direct overlap between imitation and speech in bilateral pars opercularis (BA 44) within Broca's area. Subtraction analyses revealed no unique localization neither for speech nor for imitation. To verify the potential of ALE subtraction analysis to detect unique involvement within Broca's area, we contrasted the results of a meta-analysis on motor inhibition and imitation and found separable regions involved for imitation. This is the first meta-analysis to compare the neural correlates of imitation and overt speech. The results are in line with the proposed evolutionary roots of speech in imitation.

  20. Knockdown of Dyslexia-Gene Dcdc2 Interferes with Speech Sound Discrimination in Continuous Streams.

    Science.gov (United States)

    Centanni, Tracy Michelle; Booker, Anne B; Chen, Fuyi; Sloan, Andrew M; Carraway, Ryan S; Rennaker, Robert L; LoTurco, Joseph J; Kilgard, Michael P

    2016-04-27

    Dyslexia is the most common developmental language disorder and is marked by deficits in reading and phonological awareness. One theory of dyslexia suggests that the phonological awareness deficit is due to abnormal auditory processing of speech sounds. Variants in DCDC2 and several other neural migration genes are associated with dyslexia and may contribute to auditory processing deficits. In the current study, we tested the hypothesis that RNAi suppression of Dcdc2 in rats causes abnormal cortical responses to sound and impaired speech sound discrimination. In the current study, rats were subjected in utero to RNA interference targeting of the gene Dcdc2 or a scrambled sequence. Primary auditory cortex (A1) responses were acquired from 11 rats (5 with Dcdc2 RNAi; DC-) before any behavioral training. A separate group of 8 rats (3 DC-) were trained on a variety of speech sound discrimination tasks, and auditory cortex responses were acquired following training. Dcdc2 RNAi nearly eliminated the ability of rats to identify specific speech sounds from a continuous train of speech sounds but did not impair performance during discrimination of isolated speech sounds. The neural responses to speech sounds in A1 were not degraded as a function of presentation rate before training. These results suggest that A1 is not directly involved in the impaired speech discrimination caused by Dcdc2 RNAi. This result contrasts earlier results using Kiaa0319 RNAi and suggests that different dyslexia genes may cause different deficits in the speech processing circuitry, which may explain differential responses to therapy. Although dyslexia is diagnosed through reading difficulty, there is a great deal of variation in the phenotypes of these individuals. The underlying neural and genetic mechanisms causing these differences are still widely debated. In the current study, we demonstrate that suppression of a candidate-dyslexia gene causes deficits on tasks of rapid stimulus processing

  1. Sensorimotor oscillations prior to speech onset reflect altered motor networks in adults who stutter

    Directory of Open Access Journals (Sweden)

    Anna-Maria Mersov

    2016-09-01

    Full Text Available Adults who stutter (AWS have demonstrated atypical coordination of motor and sensory regions during speech production. Yet little is known of the speech-motor network in AWS in the brief time window preceding audible speech onset. The purpose of the current study was to characterize neural oscillations in the speech-motor network during preparation for and execution of overt speech production in AWS using magnetoencephalography (MEG. Twelve AWS and twelve age-matched controls were presented with 220 words, each word embedded in a carrier phrase. Controls were presented with the same word list as their matched AWS participant. Neural oscillatory activity was localized using minimum-variance beamforming during two time periods of interest: speech preparation (prior to speech onset and speech execution (following speech onset. Compared to controls, AWS showed stronger beta (15-25Hz suppression in the speech preparation stage, followed by stronger beta synchronization in the bilateral mouth motor cortex. AWS also recruited the right mouth motor cortex significantly earlier in the speech preparation stage compared to controls. Exaggerated motor preparation is discussed in the context of reduced coordination in the speech-motor network of AWS. It is further proposed that exaggerated beta synchronization may reflect a more strongly inhibited motor system that requires a stronger beta suppression to disengage prior to speech initiation. These novel findings highlight critical differences in the speech-motor network of AWS that occur prior to speech onset and emphasize the need to investigate further the speech-motor assembly in the stuttering population.

  2. Representational Similarity Analysis Reveals Heterogeneous Networks Supporting Speech Motor Control

    DEFF Research Database (Denmark)

    Zheng, Zane; Cusack, Rhodri; Johnsrude, Ingrid

    The everyday act of speaking involves the complex processes of speech motor control. One important feature of such control is regulation of articulation when auditory concomitants of speech do not correspond to the intended motor gesture. While theoretical accounts of speech monitoring posit...... multiple functional components required for detection of errors in speech planning (e.g., Levelt, 1983), neuroimaging studies generally indicate either single brain regions sensitive to speech production errors, or small, discrete networks. Here we demonstrate that the complex system controlling speech...... is supported by a complex neural network that is involved in linguistic, motoric and sensory processing. With the aid of novel real-time acoustic analyses and representational similarity analyses of fMRI signals, our data show functionally differentiated networks underlying auditory feedback control of speech....

  3. Speech and nonspeech: What are we talking about?

    Science.gov (United States)

    Maas, Edwin

    2017-08-01

    Understanding of the behavioural, cognitive and neural underpinnings of speech production is of interest theoretically, and is important for understanding disorders of speech production and how to assess and treat such disorders in the clinic. This paper addresses two claims about the neuromotor control of speech production: (1) speech is subserved by a distinct, specialised motor control system and (2) speech is holistic and cannot be decomposed into smaller primitives. Both claims have gained traction in recent literature, and are central to a task-dependent model of speech motor control. The purpose of this paper is to stimulate thinking about speech production, its disorders and the clinical implications of these claims. The paper poses several conceptual and empirical challenges for these claims - including the critical importance of defining speech. The emerging conclusion is that a task-dependent model is called into question as its two central claims are founded on ill-defined and inconsistently applied concepts. The paper concludes with discussion of methodological and clinical implications, including the potential utility of diadochokinetic (DDK) tasks in assessment of motor speech disorders and the contraindication of nonspeech oral motor exercises to improve speech function.

  4. Dynamic encoding of speech sequence probability in human temporal cortex.

    Science.gov (United States)

    Leonard, Matthew K; Bouchard, Kristofer E; Tang, Claire; Chang, Edward F

    2015-05-06

    Sensory processing involves identification of stimulus features, but also integration with the surrounding sensory and cognitive context. Previous work in animals and humans has shown fine-scale sensitivity to context in the form of learned knowledge about the statistics of the sensory environment, including relative probabilities of discrete units in a stream of sequential auditory input. These statistics are a defining characteristic of one of the most important sequential signals humans encounter: speech. For speech, extensive exposure to a language tunes listeners to the statistics of sound sequences. To address how speech sequence statistics are neurally encoded, we used high-resolution direct cortical recordings from human lateral superior temporal cortex as subjects listened to words and nonwords with varying transition probabilities between sound segments. In addition to their sensitivity to acoustic features (including contextual features, such as coarticulation), we found that neural responses dynamically encoded the language-level probability of both preceding and upcoming speech sounds. Transition probability first negatively modulated neural responses, followed by positive modulation of neural responses, consistent with coordinated predictive and retrospective recognition processes, respectively. Furthermore, transition probability encoding was different for real English words compared with nonwords, providing evidence for online interactions with high-order linguistic knowledge. These results demonstrate that sensory processing of deeply learned stimuli involves integrating physical stimulus features with their contextual sequential structure. Despite not being consciously aware of phoneme sequence statistics, listeners use this information to process spoken input and to link low-level acoustic representations with linguistic information about word identity and meaning. Copyright © 2015 the authors 0270-6474/15/357203-12$15.00/0.

  5. A neural population model incorporating dopaminergic neurotransmission during complex voluntary behaviors.

    Directory of Open Access Journals (Sweden)

    Stefan Fürtinger

    2014-11-01

    Full Text Available Assessing brain activity during complex voluntary motor behaviors that require the recruitment of multiple neural sites is a field of active research. Our current knowledge is primarily based on human brain imaging studies that have clear limitations in terms of temporal and spatial resolution. We developed a physiologically informed non-linear multi-compartment stochastic neural model to simulate functional brain activity coupled with neurotransmitter release during complex voluntary behavior, such as speech production. Due to its state-dependent modulation of neural firing, dopaminergic neurotransmission plays a key role in the organization of functional brain circuits controlling speech and language and thus has been incorporated in our neural population model. A rigorous mathematical proof establishing existence and uniqueness of solutions to the proposed model as well as a computationally efficient strategy to numerically approximate these solutions are presented. Simulated brain activity during the resting state and sentence production was analyzed using functional network connectivity, and graph theoretical techniques were employed to highlight differences between the two conditions. We demonstrate that our model successfully reproduces characteristic changes seen in empirical data between the resting state and speech production, and dopaminergic neurotransmission evokes pronounced changes in modeled functional connectivity by acting on the underlying biological stochastic neural model. Specifically, model and data networks in both speech and rest conditions share task-specific network features: both the simulated and empirical functional connectivity networks show an increase in nodal influence and segregation in speech over the resting state. These commonalities confirm that dopamine is a key neuromodulator of the functional connectome of speech control. Based on reproducible characteristic aspects of empirical data, we suggest a number

  6. Establishment of Human Neural Progenitor Cells from Human Induced Pluripotent Stem Cells with Diverse Tissue Origins

    Directory of Open Access Journals (Sweden)

    Hayato Fukusumi

    2016-01-01

    Full Text Available Human neural progenitor cells (hNPCs have previously been generated from limited numbers of human induced pluripotent stem cell (hiPSC clones. Here, 21 hiPSC clones derived from human dermal fibroblasts, cord blood cells, and peripheral blood mononuclear cells were differentiated using two neural induction methods, an embryoid body (EB formation-based method and an EB formation method using dual SMAD inhibitors (dSMADi. Our results showed that expandable hNPCs could be generated from hiPSC clones with diverse somatic tissue origins. The established hNPCs exhibited a mid/hindbrain-type neural identity and uniform expression of neural progenitor genes.

  7. Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

    Science.gov (United States)

    Borrie, Stephanie A

    2015-03-01

    This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.

  8. DEVELOPMENT AND DISORDERS OF SPEECH IN CHILDHOOD.

    Science.gov (United States)

    KARLIN, ISAAC W.; AND OTHERS

    THE GROWTH, DEVELOPMENT, AND ABNORMALITIES OF SPEECH IN CHILDHOOD ARE DESCRIBED IN THIS TEXT DESIGNED FOR PEDIATRICIANS, PSYCHOLOGISTS, EDUCATORS, MEDICAL STUDENTS, THERAPISTS, PATHOLOGISTS, AND PARENTS. THE NORMAL DEVELOPMENT OF SPEECH AND LANGUAGE IS DISCUSSED, INCLUDING THEORIES ON THE ORIGIN OF SPEECH IN MAN AND FACTORS INFLUENCING THE NORMAL…

  9. Dopamine Regulation of Human Speech and Bird Song: A Critical Review

    Science.gov (United States)

    Simonyan, Kristina; Horwitz, Barry; Jarvis, Erich D.

    2012-01-01

    To understand the neural basis of human speech control, extensive research has been done using a variety of methodologies in a range of experimental models. Nevertheless, several critical questions about learned vocal motor control still remain open. One of them is the mechanism(s) by which neurotransmitters, such as dopamine, modulate speech and…

  10. Magnified Neural Envelope Coding Predicts Deficits in Speech Perception in Noise.

    Science.gov (United States)

    Millman, Rebecca E; Mattys, Sven L; Gouws, André D; Prendergast, Garreth

    2017-08-09

    Verbal communication in noisy backgrounds is challenging. Understanding speech in background noise that fluctuates in intensity over time is particularly difficult for hearing-impaired listeners with a sensorineural hearing loss (SNHL). The reduction in fast-acting cochlear compression associated with SNHL exaggerates the perceived fluctuations in intensity in amplitude-modulated sounds. SNHL-induced changes in the coding of amplitude-modulated sounds may have a detrimental effect on the ability of SNHL listeners to understand speech in the presence of modulated background noise. To date, direct evidence for a link between magnified envelope coding and deficits in speech identification in modulated noise has been absent. Here, magnetoencephalography was used to quantify the effects of SNHL on phase locking to the temporal envelope of modulated noise (envelope coding) in human auditory cortex. Our results show that SNHL enhances the amplitude of envelope coding in posteromedial auditory cortex, whereas it enhances the fidelity of envelope coding in posteromedial and posterolateral auditory cortex. This dissociation was more evident in the right hemisphere, demonstrating functional lateralization in enhanced envelope coding in SNHL listeners. However, enhanced envelope coding was not perceptually beneficial. Our results also show that both hearing thresholds and, to a lesser extent, magnified cortical envelope coding in left posteromedial auditory cortex predict speech identification in modulated background noise. We propose a framework in which magnified envelope coding in posteromedial auditory cortex disrupts the segregation of speech from background noise, leading to deficits in speech perception in modulated background noise. SIGNIFICANCE STATEMENT People with hearing loss struggle to follow conversations in noisy environments. Background noise that fluctuates in intensity over time poses a particular challenge. Using magnetoencephalography, we demonstrate

  11. A hypothesis on the biological origins and social evolution of music and dance

    Directory of Open Access Journals (Sweden)

    Tianyan eWang

    2015-02-01

    Full Text Available The origins of music and musical emotions is still an enigma, here I propose a comprehensive hypothesis on the origins and evolution of music, dance and speech from a biological and sociological perspective. I suggest that every pitch interval between neighboring notes in music represents corresponding movement pattern through interpreting the Doppler effect of sound, which not only provides a possible explanation to the transposition invariance of music, but also integrates music and dance into a common form—rhythmic movements. Accordingly, investigating the origins of music poses the question: why do humans appreciate rhythmic movements? I suggest that human appreciation of rhythmic movements and rhythmic events developed from the natural selection of organisms adapting to the internal and external rhythmic environments. The perception and production of, as well as synchronization with external and internal rhythms are so vital for an organism’s survival and reproduction, that animals have a rhythm-related reward and emotion (RRRE system. The RRRE system enables the appreciation of rhythmic movements and events, and is integral to the origination of music, dance and speech. The first type of rewards and emotions (rhythm-related rewards and emotions, RRREs are evoked by music and dance, and have biological and social functions, which in turn, promote the evolution of music, dance and speech. These functions also evoke a second type of rewards and emotions, which I name society-related rewards and emotions (SRREs. The neural circuits of RRREs and SRREs develop in species formation and personal growth, with congenital and acquired characteristics, respectively, namely music is the combination of nature and culture. This hypothesis provides probable selection pressures and outlines the evolution of music, dance and speech. The links between the Doppler effect and the RRREs and SRREs can be empirically tested, making the current hypothesis

  12. A hypothesis on the biological origins and social evolution of music and dance.

    Science.gov (United States)

    Wang, Tianyan

    2015-01-01

    The origins of music and musical emotions is still an enigma, here I propose a comprehensive hypothesis on the origins and evolution of music, dance, and speech from a biological and sociological perspective. I suggest that every pitch interval between neighboring notes in music represents corresponding movement pattern through interpreting the Doppler effect of sound, which not only provides a possible explanation for the transposition invariance of music, but also integrates music and dance into a common form-rhythmic movements. Accordingly, investigating the origins of music poses the question: why do humans appreciate rhythmic movements? I suggest that human appreciation of rhythmic movements and rhythmic events developed from the natural selection of organisms adapting to the internal and external rhythmic environments. The perception and production of, as well as synchronization with external and internal rhythms are so vital for an organism's survival and reproduction, that animals have a rhythm-related reward and emotion (RRRE) system. The RRRE system enables the appreciation of rhythmic movements and events, and is integral to the origination of music, dance and speech. The first type of rewards and emotions (rhythm-related rewards and emotions, RRREs) are evoked by music and dance, and have biological and social functions, which in turn, promote the evolution of music, dance and speech. These functions also evoke a second type of rewards and emotions, which I name society-related rewards and emotions (SRREs). The neural circuits of RRREs and SRREs develop in species formation and personal growth, with congenital and acquired characteristics, respectively, namely music is the combination of nature and culture. This hypothesis provides probable selection pressures and outlines the evolution of music, dance, and speech. The links between the Doppler effect and the RRREs and SRREs can be empirically tested, making the current hypothesis scientifically

  13. Computational neural modeling of speech motor control in childhood apraxia of speech (CAS).

    NARCIS (Netherlands)

    Terband, H.R.; Maassen, B.A.M.; Guenther, F.H.; Brumberg, J.

    2009-01-01

    PURPOSE: Childhood apraxia of speech (CAS) has been associated with a wide variety of diagnostic descriptions and has been shown to involve different symptoms during successive stages of development. In the present study, the authors attempted to associate the symptoms of CAS in a particular

  14. Speech Pathology in Ancient India--A Review of Sanskrit Literature.

    Science.gov (United States)

    Savithri, S. R.

    1987-01-01

    The paper is a review of ancient Sanskrit literature for information on the origin and development of speech and language, speech production, normality of speech and language, and disorders of speech and language and their treatment. (DB)

  15. An Introduction to Neural Networks for Hearing Aid Noise Recognition.

    Science.gov (United States)

    Kim, Jun W.; Tyler, Richard S.

    1995-01-01

    This article introduces the use of multilayered artificial neural networks in hearing aid noise recognition. It reviews basic principles of neural networks, and offers an example of an application in which a neural network is used to identify the presence or absence of noise in speech. The ability of neural networks to "learn" the…

  16. A homology sound-based algorithm for speech signal interference

    Science.gov (United States)

    Jiang, Yi-jiao; Chen, Hou-jin; Li, Ju-peng; Zhang, Zhan-song

    2015-12-01

    Aiming at secure analog speech communication, a homology sound-based algorithm for speech signal interference is proposed in this paper. We first split speech signal into phonetic fragments by a short-term energy method and establish an interference noise cache library with the phonetic fragments. Then we implement the homology sound interference by mixing the randomly selected interferential fragments and the original speech in real time. The computer simulation results indicated that the interference produced by this algorithm has advantages of real time, randomness, and high correlation with the original signal, comparing with the traditional noise interference methods such as white noise interference. After further studies, the proposed algorithm may be readily used in secure speech communication.

  17. Discriminative training of self-structuring hidden control neural models

    DEFF Research Database (Denmark)

    Sørensen, Helge Bjarup Dissing; Hartmann, Uwe; Hunnerup, Preben

    1995-01-01

    This paper presents a new training algorithm for self-structuring hidden control neural (SHC) models. The SHC models were trained non-discriminatively for speech recognition applications. Better recognition performance can generally be achieved, if discriminative training is applied instead. Thus...... we developed a discriminative training algorithm for SHC models, where each SHC model for a specific speech pattern is trained with utterances of the pattern to be recognized and with other utterances. The discriminative training of SHC neural models has been tested on the TIDIGITS database...

  18. Computational Neural Modeling of Speech Motor Control in Childhood Apraxia of Speech (CAS)

    Science.gov (United States)

    Terband, Hayo; Maassen, Ben; Guenther, Frank H.; Brumberg, Jonathan

    2009-01-01

    Purpose: Childhood apraxia of speech (CAS) has been associated with a wide variety of diagnostic descriptions and has been shown to involve different symptoms during successive stages of development. In the present study, the authors attempted to associate the symptoms of CAS in a particular developmental stage with particular…

  19. Speech perception under adverse conditions: Insights from behavioral, computational and neuroscience research

    Directory of Open Access Journals (Sweden)

    Sara eGuediche

    2014-01-01

    Full Text Available Adult speech perception reflects the long-term regularities of the native language, but it is also flexible such that it accommodates and adapts to adverse listening conditions and short-term deviations from native-language norms. The purpose of this review article is to examine how the broader neuroscience literature can inform and advance research efforts in understanding the neural basis of flexibility and adaptive plasticity in speech perception. In particular, we consider several domains of neuroscience research that offer insight into how perception can be adaptively tuned to short-term deviations while also maintaining without affecting the long-term learned regularities for mapping sensory input. We review several literatures to highlight the potential role of learning algorithms that rely on prediction error signals and discuss specific neural structures that are likely to contribute to such learning. Already, a few studies have alluded to a potential role of these mechanisms in adaptive plasticity in speech perception. Better understanding the application and limitations of these algorithms for the challenges of flexible speech perception under adverse conditions promises to inform theoretical models of speech.

  20. Recognition of In-Ear Microphone Speech Data Using Multi-Layer Neural Networks

    National Research Council Canada - National Science Library

    Bulbuller, Gokhan

    2006-01-01

    .... In this study, a speech recognition system is presented, specifically an isolated word recognizer which uses speech collected from the external auditory canals of the subjects via an in-ear microphone...

  1. Song and speech: examining the link between singing talent and speech imitation ability.

    Science.gov (United States)

    Christiner, Markus; Reiterer, Susanne M

    2013-01-01

    In previous research on speech imitation, musicality, and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Forty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64% of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66% of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi) could be explained by working memory together with a singer's sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and auditory memory with singing fitting better into the category of "speech" on the productive level and "music" on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. (1) Motor flexibility and the ability to sing improve language and musical function. (2) Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. (3) The ability to sing improves the memory span of the auditory working memory.

  2. Song and speech: examining the link between singing talent and speech imitation ability

    Directory of Open Access Journals (Sweden)

    Markus eChristiner

    2013-11-01

    Full Text Available In previous research on speech imitation, musicality and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Fourty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64 % of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66 % of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi could be explained by working memory together with a singer’s sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and sound memory with singing fitting better into the category of "speech" on the productive level and "music" on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. 1. Motor flexibility and the ability to sing improve language and musical function. 2. Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. 3. The ability to sing improves the memory span of the auditory short term memory.

  3. The lateralized arcuate fasciculus in developmental pitch disorders among mandarin amusics: left for speech and right for music.

    Science.gov (United States)

    Chen, Xizhuo; Zhao, Yanxin; Zhong, Suyu; Cui, Zaixu; Li, Jiaqi; Gong, Gaolang; Dong, Qi; Nan, Yun

    2018-05-01

    The arcuate fasciculus (AF) is a neural fiber tract that is critical to speech and music development. Although the predominant role of the left AF in speech development is relatively clear, how the AF engages in music development is not understood. Congenital amusia is a special neurodevelopmental condition, which not only affects musical pitch but also speech tone processing. Using diffusion tensor tractography, we aimed at understanding the role of AF in music and speech processing by examining the neural connectivity characteristics of the bilateral AF among thirty Mandarin amusics. Compared to age- and intelligence quotient (IQ)-matched controls, amusics demonstrated increased connectivity as reflected by the increased fractional anisotropy in the right posterior AF but decreased connectivity as reflected by the decreased volume in the right anterior AF. Moreover, greater fractional anisotropy in the left direct AF was correlated with worse performance in speech tone perception among amusics. This study is the first to examine the neural connectivity of AF in the neurodevelopmental condition of amusia as a result of disrupted music pitch and speech tone processing. We found abnormal white matter structural connectivity in the right AF for the amusic individuals. Moreover, we demonstrated that the white matter microstructural properties of the left direct AF is modulated by lexical tone deficits among the amusic individuals. These data support the notion of distinctive pitch processing systems between music and speech.

  4. Categorical speech processing in Broca's area: an fMRI study using multivariate pattern-based analysis.

    Science.gov (United States)

    Lee, Yune-Sang; Turkeltaub, Peter; Granger, Richard; Raizada, Rajeev D S

    2012-03-14

    Although much effort has been directed toward understanding the neural basis of speech processing, the neural processes involved in the categorical perception of speech have been relatively less studied, and many questions remain open. In this functional magnetic resonance imaging (fMRI) study, we probed the cortical regions mediating categorical speech perception using an advanced brain-mapping technique, whole-brain multivariate pattern-based analysis (MVPA). Normal healthy human subjects (native English speakers) were scanned while they listened to 10 consonant-vowel syllables along the /ba/-/da/ continuum. Outside of the scanner, individuals' own category boundaries were measured to divide the fMRI data into /ba/ and /da/ conditions per subject. The whole-brain MVPA revealed that Broca's area and the left pre-supplementary motor area evoked distinct neural activity patterns between the two perceptual categories (/ba/ vs /da/). Broca's area was also found when the same analysis was applied to another dataset (Raizada and Poldrack, 2007), which previously yielded the supramarginal gyrus using a univariate adaptation-fMRI paradigm. The consistent MVPA findings from two independent datasets strongly indicate that Broca's area participates in categorical speech perception, with a possible role of translating speech signals into articulatory codes. The difference in results between univariate and multivariate pattern-based analyses of the same data suggest that processes in different cortical areas along the dorsal speech perception stream are distributed on different spatial scales.

  5. A NOVEL APPROACH TO STUTTERED SPEECH CORRECTION

    Directory of Open Access Journals (Sweden)

    Alim Sabur Ajibola

    2016-06-01

    Full Text Available Stuttered speech is a dysfluency rich speech, more prevalent in males than females. It has been associated with insufficient air pressure or poor articulation, even though the root causes are more complex. The primary features include prolonged speech and repetitive speech, while some of its secondary features include, anxiety, fear, and shame. This study used LPC analysis and synthesis algorithms to reconstruct the stuttered speech. The results were evaluated using cepstral distance, Itakura-Saito distance, mean square error, and likelihood ratio. These measures implied perfect speech reconstruction quality. ASR was used for further testing, and the results showed that all the reconstructed speech samples were perfectly recognized while only three samples of the original speech were perfectly recognized.

  6. NEURAL PROGENITORS, PATTERNING AND ECOLOGY IN NEOCORTICAL ORIGINS

    Directory of Open Access Journals (Sweden)

    Francisco eAboitiz

    2013-11-01

    Full Text Available The anatomical organization of the mammalian neocortex stands out among vertebrates for its laminar and columnar arrangement, featuring vertically oriented, excitatory pyramidal neurons. The evolutionary origin of this structure is discussed here in relation to the brain organization of other amniotes, i.e. the sauropsids (reptiles and birds. Specifically, we address the developmental modifications that had to take place to generate the neocortex, and to what extent these modifications were shared by other amniote lineages or can be considered unique to mammals. In this article, we propose a hypothesis that combines the control of proliferation in neural progenitor pools with the specification of regional morphogenetic gradients, yielding different anatomical results by virtue of the differential modulation of these processes in each lineage. Thus, there is a highly conserved genetic and developmental battery that becomes modulated in different directions according to specific selective pressures. In the case of early mammals, ecological conditions like nocturnal habits and reproductive strategies are considered to have played a key role in the selection of the particular brain patterning mechanisms that led to the origin of the neocortex.

  7. Supervised Sequence Labelling with Recurrent Neural Networks

    CERN Document Server

    Graves, Alex

    2012-01-01

    Supervised sequence labelling is a vital area of machine learning, encompassing tasks such as speech, handwriting and gesture recognition, protein secondary structure prediction and part-of-speech tagging. Recurrent neural networks are powerful sequence learning tools—robust to input noise and distortion, able to exploit long-range contextual information—that would seem ideally suited to such problems. However their role in large-scale sequence labelling systems has so far been auxiliary.    The goal of this book is a complete framework for classifying and transcribing sequential data with recurrent neural networks only. Three main innovations are introduced in order to realise this goal. Firstly, the connectionist temporal classification output layer allows the framework to be trained with unsegmented target sequences, such as phoneme-level speech transcriptions; this is in contrast to previous connectionist approaches, which were dependent on error-prone prior segmentation. Secondly, multidimensional...

  8. Origin-Dependent Neural Cell Identities in Differentiated Human iPSCs In Vitro and after Transplantation into the Mouse Brain

    Directory of Open Access Journals (Sweden)

    Gunnar Hargus

    2014-09-01

    Full Text Available The differentiation capability of induced pluripotent stem cells (iPSCs toward certain cell types for disease modeling and drug screening assays might be influenced by their somatic cell of origin. Here, we have compared the neural induction of human iPSCs generated from fetal neural stem cells (fNSCs, dermal fibroblasts, or cord blood CD34+ hematopoietic progenitor cells. Neural progenitor cells (NPCs and neurons could be generated at similar efficiencies from all iPSCs. Transcriptomics analysis of the whole genome and of neural genes revealed a separation of neuroectoderm-derived iPSC-NPCs from mesoderm-derived iPSC-NPCs. Furthermore, we found genes that were similarly expressed in fNSCs and neuroectoderm, but not in mesoderm-derived iPSC-NPCs. Notably, these neural signatures were retained after transplantation into the cortex of mice and paralleled with increased survival of neuroectoderm-derived cells in vivo. These results indicate distinct origin-dependent neural cell identities in differentiated human iPSCs both in vitro and in vivo.

  9. Neural correlates of audiovisual speech processing in a second language.

    Science.gov (United States)

    Barrós-Loscertales, Alfonso; Ventura-Campos, Noelia; Visser, Maya; Alsius, Agnès; Pallier, Christophe; Avila Rivera, César; Soto-Faraco, Salvador

    2013-09-01

    Neuroimaging studies of audiovisual speech processing have exclusively addressed listeners' native language (L1). Yet, several behavioural studies now show that AV processing plays an important role in non-native (L2) speech perception. The current fMRI study measured brain activity during auditory, visual, audiovisual congruent and audiovisual incongruent utterances in L1 and L2. BOLD responses to congruent AV speech in the pSTS were stronger than in either unimodal condition in both L1 and L2. Yet no differences in AV processing were expressed according to the language background in this area. Instead, the regions in the bilateral occipital lobe had a stronger congruency effect on the BOLD response (congruent higher than incongruent) in L2 as compared to L1. According to these results, language background differences are predominantly expressed in these unimodal regions, whereas the pSTS is similarly involved in AV integration regardless of language dominance. Copyright © 2013 Elsevier Inc. All rights reserved.

  10. The integration of prosodic speech in high functioning autism: a preliminary FMRI study.

    Directory of Open Access Journals (Sweden)

    Isabelle Hesling

    2010-07-01

    Full Text Available Autism is a neurodevelopmental disorder characterized by a specific triad of symptoms such as abnormalities in social interaction, abnormalities in communication and restricted activities and interests. While verbal autistic subjects may present a correct mastery of the formal aspects of speech, they have difficulties in prosody (music of speech, leading to communication disorders. Few behavioural studies have revealed a prosodic impairment in children with autism, and among the few fMRI studies aiming at assessing the neural network involved in language, none has specifically studied prosodic speech. The aim of the present study was to characterize specific prosodic components such as linguistic prosody (intonation, rhythm and emphasis and emotional prosody and to correlate them with the neural network underlying them.We used a behavioural test (Profiling Elements of the Prosodic System, PEPS and fMRI to characterize prosodic deficits and investigate the neural network underlying prosodic processing. Results revealed the existence of a link between perceptive and productive prosodic deficits for some prosodic components (rhythm, emphasis and affect in HFA and also revealed that the neural network involved in prosodic speech perception exhibits abnormal activation in the left SMG as compared to controls (activation positively correlated with intonation and emphasis and an absence of deactivation patterns in regions involved in the default mode.These prosodic impairments could not only result from activation patterns abnormalities but also from an inability to adequately use the strategy of the default network inhibition, both mechanisms that have to be considered for decreasing task performance in High Functioning Autism.

  11. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  12. Adult-like processing of time-compressed speech by newborns: A NIRS study.

    Science.gov (United States)

    Issard, Cécile; Gervain, Judit

    2017-06-01

    Humans can adapt to a wide range of variations in the speech signal, maintaining an invariant representation of the linguistic information it contains. Among them, adaptation to rapid or time-compressed speech has been well studied in adults, but the developmental origin of this capacity remains unknown. Does this ability depend on experience with speech (if yes, as heard in utero or as heard postnatally), with sounds in general or is it experience-independent? Using near-infrared spectroscopy, we show that the newborn brain can discriminate between three different compression rates: normal, i.e. 100% of the original duration, moderately compressed, i.e. 60% of original duration and highly compressed, i.e. 30% of original duration. Even more interestingly, responses to normal and moderately compressed speech are similar, showing a canonical hemodynamic response in the left temporoparietal, right frontal and right temporal cortex, while responses to highly compressed speech are inverted, showing a decrease in oxyhemoglobin concentration. These results mirror those found in adults, who readily adapt to moderately compressed, but not to highly compressed speech, showing that adaptation to time-compressed speech requires little or no experience with speech, and happens at an auditory, and not at a more abstract linguistic level. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  13. From prosodic structure to acoustic saliency: A fMRI investigation of speech rate, clarity, and emphasis

    Science.gov (United States)

    Golfinopoulos, Elisa

    Acoustic variability in fluent speech can arise at many stages in speech production planning and execution. For example, at the phonological encoding stage, the grouping of phonemes into syllables determines which segments are coarticulated and, by consequence, segment-level acoustic variation. Likewise phonetic encoding, which determines the spatiotemporal extent of articulatory gestures, will affect the acoustic detail of segments. Functional magnetic resonance imaging (fMRI) was used to measure brain activity of fluent adult speakers in four speaking conditions: fast, normal, clear, and emphatic (or stressed) speech. These speech manner changes typically result in acoustic variations that do not change the lexical or semantic identity of productions but do affect the acoustic saliency of phonemes, syllables and/or words. Acoustic responses recorded inside the scanner were assessed quantitatively using eight acoustic measures and sentence duration was used as a covariate of non-interest in the neuroimaging analysis. Compared to normal speech, emphatic speech was characterized acoustically by a greater difference between stressed and unstressed vowels in intensity, duration, and fundamental frequency, and neurally by increased activity in right middle premotor cortex and supplementary motor area, and bilateral primary sensorimotor cortex. These findings are consistent with right-lateralized motor planning of prosodic variation in emphatic speech. Clear speech involved an increase in average vowel and sentence durations and average vowel spacing, along with increased activity in left middle premotor cortex and bilateral primary sensorimotor cortex. These findings are consistent with an increased reliance on feedforward control, resulting in hyper-articulation, under clear as compared to normal speech. Fast speech was characterized acoustically by reduced sentence duration and average vowel spacing, and neurally by increased activity in left anterior frontal

  14. On the Perception of Speech Sounds as Biologically Significant Signals1,2

    Science.gov (United States)

    Pisoni, David B.

    2012-01-01

    This paper reviews some of the major evidence and arguments currently available to support the view that human speech perception may require the use of specialized neural mechanisms for perceptual analysis. Experiments using synthetically produced speech signals with adults are briefly summarized and extensions of these results to infants and other organisms are reviewed with an emphasis towards detailing those aspects of speech perception that may require some need for specialized species-specific processors. Finally, some comments on the role of early experience in perceptual development are provided as an attempt to identify promising areas of new research in speech perception. PMID:399200

  15. Visemic Processing in Audiovisual Discrimination of Natural Speech: A Simultaneous fMRI-EEG Study

    Science.gov (United States)

    Dubois, Cyril; Otzenberger, Helene; Gounot, Daniel; Sock, Rudolph; Metz-Lutz, Marie-Noelle

    2012-01-01

    In a noisy environment, visual perception of articulatory movements improves natural speech intelligibility. Parallel to phonemic processing based on auditory signal, visemic processing constitutes a counterpart based on "visemes", the distinctive visual units of speech. Aiming at investigating the neural substrates of visemic processing in a…

  16. Visual feedback of tongue movement for novel speech sound learning

    Directory of Open Access Journals (Sweden)

    William F Katz

    2015-11-01

    Full Text Available Pronunciation training studies have yielded important information concerning the processing of audiovisual (AV information. Second language (L2 learners show increased reliance on bottom-up, multimodal input for speech perception (compared to monolingual individuals. However, little is known about the role of viewing one’s own speech articulation processes during speech training. The current study investigated whether real-time, visual feedback for tongue movement can improve a speaker’s learning of non-native speech sounds. An interactive 3D tongue visualization system based on electromagnetic articulography (EMA was used in a speech training experiment. Native speakers of American English produced a novel speech sound (/ɖ̠/; a voiced, coronal, palatal stop before, during, and after trials in which they viewed their own speech movements using the 3D model. Talkers’ productions were evaluated using kinematic (tongue-tip spatial positioning and acoustic (burst spectra measures. The results indicated a rapid gain in accuracy associated with visual feedback training. The findings are discussed with respect to neural models for multimodal speech processing.

  17. Audiovisual Speech Synchrony Measure: Application to Biometrics

    Directory of Open Access Journals (Sweden)

    Gérard Chollet

    2007-01-01

    Full Text Available Speech is a means of communication which is intrinsically bimodal: the audio signal originates from the dynamics of the articulators. This paper reviews recent works in the field of audiovisual speech, and more specifically techniques developed to measure the level of correspondence between audio and visual speech. It overviews the most common audio and visual speech front-end processing, transformations performed on audio, visual, or joint audiovisual feature spaces, and the actual measure of correspondence between audio and visual speech. Finally, the use of synchrony measure for biometric identity verification based on talking faces is experimented on the BANCA database.

  18. Subcortical encoding of speech cues in children with attention deficit hyperactivity disorder.

    Science.gov (United States)

    Jafari, Zahra; Malayeri, Saeed; Rostami, Reza

    2015-02-01

    There is little information about processing of nonspeech and speech stimuli at the subcortical level in individuals with attention deficit hyperactivity disorder (ADHD). The auditory brainstem response (ABR) provides information about the function of the auditory brainstem pathways. We aim to investigate the subcortical function in neural encoding of click and speech stimuli in children with ADHD. The subjects include 50 children with ADHD and 34 typically developing (TD) children between the ages of 8 and 12 years. Click ABR (cABR) and speech ABR (sABR) with 40 ms synthetic /da/ syllable stimulus were recorded. Latencies of cABR in waves of III and V and duration of V-Vn (P⩽0.027), and latencies of sABR in waves A, D, E, F and O and duration of V-A (P⩽0.034) were significantly longer in children with ADHD than in TD children. There were no apparent differences in components the sustained frequency following response (FFR). We conclude that children with ADHD have deficits in temporal neural encoding of both nonspeech and speech stimuli. There is a common dysfunction in the processing of click and speech stimuli at the brainstem level in children with suspected ADHD. Copyright © 2015. Published by Elsevier Ireland Ltd.

  19. Song and speech: examining the link between singing talent and speech imitation ability

    Science.gov (United States)

    Christiner, Markus; Reiterer, Susanne M.

    2013-01-01

    In previous research on speech imitation, musicality, and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Forty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64% of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66% of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi) could be explained by working memory together with a singer's sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and auditory memory with singing fitting better into the category of “speech” on the productive level and “music” on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. (1) Motor flexibility and the ability to sing improve language and musical function. (2) Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. (3) The ability to sing improves the memory span of the auditory working memory. PMID:24319438

  20. Neural tuning to low-level features of speech throughout the perisylvian cortex

    NARCIS (Netherlands)

    Berezutskaya, Y.; Freudenburg, Z.V.; Güçlü, U.; Gerven, M.A.J. van; Ramsey, N.F.

    2017-01-01

    Despite a large body of research, we continue to lack a detailed account of how auditory processing of continuous speech unfolds in the human brain. Previous research showed the propagation of low-level acoustic features of speech from posterior superior temporal gyrus towards anterior superior

  1. Neural tuning to low-level features of speech throughout the perisylvian cortex

    NARCIS (Netherlands)

    Berezutskaya, Julia; Freudenburg, Zachary V.; Güçlü, Umut; van Gerven, Marcel A.J.; Ramsey, Nick F.

    2017-01-01

    Despite a large body of research, we continue to lack a detailed account of how auditory processing of continuous speech unfolds in the human brain. Previous research showed the propagation of low-level acoustic features of speech from posterior superior temporal gyrus toward anterior superior

  2. Speech pathology in ancient India--a review of Sanskrit literature.

    Science.gov (United States)

    Savithri, S R

    1987-12-01

    This paper aims at highlighting the knowledge of the Sanskrit scholars of ancient times in the field of speech and language pathology. The information collected here is mainly from the Sanskrit texts written between 2000 B.C. and 1633 A.D. Some aspects of speech and language that have been dealt with in this review have been elaborately described in the original Sanskrit texts. The present paper, however, being limited in its scope, reviews only the essential facts, but not the details. The purpose is only to give a glimpse of the knowledge that the Sanskrit scholars of those times possessed. In brief, this paper is a review of Sanskrit literature for information on the origin and development of speech and language, speech production, normality of speech and language, and disorders of speech and language and their treatment.

  3. Long-term temporal tracking of speech rate affects spoken-word recognition.

    Science.gov (United States)

    Baese-Berk, Melissa M; Heffner, Christopher C; Dilley, Laura C; Pitt, Mark A; Morrill, Tuuli H; McAuley, J Devin

    2014-08-01

    Humans unconsciously track a wide array of distributional characteristics in their sensory environment. Recent research in spoken-language processing has demonstrated that the speech rate surrounding a target region within an utterance influences which words, and how many words, listeners hear later in that utterance. On the basis of hypotheses that listeners track timing information in speech over long timescales, we investigated the possibility that the perception of words is sensitive to speech rate over such a timescale (e.g., an extended conversation). Results demonstrated that listeners tracked variation in the overall pace of speech over an extended duration (analogous to that of a conversation that listeners might have outside the lab) and that this global speech rate influenced which words listeners reported hearing. The effects of speech rate became stronger over time. Our findings are consistent with the hypothesis that neural entrainment by speech occurs on multiple timescales, some lasting more than an hour. © The Author(s) 2014.

  4. SPEECH EMOTION RECOGNITION USING MODIFIED QUADRATIC DISCRIMINATION FUNCTION

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Quadratic Discrimination Function(QDF)is commonly used in speech emotion recognition,which proceeds on the premise that the input data is normal distribution.In this Paper,we propose a transformation to normalize the emotional features,then derivate a Modified QDF(MQDF) to speech emotion recognition.Features based on prosody and voice quality are extracted and Principal Component Analysis Neural Network (PCANN) is used to reduce dimension of the feature vectors.The results show that voice quality features are effective supplement for recognition.and the method in this paper could improve the recognition ratio effectively.

  5. Model-Based Synthesis of Visual Speech Movements from 3D Video

    Directory of Open Access Journals (Sweden)

    Edge JamesD

    2009-01-01

    Full Text Available We describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. Speech lip movements are captured using a 3D stereo face capture system and split up into phonetic units. A dynamic parameterisation of this data is constructed which maintains the relationship between lip shapes and velocities; within this parameterisation a model of how lips move is built and is used in the animation of visual speech movements from speech audio input. The mapping from audio parameters to lip movements is disambiguated by selecting only the most similar stored phonetic units to the target utterance during synthesis. By combining properties of model-based synthesis (e.g., HMMs, neural nets with unit selection we improve the quality of our speech synthesis.

  6. Focal versus distributed temporal cortex activity for speech sound category assignment

    Science.gov (United States)

    Bouton, Sophie; Chambon, Valérian; Tyrand, Rémi; Seeck, Margitta; Karkar, Sami; van de Ville, Dimitri; Giraud, Anne-Lise

    2018-01-01

    Percepts and words can be decoded from distributed neural activity measures. However, the existence of widespread representations might conflict with the more classical notions of hierarchical processing and efficient coding, which are especially relevant in speech processing. Using fMRI and magnetoencephalography during syllable identification, we show that sensory and decisional activity colocalize to a restricted part of the posterior superior temporal gyrus (pSTG). Next, using intracortical recordings, we demonstrate that early and focal neural activity in this region distinguishes correct from incorrect decisions and can be machine-decoded to classify syllables. Crucially, significant machine decoding was possible from neuronal activity sampled across different regions of the temporal and frontal lobes, despite weak or absent sensory or decision-related responses. These findings show that speech-sound categorization relies on an efficient readout of focal pSTG neural activity, while more distributed activity patterns, although classifiable by machine learning, instead reflect collateral processes of sensory perception and decision. PMID:29363598

  7. Artificial neural network analysis to assess hypernasality in patients treated for oral or oropharyngeal cancer

    NARCIS (Netherlands)

    de Bruijn, Marieke; ten Bosch, Louis; Kuik, Dirk J.; Langendijk, Johannes A.; Leemans, C. Rene; Verdonck-de Leeuw, Irma

    2011-01-01

    Objective. Investigation of applicability of neural network feature analysis of nasalance in speech to assess hypernasality in speech of patients treated for oral or oropharyngeal cancer. Patients and methods. Speech recordings of 51 patients and of 18 control speakers were evaluated regarding

  8. A Noninvasive Imaging Approach to Understanding Speech Changes following Deep Brain Stimulation in Parkinson's Disease

    Science.gov (United States)

    Narayana, Shalini; Jacks, Adam; Robin, Donald A.; Poizner, Howard; Zhang, Wei; Franklin, Crystal; Liotti, Mario; Vogel, Deanie; Fox, Peter T.

    2009-01-01

    Purpose: To explore the use of noninvasive functional imaging and "virtual" lesion techniques to study the neural mechanisms underlying motor speech disorders in Parkinson's disease. Here, we report the use of positron emission tomography (PET) and transcranial magnetic stimulation (TMS) to explain exacerbated speech impairment following…

  9. Hierarchical Organization of Auditory and Motor Representations in Speech Perception: Evidence from Searchlight Similarity Analysis.

    Science.gov (United States)

    Evans, Samuel; Davis, Matthew H

    2015-12-01

    How humans extract the identity of speech sounds from highly variable acoustic signals remains unclear. Here, we use searchlight representational similarity analysis (RSA) to localize and characterize neural representations of syllables at different levels of the hierarchically organized temporo-frontal pathways for speech perception. We asked participants to listen to spoken syllables that differed considerably in their surface acoustic form by changing speaker and degrading surface acoustics using noise-vocoding and sine wave synthesis while we recorded neural responses with functional magnetic resonance imaging. We found evidence for a graded hierarchy of abstraction across the brain. At the peak of the hierarchy, neural representations in somatomotor cortex encoded syllable identity but not surface acoustic form, at the base of the hierarchy, primary auditory cortex showed the reverse. In contrast, bilateral temporal cortex exhibited an intermediate response, encoding both syllable identity and the surface acoustic form of speech. Regions of somatomotor cortex associated with encoding syllable identity in perception were also engaged when producing the same syllables in a separate session. These findings are consistent with a hierarchical account of how variable acoustic signals are transformed into abstract representations of the identity of speech sounds. © The Author 2015. Published by Oxford University Press.

  10. Influence of auditory attention on sentence recognition captured by the neural phase.

    Science.gov (United States)

    Müller, Jana Annina; Kollmeier, Birger; Debener, Stefan; Brand, Thomas

    2018-03-07

    The aim of this study was to investigate whether attentional influences on speech recognition are reflected in the neural phase entrained by an external modulator. Sentences were presented in 7 Hz sinusoidally modulated noise while the neural response to that modulation frequency was monitored by electroencephalogram (EEG) recordings in 21 participants. We implemented a selective attention paradigm including three different attention conditions while keeping physical stimulus parameters constant. The participants' task was either to repeat the sentence as accurately as possible (speech recognition task), to count the number of decrements implemented in modulated noise (decrement detection task), or to do both (dual task), while the EEG was recorded. Behavioural analysis revealed reduced performance in the dual task condition for decrement detection, possibly reflecting limited cognitive resources. EEG analysis revealed no significant differences in power for the 7 Hz modulation frequency, but an attention-dependent phase difference between tasks. Further phase analysis revealed a significant difference 500 ms after sentence onset between trials with correct and incorrect responses for speech recognition, indicating that speech recognition performance and the neural phase are linked via selective attention mechanisms, at least shortly after sentence onset. However, the neural phase effects identified were small and await further investigation. © 2018 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.

  11. Visual Input Enhances Selective Speech Envelope Tracking in Auditory Cortex at a ‘Cocktail Party’

    Science.gov (United States)

    Golumbic, Elana Zion; Cogan, Gregory B.; Schroeder, Charles E.; Poeppel, David

    2013-01-01

    Our ability to selectively attend to one auditory signal amidst competing input streams, epitomized by the ‘Cocktail Party’ problem, continues to stimulate research from various approaches. How this demanding perceptual feat is achieved from a neural systems perspective remains unclear and controversial. It is well established that neural responses to attended stimuli are enhanced compared to responses to ignored ones, but responses to ignored stimuli are nonetheless highly significant, leading to interference in performance. We investigated whether congruent visual input of an attended speaker enhances cortical selectivity in auditory cortex, leading to diminished representation of ignored stimuli. We recorded magnetoencephalographic (MEG) signals from human participants as they attended to segments of natural continuous speech. Using two complementary methods of quantifying the neural response to speech, we found that viewing a speaker’s face enhances the capacity of auditory cortex to track the temporal speech envelope of that speaker. This mechanism was most effective in a ‘Cocktail Party’ setting, promoting preferential tracking of the attended speaker, whereas without visual input no significant attentional modulation was observed. These neurophysiological results underscore the importance of visual input in resolving perceptual ambiguity in a noisy environment. Since visual cues in speech precede the associated auditory signals, they likely serve a predictive role in facilitating auditory processing of speech, perhaps by directing attentional resources to appropriate points in time when to-be-attended acoustic input is expected to arrive. PMID:23345218

  12. Speech Evoked Auditory Brainstem Response in Stuttering

    Directory of Open Access Journals (Sweden)

    Ali Akbar Tahaei

    2014-01-01

    Full Text Available Auditory processing deficits have been hypothesized as an underlying mechanism for stuttering. Previous studies have demonstrated abnormal responses in subjects with persistent developmental stuttering (PDS at the higher level of the central auditory system using speech stimuli. Recently, the potential usefulness of speech evoked auditory brainstem responses in central auditory processing disorders has been emphasized. The current study used the speech evoked ABR to investigate the hypothesis that subjects with PDS have specific auditory perceptual dysfunction. Objectives. To determine whether brainstem responses to speech stimuli differ between PDS subjects and normal fluent speakers. Methods. Twenty-five subjects with PDS participated in this study. The speech-ABRs were elicited by the 5-formant synthesized syllable/da/, with duration of 40 ms. Results. There were significant group differences for the onset and offset transient peaks. Subjects with PDS had longer latencies for the onset and offset peaks relative to the control group. Conclusions. Subjects with PDS showed a deficient neural timing in the early stages of the auditory pathway consistent with temporal processing deficits and their abnormal timing may underlie to their disfluency.

  13. Relative Contributions of the Dorsal vs. Ventral Speech Streams to Speech Perception are Context Dependent: a lesion study

    Directory of Open Access Journals (Sweden)

    Corianne Rogalsky

    2014-04-01

    Full Text Available The neural basis of speech perception has been debated for over a century. While it is generally agreed that the superior temporal lobes are critical for the perceptual analysis of speech, a major current topic is whether the motor system contributes to speech perception, with several conflicting findings attested. In a dorsal-ventral speech stream framework (Hickok & Poeppel 2007, this debate is essentially about the roles of the dorsal versus ventral speech processing streams. A major roadblock in characterizing the neuroanatomy of speech perception is task-specific effects. For example, much of the evidence for dorsal stream involvement comes from syllable discrimination type tasks, which have been found to behaviorally doubly dissociate from auditory comprehension tasks (Baker et al. 1981. Discrimination task deficits could be a result of difficulty perceiving the sounds themselves, which is the typical assumption, or it could be a result of failures in temporary maintenance of the sensory traces, or the comparison and/or the decision process. Similar complications arise in perceiving sentences: the extent of inferior frontal (i.e. dorsal stream activation during listening to sentences increases as a function of increased task demands (Love et al. 2006. Another complication is the stimulus: much evidence for dorsal stream involvement uses speech samples lacking semantic context (CVs, non-words. The present study addresses these issues in a large-scale lesion-symptom mapping study. 158 patients with focal cerebral lesions from the Mutli-site Aphasia Research Consortium underwent a structural MRI or CT scan, as well as an extensive psycholinguistic battery. Voxel-based lesion symptom mapping was used to compare the neuroanatomy involved in the following speech perception tasks with varying phonological, semantic, and task loads: (i two discrimination tasks of syllables (non-words and words, respectively, (ii two auditory comprehension tasks

  14. Speech perception as an active cognitive process

    Directory of Open Access Journals (Sweden)

    Shannon eHeald

    2014-03-01

    Full Text Available One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processingd with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or

  15. Effect of attentional load on audiovisual speech perception: Evidence from ERPs

    Directory of Open Access Journals (Sweden)

    Agnès eAlsius

    2014-07-01

    Full Text Available Seeing articulatory movements influences perception of auditory speech. This is often reflected in a shortened latency of auditory event-related potentials (ERPs generated in the auditory cortex. The present study addressed whether this early neural correlate of audiovisual interaction is modulated by attention. We recorded ERPs in 15 subjects while they were presented with auditory, visual and audiovisual spoken syllables. Audiovisual stimuli consisted of incongruent auditory and visual components known to elicit a McGurk effect, i.e. a visually driven alteration in the auditory speech percept. In a Dual task condition, participants were asked to identify spoken syllables whilst monitoring a rapid visual stream of pictures for targets, i.e., they had to divide their attention. In a Single task condition, participants identified the syllables without any other tasks, i.e., they were asked to ignore the pictures and focus their attention fully on the spoken syllables. The McGurk effect was weaker in the Dual task than in the Single task condition, indicating an effect of attentional load on audiovisual speech perception. Early auditory ERP components, N1 and P2, peaked earlier to audiovisual stimuli than to auditory stimuli when attention was fully focused on syllables, indicating neurophysiological audiovisual interaction. This latency decrement was reduced when attention was loaded, suggesting that attention influences early neural processing of audiovisual speech. We conclude that reduced attention weakens the interaction between vision and audition in speech.

  16. Effect of attentional load on audiovisual speech perception: evidence from ERPs.

    Science.gov (United States)

    Alsius, Agnès; Möttönen, Riikka; Sams, Mikko E; Soto-Faraco, Salvador; Tiippana, Kaisa

    2014-01-01

    Seeing articulatory movements influences perception of auditory speech. This is often reflected in a shortened latency of auditory event-related potentials (ERPs) generated in the auditory cortex. The present study addressed whether this early neural correlate of audiovisual interaction is modulated by attention. We recorded ERPs in 15 subjects while they were presented with auditory, visual, and audiovisual spoken syllables. Audiovisual stimuli consisted of incongruent auditory and visual components known to elicit a McGurk effect, i.e., a visually driven alteration in the auditory speech percept. In a Dual task condition, participants were asked to identify spoken syllables whilst monitoring a rapid visual stream of pictures for targets, i.e., they had to divide their attention. In a Single task condition, participants identified the syllables without any other tasks, i.e., they were asked to ignore the pictures and focus their attention fully on the spoken syllables. The McGurk effect was weaker in the Dual task than in the Single task condition, indicating an effect of attentional load on audiovisual speech perception. Early auditory ERP components, N1 and P2, peaked earlier to audiovisual stimuli than to auditory stimuli when attention was fully focused on syllables, indicating neurophysiological audiovisual interaction. This latency decrement was reduced when attention was loaded, suggesting that attention influences early neural processing of audiovisual speech. We conclude that reduced attention weakens the interaction between vision and audition in speech.

  17. Noise and pitch interact during the cortical segregation of concurrent speech.

    Science.gov (United States)

    Bidelman, Gavin M; Yellamsetty, Anusha

    2017-08-01

    Behavioral studies reveal listeners exploit intrinsic differences in voice fundamental frequency (F0) to segregate concurrent speech sounds-the so-called "F0-benefit." More favorable signal-to-noise ratio (SNR) in the environment, an extrinsic acoustic factor, similarly benefits the parsing of simultaneous speech. Here, we examined the neurobiological substrates of these two cues in the perceptual segregation of concurrent speech mixtures. We recorded event-related brain potentials (ERPs) while listeners performed a speeded double-vowel identification task. Listeners heard two concurrent vowels whose F0 differed by zero or four semitones presented in either clean (no noise) or noise-degraded (+5 dB SNR) conditions. Behaviorally, listeners were more accurate in correctly identifying both vowels for larger F0 separations but F0-benefit was more pronounced at more favorable SNRs (i.e., pitch × SNR interaction). Analysis of the ERPs revealed that only the P2 wave (∼200 ms) showed a similar F0 x SNR interaction as behavior and was correlated with listeners' perceptual F0-benefit. Neural classifiers applied to the ERPs further suggested that speech sounds are segregated neurally within 200 ms based on SNR whereas segregation based on pitch occurs later in time (400-700 ms). The earlier timing of extrinsic SNR compared to intrinsic F0-based segregation implies that the cortical extraction of speech from noise is more efficient than differentiating speech based on pitch cues alone, which may recruit additional cortical processes. Findings indicate that noise and pitch differences interact relatively early in cerebral cortex and that the brain arrives at the identities of concurrent speech mixtures as early as ∼200 ms. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Man machine interface based on speech recognition

    International Nuclear Information System (INIS)

    Jorge, Carlos A.F.; Aghina, Mauricio A.C.; Mol, Antonio C.A.; Pereira, Claudio M.N.A.

    2007-01-01

    This work reports the development of a Man Machine Interface based on speech recognition. The system must recognize spoken commands, and execute the desired tasks, without manual interventions of operators. The range of applications goes from the execution of commands in an industrial plant's control room, to navigation and interaction in virtual environments. Results are reported for isolated word recognition, the isolated words corresponding to the spoken commands. For the pre-processing stage, relevant parameters are extracted from the speech signals, using the cepstral analysis technique, that are used for isolated word recognition, and corresponds to the inputs of an artificial neural network, that performs recognition tasks. (author)

  19. Quadcopter Control Using Speech Recognition

    Science.gov (United States)

    Malik, H.; Darma, S.; Soekirno, S.

    2018-04-01

    This research reported a comparison from a success rate of speech recognition systems that used two types of databases they were existing databases and new databases, that were implemented into quadcopter as motion control. Speech recognition system was using Mel frequency cepstral coefficient method (MFCC) as feature extraction that was trained using recursive neural network method (RNN). MFCC method was one of the feature extraction methods that most used for speech recognition. This method has a success rate of 80% - 95%. Existing database was used to measure the success rate of RNN method. The new database was created using Indonesian language and then the success rate was compared with results from an existing database. Sound input from the microphone was processed on a DSP module with MFCC method to get the characteristic values. Then, the characteristic values were trained using the RNN which result was a command. The command became a control input to the single board computer (SBC) which result was the movement of the quadcopter. On SBC, we used robot operating system (ROS) as the kernel (Operating System).

  20. Evaluating deep learning architectures for Speech Emotion Recognition.

    Science.gov (United States)

    Fayek, Haytham M; Lech, Margaret; Cavedon, Lawrence

    2017-08-01

    Speech Emotion Recognition (SER) can be regarded as a static or dynamic classification problem, which makes SER an excellent test bed for investigating and comparing various deep learning architectures. We describe a frame-based formulation to SER that relies on minimal speech processing and end-to-end deep learning to model intra-utterance dynamics. We use the proposed SER system to empirically explore feed-forward and recurrent neural network architectures and their variants. Experiments conducted illuminate the advantages and limitations of these architectures in paralinguistic speech recognition and emotion recognition in particular. As a result of our exploration, we report state-of-the-art results on the IEMOCAP database for speaker-independent SER and present quantitative and qualitative assessments of the models' performances. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Noise-robust cortical tracking of attended speech in real-world acoustic scenes

    DEFF Research Database (Denmark)

    Fuglsang, Søren; Dau, Torsten; Hjortkjær, Jens

    2017-01-01

    Selectively attending to one speaker in a multi-speaker scenario is thought to synchronize low-frequency cortical activity to the attended speech signal. In recent studies, reconstruction of speech from single-trial electroencephalogram (EEG) data has been used to decode which talker a listener...... is attending to in a two-talker situation. It is currently unclear how this generalizes to more complex sound environments. Behaviorally, speech perception is robust to the acoustic distortions that listeners typically encounter in everyday life, but it is unknown whether this is mirrored by a noise......-robust neural tracking of attended speech. Here we used advanced acoustic simulations to recreate real-world acoustic scenes in the laboratory. In virtual acoustic realities with varying amounts of reverberation and number of interfering talkers, listeners selectively attended to the speech stream...

  2. Relationships between music training, speech processing, and word learning: a network perspective.

    Science.gov (United States)

    Elmer, Stefan; Jäncke, Lutz

    2018-03-15

    Numerous studies have documented the behavioral advantages conferred on professional musicians and children undergoing music training in processing speech sounds varying in the spectral and temporal dimensions. These beneficial effects have previously often been associated with local functional and structural changes in the auditory cortex (AC). However, this perspective is oversimplified, in that it does not take into account the intrinsic organization of the human brain, namely, neural networks and oscillatory dynamics. Therefore, we propose a new framework for extending these previous findings to a network perspective by integrating multimodal imaging, electrophysiology, and neural oscillations. In particular, we provide concrete examples of how functional and structural connectivity can be used to model simple neural circuits exerting a modulatory influence on AC activity. In addition, we describe how such a network approach can be used for better comprehending the beneficial effects of music training on more complex speech functions, such as word learning. © 2018 New York Academy of Sciences.

  3. Identifying Cortical Lateralization of Speech Processing in Infants Using Near-Infrared Spectroscopy

    Science.gov (United States)

    Bortfeld, Heather; Fava, Eswen; Boas, David A.

    2010-01-01

    We investigate the utility of near-infrared spectroscopy (NIRS) as an alternative technique for studying infant speech processing. NIRS is an optical imaging technology that uses relative changes in total hemoglobin concentration and oxygenation as an indicator of neural activation. Procedurally, NIRS has the advantage over more common methods (e.g., fMRI) in that it can be used to study the neural responses of behaviorally active infants. Older infants (aged 6–9 months) were allowed to sit on their caretakers’ laps during stimulus presentation to determine relative differences in focal activity in the temporal region of the brain during speech processing. Results revealed a dissociation of sensory-specific processing in two cortical regions, the left and right temporal lobes. These findings are consistent with those obtained using other neurophysiological methods and point to the utility of NIRS as a means of establishing neural correlates of language development in older (and more active) infants. PMID:19142766

  4. Specialization in audiovisual speech perception: a replication study

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    Speech perception is audiovisual as evidenced by bimodal integration in the McGurk effect. This integration effect may be specific to speech or be applied to all stimuli in general. To investigate this, Tuomainen et al. (2005) used sine-wave speech, which naïve observers may perceive as non......-speech, but hear as speech once informed of the linguistic origin of the signal. Combinations of sine-wave speech and incongruent video of the talker elicited a McGurk effect only for informed observers. This indicates that the audiovisual integration effect is specific to speech perception. However, observers...... that observers did look near the mouth. We conclude that eye-movements did not influence the results of Tuomainen et al. and that their results thus can be taken as evidence of a speech specific mode of audiovisual integration underlying the McGurk illusion....

  5. The role of high-level processes for oscillatory phase entrainment to speech sound

    Directory of Open Access Journals (Sweden)

    Benedikt eZoefel

    2015-12-01

    Full Text Available Constantly bombarded with input, the brain has the need to filter out relevant information while ignoring the irrelevant rest. A powerful tool may be represented by neural oscillations which entrain their high-excitability phase to important input while their low-excitability phase attenuates irrelevant information. Indeed, the alignment between brain oscillations and speech improves intelligibility and helps dissociating speakers during a cocktail party. Although well-investigated, the contribution of low- and high-level processes to phase entrainment to speech sound has only recently begun to be understood. Here, we review those findings, and concentrate on three main results: (1 Phase entrainment to speech sound is modulated by attention or predictions, likely supported by top-down signals and indicating higher-level processes involved in the brain’s adjustment to speech. (2 As phase entrainment to speech can be observed without systematic fluctuations in sound amplitude or spectral content, it does not only reflect a passive steady-state ringing of the cochlea, but entails a higher-level process. (3 The role of intelligibility for phase entrainment is debated. Recent results suggest that intelligibility modulates the behavioral consequences of entrainment, rather than directly affecting the strength of entrainment in auditory regions. We conclude that phase entrainment to speech reflects a sophisticated mechanism: Several high-level processes interact to optimally align neural oscillations with predicted events of high relevance, even when they are hidden in a continuous stream of background noise.

  6. Mild developmental foreign accent syndrome and psychiatric comorbidity: Altered white matter integrity in speech and emotion regulation networks

    Directory of Open Access Journals (Sweden)

    Marcelo L Berthier

    2016-08-01

    Full Text Available Foreign accent syndrome (FAS is a speech disorder that is defined by the emergence of a peculiar manner of articulation and intonation which is perceived as foreign. In most cases of acquired FAS (AFAS the new accent is secondary to small focal lesions involving components of the bilaterally distributed neural network for speech production. In the past few years FAS has also been described in different psychiatric conditions (conversion disorder, bipolar disorder, schizophrenia as well as in developmental disorders (specific language impairment, apraxia of speech. In the present study, two adult males, one with atypical phonetic production and the other one with cluttering, reported having developmental FAS (DFAS since their adolescence. Perceptual analysis by naïve judges could not confirm the presence of foreign accent, possibly due to the mildness of the speech disorder. However, detailed linguistic analysis provided evidence of prosodic and segmental errors previously reported in AFAS cases. Cognitive testing showed reduced communication in activities of daily living and mild deficits related to psychiatric disorders. Psychiatric evaluation revealed long-lasting internalizing disorders (neuroticism, anxiety, obsessive-compulsive disorder, social phobia, depression, alexithymia, hopelessness, and apathy in both subjects. Diffusion tensor imaging (DTI data from each subject with DFAS were compared with data from a group of 21 age- and gender-matched healthy control subjects. Diffusion parameters (MD, AD, and RD in predefined regions of interest showed changes of white matter microstructure in regions previously related with AFAS and psychiatric disorders. In conclusion, the present findings militate against the possibility that these two subjects have FAS of psychogenic origin. Rather, our findings provide evidence that mild DFAS occurring in the context of subtle, yet persistent, developmental speech disorders may be associated with

  7. The development and validation of the speech quality instrument.

    Science.gov (United States)

    Chen, Stephanie Y; Griffin, Brianna M; Mancuso, Dean; Shiau, Stephanie; DiMattia, Michelle; Cellum, Ilana; Harvey Boyd, Kelly; Prevoteau, Charlotte; Kohlberg, Gavriel D; Spitzer, Jaclyn B; Lalwani, Anil K

    2017-12-08

    Although speech perception tests are available to evaluate hearing, there is no standardized validated tool to quantify speech quality. The objective of this study is to develop a validated tool to measure quality of speech heard. Prospective instrument validation study of 35 normal hearing adults recruited at a tertiary referral center. Participants listened to 44 speech clips of male/female voices reciting the Rainbow Passage. Speech clips included original and manipulated excerpts capturing goal qualities such as mechanical and garbled. Listeners rated clips on a 10-point visual analog scale (VAS) of 18 characteristics (e.g. cartoonish, garbled). Skewed distribution analysis identified mean ratings in the upper and lower 2-point limits of the VAS (ratings of 8-10, 0-2, respectively); items with inconsistent responses were eliminated. The test was pruned to a final instrument of nine speech clips that clearly define qualities of interest: speech-like, male/female, cartoonish, echo-y, garbled, tinny, mechanical, rough, breathy, soothing, hoarse, like, pleasant, natural. Mean ratings were highest for original female clips (8.8) and lowest for not-speech manipulation (2.1). Factor analysis identified two subsets of characteristics: internal consistency demonstrated Cronbach's alpha of 0.95 and 0.82 per subset. Test-retest reliability of total scores was high, with an intraclass correlation coefficient of 0.76. The Speech Quality Instrument (SQI) is a concise, valid tool for assessing speech quality as an indicator for hearing performance. SQI may be a valuable outcome measure for cochlear implant recipients who, despite achieving excellent speech perception, often experience poor speech quality. 2b. Laryngoscope, 2017. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.

  8. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...

  9. Intonational speech prosody encoding in the human auditory cortex.

    Science.gov (United States)

    Tang, C; Hamilton, L S; Chang, E F

    2017-08-25

    Speakers of all human languages regularly use intonational pitch to convey linguistic meaning, such as to emphasize a particular word. Listeners extract pitch movements from speech and evaluate the shape of intonation contours independent of each speaker's pitch range. We used high-density electrocorticography to record neural population activity directly from the brain surface while participants listened to sentences that varied in intonational pitch contour, phonetic content, and speaker. Cortical activity at single electrodes over the human superior temporal gyrus selectively represented intonation contours. These electrodes were intermixed with, yet functionally distinct from, sites that encoded different information about phonetic features or speaker identity. Furthermore, the representation of intonation contours directly reflected the encoding of speaker-normalized relative pitch but not absolute pitch. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

  10. Partially overlapping sensorimotor networks underlie speech praxis and verbal short-term memory: evidence from apraxia of speech following acute stroke.

    Science.gov (United States)

    Hickok, Gregory; Rogalsky, Corianne; Chen, Rong; Herskovits, Edward H; Townsley, Sarah; Hillis, Argye E

    2014-01-01

    We tested the hypothesis that motor planning and programming of speech articulation and verbal short-term memory (vSTM) depend on partially overlapping networks of neural regions. We evaluated this proposal by testing 76 individuals with acute ischemic stroke for impairment in motor planning of speech articulation (apraxia of speech, AOS) and vSTM in the first day of stroke, before the opportunity for recovery or reorganization of structure-function relationships. We also evaluated areas of both infarct and low blood flow that might have contributed to AOS or impaired vSTM in each person. We found that AOS was associated with tissue dysfunction in motor-related areas (posterior primary motor cortex, pars opercularis; premotor cortex, insula) and sensory-related areas (primary somatosensory cortex, secondary somatosensory cortex, parietal operculum/auditory cortex); while impaired vSTM was associated with primarily motor-related areas (pars opercularis and pars triangularis, premotor cortex, and primary motor cortex). These results are consistent with the hypothesis, also supported by functional imaging data, that both speech praxis and vSTM rely on partially overlapping networks of brain regions.

  11. Why the Left Hemisphere Is Dominant for Speech Production: Connecting the Dots

    Directory of Open Access Journals (Sweden)

    Harvey Martin Sussman

    2015-12-01

    Full Text Available Evidence from seemingly disparate areas of speech/language research is reviewed to form a unified theoretical account for why the left hemisphere is specialized for speech production. Research findings from studies investigating hemispheric lateralization of infant babbling, the primacy of the syllable in phonological structure, rhyming performance in split-brain patients, rhyming ability and phonetic categorization in children diagnosed with developmental apraxia of speech, rules governing exchange errors in spoonerisms, organizational principles of neocortical control of learned motor behaviors, and multi-electrode recordings of human neuronal responses to speech sounds are described and common threads highlighted. It is suggested that the emergence, in developmental neurogenesis, of a hard-wired, syllabically-organized, neural substrate representing the phonemic sound elements of one’s language, particularly the vocalic nucleus, is the crucial factor underlying the left hemisphere’s dominance for speech production.

  12. Deep Learning Neural Networks in Cybersecurity - Managing Malware with AI

    OpenAIRE

    Rayle, Keith

    2017-01-01

    There’s a lot of talk about the benefits of deep learning (neural networks) and how it’s the new electricity that will power us into the future. Medical diagnosis, computer vision and speech recognition are all examples of use-cases where neural networks are being applied in our everyday business environment. This begs the question…what are the uses of neural-network applications for cyber security? How does the AI process work when applying neural networks to detect malicious software bombar...

  13. Cascaded bidirectional recurrent neural networks for protein secondary structure prediction.

    Science.gov (United States)

    Chen, Jinmiao; Chaudhari, Narendra

    2007-01-01

    Protein secondary structure (PSS) prediction is an important topic in bioinformatics. Our study on a large set of non-homologous proteins shows that long-range interactions commonly exist and negatively affect PSS prediction. Besides, we also reveal strong correlations between secondary structure (SS) elements. In order to take into account the long-range interactions and SS-SS correlations, we propose a novel prediction system based on cascaded bidirectional recurrent neural network (BRNN). We compare the cascaded BRNN against another two BRNN architectures, namely the original BRNN architecture used for speech recognition as well as Pollastri's BRNN that was proposed for PSS prediction. Our cascaded BRNN achieves an overall three state accuracy Q3 of 74.38\\%, and reaches a high Segment OVerlap (SOV) of 66.0455. It outperforms the original BRNN and Pollastri's BRNN in both Q3 and SOV. Specifically, it improves the SOV score by 4-6%.

  14. The DIVA model: A neural theory of speech acquisition and production.

    Science.gov (United States)

    Tourville, Jason A; Guenther, Frank H

    2011-01-01

    The DIVA model of speech production provides a computationally and neuroanatomically explicit account of the network of brain regions involved in speech acquisition and production. An overview of the model is provided along with descriptions of the computations performed in the different brain regions represented in the model. The latest version of the model, which contains a new right-lateralized feedback control map in ventral premotor cortex, will be described, and experimental results that motivated this new model component will be discussed. Application of the model to the study and treatment of communication disorders will also be briefly described.

  15. A universal multilingual weightless neural network tagger via quantitative linguistics.

    Science.gov (United States)

    Carneiro, Hugo C C; Pedreira, Carlos E; França, Felipe M G; Lima, Priscila M V

    2017-07-01

    In the last decade, given the availability of corpora in several distinct languages, research on multilingual part-of-speech tagging started to grow. Amongst the novelties there is mWANN-Tagger (multilingual weightless artificial neural network tagger), a weightless neural part-of-speech tagger capable of being used for mostly-suffix-oriented languages. The tagger was subjected to corpora in eight languages of quite distinct natures and had a remarkable accuracy with very low sample deviation in every one of them, indicating the robustness of weightless neural systems for part-of-speech tagging tasks. However, mWANN-Tagger needed to be tuned for every new corpus, since each one required a different parameter configuration. For mWANN-Tagger to be truly multilingual, it should be usable for any new language with no need of parameter tuning. This article proposes a study that aims to find a relation between the lexical diversity of a language and the parameter configuration that would produce the best performing mWANN-Tagger instance. Preliminary analyses suggested that a single parameter configuration may be applied to the eight aforementioned languages. The mWANN-Tagger instance produced by this configuration was as accurate as the language-dependent ones obtained through tuning. Afterwards, the weightless neural tagger was further subjected to new corpora in languages that range from very isolating to polysynthetic ones. The best performing instances of mWANN-Tagger are again the ones produced by the universal parameter configuration. Hence, mWANN-Tagger can be applied to new corpora with no need of parameter tuning, making it a universal multilingual part-of-speech tagger. Further experiments with Universal Dependencies treebanks reveal that mWANN-Tagger may be extended and that it has potential to outperform most state-of-the-art part-of-speech taggers if better word representations are provided. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Binaural speech discrimination under noise in hearing-impaired listeners

    Science.gov (United States)

    Kumar, K. V.; Rao, A. B.

    1988-01-01

    This paper presents the results of an assessment of speech discrimination by hearing-impaired listeners (sensori-neural, conductive, and mixed groups) under binaural free-field listening in the presence of background noise. Subjects with pure-tone thresholds greater than 20 dB in 0.5, 1.0 and 2.0 kHz were presented with a version of the W-22 list of phonetically balanced words under three conditions: (1) 'quiet', with the chamber noise below 28 dB and speech at 60 dB; (2) at a constant S/N ratio of +10 dB, and with a background white noise at 70 dB; and (3) same as condition (2), but with the background noise at 80 dB. The mean speech discrimination scores decreased significantly with noise in all groups. However, the decrease in binaural speech discrimination scores with an increase in hearing impairment was less for material presented under the noise conditions than for the material presented in quiet.

  17. Age-related changes in the functional neuroanatomy of overt speech production.

    Science.gov (United States)

    Sörös, Peter; Bose, Arpita; Sokoloff, Lisa Guttman; Graham, Simon J; Stuss, Donald T

    2011-08-01

    Alterations of existing neural networks during healthy aging, resulting in behavioral deficits and changes in brain activity, have been described for cognitive, motor, and sensory functions. To investigate age-related changes in the neural circuitry underlying overt non-lexical speech production, functional MRI was performed in 14 healthy younger (21-32 years) and 14 healthy older individuals (62-84 years). The experimental task involved the acoustically cued overt production of the vowel /a/ and the polysyllabic utterance /pataka/. In younger and older individuals, overt speech production was associated with the activation of a widespread articulo-phonological network, including the primary motor cortex, the supplementary motor area, the cingulate motor areas, and the posterior superior temporal cortex, similar in the /a/ and /pataka/ condition. An analysis of variance with the factors age and condition revealed a significant main effect of age. Irrespective of the experimental condition, significantly greater activation was found in the bilateral posterior superior temporal cortex, the posterior temporal plane, and the transverse temporal gyri in younger compared to older individuals. Significantly greater activation was found in the bilateral middle temporal gyri, medial frontal gyri, middle frontal gyri, and inferior frontal gyri in older vs. younger individuals. The analysis of variance did not reveal a significant main effect of condition and no significant interaction of age and condition. These results suggest a complex reorganization of neural networks dedicated to the production of speech during healthy aging. Copyright © 2009 Elsevier Inc. All rights reserved.

  18. Neuroscience-inspired computational systems for speech recognition under noisy conditions

    Science.gov (United States)

    Schafer, Phillip B.

    Humans routinely recognize speech in challenging acoustic environments with background music, engine sounds, competing talkers, and other acoustic noise. However, today's automatic speech recognition (ASR) systems perform poorly in such environments. In this dissertation, I present novel methods for ASR designed to approach human-level performance by emulating the brain's processing of sounds. I exploit recent advances in auditory neuroscience to compute neuron-based representations of speech, and design novel methods for decoding these representations to produce word transcriptions. I begin by considering speech representations modeled on the spectrotemporal receptive fields of auditory neurons. These representations can be tuned to optimize a variety of objective functions, which characterize the response properties of a neural population. I propose an objective function that explicitly optimizes the noise invariance of the neural responses, and find that it gives improved performance on an ASR task in noise compared to other objectives. The method as a whole, however, fails to significantly close the performance gap with humans. I next consider speech representations that make use of spiking model neurons. The neurons in this method are feature detectors that selectively respond to spectrotemporal patterns within short time windows in speech. I consider a number of methods for training the response properties of the neurons. In particular, I present a method using linear support vector machines (SVMs) and show that this method produces spikes that are robust to additive noise. I compute the spectrotemporal receptive fields of the neurons for comparison with previous physiological results. To decode the spike-based speech representations, I propose two methods designed to work on isolated word recordings. The first method uses a classical ASR technique based on the hidden Markov model. The second method is a novel template-based recognition scheme that takes

  19. Speech-to-Speech Relay Service

    Science.gov (United States)

    Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...

  20. Altered resting-state network connectivity in stroke patients with and without apraxia of speech.

    Science.gov (United States)

    New, Anneliese B; Robin, Donald A; Parkinson, Amy L; Duffy, Joseph R; McNeil, Malcom R; Piguet, Olivier; Hornberger, Michael; Price, Cathy J; Eickhoff, Simon B; Ballard, Kirrie J

    2015-01-01

    Motor speech disorders, including apraxia of speech (AOS), account for over 50% of the communication disorders following stroke. Given its prevalence and impact, and the need to understand its neural mechanisms, we used resting state functional MRI to examine functional connectivity within a network of regions previously hypothesized as being associated with AOS (bilateral anterior insula (aINS), inferior frontal gyrus (IFG), and ventral premotor cortex (PM)) in a group of 32 left hemisphere stroke patients and 18 healthy, age-matched controls. Two expert clinicians rated severity of AOS, dysarthria and nonverbal oral apraxia of the patients. Fifteen individuals were categorized as AOS and 17 were AOS-absent. Comparison of connectivity in patients with and without AOS demonstrated that AOS patients had reduced connectivity between bilateral PM, and this reduction correlated with the severity of AOS impairment. In addition, AOS patients had negative connectivity between the left PM and right aINS and this effect decreased with increasing severity of non-verbal oral apraxia. These results highlight left PM involvement in AOS, begin to differentiate its neural mechanisms from those of other motor impairments following stroke, and help inform us of the neural mechanisms driving differences in speech motor planning and programming impairment following stroke.

  1. Altered resting-state network connectivity in stroke patients with and without apraxia of speech

    Directory of Open Access Journals (Sweden)

    Anneliese B. New

    2015-01-01

    Full Text Available Motor speech disorders, including apraxia of speech (AOS, account for over 50% of the communication disorders following stroke. Given its prevalence and impact, and the need to understand its neural mechanisms, we used resting state functional MRI to examine functional connectivity within a network of regions previously hypothesized as being associated with AOS (bilateral anterior insula (aINS, inferior frontal gyrus (IFG, and ventral premotor cortex (PM in a group of 32 left hemisphere stroke patients and 18 healthy, age-matched controls. Two expert clinicians rated severity of AOS, dysarthria and nonverbal oral apraxia of the patients. Fifteen individuals were categorized as AOS and 17 were AOS-absent. Comparison of connectivity in patients with and without AOS demonstrated that AOS patients had reduced connectivity between bilateral PM, and this reduction correlated with the severity of AOS impairment. In addition, AOS patients had negative connectivity between the left PM and right aINS and this effect decreased with increasing severity of non-verbal oral apraxia. These results highlight left PM involvement in AOS, begin to differentiate its neural mechanisms from those of other motor impairments following stroke, and help inform us of the neural mechanisms driving differences in speech motor planning and programming impairment following stroke.

  2. Inner Speech: Development, Cognitive Functions, Phenomenology, and Neurobiology

    Science.gov (United States)

    2015-01-01

    Inner speech—also known as covert speech or verbal thinking—has been implicated in theories of cognitive development, speech monitoring, executive function, and psychopathology. Despite a growing body of knowledge on its phenomenology, development, and function, approaches to the scientific study of inner speech have remained diffuse and largely unintegrated. This review examines prominent theoretical approaches to inner speech and methodological challenges in its study, before reviewing current evidence on inner speech in children and adults from both typical and atypical populations. We conclude by considering prospects for an integrated cognitive science of inner speech, and present a multicomponent model of the phenomenon informed by developmental, cognitive, and psycholinguistic considerations. Despite its variability among individuals and across the life span, inner speech appears to perform significant functions in human cognition, which in some cases reflect its developmental origins and its sharing of resources with other cognitive processes. PMID:26011789

  3. Neural Processing of What and Who Information in Speech

    Science.gov (United States)

    Chandrasekaran, Bharath; Chan, Alice H. D.; Wong, Patrick C. M.

    2011-01-01

    Human speech is composed of two types of information, related to content (lexical information, i.e., "what" is being said [e.g., words]) and to the speaker (indexical information, i.e., "who" is talking [e.g., voices]). The extent to which lexical versus indexical information is represented separately or integrally in the brain is unresolved. In…

  4. Development of a Low-Cost, Noninvasive, Portable Visual Speech Recognition Program.

    Science.gov (United States)

    Kohlberg, Gavriel D; Gal, Ya'akov Kobi; Lalwani, Anil K

    2016-09-01

    Loss of speech following tracheostomy and laryngectomy severely limits communication to simple gestures and facial expressions that are largely ineffective. To facilitate communication in these patients, we seek to develop a low-cost, noninvasive, portable, and simple visual speech recognition program (VSRP) to convert articulatory facial movements into speech. A Microsoft Kinect-based VSRP was developed to capture spatial coordinates of lip movements and translate them into speech. The articulatory speech movements associated with 12 sentences were used to train an artificial neural network classifier. The accuracy of the classifier was then evaluated on a separate, previously unseen set of articulatory speech movements. The VSRP was successfully implemented and tested in 5 subjects. It achieved an accuracy rate of 77.2% (65.0%-87.6% for the 5 speakers) on a 12-sentence data set. The mean time to classify an individual sentence was 2.03 milliseconds (1.91-2.16). We have demonstrated the feasibility of a low-cost, noninvasive, portable VSRP based on Kinect to accurately predict speech from articulation movements in clinically trivial time. This VSRP could be used as a novel communication device for aphonic patients. © The Author(s) 2016.

  5. Language/Culture Modulates Brain and Gaze Processes in Audiovisual Speech Perception.

    Science.gov (United States)

    Hisanaga, Satoko; Sekiyama, Kaoru; Igasaki, Tomohiko; Murayama, Nobuki

    2016-10-13

    Several behavioural studies have shown that the interplay between voice and face information in audiovisual speech perception is not universal. Native English speakers (ESs) are influenced by visual mouth movement to a greater degree than native Japanese speakers (JSs) when listening to speech. However, the biological basis of these group differences is unknown. Here, we demonstrate the time-varying processes of group differences in terms of event-related brain potentials (ERP) and eye gaze for audiovisual and audio-only speech perception. On a behavioural level, while congruent mouth movement shortened the ESs' response time for speech perception, the opposite effect was observed in JSs. Eye-tracking data revealed a gaze bias to the mouth for the ESs but not the JSs, especially before the audio onset. Additionally, the ERP P2 amplitude indicated that ESs processed multisensory speech more efficiently than auditory-only speech; however, the JSs exhibited the opposite pattern. Taken together, the ESs' early visual attention to the mouth was likely to promote phonetic anticipation, which was not the case for the JSs. These results clearly indicate the impact of language and/or culture on multisensory speech processing, suggesting that linguistic/cultural experiences lead to the development of unique neural systems for audiovisual speech perception.

  6. Evolution of non-speech sound memory in postlingual deafness: implications for cochlear implant rehabilitation.

    Science.gov (United States)

    Lazard, D S; Giraud, A L; Truy, E; Lee, H J

    2011-07-01

    Neurofunctional patterns assessed before or after cochlear implantation (CI) are informative markers of implantation outcome. Because phonological memory reorganization in post-lingual deafness is predictive of the outcome, we investigated, using a cross-sectional approach, whether memory of non-speech sounds (NSS) produced by animals or objects (i.e. non-human sounds) is also reorganized, and how this relates to speech perception after CI. We used an fMRI auditory imagery task in which sounds were evoked by pictures of noisy items for post-lingual deaf candidates for CI and for normal-hearing subjects. When deaf subjects imagined sounds, the left inferior frontal gyrus, the right posterior temporal gyrus and the right amygdala were less activated compared to controls. Activity levels in these regions decreased with duration of auditory deprivation, indicating declining NSS representations. Whole brain correlations with duration of auditory deprivation and with speech scores after CI showed an activity decline in dorsal, fronto-parietal, cortical regions, and an activity increase in ventral cortical regions, the right anterior temporal pole and the hippocampal gyrus. Both dorsal and ventral reorganizations predicted poor speech perception outcome after CI. These results suggest that post-CI speech perception relies, at least partially, on the integrity of a neural system used for processing NSS that is based on audio-visual and articulatory mapping processes. When this neural system is reorganized, post-lingual deaf subjects resort to inefficient semantic- and memory-based strategies. These results complement those of other studies on speech processing, suggesting that both speech and NSS representations need to be maintained during deafness to ensure the success of CI. Copyright © 2011 Elsevier Ltd. All rights reserved.

  7. Brainstem encoding of speech and musical stimuli in congenital amusia: Evidence from Cantonese speakers

    Directory of Open Access Journals (Sweden)

    Fang eLiu

    2015-01-01

    Full Text Available Congenital amusia is a neurodevelopmental disorder of musical processing that also impacts subtle aspects of speech processing. It remains debated at what stage(s of auditory processing deficits in amusia arise. In this study, we investigated whether amusia originates from impaired subcortical encoding of speech (in quiet and noise and musical sounds in the brainstem. Fourteen Cantonese-speaking amusics and 14 matched controls passively listened to six Cantonese lexical tones in quiet, two Cantonese tones in noise (signal-to-noise ratios at 0 and 20 dB, and two cello tones in quiet while their frequency-following responses (FFRs to these tones were recorded. All participants also completed a behavioral lexical tone identification task. The results indicated normal brainstem encoding of pitch in speech (in quiet and noise and musical stimuli in amusics relative to controls, as measured by FFR pitch strength, pitch error, and stimulus-to-response correlation. There was also no group difference in neural conduction time or FFR amplitudes. Both groups demonstrated better FFRs to speech (in quiet and noise than to musical stimuli. However, a significant group difference was observed for tone identification, with amusics showing significantly lower accuracy than controls. Analysis of the tone confusion matrices suggested that amusics were more likely than controls to confuse between tones that shared similar acoustic features. Interestingly, this deficit in lexical tone identification was not coupled with brainstem abnormality for either speech or musical stimuli. Together, our results suggest that the amusic brainstem is not functioning abnormally, although higher-order linguistic pitch processing is impaired in amusia. This finding has significant implications for theories of central auditory processing, requiring further investigations into how different stages of auditory processing interact in the human brain.

  8. Brainstem encoding of speech and musical stimuli in congenital amusia: evidence from Cantonese speakers.

    Science.gov (United States)

    Liu, Fang; Maggu, Akshay R; Lau, Joseph C Y; Wong, Patrick C M

    2014-01-01

    Congenital amusia is a neurodevelopmental disorder of musical processing that also impacts subtle aspects of speech processing. It remains debated at what stage(s) of auditory processing deficits in amusia arise. In this study, we investigated whether amusia originates from impaired subcortical encoding of speech (in quiet and noise) and musical sounds in the brainstem. Fourteen Cantonese-speaking amusics and 14 matched controls passively listened to six Cantonese lexical tones in quiet, two Cantonese tones in noise (signal-to-noise ratios at 0 and 20 dB), and two cello tones in quiet while their frequency-following responses (FFRs) to these tones were recorded. All participants also completed a behavioral lexical tone identification task. The results indicated normal brainstem encoding of pitch in speech (in quiet and noise) and musical stimuli in amusics relative to controls, as measured by FFR pitch strength, pitch error, and stimulus-to-response correlation. There was also no group difference in neural conduction time or FFR amplitudes. Both groups demonstrated better FFRs to speech (in quiet and noise) than to musical stimuli. However, a significant group difference was observed for tone identification, with amusics showing significantly lower accuracy than controls. Analysis of the tone confusion matrices suggested that amusics were more likely than controls to confuse between tones that shared similar acoustic features. Interestingly, this deficit in lexical tone identification was not coupled with brainstem abnormality for either speech or musical stimuli. Together, our results suggest that the amusic brainstem is not functioning abnormally, although higher-order linguistic pitch processing is impaired in amusia. This finding has significant implications for theories of central auditory processing, requiring further investigations into how different stages of auditory processing interact in the human brain.

  9. Brainstem encoding of speech and musical stimuli in congenital amusia: evidence from Cantonese speakers

    Science.gov (United States)

    Liu, Fang; Maggu, Akshay R.; Lau, Joseph C. Y.; Wong, Patrick C. M.

    2015-01-01

    Congenital amusia is a neurodevelopmental disorder of musical processing that also impacts subtle aspects of speech processing. It remains debated at what stage(s) of auditory processing deficits in amusia arise. In this study, we investigated whether amusia originates from impaired subcortical encoding of speech (in quiet and noise) and musical sounds in the brainstem. Fourteen Cantonese-speaking amusics and 14 matched controls passively listened to six Cantonese lexical tones in quiet, two Cantonese tones in noise (signal-to-noise ratios at 0 and 20 dB), and two cello tones in quiet while their frequency-following responses (FFRs) to these tones were recorded. All participants also completed a behavioral lexical tone identification task. The results indicated normal brainstem encoding of pitch in speech (in quiet and noise) and musical stimuli in amusics relative to controls, as measured by FFR pitch strength, pitch error, and stimulus-to-response correlation. There was also no group difference in neural conduction time or FFR amplitudes. Both groups demonstrated better FFRs to speech (in quiet and noise) than to musical stimuli. However, a significant group difference was observed for tone identification, with amusics showing significantly lower accuracy than controls. Analysis of the tone confusion matrices suggested that amusics were more likely than controls to confuse between tones that shared similar acoustic features. Interestingly, this deficit in lexical tone identification was not coupled with brainstem abnormality for either speech or musical stimuli. Together, our results suggest that the amusic brainstem is not functioning abnormally, although higher-order linguistic pitch processing is impaired in amusia. This finding has significant implications for theories of central auditory processing, requiring further investigations into how different stages of auditory processing interact in the human brain. PMID:25646077

  10. Processing of prosodic changes in natural speech stimuli in school-age children.

    Science.gov (United States)

    Lindström, R; Lepistö, T; Makkonen, T; Kujala, T

    2012-12-01

    Speech prosody conveys information about important aspects of communication: the meaning of the sentence and the emotional state or intention of the speaker. The present study addressed processing of emotional prosodic changes in natural speech stimuli in school-age children (mean age 10 years) by recording the electroencephalogram, facial electromyography, and behavioral responses. The stimulus was a semantically neutral Finnish word uttered with four different emotional connotations: neutral, commanding, sad, and scornful. In the behavioral sound-discrimination task the reaction times were fastest for the commanding stimulus and longest for the scornful stimulus, and faster for the neutral than for the sad stimulus. EEG and EMG responses were measured during non-attentive oddball paradigm. Prosodic changes elicited a negative-going, fronto-centrally distributed neural response peaking at about 500 ms from the onset of the stimulus, followed by a fronto-central positive deflection, peaking at about 740 ms. For the commanding stimulus also a rapid negative deflection peaking at about 290 ms from stimulus onset was elicited. No reliable stimulus type specific rapid facial reactions were found. The results show that prosodic changes in natural speech stimuli activate pre-attentive neural change-detection mechanisms in school-age children. However, the results do not support the suggestion of automaticity of emotion specific facial muscle responses to non-attended emotional speech stimuli in children. Copyright © 2012 Elsevier B.V. All rights reserved.

  11. Concurrent Speech Segregation Problems in Hearing Impaired Children

    Directory of Open Access Journals (Sweden)

    Hossein Talebi

    2014-04-01

    Full Text Available Objective: This study was a basic investigation of the ability of concurrent speech segregation in hearing impaired children. Concurrent segregation is one of the fundamental components of auditory scene analysis and plays an important role in speech perception. In the present study, we compared auditory late responses or ALRs between hearing impaired and normal children. Materials & Methods: Auditory late potentials in response to 12 double vowels were recorded in 10 children with moderate to severe sensory neural hearing loss and 10 normal children. Double vowels (pairs of synthetic vowels were presented concurrently and binaurally. Fundamental frequency (F0 of these vowels and the size of the difference in F0 between vowels was 100 Hz and 0.5 semitones respectively. Results: Comparing N1-P2 amplitude showed statistically significant difference in some stimuli between hearing impaired and normal children (P<0.05. This complex indexing the vowel change detection and reflecting central auditory speech representation without active client participation was decreased in hearing impaired children. Conclusion: This study showed problems in concurrent speech segregation in hearing impaired children evidenced by ALRs. This information indicated deficiencies in bottom-up processing of speech characteristics based on F0 and its differences in these children.

  12. Knockdown of the dyslexia-associated gene Kiaa0319 impairs temporal responses to speech stimuli in rat primary auditory cortex.

    Science.gov (United States)

    Centanni, T M; Booker, A B; Sloan, A M; Chen, F; Maher, B J; Carraway, R S; Khodaparast, N; Rennaker, R; LoTurco, J J; Kilgard, M P

    2014-07-01

    One in 15 school age children have dyslexia, which is characterized by phoneme-processing problems and difficulty learning to read. Dyslexia is associated with mutations in the gene KIAA0319. It is not known whether reduced expression of KIAA0319 can degrade the brain's ability to process phonemes. In the current study, we used RNA interference (RNAi) to reduce expression of Kiaa0319 (the rat homolog of the human gene KIAA0319) and evaluate the effect in a rat model of phoneme discrimination. Speech discrimination thresholds in normal rats are nearly identical to human thresholds. We recorded multiunit neural responses to isolated speech sounds in primary auditory cortex (A1) of rats that received in utero RNAi of Kiaa0319. Reduced expression of Kiaa0319 increased the trial-by-trial variability of speech responses and reduced the neural discrimination ability of speech sounds. Intracellular recordings from affected neurons revealed that reduced expression of Kiaa0319 increased neural excitability and input resistance. These results provide the first evidence that decreased expression of the dyslexia-associated gene Kiaa0319 can alter cortical responses and impair phoneme processing in auditory cortex. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Perceived Conventionality in Co-speech Gestures Involves the Fronto-Temporal Language Network

    Directory of Open Access Journals (Sweden)

    Dhana Wolf

    2017-11-01

    Full Text Available Face-to-face communication is multimodal; it encompasses spoken words, facial expressions, gaze, and co-speech gestures. In contrast to linguistic symbols (e.g., spoken words or signs in sign language relying on mostly explicit conventions, gestures vary in their degree of conventionality. Bodily signs may have a general accepted or conventionalized meaning (e.g., a head shake or less so (e.g., self-grooming. We hypothesized that subjective perception of conventionality in co-speech gestures relies on the classical language network, i.e., the left hemispheric inferior frontal gyrus (IFG, Broca's area and the posterior superior temporal gyrus (pSTG, Wernicke's area and studied 36 subjects watching video-recorded story retellings during a behavioral and an functional magnetic resonance imaging (fMRI experiment. It is well documented that neural correlates of such naturalistic videos emerge as intersubject covariance (ISC in fMRI even without involving a stimulus (model-free analysis. The subjects attended either to perceived conventionality or to a control condition (any hand movements or gesture-speech relations. Such tasks modulate ISC in contributing neural structures and thus we studied ISC changes to task demands in language networks. Indeed, the conventionality task significantly increased covariance of the button press time series and neuronal synchronization in the left IFG over the comparison with other tasks. In the left IFG, synchronous activity was observed during the conventionality task only. In contrast, the left pSTG exhibited correlated activation patterns during all conditions with an increase in the conventionality task at the trend level only. Conceivably, the left IFG can be considered a core region for the processing of perceived conventionality in co-speech gestures similar to spoken language. In general, the interpretation of conventionalized signs may rely on neural mechanisms that engage during language comprehension.

  14. A functional near-infrared spectroscopic investigation of speech production during reading.

    Science.gov (United States)

    Wan, Nick; Hancock, Allison S; Moon, Todd K; Gillam, Ronald B

    2018-03-01

    This study was designed to test the extent to which speaking processes related to articulation and voicing influence Functional Near Infrared Spectroscopy (fNIRS) measures of cortical hemodynamics and functional connectivity. Participants read passages in three conditions (oral reading, silent mouthing, and silent reading) while undergoing fNIRS imaging. Area under the curve (AUC) analyses of the oxygenated and deoxygenated hemodynamic response function concentration values were compared for each task across five regions of interest. There were significant region main effects for both oxy and deoxy AUC analyses, and a significant region × task interaction for deoxy AUC favoring the oral reading condition over the silent reading condition for two nonmotor regions. Assessment of functional connectivity using Granger Causality revealed stronger networks between motor areas during oral reading and stronger networks between language areas during silent reading. There was no evidence that the hemodynamic flow from motor areas during oral reading compromised measures of language-related neural activity in nonmotor areas. However, speech movements had small, but measurable effects on fNIRS measures of neural connections between motor and nonmotor brain areas across the perisylvian region, even after wavelet filtering. Therefore, researchers studying speech processes with fNIRS should use wavelet filtering during preprocessing to reduce speech motion artifacts, incorporate a nonspeech communication or language control task into the research design, and conduct a connectivity analysis to adequately assess the impact of functional speech on the hemodynamic response across the perisylvian region. © 2017 Wiley Periodicals, Inc.

  15. Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem.

    Directory of Open Access Journals (Sweden)

    Cai Wingfield

    2017-09-01

    Full Text Available There is widespread interest in the relationship between the neurobiological systems supporting human cognition and emerging computational systems capable of emulating these capacities. Human speech comprehension, poorly understood as a neurobiological process, is an important case in point. Automatic Speech Recognition (ASR systems with near-human levels of performance are now available, which provide a computationally explicit solution for the recognition of words in continuous speech. This research aims to bridge the gap between speech recognition processes in humans and machines, using novel multivariate techniques to compare incremental 'machine states', generated as the ASR analysis progresses over time, to the incremental 'brain states', measured using combined electro- and magneto-encephalography (EMEG, generated as the same inputs are heard by human listeners. This direct comparison of dynamic human and machine internal states, as they respond to the same incrementally delivered sensory input, revealed a significant correspondence between neural response patterns in human superior temporal cortex and the structural properties of ASR-derived phonetic models. Spatially coherent patches in human temporal cortex responded selectively to individual phonetic features defined on the basis of machine-extracted regularities in the speech to lexicon mapping process. These results demonstrate the feasibility of relating human and ASR solutions to the problem of speech recognition, and suggest the potential for further studies relating complex neural computations in human speech comprehension to the rapidly evolving ASR systems that address the same problem domain.

  16. Spatial resolution dependence on spectral frequency in human speech cortex electrocorticography

    Science.gov (United States)

    Muller, Leah; Hamilton, Liberty S.; Edwards, Erik; Bouchard, Kristofer E.; Chang, Edward F.

    2016-10-01

    Objective. Electrocorticography (ECoG) has become an important tool in human neuroscience and has tremendous potential for emerging applications in neural interface technology. Electrode array design parameters are outstanding issues for both research and clinical applications, and these parameters depend critically on the nature of the neural signals to be recorded. Here, we investigate the functional spatial resolution of neural signals recorded at the human cortical surface. We empirically derive spatial spread functions to quantify the shared neural activity for each frequency band of the electrocorticogram. Approach. Five subjects with high-density (4 mm center-to-center spacing) ECoG grid implants participated in speech perception and production tasks while neural activity was recorded from the speech cortex, including superior temporal gyrus, precentral gyrus, and postcentral gyrus. The cortical surface field potential was decomposed into traditional EEG frequency bands. Signal similarity between electrode pairs for each frequency band was quantified using a Pearson correlation coefficient. Main results. The correlation of neural activity between electrode pairs was inversely related to the distance between the electrodes; this relationship was used to quantify spatial falloff functions for cortical subdomains. As expected, lower frequencies remained correlated over larger distances than higher frequencies. However, both the envelope and phase of gamma and high gamma frequencies (30-150 Hz) are largely uncorrelated (<90%) at 4 mm, the smallest spacing of the high-density arrays. Thus, ECoG arrays smaller than 4 mm have significant promise for increasing signal resolution at high frequencies, whereas less additional gain is achieved for lower frequencies. Significance. Our findings quantitatively demonstrate the dependence of ECoG spatial resolution on the neural frequency of interest. We demonstrate that this relationship is consistent across patients and

  17. Student Internet Speech: Where Does the Schoolyard End in the Cyberworld?

    Science.gov (United States)

    Denny, Thomas D.

    2013-01-01

    This study examines student internet speech that originates off-campus but results in discipline from school. The history of the issue of student speech is explored to set the foundation for the current issue. In the absence of a Supreme Court ruling on student off-campus internet speech, cases reaching the Federal level are explored in search of…

  18. Partially Overlapping Sensorimotor Networks Underlie Speech Praxis and Verbal Short-Term Memory: Evidence from Apraxia of Speech Following Acute Stroke

    Directory of Open Access Journals (Sweden)

    Gregory eHickok

    2014-08-01

    Full Text Available We tested the hypothesis that motor planning and programming of speech articulation and verbal short-term memory (vSTM depend on partially overlapping networks of neural regions. We evaluated this proposal by testing 76 individuals with acute ischemic stroke for impairment in motor planning of speech articulation (apraxia of speech; AOS and vSTM in the first day of stroke, before the opportunity for recovery or reorganization of structure-function relationships. We also evaluate areas of both infarct and low blood flow that might have contributed to AOS or impaired vSTM in each person. We found that AOS was associated with tissue dysfunction in motor-related areas (posterior primary motor cortex, pars opercularis; premotor cortex, insula and sensory-related areas (primary somatosensory cortex, secondary somatosensory cortex, parietal operculum/auditory cortex; while impaired vSTM was associated with primarily motor-related areas (pars opercularis and pars triangularis, premotor cortex, and primary motor cortex. These results are consistent with the hypothesis, also supported by functional imaging data, that both speech praxis and vSTM rely on partially overlapping networks of brain regions.

  19. Dissociating speech perception and comprehension at reduced levels of awareness

    Science.gov (United States)

    Davis, Matthew H.; Coleman, Martin R.; Absalom, Anthony R.; Rodd, Jennifer M.; Johnsrude, Ingrid S.; Matta, Basil F.; Owen, Adrian M.; Menon, David K.

    2007-01-01

    We used functional MRI and the anesthetic agent propofol to assess the relationship among neural responses to speech, successful comprehension, and conscious awareness. Volunteers were scanned while listening to sentences containing ambiguous words, matched sentences without ambiguous words, and signal-correlated noise (SCN). During three scanning sessions, participants were nonsedated (awake), lightly sedated (a slowed response to conversation), and deeply sedated (no conversational response, rousable by loud command). Bilateral temporal-lobe responses for sentences compared with signal-correlated noise were observed at all three levels of sedation, although prefrontal and premotor responses to speech were absent at the deepest level of sedation. Additional inferior frontal and posterior temporal responses to ambiguous sentences provide a neural correlate of semantic processes critical for comprehending sentences containing ambiguous words. However, this additional response was absent during light sedation, suggesting a marked impairment of sentence comprehension. A significant decline in postscan recognition memory for sentences also suggests that sedation impaired encoding of sentences into memory, with left inferior frontal and temporal lobe responses during light sedation predicting subsequent recognition memory. These findings suggest a graded degradation of cognitive function in response to sedation such that “higher-level” semantic and mnemonic processes can be impaired at relatively low levels of sedation, whereas perceptual processing of speech remains resilient even during deep sedation. These results have important implications for understanding the relationship between speech comprehension and awareness in the healthy brain in patients receiving sedation and in patients with disorders of consciousness. PMID:17938125

  20. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System.

    Science.gov (United States)

    Partila, Pavol; Voznak, Miroslav; Tovarek, Jaromir

    2015-01-01

    The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.

  1. Neural origins of psychosocial functioning impairments in major depression.

    Science.gov (United States)

    Pulcu, Erdem; Elliott, Rebecca

    2015-09-01

    Major depressive disorder, a complex neuropsychiatric condition, is associated with psychosocial functioning impairments that could become chronic even after symptoms remit. Social functioning impairments in patients could also pose coping difficulties to individuals around them. In this Personal View, we trace the potential neurobiological origins of these impairments down to three candidate domains-namely, social perception and emotion processing, motivation and reward value processing, and social decision making. We argue that the neural basis of abnormalities in these domains could be detectable at different temporal stages during social interactions (eg, before and after decision stages), particularly within frontomesolimbic networks (ie, frontostriatal and amygdala-striatal circuitries). We review some of the experimental designs used to probe these circuits and suggest novel, integrative approaches. We propose that an understanding of the interactions between these domains could provide valuable insights for the clinical stratification of major depressive disorder subtypes and might inform future developments of novel treatment options in return. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. Neural network models of categorical perception.

    Science.gov (United States)

    Damper, R I; Harnad, S R

    2000-05-01

    Studies of the categorical perception (CP) of sensory continua have a long and rich history in psychophysics. In 1977, Macmillan, Kaplan, and Creelman introduced the use of signal detection theory to CP studies. Anderson and colleagues simultaneously proposed the first neural model for CP, yet this line of research has been less well explored. In this paper, we assess the ability of neural-network models of CP to predict the psychophysical performance of real observers with speech sounds and artificial/novel stimuli. We show that a variety of neural mechanisms are capable of generating the characteristics of CP. Hence, CP may not be a special model of perception but an emergent property of any sufficiently powerful general learning system.

  3. Self-organizing map classifier for stressed speech recognition

    Science.gov (United States)

    Partila, Pavol; Tovarek, Jaromir; Voznak, Miroslav

    2016-05-01

    This paper presents a method for detecting speech under stress using Self-Organizing Maps. Most people who are exposed to stressful situations can not adequately respond to stimuli. Army, police, and fire department occupy the largest part of the environment that are typical of an increased number of stressful situations. The role of men in action is controlled by the control center. Control commands should be adapted to the psychological state of a man in action. It is known that the psychological changes of the human body are also reflected physiologically, which consequently means the stress effected speech. Therefore, it is clear that the speech stress recognizing system is required in the security forces. One of the possible classifiers, which are popular for its flexibility, is a self-organizing map. It is one type of the artificial neural networks. Flexibility means independence classifier on the character of the input data. This feature is suitable for speech processing. Human Stress can be seen as a kind of emotional state. Mel-frequency cepstral coefficients, LPC coefficients, and prosody features were selected for input data. These coefficients were selected for their sensitivity to emotional changes. The calculation of the parameters was performed on speech recordings, which can be divided into two classes, namely the stress state recordings and normal state recordings. The benefit of the experiment is a method using SOM classifier for stress speech detection. Results showed the advantage of this method, which is input data flexibility.

  4. An ALE meta-analysis on the audiovisual integration of speech signals.

    Science.gov (United States)

    Erickson, Laura C; Heeg, Elizabeth; Rauschecker, Josef P; Turkeltaub, Peter E

    2014-11-01

    The brain improves speech processing through the integration of audiovisual (AV) signals. Situations involving AV speech integration may be crudely dichotomized into those where auditory and visual inputs contain (1) equivalent, complementary signals (validating AV speech) or (2) inconsistent, different signals (conflicting AV speech). This simple framework may allow the systematic examination of broad commonalities and differences between AV neural processes engaged by various experimental paradigms frequently used to study AV speech integration. We conducted an activation likelihood estimation metaanalysis of 22 functional imaging studies comprising 33 experiments, 311 subjects, and 347 foci examining "conflicting" versus "validating" AV speech. Experimental paradigms included content congruency, timing synchrony, and perceptual measures, such as the McGurk effect or synchrony judgments, across AV speech stimulus types (sublexical to sentence). Colocalization of conflicting AV speech experiments revealed consistency across at least two contrast types (e.g., synchrony and congruency) in a network of dorsal stream regions in the frontal, parietal, and temporal lobes. There was consistency across all contrast types (synchrony, congruency, and percept) in the bilateral posterior superior/middle temporal cortex. Although fewer studies were available, validating AV speech experiments were localized to other regions, such as ventral stream visual areas in the occipital and inferior temporal cortex. These results suggest that while equivalent, complementary AV speech signals may evoke activity in regions related to the corroboration of sensory input, conflicting AV speech signals recruit widespread dorsal stream areas likely involved in the resolution of conflicting sensory signals. Copyright © 2014 Wiley Periodicals, Inc.

  5. Musical intervention enhances infants’ neural processing of temporal structure in music and speech

    OpenAIRE

    Zhao, T. Christina; Kuhl, Patricia K.

    2016-01-01

    Musicians show enhanced musical pitch and meter processing, effects that generalize to speech. Yet potential differences between musicians and nonmusicians limit conclusions. We examined the effects of a randomized laboratory-controlled music intervention on music and speech processing in 9-mo-old infants. The Intervention exposed infants to music in triple meter (the waltz) in a social environment. Controls engaged in similar social play without music. After 12 sessions, infants’ temporal in...

  6. TIMLOGORO - AN INTERACTIVE PLATFORM DESIGN FOR SPEECH THERAPY

    Directory of Open Access Journals (Sweden)

    Georgeta PÂNIȘOARĂ

    2016-12-01

    Full Text Available This article presents some tehnical and pedagogical features of an interactive platforme used for language therapy. Timlogoro project demonstrates that technology is an effective tool in learning and, in particular, a viable solution for improving speech disorders present in different stages of age. A digital platform for different categories of users with speech impairments (children and adults has a good support in pedagogical principles. In speech therapy, the computer was originally used to assess deficiencies. Nowadays it has become a useful tool in language rehabilitation. A few Romanian speech therapists create digital applications that will be used in therapy for recovery.This work was supported by a grant of the Romanian National Authority for Scientific UEFISCDI.

  7. Altered resting-state network connectivity in stroke patients with and without apraxia of speech

    OpenAIRE

    New, Anneliese B.; Robin, Donald A.; Parkinson, Amy L.; Duffy, Joseph R.; McNeil, Malcom R.; Piguet, Olivier; Hornberger, Michael; Price, Cathy J.; Eickhoff, Simon B.; Ballard, Kirrie J.

    2015-01-01

    Motor speech disorders, including apraxia of speech (AOS), account for over 50% of the communication disorders following stroke. Given its prevalence and impact, and the need to understand its neural mechanisms, we used resting state functional MRI to examine functional connectivity within a network of regions previously hypothesized as being associated with AOS (bilateral anterior insula (aINS), inferior frontal gyrus (IFG), and ventral premotor cortex (PM)) in a group of 32 left hemisphere ...

  8. The equilibrium point hypothesis and its application to speech motor control.

    Science.gov (United States)

    Perrier, P; Ostry, D J; Laboissière, R

    1996-04-01

    In this paper, we address a number of issues in speech research in the context of the equilibrium point hypothesis of motor control. The hypothesis suggests that movements arise from shifts in the equilibrium position of the limb or the speech articulator. The equilibrium is a consequence of the interaction of central neural commands, reflex mechanisms, muscle properties, and external loads, but it is under the control of central neural commands. These commands act to shift the equilibrium via centrally specified signals acting at the level of the motoneurone (MN) pool. In the context of a model of sagittal plane jaw and hyoid motion based on the lambda version of the equilibrium point hypothesis, we consider the implications of this hypothesis for the notion of articulatory targets. We suggest that simple linear control signals may underlie smooth articulatory trajectories. We explore as well the phenomenon of intraarticulator coarticulation in jaw movement. We suggest that even when no account is taken of upcoming context, that apparent anticipatory changes in movement amplitude and duration may arise due to dynamics. We also present a number of simulations that show in different ways how variability in measured kinematics can arise in spite of constant magnitude speech control signals.

  9. DEVELOPMENT OF AUTOMATED SPEECH RECOGNITION SYSTEM FOR EGYPTIAN ARABIC PHONE CONVERSATIONS

    Directory of Open Access Journals (Sweden)

    A. N. Romanenko

    2016-07-01

    Full Text Available The paper deals with description of several speech recognition systems for the Egyptian Colloquial Arabic. The research is based on the CALLHOME Egyptian corpus. The description of both systems, classic: based on Hidden Markov and Gaussian Mixture Models, and state-of-the-art: deep neural network acoustic models is given. We have demonstrated the contribution from the usage of speaker-dependent bottleneck features; for their extraction three extractors based on neural networks were trained. For their training three datasets in several languageswere used:Russian, English and differentArabic dialects.We have studied the possibility of application of a small Modern Standard Arabic (MSA corpus to derive phonetic transcriptions. The experiments have shown that application of the extractor obtained on the basis of the Russian dataset enables to increase significantly the quality of the Arabic speech recognition. We have also stated that the usage of phonetic transcriptions based on modern standard Arabic decreases recognition quality. Nevertheless, system operation results remain applicable in practice. In addition, we have carried out the study of obtained models application for the keywords searching problem solution. The systems obtained demonstrate good results as compared to those published before. Some ways to improve speech recognition are offered.

  10. Hidden Neural Networks: A Framework for HMM/NN Hybrids

    DEFF Research Database (Denmark)

    Riis, Søren Kamaric; Krogh, Anders Stærmose

    1997-01-01

    This paper presents a general framework for hybrids of hidden Markov models (HMM) and neural networks (NN). In the new framework called hidden neural networks (HNN) the usual HMM probability parameters are replaced by neural network outputs. To ensure a probabilistic interpretation the HNN is nor...... HMMs on TIMIT continuous speech recognition benchmarks. On the task of recognizing five broad phoneme classes an accuracy of 84% is obtained compared to 76% for a standard HMM. Additionally, we report a preliminary result of 69% accuracy on the TIMIT 39 phoneme task...

  11. Noise-robust speech triage.

    Science.gov (United States)

    Bartos, Anthony L; Cipr, Tomas; Nelson, Douglas J; Schwarz, Petr; Banowetz, John; Jerabek, Ladislav

    2018-04-01

    A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (-10 dB ≤ SNR < +10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR ≥ +10 dB).

  12. The Beginnings of Danish Speech Perception

    DEFF Research Database (Denmark)

    Østerbye, Torkil

    , in the light of the rich and complex Danish sound system. The first two studies report on native adults’ perception of Danish speech sounds in quiet and noise. The third study examined the development of language-specific perception in native Danish infants at 6, 9 and 12 months of age. The book points......Little is known about the perception of speech sounds by native Danish listeners. However, the Danish sound system differs in several interesting ways from the sound systems of other languages. For instance, Danish is characterized, among other features, by a rich vowel inventory and by different...... reductions of speech sounds evident in the pronunciation of the language. This book (originally a PhD thesis) consists of three studies based on the results of two experiments. The experiments were designed to provide knowledge of the perception of Danish speech sounds by Danish adults and infants...

  13. Speech-like rhythm in a voiced and voiceless orangutan call.

    Directory of Open Access Journals (Sweden)

    Adriano R Lameira

    Full Text Available The evolutionary origins of speech remain obscure. Recently, it was proposed that speech derived from monkey facial signals which exhibit a speech-like rhythm of ∼5 open-close lip cycles per second. In monkeys, these signals may also be vocalized, offering a plausible evolutionary stepping stone towards speech. Three essential predictions remain, however, to be tested to assess this hypothesis' validity; (i Great apes, our closest relatives, should likewise produce 5Hz-rhythm signals, (ii speech-like rhythm should involve calls articulatorily similar to consonants and vowels given that speech rhythm is the direct product of stringing together these two basic elements, and (iii speech-like rhythm should be experience-based. Via cinematic analyses we demonstrate that an ex-entertainment orangutan produces two calls at a speech-like rhythm, coined "clicks" and "faux-speech." Like voiceless consonants, clicks required no vocal fold action, but did involve independent manoeuvring over lips and tongue. In parallel to vowels, faux-speech showed harmonic and formant modulations, implying vocal fold and supralaryngeal action. This rhythm was several times faster than orangutan chewing rates, as observed in monkeys and humans. Critically, this rhythm was seven-fold faster, and contextually distinct, than any other known rhythmic calls described to date in the largest database of the orangutan repertoire ever assembled. The first two predictions advanced by this study are validated and, based on parsimony and exclusion of potential alternative explanations, initial support is given to the third prediction. Irrespectively of the putative origins of these calls and underlying mechanisms, our findings demonstrate irrevocably that great apes are not respiratorily, articulatorilly, or neurologically constrained for the production of consonant- and vowel-like calls at speech rhythm. Orangutan clicks and faux-speech confirm the importance of rhythmic speech

  14. ChemNet: A Transferable and Generalizable Deep Neural Network for Small-Molecule Property Prediction

    Energy Technology Data Exchange (ETDEWEB)

    Goh, Garrett B.; Siegel, Charles M.; Vishnu, Abhinav; Hodas, Nathan O.

    2017-12-08

    With access to large datasets, deep neural networks through representation learning have been able to identify patterns from raw data, achieving human-level accuracy in image and speech recognition tasks. However, in chemistry, availability of large standardized and labelled datasets is scarce, and with a multitude of chemical properties of interest, chemical data is inherently small and fragmented. In this work, we explore transfer learning techniques in conjunction with the existing Chemception CNN model, to create a transferable and generalizable deep neural network for small-molecule property prediction. Our latest model, ChemNet learns in a semi-supervised manner from inexpensive labels computed from the ChEMBL database. When fine-tuned to the Tox21, HIV and FreeSolv dataset, which are 3 separate chemical tasks that ChemNet was not originally trained on, we demonstrate that ChemNet exceeds the performance of existing Chemception models, contemporary MLP models that trains on molecular fingerprints, and it matches the performance of the ConvGraph algorithm, the current state-of-the-art. Furthermore, as ChemNet has been pre-trained on a large diverse chemical database, it can be used as a universal “plug-and-play” deep neural network, which accelerates the deployment of deep neural networks for the prediction of novel small-molecule chemical properties.

  15. The application of sparse linear prediction dictionary to compressive sensing in speech signals

    Directory of Open Access Journals (Sweden)

    YOU Hanxu

    2016-04-01

    Full Text Available Appling compressive sensing (CS,which theoretically guarantees that signal sampling and signal compression can be achieved simultaneously,into audio and speech signal processing is one of the most popular research topics in recent years.In this paper,K-SVD algorithm was employed to learn a sparse linear prediction dictionary regarding as the sparse basis of underlying speech signals.Compressed signals was obtained by applying random Gaussian matrix to sample original speech frames.Orthogonal matching pursuit (OMP and compressive sampling matching pursuit (CoSaMP were adopted to recovery original signals from compressed one.Numbers of experiments were carried out to investigate the impact of speech frames length,compression ratios,sparse basis and reconstruction algorithms on CS performance.Results show that sparse linear prediction dictionary can advance the performance of speech signals reconstruction compared with discrete cosine transform (DCT matrix.

  16. An analysis of machine translation and speech synthesis in speech-to-speech translation system

    OpenAIRE

    Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.

    2011-01-01

    This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...

  17. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System

    Directory of Open Access Journals (Sweden)

    Pavol Partila

    2015-01-01

    Full Text Available The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.

  18. Detection of cardiac activity changes from human speech

    Science.gov (United States)

    Tovarek, Jaromir; Partila, Pavol; Voznak, Miroslav; Mikulec, Martin; Mehic, Miralem

    2015-05-01

    Impact of changes in blood pressure and pulse from human speech is disclosed in this article. The symptoms of increased physical activity are pulse, systolic and diastolic pressure. There are many methods of measuring and indicating these parameters. The measurements must be carried out using devices which are not used in everyday life. In most cases, the measurement of blood pressure and pulse following health problems or other adverse feelings. Nowadays, research teams are trying to design and implement modern methods in ordinary human activities. The main objective of the proposal is to reduce the delay between detecting the adverse pressure and to the mentioned warning signs and feelings. Common and frequent activity of man is speaking, while it is known that the function of the vocal tract can be affected by the change in heart activity. Therefore, it can be a useful parameter for detecting physiological changes. A method for detecting human physiological changes by speech processing and artificial neural network classification is described in this article. The pulse and blood pressure changes was induced by physical exercises in this experiment. The set of measured subjects was formed by ten healthy volunteers of both sexes. None of the subjects was a professional athlete. The process of the experiment was divided into phases before, during and after physical training. Pulse, systolic, diastolic pressure was measured and voice activity was recorded after each of them. The results of this experiment describe a method for detecting increased cardiac activity from human speech using artificial neural network.

  19. Speech and non-speech processing in children with phonological disorders: an electrophysiological study

    Directory of Open Access Journals (Sweden)

    Isabela Crivellaro Gonçalves

    2011-01-01

    Full Text Available OBJECTIVE: To determine whether neurophysiological auditory brainstem responses to clicks and repeated speech stimuli differ between typically developing children and children with phonological disorders. INTRODUCTION: Phonological disorders are language impairments resulting from inadequate use of adult phonological language rules and are among the most common speech and language disorders in children (prevalence: 8 - 9%. Our hypothesis is that children with phonological disorders have basic differences in the way that their brains encode acoustic signals at brainstem level when compared to normal counterparts. METHODS: We recorded click and speech evoked auditory brainstem responses in 18 typically developing children (control group and in 18 children who were clinically diagnosed with phonological disorders (research group. The age range of the children was from 7-11 years. RESULTS: The research group exhibited significantly longer latency responses to click stimuli (waves I, III and V and speech stimuli (waves V and A when compared to the control group. DISCUSSION: These results suggest that the abnormal encoding of speech sounds may be a biological marker of phonological disorders. However, these results cannot define the biological origins of phonological problems. We also observed that speech-evoked auditory brainstem responses had a higher specificity/sensitivity for identifying phonological disorders than click-evoked auditory brainstem responses. CONCLUSIONS: Early stages of the auditory pathway processing of an acoustic stimulus are not similar in typically developing children and those with phonological disorders. These findings suggest that there are brainstem auditory pathway abnormalities in children with phonological disorders.

  20. Differentiation between non-neural and neural contributors to ankle joint stiffness in cerebral palsy

    NARCIS (Netherlands)

    De Gooijer-van de Groep, K.L.; De Vlugt, E.; De Groot, J.H.; Van der Heijden-Maessen, H.C.M.; Wielheesen, D.H.M.; Van Wijlen-Hempel, R.M.S.; Arendzen, J.H.; Meskers, C.G.M.

    2013-01-01

    Background Spastic paresis in cerebral palsy (CP) is characterized by increased joint stiffness that may be of neural origin, i.e. improper muscle activation caused by e.g. hyperreflexia or non-neural origin, i.e. altered tissue viscoelastic properties (clinically: “spasticity” vs. “contracture”).

  1. Comparative analysis of rhytmic abilities in children with and without speech and language disorders

    OpenAIRE

    Frangež, Mateja

    2014-01-01

    Music and speech consist of sound, acoustic undulation, and contain structural patterns of pitch, duration and magnitude. Processing both music and prosodic patterns take place in some joint neural areas, but there is also significant difference in the representation of speech and music in the brain. Music and language are markedly different in their form, function and use of syntactic structures. Complex structure and functioning of the brain enable music to expand its impact on other areas ...

  2. Man-system interface based on automatic speech recognition: integration to a virtual control desk

    Energy Technology Data Exchange (ETDEWEB)

    Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C., E-mail: calexandre@ien.gov.b, E-mail: mol@ien.gov.b, E-mail: cmnap@ien.gov.b, E-mail: mag@ien.gov.b [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil); Nomiya, Diogo V., E-mail: diogonomiya@gmail.co [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil)

    2009-07-01

    This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)

  3. Man-system interface based on automatic speech recognition: integration to a virtual control desk

    International Nuclear Information System (INIS)

    Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C.; Nomiya, Diogo V.

    2009-01-01

    This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)

  4. Neural dynamics of audiovisual speech integration under variable listening conditions: an individual participant analysis.

    Science.gov (United States)

    Altieri, Nicholas; Wenger, Michael J

    2013-01-01

    Speech perception engages both auditory and visual modalities. Limitations of traditional accuracy-only approaches in the investigation of audiovisual speech perception have motivated the use of new methodologies. In an audiovisual speech identification task, we utilized capacity (Townsend and Nozawa, 1995), a dynamic measure of efficiency, to quantify audiovisual integration. Capacity was used to compare RT distributions from audiovisual trials to RT distributions from auditory-only and visual-only trials across three listening conditions: clear auditory signal, S/N ratio of -12 dB, and S/N ratio of -18 dB. The purpose was to obtain EEG recordings in conjunction with capacity to investigate how a late ERP co-varies with integration efficiency. Results showed efficient audiovisual integration for low auditory S/N ratios, but inefficient audiovisual integration when the auditory signal was clear. The ERP analyses showed evidence for greater audiovisual amplitude compared to the unisensory signals for lower auditory S/N ratios (higher capacity/efficiency) compared to the high S/N ratio (low capacity/inefficient integration). The data are consistent with an interactive framework of integration, where auditory recognition is influenced by speech-reading as a function of signal clarity.

  5. Cognitive Processes Underlying Nonnative Speech Production: The Significance of Recurrent Sequences.

    Science.gov (United States)

    Oppenheim, Nancy

    This study was designed to identify whether advanced nonnative speakers of English rely on recurrent sequences to produce fluent speech in conformance with neural network theories and symbolic network theories; participants were 6 advanced, speaking and listening university students, aged 18-37 years (their native countries being Korea, Japan,…

  6. Evidence-Based Systematic Review: Effects of Neuromuscular Electrical Stimulation on Swallowing and Neural Activation

    Science.gov (United States)

    Clark, Heather; Lazarus, Cathy; Arvedson, Joan; Schooling, Tracy; Frymark, Tobi

    2009-01-01

    Purpose: To systematically review the literature examining the effects of neuromuscular electrical stimulation (NMES) on swallowing and neural activation. The review was conducted as part of a series examining the effects of oral motor exercises (OMEs) on speech, swallowing, and neural activation. Method: A systematic search was conducted to…

  7. The role of human parietal area 7A as a link between sequencing in hand actions and in overt speech production

    Directory of Open Access Journals (Sweden)

    Stefan eHeim

    2012-12-01

    Full Text Available Research on the evolutionary basis of the human language faculty has proposed the mirror neuron system as a link between motor processing and speech development. Consequently, most work has focussed on the left inferior frontal cortex, in particular Broca's region, and the left inferior parietal cortex. However, the direct link between planning of hand motor and speech actions remains to be elucidated. Thus, the present study investigated whether sequencing of hand motor actions vs. speech motor actions has a common neural denominator. For the hand motor task, 25 subjects performed single, repeated, or sequenced button presses with either the left or right hand. The speech task was in analogy; the same subjects produced the syllable "po" once or repeatedly, or a sequence of different syllables (po-pi-po. Speech motor vs. hand motor effectors resulted in increased perisylvian activation including Broca's region (left area 44 and areas medially adjacent to left area 45. In contrast, common activation for sequenced vs. repeated production of button presses and syllables revealed the effector-independent involvement of left area 7A in the superior parietal lobule (SPL in sequencing. These data demonstrate that sequencing of vocal gestures, an important precondition for ordered utterances and ultimately human speech, shares area 7A, rather than inferior parietal regions, as a common cortical module with hand motor sequencing. Interestingly, area 7A has previously also been shown to be involved in the observation of hand and non-hand actions. In combination with the literature, the present data thus suggest a distinction between area 44, which is specifically recruited for (cognitive aspects of speech, and SPL area 7A for general aspects of motor sequencing. In sum, the study demonstrates a yet little considered role of the superior parietal lobule in the origins of speech, and may be discussed in the light of embodiment of speech and language in the

  8. Hybrid model decomposition of speech and noise in a radial basis function neural model framework

    DEFF Research Database (Denmark)

    Sørensen, Helge Bjarup Dissing; Hartmann, Uwe

    1994-01-01

    The aim of the paper is to focus on a new approach to automatic speech recognition in noisy environments where the noise has either stationary or non-stationary statistical characteristics. The aim is to perform automatic recognition of speech in the presence of additive car noise. The technique...

  9. Speech rate in Parkinson's disease: A controlled study.

    Science.gov (United States)

    Martínez-Sánchez, F; Meilán, J J G; Carro, J; Gómez Íñiguez, C; Millian-Morell, L; Pujante Valverde, I M; López-Alburquerque, T; López, D E

    2016-09-01

    Speech disturbances will affect most patients with Parkinson's disease (PD) over the course of the disease. The origin and severity of these symptoms are of clinical and diagnostic interest. To evaluate the clinical pattern of speech impairment in PD patients and identify significant differences in speech rate and articulation compared to control subjects. Speech rate and articulation in a reading task were measured using an automatic analytical method. A total of 39 PD patients in the 'on' state and 45 age-and sex-matched asymptomatic controls participated in the study. None of the patients experienced dyskinesias or motor fluctuations during the test. The patients with PD displayed a significant reduction in speech and articulation rates; there were no significant correlations between the studied speech parameters and patient characteristics such as L-dopa dose, duration of the disorder, age, and UPDRS III scores and Hoehn & Yahr scales. Patients with PD show a characteristic pattern of declining speech rate. These results suggest that in PD, disfluencies are the result of the movement disorder affecting the physiology of speech production systems. Copyright © 2014 Sociedad Española de Neurología. Publicado por Elsevier España, S.L.U. All rights reserved.

  10. Rule-Based Storytelling Text-to-Speech (TTS Synthesis

    Directory of Open Access Journals (Sweden)

    Ramli Izzad

    2016-01-01

    Full Text Available In recent years, various real life applications such as talking books, gadgets and humanoid robots have drawn the attention to pursue research in the area of expressive speech synthesis. Speech synthesis is widely used in various applications. However, there is a growing need for an expressive speech synthesis especially for communication and robotic. In this paper, global and local rule are developed to convert neutral to storytelling style speech for the Malay language. In order to generate rules, modification of prosodic parameters such as pitch, intensity, duration, tempo and pauses are considered. Modification of prosodic parameters is examined by performing prosodic analysis on a story collected from an experienced female and male storyteller. The global and local rule is applied in sentence level and synthesized using HNM. Subjective tests are conducted to evaluate the synthesized storytelling speech quality of both rules based on naturalness, intelligibility, and similarity to the original storytelling speech. The results showed that global rule give a better result than local rule

  11. Articulatory mediation of speech perception: a causal analysis of multi-modal imaging data.

    Science.gov (United States)

    Gow, David W; Segawa, Jennifer A

    2009-02-01

    The inherent confound between the organization of articulation and the acoustic-phonetic structure of the speech signal makes it exceptionally difficult to evaluate the competing claims of motor and acoustic-phonetic accounts of how listeners recognize coarticulated speech. Here we use Granger causation analyzes of high spatiotemporal resolution neural activation data derived from the integration of magnetic resonance imaging, magnetoencephalography and electroencephalography, to examine the role of lexical and articulatory mediation in listeners' ability to use phonetic context to compensate for place assimilation. Listeners heard two-word phrases such as pen pad and then saw two pictures, from which they had to select the one that depicted the phrase. Assimilation, lexical competitor environment and the phonological validity of assimilation context were all manipulated. Behavioral data showed an effect of context on the interpretation of assimilated segments. Analysis of 40 Hz gamma phase locking patterns identified a large distributed neural network including 16 distinct regions of interest (ROIs) spanning portions of both hemispheres in the first 200 ms of post-assimilation context. Granger analyzes of individual conditions showed differing patterns of causal interaction between ROIs during this interval, with hypothesized lexical and articulatory structures and pathways driving phonetic activation in the posterior superior temporal gyrus in assimilation conditions, but not in phonetically unambiguous conditions. These results lend strong support for the motor theory of speech perception, and clarify the role of lexical mediation in the phonetic processing of assimilated speech.

  12. Altered time course of amygdala activation during speech anticipation in social anxiety disorder.

    Science.gov (United States)

    Davies, Carolyn D; Young, Katherine; Torre, Jared B; Burklund, Lisa J; Goldin, Philippe R; Brown, Lily A; Niles, Andrea N; Lieberman, Matthew D; Craske, Michelle G

    2017-02-01

    Exaggerated anticipatory anxiety is common in social anxiety disorder (SAD). Neuroimaging studies have revealed altered neural activity in response to social stimuli in SAD, but fewer studies have examined neural activity during anticipation of feared social stimuli in SAD. The current study examined the time course and magnitude of activity in threat processing brain regions during speech anticipation in socially anxious individuals and healthy controls (HC). Participants (SAD n=58; HC n=16) underwent functional magnetic resonance imaging (fMRI) during which they completed a 90s control anticipation task and 90s speech anticipation task. Repeated measures multi-level modeling analyses were used to examine group differences in time course activity during speech vs. control anticipation for regions of interest, including bilateral amygdala, insula, ventral striatum, and dorsal anterior cingulate cortex. The time course of amygdala activity was more prolonged and less variable throughout speech anticipation in SAD participants compared to HCs, whereas the overall magnitude of amygdala response did not differ between groups. Magnitude and time course of activity was largely similar between groups across other regions of interest. Analyses were restricted to regions of interest and task order was the same across participants due to the nature of deception instructions. Sustained amygdala time course during anticipation may uniquely reflect heightened detection of threat or deficits in emotion regulation in socially anxious individuals. Findings highlight the importance of examining temporal dynamics of amygdala responding. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces.

    Directory of Open Access Journals (Sweden)

    Florent Bocquelet

    2016-11-01

    Full Text Available Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN trained on electromagnetic articulography (EMA data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.

  14. Precision Scaling of Neural Networks for Efficient Audio Processing

    OpenAIRE

    Ko, Jong Hwan; Fromm, Josh; Philipose, Matthai; Tashev, Ivan; Zarar, Shuayb

    2017-01-01

    While deep neural networks have shown powerful performance in many audio applications, their large computation and memory demand has been a challenge for real-time processing. In this paper, we study the impact of scaling the precision of neural networks on the performance of two common audio processing tasks, namely, voice-activity detection and single-channel speech enhancement. We determine the optimal pair of weight/neuron bit precision by exploring its impact on both the performance and ...

  15. Transfer Effect of Speech-sound Learning on Auditory-motor Processing of Perceived Vocal Pitch Errors.

    Science.gov (United States)

    Chen, Zhaocong; Wong, Francis C K; Jones, Jeffery A; Li, Weifeng; Liu, Peng; Chen, Xi; Liu, Hanjun

    2015-08-17

    Speech perception and production are intimately linked. There is evidence that speech motor learning results in changes to auditory processing of speech. Whether speech motor control benefits from perceptual learning in speech, however, remains unclear. This event-related potential study investigated whether speech-sound learning can modulate the processing of feedback errors during vocal pitch regulation. Mandarin speakers were trained to perceive five Thai lexical tones while learning to associate pictures with spoken words over 5 days. Before and after training, participants produced sustained vowel sounds while they heard their vocal pitch feedback unexpectedly perturbed. As compared to the pre-training session, the magnitude of vocal compensation significantly decreased for the control group, but remained consistent for the trained group at the post-training session. However, the trained group had smaller and faster N1 responses to pitch perturbations and exhibited enhanced P2 responses that correlated significantly with their learning performance. These findings indicate that the cortical processing of vocal pitch regulation can be shaped by learning new speech-sound associations, suggesting that perceptual learning in speech can produce transfer effects to facilitating the neural mechanisms underlying the online monitoring of auditory feedback regarding vocal production.

  16. Learning spectral-temporal features with 3D CNNs for speech emotion recognition

    NARCIS (Netherlands)

    Kim, Jaebok; Truong, Khiet; Englebienne, Gwenn; Evers, Vanessa

    2017-01-01

    In this paper, we propose to use deep 3-dimensional convolutional networks (3D CNNs) in order to address the challenge of modelling spectro-temporal dynamics for speech emotion recognition (SER). Compared to a hybrid of Convolutional Neural Network and Long-Short-Term-Memory (CNN-LSTM), our proposed

  17. Training to Improve Hearing Speech in Noise: Biological Mechanisms

    Science.gov (United States)

    Song, Judy H.; Skoe, Erika; Banai, Karen

    2012-01-01

    We investigated training-related improvements in listening in noise and the biological mechanisms mediating these improvements. Training-related malleability was examined using a program that incorporates cognitively based listening exercises to improve speech-in-noise perception. Before and after training, auditory brainstem responses to a speech syllable were recorded in quiet and multitalker noise from adults who ranged in their speech-in-noise perceptual ability. Controls did not undergo training but were tested at intervals equivalent to the trained subjects. Trained subjects exhibited significant improvements in speech-in-noise perception that were retained 6 months later. Subcortical responses in noise demonstrated training-related enhancements in the encoding of pitch-related cues (the fundamental frequency and the second harmonic), particularly for the time-varying portion of the syllable that is most vulnerable to perceptual disruption (the formant transition region). Subjects with the largest strength of pitch encoding at pretest showed the greatest perceptual improvement. Controls exhibited neither neurophysiological nor perceptual changes. We provide the first demonstration that short-term training can improve the neural representation of cues important for speech-in-noise perception. These results implicate and delineate biological mechanisms contributing to learning success, and they provide a conceptual advance to our understanding of the kind of training experiences that can influence sensory processing in adulthood. PMID:21799207

  18. Neural Correlates of Phonological Processing in Speech Sound Disorder: A Functional Magnetic Resonance Imaging Study

    Science.gov (United States)

    Tkach, Jean A.; Chen, Xu; Freebairn, Lisa A.; Schmithorst, Vincent J.; Holland, Scott K.; Lewis, Barbara A.

    2011-01-01

    Speech sound disorders (SSD) are the largest group of communication disorders observed in children. One explanation for these disorders is that children with SSD fail to form stable phonological representations when acquiring the speech sound system of their language due to poor phonological memory (PM). The goal of this study was to examine PM in…

  19. Specific acoustic models for spontaneous and dictated style in indonesian speech recognition

    Science.gov (United States)

    Vista, C. B.; Satriawan, C. H.; Lestari, D. P.; Widyantoro, D. H.

    2018-03-01

    The performance of an automatic speech recognition system is affected by differences in speech style between the data the model is originally trained upon and incoming speech to be recognized. In this paper, the usage of GMM-HMM acoustic models for specific speech styles is investigated. We develop two systems for the experiments; the first employs a speech style classifier to predict the speech style of incoming speech, either spontaneous or dictated, then decodes this speech using an acoustic model specifically trained for that speech style. The second system uses both acoustic models to recognise incoming speech and decides upon a final result by calculating a confidence score of decoding. Results show that training specific acoustic models for spontaneous and dictated speech styles confers a slight recognition advantage as compared to a baseline model trained on a mixture of spontaneous and dictated training data. In addition, the speech style classifier approach of the first system produced slightly more accurate results than the confidence scoring employed in the second system.

  20. Electrophysiological Correlates of Semantic Dissimilarity Reflect the Comprehension of Natural, Narrative Speech.

    Science.gov (United States)

    Broderick, Michael P; Anderson, Andrew J; Di Liberto, Giovanni M; Crosse, Michael J; Lalor, Edmund C

    2018-03-05

    People routinely hear and understand speech at rates of 120-200 words per minute [1, 2]. Thus, speech comprehension must involve rapid, online neural mechanisms that process words' meanings in an approximately time-locked fashion. However, electrophysiological evidence for such time-locked processing has been lacking for continuous speech. Although valuable insights into semantic processing have been provided by the "N400 component" of the event-related potential [3-6], this literature has been dominated by paradigms using incongruous words within specially constructed sentences, with less emphasis on natural, narrative speech comprehension. Building on the discovery that cortical activity "tracks" the dynamics of running speech [7-9] and psycholinguistic work demonstrating [10-12] and modeling [13-15] how context impacts on word processing, we describe a new approach for deriving an electrophysiological correlate of natural speech comprehension. We used a computational model [16] to quantify the meaning carried by words based on how semantically dissimilar they were to their preceding context and then regressed this measure against electroencephalographic (EEG) data recorded from subjects as they listened to narrative speech. This produced a prominent negativity at a time lag of 200-600 ms on centro-parietal EEG channels, characteristics common to the N400. Applying this approach to EEG datasets involving time-reversed speech, cocktail party attention, and audiovisual speech-in-noise demonstrated that this response was very sensitive to whether or not subjects understood the speech they heard. These findings demonstrate that, when successfully comprehending natural speech, the human brain responds to the contextual semantic content of each word in a relatively time-locked fashion. Copyright © 2018 Elsevier Ltd. All rights reserved.

  1. Speaker diarization system using HXLPS and deep neural network

    Directory of Open Access Journals (Sweden)

    V. Subba Ramaiah

    2018-03-01

    Full Text Available In general, speaker diarization is defined as the process of segmenting the input speech signal and grouped the homogenous regions with regard to the speaker identity. The main idea behind this system is that it is able to discriminate the speaker signal by assigning the label of the each speaker signal. Due to rapid growth of broadcasting and meeting, the speaker diarization is burdensome to enhance the readability of the speech transcription. In order to solve this issue, Holoentropy with the eXtended Linear Prediction using autocorrelation Snapshot (HXLPS and deep neural network (DNN is proposed for the speaker diarization system. The HXLPS extraction method is newly developed by incorporating the Holoentropy with the XLPS. Once we attain the features, the speech and non-speech signals are detected by the Voice Activity Detection (VAD method. Then, i-vector representation of every segmented signal is obtained using Universal Background Model (UBM model. Consequently, DNN is utilized to assign the label for the speaker signal which is then clustered according to the speaker label. The performance is analysed using the evaluation metrics, such as tracking distance, false alarm rate and diarization error rate. The outcome of the proposed method ensures the better diarization performance by achieving the lower DER of 1.36% based on lambda value and DER of 2.23% depends on the frame length. Keywords: Speaker diarization, HXLPS feature extraction, Voice activity detection, Deep neural network, Speaker clustering, Diarization Error Rate (DER

  2. Hemispheric speech lateralisation in the developing brain is related to motor praxis ability

    Directory of Open Access Journals (Sweden)

    Jessica C. Hodgson

    2016-12-01

    Full Text Available Commonly displayed functional asymmetries such as hand dominance and hemispheric speech lateralisation are well researched in adults. However there is debate about when such functions become lateralised in the typically developing brain. This study examined whether patterns of speech laterality and hand dominance were related and whether they varied with age in typically developing children. 148 children aged 3–10 years performed an electronic pegboard task to determine hand dominance; a subset of 38 of these children also underwent functional Transcranial Doppler (fTCD imaging to derive a lateralisation index (LI for hemispheric activation during speech production using an animation description paradigm. There was no main effect of age in the speech laterality scores, however, younger children showed a greater difference in performance between their hands on the motor task. Furthermore, this between-hand performance difference significantly interacted with direction of speech laterality, with a smaller between-hand difference relating to increased left hemisphere activation. This data shows that both handedness and speech lateralisation appear relatively determined by age 3, but that atypical cerebral lateralisation is linked to greater performance differences in hand skill, irrespective of age. Results are discussed in terms of the common neural systems underpinning handedness and speech lateralisation.

  3. A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM

    Directory of Open Access Journals (Sweden)

    Chenchen Huang

    2014-01-01

    Full Text Available Feature extraction is a very important part in speech emotion recognition, and in allusion to feature extraction in speech emotion recognition problems, this paper proposed a new method of feature extraction, using DBNs in DNN to extract emotional features in speech signal automatically. By training a 5 layers depth DBNs, to extract speech emotion feature and incorporate multiple consecutive frames to form a high dimensional feature. The features after training in DBNs were the input of nonlinear SVM classifier, and finally speech emotion recognition multiple classifier system was achieved. The speech emotion recognition rate of the system reached 86.5%, which was 7% higher than the original method.

  4. Event-related potential evidence of form and meaning coding during online speech recognition.

    Science.gov (United States)

    Friedrich, Claudia K; Kotz, Sonja A

    2007-04-01

    It is still a matter of debate whether initial analysis of speech is independent of contextual influences or whether meaning can modulate word activation directly. Utilizing event-related brain potentials (ERPs), we tested the neural correlates of speech recognition by presenting sentences that ended with incomplete words, such as To light up the dark she needed her can-. Immediately following the incomplete words, subjects saw visual words that (i) matched form and meaning, such as candle; (ii) matched meaning but not form, such as lantern; (iii) matched form but not meaning, such as candy; or (iv) mismatched form and meaning, such as number. We report ERP evidence for two distinct cohorts of lexical tokens: (a) a left-lateralized effect, the P250, differentiates form-matching words (i, iii) and form-mismatching words (ii, iv); (b) a right-lateralized effect, the P220, differentiates words that match in form and/or meaning (i, ii, iii) from mismatching words (iv). Lastly, fully matching words (i) reduce the amplitude of the N400. These results accommodate bottom-up and top-down accounts of human speech recognition. They suggest that neural representations of form and meaning are activated independently early on and are integrated at a later stage during sentence comprehension.

  5. Speech/Nonspeech Detection Using Minimal Walsh Basis Functions

    Directory of Open Access Journals (Sweden)

    Pwint Moe

    2007-01-01

    Full Text Available This paper presents a new method to detect speech/nonspeech components of a given noisy signal. Employing the combination of binary Walsh basis functions and an analysis-synthesis scheme, the original noisy speech signal is modified first. From the modified signals, the speech components are distinguished from the nonspeech components by using a simple decision scheme. Minimal number of Walsh basis functions to be applied is determined using singular value decomposition (SVD. The main advantages of the proposed method are low computational complexity, less parameters to be adjusted, and simple implementation. It is observed that the use of Walsh basis functions makes the proposed algorithm efficiently applicable in real-world situations where processing time is crucial. Simulation results indicate that the proposed algorithm achieves high-speech and nonspeech detection rates while maintaining a low error rate for different noisy conditions.

  6. Patients with hippocampal amnesia successfully integrate gesture and speech.

    Science.gov (United States)

    Hilverman, Caitlin; Clough, Sharice; Duff, Melissa C; Cook, Susan Wagner

    2018-06-19

    During conversation, people integrate information from co-speech hand gestures with information in spoken language. For example, after hearing the sentence, "A piece of the log flew up and hit Carl in the face" while viewing a gesture directed at the nose, people tend to later report that the log hit Carl in the nose (information only in gesture) rather than in the face (information in speech). The cognitive and neural mechanisms that support the integration of gesture with speech are unclear. One possibility is that the hippocampus - known for its role in relational memory and information integration - is necessary for integrating gesture and speech. To test this possibility, we examined how patients with hippocampal amnesia and healthy and brain-damaged comparison participants express information from gesture in a narrative retelling task. Participants watched videos of an experimenter telling narratives that included hand gestures that contained supplementary information. Participants were asked to retell the narratives and their spoken retellings were assessed for the presence of information from gesture. For features that had been accompanied by supplementary gesture, patients with amnesia retold fewer of these features overall and fewer retellings that matched the speech from the narrative. Yet their retellings included features that contained information that had been present uniquely in gesture in amounts that were not reliably different from comparison groups. Thus, a functioning hippocampus is not necessary for gesture-speech integration over short timescales. Providing unique information in gesture may enhance communication for individuals with declarative memory impairment, possibly via non-declarative memory mechanisms. Copyright © 2018. Published by Elsevier Ltd.

  7. Neu-IR: The SIGIR 2016 Workshop on Neural Information Retrieval

    NARCIS (Netherlands)

    Craswell, N.; Croft, W.B.; Guo, J.; Mitra, B.; de Rijke, M.

    2016-01-01

    In recent years, deep neural networks have yielded significant performance improvements on speech recognition and computer vision tasks, as well as led to exciting breakthroughs in novel application areas such as automatic voice translation, image captioning, and conversational agents. Despite

  8. Psychogenic stuttering and other acquired nonorganic speech and language abnormalities.

    Science.gov (United States)

    Binder, Laurence M; Spector, Jack; Youngjohn, James R

    2012-08-01

    Three cases are presented of peculiar speech and language abnormalities that were evaluated in the context of personal injury lawsuit or workers compensation claims of brain dysfunction after mild traumatic brain injuries. Neuropsychological measures of effort and motivation showed evidence of suboptimal motivation or outright malingering. The speech and language abnormalities of these cases probably were not consistent with neurogenic features of dysfluent speech including stuttering or aphasia. We propose that severe dysfluency or language abnormalities persisting after a single, uncomplicated, mild traumatic brain injury are unusual and should elicit suspicion of a psychogenic origin.

  9. Using principal component analysis to capture individual differences within a unified neuropsychological model of chronic post-stroke aphasia: Revealing the unique neural correlates of speech fluency, phonology and semantics.

    Science.gov (United States)

    Halai, Ajay D; Woollams, Anna M; Lambon Ralph, Matthew A

    2017-01-01

    Individual differences in the performance profiles of neuropsychologically-impaired patients are pervasive yet there is still no resolution on the best way to model and account for the variation in their behavioural impairments and the associated neural correlates. To date, researchers have generally taken one of three different approaches: a single-case study methodology in which each case is considered separately; a case-series design in which all individual patients from a small coherent group are examined and directly compared; or, group studies, in which a sample of cases are investigated as one group with the assumption that they are drawn from a homogenous category and that performance differences are of no interest. In recent research, we have developed a complementary alternative through the use of principal component analysis (PCA) of individual data from large patient cohorts. This data-driven approach not only generates a single unified model for the group as a whole (expressed in terms of the emergent principal components) but is also able to capture the individual differences between patients (in terms of their relative positions along the principal behavioural axes). We demonstrate the use of this approach by considering speech fluency, phonology and semantics in aphasia diagnosis and classification, as well as their unique neural correlates. PCA of the behavioural data from 31 patients with chronic post-stroke aphasia resulted in four statistically-independent behavioural components reflecting phonological, semantic, executive-cognitive and fluency abilities. Even after accounting for lesion volume, entering the four behavioural components simultaneously into a voxel-based correlational methodology (VBCM) analysis revealed that speech fluency (speech quanta) was uniquely correlated with left motor cortex and underlying white matter (including the anterior section of the arcuate fasciculus and the frontal aslant tract), phonological skills with

  10. Electrolarynx Voice Recognition Utilizing Pulse Coupled Neural Network

    Directory of Open Access Journals (Sweden)

    Fatchul Arifin

    2010-08-01

    Full Text Available The laryngectomies patient has no ability to speak normally because their vocal chords have been removed. The easiest option for the patient to speak again is by using electrolarynx speech. This tool is placed on the lower chin. Vibration of the neck while speaking is used to produce sound. Meanwhile, the technology of "voice recognition" has been growing very rapidly. It is expected that the technology of "voice recognition" can also be used by laryngectomies patients who use electrolarynx.This paper describes a system for electrolarynx speech recognition. Two main parts of the system are feature extraction and pattern recognition. The Pulse Coupled Neural Network – PCNN is used to extract the feature and characteristic of electrolarynx speech. Varying of β (one of PCNN parameter also was conducted. Multi layer perceptron is used to recognize the sound patterns. There are two kinds of recognition conducted in this paper: speech recognition and speaker recognition. The speech recognition recognizes specific speech from every people. Meanwhile, speaker recognition recognizes specific speech from specific person. The system ran well. The "electrolarynx speech recognition" has been tested by recognizing of “A” and "not A" voice. The results showed that the system had 94.4% validation. Meanwhile, the electrolarynx speaker recognition has been tested by recognizing of “saya” voice from some different speakers. The results showed that the system had 92.2% validation. Meanwhile, the best β parameter of PCNN for electrolarynx recognition is 3.

  11. Inner Speech's Relationship With Overt Speech in Poststroke Aphasia.

    Science.gov (United States)

    Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A

    2017-09-18

    Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.

  12. The Role of Corticostriatal Systems in Speech Category Learning.

    Science.gov (United States)

    Yi, Han-Gyol; Maddox, W Todd; Mumford, Jeanette A; Chandrasekaran, Bharath

    2016-04-01

    One of the most difficult category learning problems for humans is learning nonnative speech categories. While feedback-based category training can enhance speech learning, the mechanisms underlying these benefits are unclear. In this functional magnetic resonance imaging study, we investigated neural and computational mechanisms underlying feedback-dependent speech category learning in adults. Positive feedback activated a large corticostriatal network including the dorsolateral prefrontal cortex, inferior parietal lobule, middle temporal gyrus, caudate, putamen, and the ventral striatum. Successful learning was contingent upon the activity of domain-general category learning systems: the fast-learning reflective system, involving the dorsolateral prefrontal cortex that develops and tests explicit rules based on the feedback content, and the slow-learning reflexive system, involving the putamen in which the stimuli are implicitly associated with category responses based on the reward value in feedback. Computational modeling of response strategies revealed significant use of reflective strategies early in training and greater use of reflexive strategies later in training. Reflexive strategy use was associated with increased activation in the putamen. Our results demonstrate a critical role for the reflexive corticostriatal learning system as a function of response strategy and proficiency during speech category learning. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Assessment of speech intelligibility in background noise and reverberation

    DEFF Research Database (Denmark)

    Nielsen, Jens Bo

    Reliable methods for assessing speech intelligibility are essential within hearing research, audiology, and related areas. Such methods can be used for obtaining a better understanding of how speech intelligibility is affected by, e.g., various environmental factors or different types of hearing...... impairment. In this thesis, two sentence-based tests for speech intelligibility in Danish were developed. The first test is the Conversational Language Understanding Evaluation (CLUE), which is based on the principles of the original American-English Hearing in Noise Test (HINT). The second test...... is a modified version of CLUE where the speech material and the scoring rules have been reconsidered. An extensive validation of the modified test was conducted with both normal-hearing and hearing-impaired listeners. The validation showed that the test produces reliable results for both groups of listeners...

  14. Dissociating speech perception and comprehension at reduced levels of awareness

    OpenAIRE

    Davis, Matthew H.; Coleman, Martin R.; Absalom, Anthony R.; Rodd, Jennifer M.; Johnsrude, Ingrid S.; Matta, Basil F.; Owen, Adrian M.; Menon, David K.

    2007-01-01

    We used functional MRI and the anesthetic agent propofol to assess the relationship among neural responses to speech, successful comprehension, and conscious awareness. Volunteers were scanned while listening to sentences containing ambiguous words, matched sentences without ambiguous words, and signal-correlated noise (SCN). During three scanning sessions, participants were nonsedated (awake), lightly sedated (a slowed response to conversation), and deeply sedated (no conversational response...

  15. Musician advantage for speech-on-speech perception

    NARCIS (Netherlands)

    Başkent, Deniz; Gaudrain, Etienne

    Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level

  16. On the origin of reproducible sequential activity in neural circuits

    Science.gov (United States)

    Afraimovich, V. S.; Zhigulin, V. P.; Rabinovich, M. I.

    2004-12-01

    Robustness and reproducibility of sequential spatio-temporal responses is an essential feature of many neural circuits in sensory and motor systems of animals. The most common mathematical images of dynamical regimes in neural systems are fixed points, limit cycles, chaotic attractors, and continuous attractors (attractive manifolds of neutrally stable fixed points). These are not suitable for the description of reproducible transient sequential neural dynamics. In this paper we present the concept of a stable heteroclinic sequence (SHS), which is not an attractor. SHS opens the way for understanding and modeling of transient sequential activity in neural circuits. We show that this new mathematical object can be used to describe robust and reproducible sequential neural dynamics. Using the framework of a generalized high-dimensional Lotka-Volterra model, that describes the dynamics of firing rates in an inhibitory network, we present analytical results on the existence of the SHS in the phase space of the network. With the help of numerical simulations we confirm its robustness in presence of noise in spite of the transient nature of the corresponding trajectories. Finally, by referring to several recent neurobiological experiments, we discuss possible applications of this new concept to several problems in neuroscience.

  17. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

    International Nuclear Information System (INIS)

    Wang Xiaojia; Mao Qirong; Zhan Yongzhao

    2008-01-01

    There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction

  18. Amusia results in abnormal brain activity following inappropriate intonation during speech comprehension.

    Science.gov (United States)

    Jiang, Cunmei; Hamm, Jeff P; Lim, Vanessa K; Kirk, Ian J; Chen, Xuhai; Yang, Yufang

    2012-01-01

    Pitch processing is a critical ability on which humans' tonal musical experience depends, and which is also of paramount importance for decoding prosody in speech. Congenital amusia refers to deficits in the ability to properly process musical pitch, and recent evidence has suggested that this musical pitch disorder may impact upon the processing of speech sounds. Here we present the first electrophysiological evidence demonstrating that individuals with amusia who speak Mandarin Chinese are impaired in classifying prosody as appropriate or inappropriate during a speech comprehension task. When presented with inappropriate prosody stimuli, control participants elicited a larger P600 and smaller N100 relative to the appropriate condition. In contrast, amusics did not show significant differences between the appropriate and inappropriate conditions in either the N100 or the P600 component. This provides further evidence that the pitch perception deficits associated with amusia may also affect intonation processing during speech comprehension in those who speak a tonal language such as Mandarin, and suggests music and language share some cognitive and neural resources.

  19. Amusia results in abnormal brain activity following inappropriate intonation during speech comprehension.

    Directory of Open Access Journals (Sweden)

    Cunmei Jiang

    Full Text Available Pitch processing is a critical ability on which humans' tonal musical experience depends, and which is also of paramount importance for decoding prosody in speech. Congenital amusia refers to deficits in the ability to properly process musical pitch, and recent evidence has suggested that this musical pitch disorder may impact upon the processing of speech sounds. Here we present the first electrophysiological evidence demonstrating that individuals with amusia who speak Mandarin Chinese are impaired in classifying prosody as appropriate or inappropriate during a speech comprehension task. When presented with inappropriate prosody stimuli, control participants elicited a larger P600 and smaller N100 relative to the appropriate condition. In contrast, amusics did not show significant differences between the appropriate and inappropriate conditions in either the N100 or the P600 component. This provides further evidence that the pitch perception deficits associated with amusia may also affect intonation processing during speech comprehension in those who speak a tonal language such as Mandarin, and suggests music and language share some cognitive and neural resources.

  20. Modeling speech imitation and ecological learning of auditory-motor maps

    Directory of Open Access Journals (Sweden)

    Claudia eCanevari

    2013-06-01

    Full Text Available Classical models of speech consider an antero-posterior distinction between perceptive and productive functions. However, the selective alteration of neural activity in speech motor centers, via transcranial magnetic stimulation, was shown to affect speech discrimination. On the automatic speech recognition (ASR side, the recognition systems have classically relied solely on acoustic data, achieving rather good performance in optimal listening conditions. The main limitations of current ASR are mainly evident in the realistic use of such systems. These limitations can be partly reduced by using normalization strategies that minimize inter-speaker variability by either explicitly removing speakers’ peculiarities or adapting different speakers to a reference model. In this paper we aim at modeling a motor-based imitation learning mechanism in ASR. We tested the utility of a speaker normalization strategy that uses motor representations of speech and compare it with strategies that ignore the motor domain. Specifically, we first trained a regressor through state-of-the-art machine learning techniques to build an auditory-motor mapping, in a sense mimicking a human learner that tries to reproduce utterances produced by other speakers. This auditory-motor mapping maps the speech acoustics of a speaker into the motor plans of a reference speaker. Since, during recognition, only speech acoustics are available, the mapping is necessary to recover motor information. Subsequently, in a phone classification task, we tested the system on either one of the speakers that was used during training or a new one. Results show that in both cases the motor-based speaker normalization strategy almost always outperforms all other strategies where only acoustics is taken into account.

  1. Auditory-motor interactions in pediatric motor speech disorders: neurocomputational modeling of disordered development.

    Science.gov (United States)

    Terband, H; Maassen, B; Guenther, F H; Brumberg, J

    2014-01-01

    Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between neurological deficits in auditory and motor processes using computational modeling with the DIVA model. In a series of computer simulations, we investigated the effect of a motor processing deficit alone (MPD), and the effect of a motor processing deficit in combination with an auditory processing deficit (MPD+APD) on the trajectory and endpoint of speech motor development in the DIVA model. Simulation results showed that a motor programming deficit predominantly leads to deterioration on the phonological level (phonemic mappings) when auditory self-monitoring is intact, and on the systemic level (systemic mapping) if auditory self-monitoring is impaired. These findings suggest a close relation between quality of auditory self-monitoring and the involvement of phonological vs. motor processes in children with pediatric motor speech disorders. It is suggested that MPD+APD might be involved in typically apraxic speech output disorders and MPD in pediatric motor speech disorders that also have a phonological component. Possibilities to verify these hypotheses using empirical data collected from human subjects are discussed. The reader will be able to: (1) identify the difficulties in studying disordered speech motor development; (2) describe the differences in speech motor characteristics between SSD and subtype CAS; (3) describe the different types of learning that occur in the sensory-motor system during babbling and early speech acquisition; (4) identify the neural control subsystems involved in speech production; (5) describe the potential role of auditory self-monitoring in developmental speech disorders. Copyright © 2014 Elsevier Inc. All rights reserved.

  2. Brain activity underlying the recovery of meaning from degraded speech: A functional near-infrared spectroscopy (fNIRS) study.

    Science.gov (United States)

    Wijayasiri, Pramudi; Hartley, Douglas E H; Wiggins, Ian M

    2017-08-01

    The purpose of this study was to establish whether functional near-infrared spectroscopy (fNIRS), an emerging brain-imaging technique based on optical principles, is suitable for studying the brain activity that underlies effortful listening. In an event-related fNIRS experiment, normally-hearing adults listened to sentences that were either clear or degraded (noise vocoded). These sentences were presented simultaneously with a non-speech distractor, and on each trial participants were instructed to attend either to the speech or to the distractor. The primary region of interest for the fNIRS measurements was the left inferior frontal gyrus (LIFG), a cortical region involved in higher-order language processing. The fNIRS results confirmed findings previously reported in the functional magnetic resonance imaging (fMRI) literature. Firstly, the LIFG exhibited an elevated response to degraded versus clear speech, but only when attention was directed towards the speech. This attention-dependent increase in frontal brain activation may be a neural marker for effortful listening. Secondly, during attentive listening to degraded speech, the haemodynamic response peaked significantly later in the LIFG than in superior temporal cortex, possibly reflecting the engagement of working memory to help reconstruct the meaning of degraded sentences. The homologous region in the right hemisphere may play an equivalent role to the LIFG in some left-handed individuals. In conclusion, fNIRS holds promise as a flexible tool to examine the neural signature of effortful listening. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. A Digital Liquid State Machine With Biologically Inspired Learning and Its Application to Speech Recognition.

    Science.gov (United States)

    Zhang, Yong; Li, Peng; Jin, Yingyezhe; Choe, Yoonsuck

    2015-11-01

    This paper presents a bioinspired digital liquid-state machine (LSM) for low-power very-large-scale-integration (VLSI)-based machine learning applications. To the best of the authors' knowledge, this is the first work that employs a bioinspired spike-based learning algorithm for the LSM. With the proposed online learning, the LSM extracts information from input patterns on the fly without needing intermediate data storage as required in offline learning methods such as ridge regression. The proposed learning rule is local such that each synaptic weight update is based only upon the firing activities of the corresponding presynaptic and postsynaptic neurons without incurring global communications across the neural network. Compared with the backpropagation-based learning, the locality of computation in the proposed approach lends itself to efficient parallel VLSI implementation. We use subsets of the TI46 speech corpus to benchmark the bioinspired digital LSM. To reduce the complexity of the spiking neural network model without performance degradation for speech recognition, we study the impacts of synaptic models on the fading memory of the reservoir and hence the network performance. Moreover, we examine the tradeoffs between synaptic weight resolution, reservoir size, and recognition performance and present techniques to further reduce the overhead of hardware implementation. Our simulation results show that in terms of isolated word recognition evaluated using the TI46 speech corpus, the proposed digital LSM rivals the state-of-the-art hidden Markov-model-based recognizer Sphinx-4 and outperforms all other reported recognizers including the ones that are based upon the LSM or neural networks.

  4. Fastest learning in small-world neural networks

    International Nuclear Information System (INIS)

    Simard, D.; Nadeau, L.; Kroeger, H.

    2005-01-01

    We investigate supervised learning in neural networks. We consider a multi-layered feed-forward network with back propagation. We find that the network of small-world connectivity reduces the learning error and learning time when compared to the networks of regular or random connectivity. Our study has potential applications in the domain of data-mining, image processing, speech recognition, and pattern recognition

  5. Developmental changes in brain activation involved in the production of novel speech sounds in children.

    Science.gov (United States)

    Hashizume, Hiroshi; Taki, Yasuyuki; Sassa, Yuko; Thyreau, Benjamin; Asano, Michiko; Asano, Kohei; Takeuchi, Hikaru; Nouchi, Rui; Kotozaki, Yuka; Jeong, Hyeonjeong; Sugiura, Motoaki; Kawashima, Ryuta

    2014-08-01

    Older children are more successful at producing unfamiliar, non-native speech sounds than younger children during the initial stages of learning. To reveal the neuronal underpinning of the age-related increase in the accuracy of non-native speech production, we examined the developmental changes in activation involved in the production of novel speech sounds using functional magnetic resonance imaging. Healthy right-handed children (aged 6-18 years) were scanned while performing an overt repetition task and a perceptual task involving aurally presented non-native and native syllables. Productions of non-native speech sounds were recorded and evaluated by native speakers. The mouth regions in the bilateral primary sensorimotor areas were activated more significantly during the repetition task relative to the perceptual task. The hemodynamic response in the left inferior frontal gyrus pars opercularis (IFG pOp) specific to non-native speech sound production (defined by prior hypothesis) increased with age. Additionally, the accuracy of non-native speech sound production increased with age. These results provide the first evidence of developmental changes in the neural processes underlying the production of novel speech sounds. Our data further suggest that the recruitment of the left IFG pOp during the production of novel speech sounds was possibly enhanced due to the maturation of the neuronal circuits needed for speech motor planning. This, in turn, would lead to improvement in the ability to immediately imitate non-native speech. Copyright © 2014 Wiley Periodicals, Inc.

  6. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    Science.gov (United States)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  7. Modular representation of layered neural networks.

    Science.gov (United States)

    Watanabe, Chihiro; Hiramatsu, Kaoru; Kashino, Kunio

    2018-01-01

    Layered neural networks have greatly improved the performance of various applications including image processing, speech recognition, natural language processing, and bioinformatics. However, it is still difficult to discover or interpret knowledge from the inference provided by a layered neural network, since its internal representation has many nonlinear and complex parameters embedded in hierarchical layers. Therefore, it becomes important to establish a new methodology by which layered neural networks can be understood. In this paper, we propose a new method for extracting a global and simplified structure from a layered neural network. Based on network analysis, the proposed method detects communities or clusters of units with similar connection patterns. We show its effectiveness by applying it to three use cases. (1) Network decomposition: it can decompose a trained neural network into multiple small independent networks thus dividing the problem and reducing the computation time. (2) Training assessment: the appropriateness of a trained result with a given hyperparameter or randomly chosen initial parameters can be evaluated by using a modularity index. And (3) data analysis: in practical data it reveals the community structure in the input, hidden, and output layers, which serves as a clue for discovering knowledge from a trained neural network. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Motivational Projections of Russian Spontaneous Speech

    Directory of Open Access Journals (Sweden)

    Galina M. Shipitsina

    2017-06-01

    Full Text Available The article deals with the semantic, pragmatic and structural features of words, phrases, dialogues motivation, in the contemporary Russian popular speech. These structural features are characterized by originality and unconventional use. Language material is the result of authors` direct observation of spontaneous verbal communication between people of different social and age groups. The words and remarks were analyzed in compliance with the communication system of national Russian language and cultural background of popular speech. Studies have discovered that in spoken discourse there are some other ways to increase the expression statement. It is important to note that spontaneous speech identifies lacunae in the nominative language and its vocabulary system. It is proved, prefixation is also effective and regular way of the same action presenting. The most typical forms, ways and means to update language resources as a result of the linguistic creativity of native speakers were identified.

  9. Learning speaker-specific characteristics with a deep neural architecture.

    Science.gov (United States)

    Chen, Ke; Salman, Ahmad

    2011-11-01

    Speech signals convey various yet mixed information ranging from linguistic to speaker-specific information. However, most of acoustic representations characterize all different kinds of information as whole, which could hinder either a speech or a speaker recognition (SR) system from producing a better performance. In this paper, we propose a novel deep neural architecture (DNA) especially for learning speaker-specific characteristics from mel-frequency cepstral coefficients, an acoustic representation commonly used in both speech recognition and SR, which results in a speaker-specific overcomplete representation. In order to learn intrinsic speaker-specific characteristics, we come up with an objective function consisting of contrastive losses in terms of speaker similarity/dissimilarity and data reconstruction losses used as regularization to normalize the interference of non-speaker-related information. Moreover, we employ a hybrid learning strategy for learning parameters of the deep neural networks: i.e., local yet greedy layerwise unsupervised pretraining for initialization and global supervised learning for the ultimate discriminative goal. With four Linguistic Data Consortium (LDC) benchmarks and two non-English corpora, we demonstrate that our overcomplete representation is robust in characterizing various speakers, no matter whether their utterances have been used in training our DNA, and highly insensitive to text and languages spoken. Extensive comparative studies suggest that our approach yields favorite results in speaker verification and segmentation. Finally, we discuss several issues concerning our proposed approach.

  10. A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

    OpenAIRE

    Zhang, Xinyu; Das, Srinjoy; Neopane, Ojash; Kreutz-Delgado, Ken

    2017-01-01

    In recent years deep learning algorithms have shown extremely high performance on machine learning tasks such as image classification and speech recognition. In support of such applications, various FPGA accelerator architectures have been proposed for convolutional neural networks (CNNs) that enable high performance for classification tasks at lower power than CPU and GPU processors. However, to date, there has been little research on the use of FPGA implementations of deconvolutional neural...

  11. Greater BOLD variability in older compared with younger adults during audiovisual speech perception.

    Directory of Open Access Journals (Sweden)

    Sarah H Baum

    Full Text Available Older adults exhibit decreased performance and increased trial-to-trial variability on a range of cognitive tasks, including speech perception. We used blood oxygen level dependent functional magnetic resonance imaging (BOLD fMRI to search for neural correlates of these behavioral phenomena. We compared brain responses to simple speech stimuli (audiovisual syllables in 24 healthy older adults (53 to 70 years old and 14 younger adults (23 to 39 years old using two independent analysis strategies: region-of-interest (ROI and voxel-wise whole-brain analysis. While mean response amplitudes were moderately greater in younger adults, older adults had much greater within-subject variability. The greatly increased variability in older adults was observed for both individual voxels in the whole-brain analysis and for ROIs in the left superior temporal sulcus, the left auditory cortex, and the left visual cortex. Increased variability in older adults could not be attributed to differences in head movements between the groups. Increased neural variability may be related to the performance declines and increased behavioral variability that occur with aging.

  12. Cortical oscillations in auditory perception and speech: evidence for two temporal windows in human auditory cortex

    Directory of Open Access Journals (Sweden)

    Huan eLuo

    2012-05-01

    Full Text Available Natural sounds, including vocal communication sounds, contain critical information at multiple time scales. Two essential temporal modulation rates in speech have been argued to be in the low gamma band (~20-80 ms duration information and the theta band (~150-300 ms, corresponding to segmental and syllabic modulation rates, respectively. On one hypothesis, auditory cortex implements temporal integration using time constants closely related to these values. The neural correlates of a proposed dual temporal window mechanism in human auditory cortex remain poorly understood. We recorded MEG responses from participants listening to non-speech auditory stimuli with different temporal structures, created by concatenating frequency-modulated segments of varied segment durations. We show that these non-speech stimuli with temporal structure matching speech-relevant scales (~25 ms and ~200 ms elicit reliable phase tracking in the corresponding associated oscillatory frequencies (low gamma and theta bands. In contrast, stimuli with non-matching temporal structure do not. Furthermore, the topography of theta band phase tracking shows rightward lateralization while gamma band phase tracking occurs bilaterally. The results support the hypothesis that there exists multi-time resolution processing in cortex on discontinuous scales and provide evidence for an asymmetric organization of temporal analysis (asymmetrical sampling in time, AST. The data argue for a macroscopic-level neural mechanism underlying multi-time resolution processing: the sliding and resetting of intrinsic temporal windows on privileged time scales.

  13. Music and Speech Perception in Children Using Sung Speech.

    Science.gov (United States)

    Nie, Yingjiu; Galvin, John J; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie

    2018-01-01

    This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.

  14. Multisensory speech perception without the left superior temporal sulcus.

    Science.gov (United States)

    Baum, Sarah H; Martin, Randi C; Hamilton, A Cris; Beauchamp, Michael S

    2012-09-01

    Converging evidence suggests that the left superior temporal sulcus (STS) is a critical site for multisensory integration of auditory and visual information during speech perception. We report a patient, SJ, who suffered a stroke that damaged the left tempo-parietal area, resulting in mild anomic aphasia. Structural MRI showed complete destruction of the left middle and posterior STS, as well as damage to adjacent areas in the temporal and parietal lobes. Surprisingly, SJ demonstrated preserved multisensory integration measured with two independent tests. First, she perceived the McGurk effect, an illusion that requires integration of auditory and visual speech. Second, her perception of morphed audiovisual speech with ambiguous auditory or visual information was significantly influenced by the opposing modality. To understand the neural basis for this preserved multisensory integration, blood-oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) was used to examine brain responses to audiovisual speech in SJ and 23 healthy age-matched controls. In controls, bilateral STS activity was observed. In SJ, no activity was observed in the damaged left STS but in the right STS, more cortex was active in SJ than in any of the normal controls. Further, the amplitude of the BOLD response in right STS response to McGurk stimuli was significantly greater in SJ than in controls. The simplest explanation of these results is a reorganization of SJ's cortical language networks such that the right STS now subserves multisensory integration of speech. Copyright © 2012 Elsevier Inc. All rights reserved.

  15. From Persuasive to Authoritative Speech Genres

    DEFF Research Database (Denmark)

    Nørreklit, Hanne; Scapens, Robert

    2014-01-01

    Purpose: The purpose of this paper is to contrast the speech genres in the original and the published versions of an article written by academic researchers and published in the US practitioner-oriented journal, Strategic Finance. The original version, submitted by the researchers, war rewritten...... a world". Research limitations/implications: The choice of language and argumentation should be given careful attention when the authors craft the accounting frameworks and research papers, and especially when the authors seek to communicate the findings of the research to practitioners. However...

  16. The Gestural Theory of Language Origins

    Science.gov (United States)

    Armstrong, David F.

    2008-01-01

    The idea that iconic visible gesture had something to do with the origin of language, particularly speech, is a frequent element in speculation about this phenomenon and appears early in its history. Socrates hypothesizes about the origins of Greek words in Plato's satirical dialogue, "Cratylus", and his speculation includes a possible…

  17. Apraxia of Speech

    Science.gov (United States)

    ... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...

  18. Milton's "Areopagitica" Freedom of Speech on Campus

    Science.gov (United States)

    Sullivan, Daniel F.

    2006-01-01

    The author discusses the content in John Milton's "Areopagitica: A Speech for the Liberty of Unlicensed Printing to the Parliament of England" (1985) and provides parallelism to censorship practiced in higher education. Originally published in 1644, "Areopagitica" makes a powerful--and precocious--argument for freedom of speech…

  19. Studies of Speech Disorders in Schizophrenia. History and State-of-the-art

    Directory of Open Access Journals (Sweden)

    Shedovskiy E. F.

    2015-08-01

    Full Text Available The article reviews studies of speech disorders in schizophrenia. The authors paid attention to a historical course and characterization of studies of areas: the actual psychopathological (speech disorders as a psychopathological symptoms, their description and taxonomy, psychological (isolated neurons and pathopsychological perspective analysis separately analyzed some modern foreign works, covering a variety of approaches to the study of speech disorders in the endogenous mental disorders. Disorders and features of speech are among the most striking manifestations of schizophrenia along with impaired thinking (Savitskaya A. V., Mikirtumov B. E.. With all the variety of symptoms, speech disorders in schizophrenia could be classified and organized. The few clinical psychological studies of speech activity in schizophrenia presented work on the study of generation and standard speech utterance; features verbal associative process, speed parameters of speech utterances. Special attention is given to integrated research in the mainstream of biological psychiatry and genetic trends. It is shown that the topic for more than a half-century history of originality of speech pathology in schizophrenia has received some coverage in the psychiatric and psychological literature and continues to generate interest in the modern integrated multidisciplinary approach

  20. The application of neural networks for fault diagnosis in nuclear reactors

    International Nuclear Information System (INIS)

    Jalel, N.A.; Nicholson, H.

    1990-11-01

    In recent years considerable work have been done in the field of neural networks due to the recent development of effective learning algorithms, and the results of their applications have suggested that they can provide useful tools for solving practical problems. Artificial neural networks are mathematical models of theorized mind and brain activity. They are aimed to explore and reproduce human information processing tasks such as speech, vision, knowledge processing and control. The possibility of using artificial neural networks for fault and accident diagnosis in the Loss Of Fluid Test (LOFT) reactor, a small scale pressurised water reactor, is examined and explained in the paper. (author)

  1. Apraxia of Speech: Concepts and Controversies

    Science.gov (United States)

    Ziegler, Wolfram; Aichert, Ingrid; Staiger, Anja

    2012-01-01

    Purpose: This article was written as an editorial to a collection of original articles on apraxia of speech (AOS) in which some of the more recent advancements in the understanding of this syndrome are discussed. It covers controversial issues concerning the theoretical foundations of AOS. Our approach was motivated by a change of perspective on…

  2. Deep Neural Network-Based Chinese Semantic Role Labeling

    Institute of Scientific and Technical Information of China (English)

    ZHENG Xiaoqing; CHEN Jun; SHANG Guoqiang

    2017-01-01

    A recent trend in machine learning is to use deep architec-tures to discover multiple levels of features from data, which has achieved impressive results on various natural language processing (NLP) tasks. We propose a deep neural network-based solution to Chinese semantic role labeling (SRL) with its application on message analysis. The solution adopts a six-step strategy: text normalization, named entity recognition (NER), Chinese word segmentation and part-of-speech (POS) tagging, theme classification, SRL, and slot filling. For each step, a novel deep neural network - based model is designed and optimized, particularly for smart phone applications. Ex-periment results on all the NLP sub - tasks of the solution show that the proposed neural networks achieve state-of-the-art performance with the minimal computational cost. The speed advantage of deep neural networks makes them more competitive for large-scale applications or applications requir-ing real-time response, highlighting the potential of the pro-posed solution for practical NLP systems.

  3. Speech Compression

    Directory of Open Access Journals (Sweden)

    Jerry D. Gibson

    2016-06-01

    Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.

  4. Neurophysiological Evidence That Musical Training Influences the Recruitment of Right Hemispheric Homologues for Speech Perception

    Directory of Open Access Journals (Sweden)

    McNeel Gordon Jantzen

    2014-03-01

    Full Text Available Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Kraus & Chandrasekaran, 2010; Parbery-Clark, Skoe, & Kraus, 2009; Zendel & Alain, 2008; Musacchia, Sams, Skoe, & Kraus, 2007. Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus (MTG and superior temporal gyrus (STG in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain.

  5. Speech interaction strategies for a humanoid assistant

    Directory of Open Access Journals (Sweden)

    Stüker Sebastian

    2018-01-01

    Full Text Available The goal of SecondHands, a H2020 project, is to design a robot that can offer help to a maintenance technician in a proactive manner. The robot is to act as a second pair of hands that can assist the technician when he is in need of help. In order for the robot to be of real help to the technician, it needs to understand his needs and follow his commands. Interaction via speech is a crucial part of this. Due to the nature of the situation in which the interactions take place, often the technician needs to speak to the robot when under stress performing strenuous physical labor, the classical turn based interaction schemes need to be transformed into dialogue systems that perform stream processing, anticipating user intentions, correcting itself as more information become available, in order to be able to respond in a rapid manner. In order to meet these demands, we are developing low-latency streaming based automatic speech recognition systems in combination with recurrent neural network based Natural Language Understanding systems that perform slot filling and intent recognition in order for the robot to provide assistance in a rapid manner, that can be partly based on speculative classifications that are then being refined as more speech becomes available.

  6. Audiovisual speech integration in the superior temporal region is dysfunctional in dyslexia.

    Science.gov (United States)

    Ye, Zheng; Rüsseler, Jascha; Gerth, Ivonne; Münte, Thomas F

    2017-07-25

    Dyslexia is an impairment of reading and spelling that affects both children and adults even after many years of schooling. Dyslexic readers have deficits in the integration of auditory and visual inputs but the neural mechanisms of the deficits are still unclear. This fMRI study examined the neural processing of auditorily presented German numbers 0-9 and videos of lip movements of a German native speaker voicing numbers 0-9 in unimodal (auditory or visual) and bimodal (always congruent) conditions in dyslexic readers and their matched fluent readers. We confirmed results of previous studies that the superior temporal gyrus/sulcus plays a critical role in audiovisual speech integration: fluent readers showed greater superior temporal activations for combined audiovisual stimuli than auditory-/visual-only stimuli. Importantly, such an enhancement effect was absent in dyslexic readers. Moreover, the auditory network (bilateral superior temporal regions plus medial PFC) was dynamically modulated during audiovisual integration in fluent, but not in dyslexic readers. These results suggest that superior temporal dysfunction may underly poor audiovisual speech integration in readers with dyslexia. Copyright © 2017 IBRO. Published by Elsevier Ltd. All rights reserved.

  7. Modeling auditory processing and speech perception in hearing-impaired listeners

    DEFF Research Database (Denmark)

    Jepsen, Morten Løve

    in a diagnostic rhyme test. The framework was constructed such that discrimination errors originating from the front-end and the back-end were separated. The front-end was fitted to individual listeners with cochlear hearing loss according to non-speech data, and speech data were obtained in the same listeners......A better understanding of how the human auditory system represents and analyzes sounds and how hearing impairment affects such processing is of great interest for researchers in the fields of auditory neuroscience, audiology, and speech communication as well as for applications in hearing......-instrument and speech technology. In this thesis, the primary focus was on the development and evaluation of a computational model of human auditory signal-processing and perception. The model was initially designed to simulate the normal-hearing auditory system with particular focus on the nonlinear processing...

  8. Low-level neural auditory discrimination dysfunctions in specific language impairment—A review on mismatch negativity findings

    Directory of Open Access Journals (Sweden)

    Teija Kujala

    2017-12-01

    Full Text Available In specific language impairment (SLI, there is a delay in the child’s oral language skills when compared with nonverbal cognitive abilities. The problems typically relate to phonological and morphological processing and word learning. This article reviews studies which have used mismatch negativity (MMN in investigating low-level neural auditory dysfunctions in this disorder. With MMN, it is possible to tap the accuracy of neural sound discrimination and sensory memory functions. These studies have found smaller response amplitudes and longer latencies for speech and non-speech sound changes in children with SLI than in typically developing children, suggesting impaired and slow auditory discrimination in SLI. Furthermore, they suggest shortened sensory memory duration and vulnerability of the sensory memory to masking effects. Importantly, some studies reported associations between MMN parameters and language test measures. In addition, it was found that language intervention can influence the abnormal MMN in children with SLI, enhancing its amplitude. These results suggest that the MMN can shed light on the neural basis of various auditory and memory impairments in SLI, which are likely to influence speech perception. Keywords: Specific language impairment, Auditory processing, Mismatch negativity (MMN

  9. Typical versus delayed speech onset influences verbal reporting of autistic interests.

    Science.gov (United States)

    Chiodo, Liliane; Majerus, Steve; Mottron, Laurent

    2017-01-01

    The distinction between autism and Asperger syndrome has been abandoned in the DSM-5. However, this clinical categorization largely overlaps with the presence or absence of a speech onset delay which is associated with clinical, cognitive, and neural differences. It is unknown whether these different speech development pathways and associated cognitive differences are involved in the heterogeneity of the restricted interests that characterize autistic adults. This study tested the hypothesis that speech onset delay, or conversely, early mastery of speech, orients the nature and verbal reporting of adult autistic interests. The occurrence of a priori defined descriptors for perceptual and thematic dimensions were determined, as well as the perceived function and benefits, in the response of autistic people to a semi-structured interview on their intense interests. The number of words, grammatical categories, and proportion of perceptual / thematic descriptors were computed and compared between groups by variance analyses. The participants comprised 40 autistic adults grouped according to the presence ( N  = 20) or absence ( N  = 20) of speech onset delay, as well as 20 non-autistic adults, also with intense interests, matched for non-verbal intelligence using Raven's Progressive Matrices. The overall nature, function, and benefit of intense interests were similar across autistic subgroups, and between autistic and non-autistic groups. However, autistic participants with a history of speech onset delay used more perceptual than thematic descriptors when talking about their interests, whereas the opposite was true for autistic individuals without speech onset delay. This finding remained significant after controlling for linguistic differences observed between the two groups. Verbal reporting, but not the nature or positive function, of intense interests differed between adult autistic individuals depending on their speech acquisition history: oral reporting of

  10. Speech Disfluency-dependent Amygdala Activity in Adults Who Stutter: Neuroimaging of Interpersonal Communication in MRI Scanner Environment.

    Science.gov (United States)

    Toyomura, Akira; Fujii, Tetsunoshin; Yokosawa, Koichi; Kuriki, Shinya

    2018-03-15

    Affective states, such as anticipatory anxiety, critically influence speech communication behavior in adults who stutter. However, there is currently little evidence regarding the involvement of the limbic system in speech disfluency during interpersonal communication. We designed this neuroimaging study and experimental procedure to sample neural activity during interpersonal communication between human participants, and to investigate the relationship between the amygdala activity and speech disfluency. Participants were required to engage in live communication with a stranger of the opposite sex in the MRI scanner environment. In the gaze condition, the stranger gazed at the participant without speaking, while in the live conversation condition, the stranger asked questions that the participant was required to answer. The stranger continued to gaze silently at the participant while the participant answered. Adults who stutter reported significantly higher discomfort than fluent controls during the experiment. Activity in the right amygdala, a key anatomical region in the limbic system involved in emotion, was significantly correlated with stuttering occurrences in adults who stutter. Right amygdala activity from pooled data of all participants also showed a significant correlation with discomfort level during the experiment. Activity in the prefrontal cortex, which forms emotion regulation neural circuitry with the amygdala, was decreased in adults who stutter than in fluent controls. This is the first study to demonstrate that amygdala activity during interpersonal communication is involved in disfluent speech in adults who stutter. Copyright © 2018 IBRO. Published by Elsevier Ltd. All rights reserved.

  11. Using DEDICOM for completely unsupervised part-of-speech tagging.

    Energy Technology Data Exchange (ETDEWEB)

    Chew, Peter A.; Bader, Brett William; Rozovskaya, Alla (University of Illinois, Urbana, IL)

    2009-02-01

    A standard and widespread approach to part-of-speech tagging is based on Hidden Markov Models (HMMs). An alternative approach, pioneered by Schuetze (1993), induces parts of speech from scratch using singular value decomposition (SVD). We introduce DEDICOM as an alternative to SVD for part-of-speech induction. DEDICOM retains the advantages of SVD in that it is completely unsupervised: no prior knowledge is required to induce either the tagset or the associations of terms with tags. However, unlike SVD, it is also fully compatible with the HMM framework, in that it can be used to estimate emission- and transition-probability matrices which can then be used as the input for an HMM. We apply the DEDICOM method to the CONLL corpus (CONLL 2000) and compare the output of DEDICOM to the part-of-speech tags given in the corpus, and find that the correlation (almost 0.5) is quite high. Using DEDICOM, we also estimate part-of-speech ambiguity for each term, and find that these estimates correlate highly with part-of-speech ambiguity as measured in the original corpus (around 0.88). Finally, we show how the output of DEDICOM can be evaluated and compared against the more familiar output of supervised HMM-based tagging.

  12. Neural Correlates of Temporal Complexity and Synchrony during Audiovisual Correspondence Detection.

    Science.gov (United States)

    Baumann, Oliver; Vromen, Joyce M G; Cheung, Allen; McFadyen, Jessica; Ren, Yudan; Guo, Christine C

    2018-01-01

    We often perceive real-life objects as multisensory cues through space and time. A key challenge for audiovisual integration is to match neural signals that not only originate from different sensory modalities but also that typically reach the observer at slightly different times. In humans, complex, unpredictable audiovisual streams lead to higher levels of perceptual coherence than predictable, rhythmic streams. In addition, perceptual coherence for complex signals seems less affected by increased asynchrony between visual and auditory modalities than for simple signals. Here, we used functional magnetic resonance imaging to determine the human neural correlates of audiovisual signals with different levels of temporal complexity and synchrony. Our study demonstrated that greater perceptual asynchrony and lower signal complexity impaired performance in an audiovisual coherence-matching task. Differences in asynchrony and complexity were also underpinned by a partially different set of brain regions. In particular, our results suggest that, while regions in the dorsolateral prefrontal cortex (DLPFC) were modulated by differences in memory load due to stimulus asynchrony, areas traditionally thought to be involved in speech production and recognition, such as the inferior frontal and superior temporal cortex, were modulated by the temporal complexity of the audiovisual signals. Our results, therefore, indicate specific processing roles for different subregions of the fronto-temporal cortex during audiovisual coherence detection.

  13. From birdsong to human speech recognition: bayesian inference on a hierarchy of nonlinear dynamical systems.

    Science.gov (United States)

    Yildiz, Izzet B; von Kriegstein, Katharina; Kiebel, Stefan J

    2013-01-01

    Our knowledge about the computational mechanisms underlying human learning and recognition of sound sequences, especially speech, is still very limited. One difficulty in deciphering the exact means by which humans recognize speech is that there are scarce experimental findings at a neuronal, microscopic level. Here, we show that our neuronal-computational understanding of speech learning and recognition may be vastly improved by looking at an animal model, i.e., the songbird, which faces the same challenge as humans: to learn and decode complex auditory input, in an online fashion. Motivated by striking similarities between the human and songbird neural recognition systems at the macroscopic level, we assumed that the human brain uses the same computational principles at a microscopic level and translated a birdsong model into a novel human sound learning and recognition model with an emphasis on speech. We show that the resulting Bayesian model with a hierarchy of nonlinear dynamical systems can learn speech samples such as words rapidly and recognize them robustly, even in adverse conditions. In addition, we show that recognition can be performed even when words are spoken by different speakers and with different accents-an everyday situation in which current state-of-the-art speech recognition models often fail. The model can also be used to qualitatively explain behavioral data on human speech learning and derive predictions for future experiments.

  14. From birdsong to human speech recognition: bayesian inference on a hierarchy of nonlinear dynamical systems.

    Directory of Open Access Journals (Sweden)

    Izzet B Yildiz

    Full Text Available Our knowledge about the computational mechanisms underlying human learning and recognition of sound sequences, especially speech, is still very limited. One difficulty in deciphering the exact means by which humans recognize speech is that there are scarce experimental findings at a neuronal, microscopic level. Here, we show that our neuronal-computational understanding of speech learning and recognition may be vastly improved by looking at an animal model, i.e., the songbird, which faces the same challenge as humans: to learn and decode complex auditory input, in an online fashion. Motivated by striking similarities between the human and songbird neural recognition systems at the macroscopic level, we assumed that the human brain uses the same computational principles at a microscopic level and translated a birdsong model into a novel human sound learning and recognition model with an emphasis on speech. We show that the resulting Bayesian model with a hierarchy of nonlinear dynamical systems can learn speech samples such as words rapidly and recognize them robustly, even in adverse conditions. In addition, we show that recognition can be performed even when words are spoken by different speakers and with different accents-an everyday situation in which current state-of-the-art speech recognition models often fail. The model can also be used to qualitatively explain behavioral data on human speech learning and derive predictions for future experiments.

  15. Methodology of Neural Design: Applications in Microwave Engineering

    Directory of Open Access Journals (Sweden)

    Z. Raida

    2006-06-01

    Full Text Available In the paper, an original methodology for the automatic creation of neural models of microwave structures is proposed and verified. Following the methodology, neural models of the prescribed accuracy are built within the minimum CPU time. Validity of the proposed methodology is verified by developing neural models of selected microwave structures. Functionality of neural models is verified in a design - a neural model is joined with a genetic algorithm to find a global minimum of a formulated objective function. The objective function is minimized using different versions of genetic algorithms, and their mutual combinations. The verified methodology of the automated creation of accurate neural models of microwave structures, and their association with global optimization routines are the most important original features of the paper.

  16. Speech-To-Text Conversion STT System Using Hidden Markov Model HMM

    Directory of Open Access Journals (Sweden)

    Su Myat Mon

    2015-06-01

    Full Text Available Abstract Speech is an easiest way to communicate with each other. Speech processing is widely used in many applications like security devices household appliances cellular phones ATM machines and computers. The human computer interface has been developed to communicate or interact conveniently for one who is suffering from some kind of disabilities. Speech-to-Text Conversion STT systems have a lot of benefits for the deaf or dumb people and find their applications in our daily lives. In the same way the aim of the system is to convert the input speech signals into the text output for the deaf or dumb students in the educational fields. This paper presents an approach to extract features by using Mel Frequency Cepstral Coefficients MFCC from the speech signals of isolated spoken words. And Hidden Markov Model HMM method is applied to train and test the audio files to get the recognized spoken word. The speech database is created by using MATLAB.Then the original speech signals are preprocessed and these speech samples are extracted to the feature vectors which are used as the observation sequences of the Hidden Markov Model HMM recognizer. The feature vectors are analyzed in the HMM depending on the number of states.

  17. Speech Language Assessments in Te Reo in a Primary School Maori Immersion Unit

    Science.gov (United States)

    Naidoo, Kershni

    2012-01-01

    This research originated from the need for a speech and language therapy assessment in te reo Maori for a particular child who attended a Maori immersion unit. A Speech and Language Therapy te reo assessment had already been developed but it needed to be revised and normative data collected. Discussions and assessments were carried out in a…

  18. Speech Production and Speech Discrimination by Hearing-Impaired Children.

    Science.gov (United States)

    Novelli-Olmstead, Tina; Ling, Daniel

    1984-01-01

    Seven hearing impaired children (five to seven years old) assigned to the Speakers group made highly significant gains in speech production and auditory discrimination of speech, while Listeners made only slight speech production gains and no gains in auditory discrimination. Combined speech and auditory training was more effective than auditory…

  19. Stuttering Frequency, Speech Rate, Speech Naturalness, and Speech Effort During the Production of Voluntary Stuttering.

    Science.gov (United States)

    Davidow, Jason H; Grossman, Heather L; Edge, Robin L

    2018-05-01

    Voluntary stuttering techniques involve persons who stutter purposefully interjecting disfluencies into their speech. Little research has been conducted on the impact of these techniques on the speech pattern of persons who stutter. The present study examined whether changes in the frequency of voluntary stuttering accompanied changes in stuttering frequency, articulation rate, speech naturalness, and speech effort. In total, 12 persons who stutter aged 16-34 years participated. Participants read four 300-syllable passages during a control condition, and three voluntary stuttering conditions that involved attempting to produce purposeful, tension-free repetitions of initial sounds or syllables of a word for two or more repetitions (i.e., bouncing). The three voluntary stuttering conditions included bouncing on 5%, 10%, and 15% of syllables read. Friedman tests and follow-up Wilcoxon signed ranks tests were conducted for the statistical analyses. Stuttering frequency, articulation rate, and speech naturalness were significantly different between the voluntary stuttering conditions. Speech effort did not differ between the voluntary stuttering conditions. Stuttering frequency was significantly lower during the three voluntary stuttering conditions compared to the control condition, and speech effort was significantly lower during two of the three voluntary stuttering conditions compared to the control condition. Due to changes in articulation rate across the voluntary stuttering conditions, it is difficult to conclude, as has been suggested previously, that voluntary stuttering is the reason for stuttering reductions found when using voluntary stuttering techniques. Additionally, future investigations should examine different types of voluntary stuttering over an extended period of time to determine their impact on stuttering frequency, speech rate, speech naturalness, and speech effort.

  20. Comprehension of synthetic speech and digitized natural speech by adults with aphasia.

    Science.gov (United States)

    Hux, Karen; Knollman-Porter, Kelly; Brown, Jessica; Wallace, Sarah E

    2017-09-01

    Using text-to-speech technology to provide simultaneous written and auditory content presentation may help compensate for chronic reading challenges if people with aphasia can understand synthetic speech output; however, inherent auditory comprehension challenges experienced by people with aphasia may make understanding synthetic speech difficult. This study's purpose was to compare the preferences and auditory comprehension accuracy of people with aphasia when listening to sentences generated with digitized natural speech, Alex synthetic speech (i.e., Macintosh platform), or David synthetic speech (i.e., Windows platform). The methodology required each of 20 participants with aphasia to select one of four images corresponding in meaning to each of 60 sentences comprising three stimulus sets. Results revealed significantly better accuracy given digitized natural speech than either synthetic speech option; however, individual participant performance analyses revealed three patterns: (a) comparable accuracy regardless of speech condition for 30% of participants, (b) comparable accuracy between digitized natural speech and one, but not both, synthetic speech option for 45% of participants, and (c) greater accuracy with digitized natural speech than with either synthetic speech option for remaining participants. Ranking and Likert-scale rating data revealed a preference for digitized natural speech and David synthetic speech over Alex synthetic speech. Results suggest many individuals with aphasia can comprehend synthetic speech options available on popular operating systems. Further examination of synthetic speech use to support reading comprehension through text-to-speech technology is thus warranted. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. Multivoxel Patterns Reveal Functionally Differentiated Networks Underlying Auditory Feedback Processing of Speech

    DEFF Research Database (Denmark)

    Zheng, Zane Z.; Vicente-Grabovetsky, Alejandro; MacDonald, Ewen N.

    2013-01-01

    The everyday act of speaking involves the complex processes of speech motor control. An important component of control is monitoring, detection, and processing of errors when auditory feedback does not correspond to the intended motor gesture. Here we show, using fMRI and converging operations...... within a multivoxel pattern analysis framework, that this sensorimotor process is supported by functionally differentiated brain networks. During scanning, a real-time speech-tracking system was used to deliver two acoustically different types of distorted auditory feedback or unaltered feedback while...... human participants were vocalizing monosyllabic words, and to present the same auditory stimuli while participants were passively listening. Whole-brain analysis of neural-pattern similarity revealed three functional networks that were differentially sensitive to distorted auditory feedback during...

  2. Introductory speeches

    International Nuclear Information System (INIS)

    2001-01-01

    This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering

  3. Brain responses and looking behaviour during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life.

    Directory of Open Access Journals (Sweden)

    Elena V Kushnerenko

    2013-07-01

    Full Text Available The use of visual cues during the processing of audiovisual speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6-9 months to 14-16 months of age. We used eye-tracking to examine whether individual differences in visual attention during audiovisual processing of speech in 6 to 9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6-9 month old infants also participated in an event-related potential (ERP audiovisual task within the same experimental session. Language development was then followed-up at the age of 14-16 months, using two measures of language development, the Preschool Language Scale (PLS and the Oxford Communicative Development Inventory (CDI. The results show that those infants who were less efficient in auditory speech processing at the age of 6-9 months had lower receptive language scores at 14-16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audio-visually incongruent stimuli at 6-9 months were both significantly associated with language development at 14-16 months. These findings add to the understanding of individual differences in neural signatures of audiovisual processing and associated looking behaviour in infants.

  4. Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...

  5. Whole-exome sequencing supports genetic heterogeneity in childhood apraxia of speech

    OpenAIRE

    Worthey, Elizabeth A; Raca, Gordana; Laffin, Jennifer J; Wilk, Brandon M; Harris, Jeremy M; Jakielski, Kathy J; Dimmock, David P; Strand, Edythe A; Shriberg, Lawrence D

    2013-01-01

    Background Childhood apraxia of speech (CAS) is a rare, severe, persistent pediatric motor speech disorder with associated deficits in sensorimotor, cognitive, language, learning and affective processes. Among other neurogenetic origins, CAS is the disorder segregating with a mutation in FOXP2 in a widely studied, multigenerational London family. We report the first whole-exome sequencing (WES) findings from a cohort of 10 unrelated participants, ages 3 to 19 years, with well-characterized CA...

  6. Exploring Australian speech-language pathologists' use and perceptions ofnon-speech oral motor exercises.

    Science.gov (United States)

    Rumbach, Anna F; Rose, Tanya A; Cheah, Mynn

    2018-01-29

    To explore Australian speech-language pathologists' use of non-speech oral motor exercises, and rationales for using/not using non-speech oral motor exercises in clinical practice. A total of 124 speech-language pathologists practising in Australia, working with paediatric and/or adult clients with speech sound difficulties, completed an online survey. The majority of speech-language pathologists reported that they did not use non-speech oral motor exercises when working with paediatric or adult clients with speech sound difficulties. However, more than half of the speech-language pathologists working with adult clients who have dysarthria reported using non-speech oral motor exercises with this population. The most frequently reported rationale for using non-speech oral motor exercises in speech sound difficulty management was to improve awareness/placement of articulators. The majority of speech-language pathologists agreed there is no clear clinical or research evidence base to support non-speech oral motor exercise use with clients who have speech sound difficulties. This study provides an overview of Australian speech-language pathologists' reported use and perceptions of non-speech oral motor exercises' applicability and efficacy in treating paediatric and adult clients who have speech sound difficulties. The research findings provide speech-language pathologists with insight into how and why non-speech oral motor exercises are currently used, and adds to the knowledge base regarding Australian speech-language pathology practice of non-speech oral motor exercises in the treatment of speech sound difficulties. Implications for Rehabilitation Non-speech oral motor exercises refer to oral motor activities which do not involve speech, but involve the manipulation or stimulation of oral structures including the lips, tongue, jaw, and soft palate. Non-speech oral motor exercises are intended to improve the function (e.g., movement, strength) of oral structures. The

  7. Music and speech listening enhance the recovery of early sensory processing after stroke.

    Science.gov (United States)

    Särkämö, Teppo; Pihko, Elina; Laitinen, Sari; Forsblom, Anita; Soinila, Seppo; Mikkonen, Mikko; Autti, Taina; Silvennoinen, Heli M; Erkkilä, Jaakko; Laine, Matti; Peretz, Isabelle; Hietanen, Marja; Tervaniemi, Mari

    2010-12-01

    Our surrounding auditory environment has a dramatic influence on the development of basic auditory and cognitive skills, but little is known about how it influences the recovery of these skills after neural damage. Here, we studied the long-term effects of daily music and speech listening on auditory sensory memory after middle cerebral artery (MCA) stroke. In the acute recovery phase, 60 patients who had middle cerebral artery stroke were randomly assigned to a music listening group, an audio book listening group, or a control group. Auditory sensory memory, as indexed by the magnetic MMN (MMNm) response to changes in sound frequency and duration, was measured 1 week (baseline), 3 months, and 6 months after the stroke with whole-head magnetoencephalography recordings. Fifty-four patients completed the study. Results showed that the amplitude of the frequency MMNm increased significantly more in both music and audio book groups than in the control group during the 6-month poststroke period. In contrast, the duration MMNm amplitude increased more in the audio book group than in the other groups. Moreover, changes in the frequency MMNm amplitude correlated significantly with the behavioral improvement of verbal memory and focused attention induced by music listening. These findings demonstrate that merely listening to music and speech after neural damage can induce long-term plastic changes in early sensory processing, which, in turn, may facilitate the recovery of higher cognitive functions. The neural mechanisms potentially underlying this effect are discussed.

  8. [Improving speech comprehension using a new cochlear implant speech processor].

    Science.gov (United States)

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  9. The analysis of speech acts patterns in two Egyptian inaugural speeches

    Directory of Open Access Journals (Sweden)

    Imad Hayif Sameer

    2017-09-01

    Full Text Available The theory of speech acts, which clarifies what people do when they speak, is not about individual words or sentences that form the basic elements of human communication, but rather about particular speech acts that are performed when uttering words. A speech act is the attempt at doing something purely by speaking. Many things can be done by speaking.  Speech acts are studied under what is called speech act theory, and belong to the domain of pragmatics. In this paper, two Egyptian inaugural speeches from El-Sadat and El-Sisi, belonging to different periods were analyzed to find out whether there were differences within this genre in the same culture or not. The study showed that there was a very small difference between these two speeches which were analyzed according to Searle’s theory of speech acts. In El Sadat’s speech, commissives came to occupy the first place. Meanwhile, in El–Sisi’s speech, assertives occupied the first place. Within the speeches of one culture, we can find that the differences depended on the circumstances that surrounded the elections of the Presidents at the time. Speech acts were tools they used to convey what they wanted and to obtain support from their audiences.

  10. ISOLATED SPEECH RECOGNITION SYSTEM FOR TAMIL LANGUAGE USING STATISTICAL PATTERN MATCHING AND MACHINE LEARNING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    VIMALA C.

    2015-05-01

    Full Text Available In recent years, speech technology has become a vital part of our daily lives. Various techniques have been proposed for developing Automatic Speech Recognition (ASR system and have achieved great success in many applications. Among them, Template Matching techniques like Dynamic Time Warping (DTW, Statistical Pattern Matching techniques such as Hidden Markov Model (HMM and Gaussian Mixture Models (GMM, Machine Learning techniques such as Neural Networks (NN, Support Vector Machine (SVM, and Decision Trees (DT are most popular. The main objective of this paper is to design and develop a speaker-independent isolated speech recognition system for Tamil language using the above speech recognition techniques. The background of ASR system, the steps involved in ASR, merits and demerits of the conventional and machine learning algorithms and the observations made based on the experiments are presented in this paper. For the above developed system, highest word recognition accuracy is achieved with HMM technique. It offered 100% accuracy during training process and 97.92% for testing process.

  11. Speech Problems

    Science.gov (United States)

    ... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...

  12. The effect of filtered speech feedback on the frequency of stuttering

    Science.gov (United States)

    Rami, Manish Krishnakant

    2000-10-01

    whispered speech conditions all decreased the frequency of stuttering while the approximate glottal source did not. It is suggested that articulatory events, chiefly the encoded speech output of the vocal tract origin, afford effective cues and induces fluent speech in people who stutter.

  13. A Danish open-set speech corpus for competing-speech studies

    DEFF Research Database (Denmark)

    Nielsen, Jens Bo; Dau, Torsten; Neher, Tobias

    2014-01-01

    Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs......) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded...... with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed...

  14. Multimodal Speech Capture System for Speech Rehabilitation and Learning.

    Science.gov (United States)

    Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam

    2017-11-01

    Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.

  15. Noise-robust speech recognition through auditory feature detection and spike sequence decoding.

    Science.gov (United States)

    Schafer, Phillip B; Jin, Dezhe Z

    2014-03-01

    Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.

  16. Direct speech quotations promote low relative-clause attachment in silent reading of English.

    Science.gov (United States)

    Yao, Bo; Scheepers, Christoph

    2018-07-01

    The implicit prosody hypothesis (Fodor, 1998, 2002) proposes that silent reading coincides with a default, implicit form of prosody to facilitate sentence processing. Recent research demonstrated that a more vivid form of implicit prosody is mentally simulated during silent reading of direct speech quotations (e.g., Mary said, "This dress is beautiful"), with neural and behavioural consequences (e.g., Yao, Belin, & Scheepers, 2011; Yao & Scheepers, 2011). Here, we explored the relation between 'default' and 'simulated' implicit prosody in the context of relative-clause (RC) attachment in English. Apart from confirming a general low RC-attachment preference in both production (Experiment 1) and comprehension (Experiments 2 and 3), we found that during written sentence completion (Experiment 1) or when reading silently (Experiment 2), the low RC-attachment preference was reliably enhanced when the critical sentences were embedded in direct speech quotations as compared to indirect speech or narrative sentences. However, when reading aloud (Experiment 3), direct speech did not enhance the general low RC-attachment preference. The results from Experiments 1 and 2 suggest a quantitative boost to implicit prosody (via auditory perceptual simulation) during silent production/comprehension of direct speech. By contrast, when reading aloud (Experiment 3), prosody becomes equally salient across conditions due to its explicit nature; indirect speech and narrative sentences thus become as susceptible to prosody-induced syntactic biases as direct speech. The present findings suggest a shared cognitive basis between default implicit prosody and simulated implicit prosody, providing a new platform for studying the effects of implicit prosody on sentence processing. Copyright © 2018 Elsevier B.V. All rights reserved.

  17. Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

    Science.gov (United States)

    van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

    2007-01-01

    Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…

  18. The pathways for intelligible speech: multivariate and univariate perspectives.

    Science.gov (United States)

    Evans, S; Kyong, J S; Rosen, S; Golestani, N; Warren, J E; McGettigan, C; Mourão-Miranda, J; Wise, R J S; Scott, S K

    2014-09-01

    An anterior pathway, concerned with extracting meaning from sound, has been identified in nonhuman primates. An analogous pathway has been suggested in humans, but controversy exists concerning the degree of lateralization and the precise location where responses to intelligible speech emerge. We have demonstrated that the left anterior superior temporal sulcus (STS) responds preferentially to intelligible speech (Scott SK, Blank CC, Rosen S, Wise RJS. 2000. Identification of a pathway for intelligible speech in the left temporal lobe. Brain. 123:2400-2406.). A functional magnetic resonance imaging study in Cerebral Cortex used equivalent stimuli and univariate and multivariate analyses to argue for the greater importance of bilateral posterior when compared with the left anterior STS in responding to intelligible speech (Okada K, Rong F, Venezia J, Matchin W, Hsieh IH, Saberi K, Serences JT,Hickok G. 2010. Hierarchical organization of human auditory cortex: evidence from acoustic invariance in the response to intelligible speech. 20: 2486-2495.). Here, we also replicate our original study, demonstrating that the left anterior STS exhibits the strongest univariate response and, in decoding using the bilateral temporal cortex, contains the most informative voxels showing an increased response to intelligible speech. In contrast, in classifications using local "searchlights" and a whole brain analysis, we find greater classification accuracy in posterior rather than anterior temporal regions. Thus, we show that the precise nature of the multivariate analysis used will emphasize different response profiles associated with complex sound to speech processing. © The Author 2013. Published by Oxford University Press.

  19. Efeitos do potencial de ação neural sobre a percepção de fala em usuários de implante coclear Influence of evoked compound action potential on speech perception in cochlear implant users

    Directory of Open Access Journals (Sweden)

    Mariana Cardoso Guedes

    2007-08-01

    Full Text Available O Potencial de Ação Composto Evocado Eletricamente reflete a atividade do nervo auditivo, podendo ser registrado através dos eletrodos do implante coclear. A determinação dos elementos neurais estimuláveis pode contribuir para explicar a variabilidade de desempenho entre indivíduos implantados. OBJETIVO: Comparar o desempenho nos testes de percepção da fala entre pacientes que apresentaram e que não apresentaram potencial de ação composto evocado eletricamente no momento intra-operatório. MATERIAL E MÉTODO: Estudo prospectivo no qual 100 indivíduos usuários do implante coclear Nucleus 24 foram divididos em dois grupos de acordo com a presença ou ausência do potencial de ação intra-operatório. Após 6 meses de uso do dispositivo, os resultados dos testes de percepção de fala foram comparados entre os grupos. RESULTADOS: O potencial foi observado em 72% dos pacientes. A percepção no teste de frases em formato aberto foi melhor nos indivíduos com presença de potencial (média 82,8% contra 41,0%, p = 0,005. Houve associação entre ausência do potencial e etiologia da surdez por meningite. CONCLUSÃO: Ausência de potencial neural intraoperatório esteve associada ao pior desempenho na percepção da fala e à etiologia da surdez por meningite. Por outro lado, a presença do potencial de ação intraoperatório sugere ótimo prognóstico.Electrically Evoked Compound Action Potential is a measure of synchronous cochlear nerve fibers activity elicited by electrical stimulation of the cochlear implant. The electrophysiological nerve responses may contribute to explain the variability in individual performance of cochlear implant recipients. AIM: To compare speech perception tests’ performances of cochlear implant users according to the presence or absence of intraoperative neural telemetry responses. MATERIAL AND METHOD: Prospective study design with 100 "Nucleus 24" cochlear implant users divided in two groups according

  20. From persuasive to authoritative speech genres Writing accounting research for a practitioner audience

    NARCIS (Netherlands)

    Norreklit, Hanne; Scapens, Robert W.

    2014-01-01

    Purpose - The purpose of this paper is to contrast the speech genres in the original and the published versions of an article written by academic researchers and published in the US practitioner-oriented journal, Strategic Finance. The original version, submitted by the researchers, was rewritten by

  1. Differentiation between non-neural and neural contributors to ankle joint stiffness in cerebral palsy.

    Science.gov (United States)

    de Gooijer-van de Groep, Karin L; de Vlugt, Erwin; de Groot, Jurriaan H; van der Heijden-Maessen, Hélène C M; Wielheesen, Dennis H M; van Wijlen-Hempel, Rietje M S; Arendzen, J Hans; Meskers, Carel G M

    2013-07-23

    Spastic paresis in cerebral palsy (CP) is characterized by increased joint stiffness that may be of neural origin, i.e. improper muscle activation caused by e.g. hyperreflexia or non-neural origin, i.e. altered tissue viscoelastic properties (clinically: "spasticity" vs. "contracture"). Differentiation between these components is hard to achieve by common manual tests. We applied an assessment instrument to obtain quantitative measures of neural and non-neural contributions to ankle joint stiffness in CP. Twenty-three adolescents with CP and eleven healthy subjects were seated with their foot fixated to an electrically powered single axis footplate. Passive ramp-and-hold rotations were applied over full ankle range of motion (RoM) at low and high velocities. Subject specific tissue stiffness, viscosity and reflexive torque were estimated from ankle angle, torque and triceps surae EMG activity using a neuromuscular model. In CP, triceps surae reflexive torque was on average 5.7 times larger (p = .002) and tissue stiffness 2.1 times larger (p = .018) compared to controls. High tissue stiffness was associated with reduced RoM (p therapy.

  2. Enhancement of speech signals - with a focus on voiced speech models

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie

    This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based...... on this model. The basic model used in this thesis is the harmonic model which is a commonly used model for describing the voiced part of the speech signal. We show that it can be beneficial to extend the model to take inharmonicities or the non-stationarity of speech into account. Extending the model...

  3. Large Scale Functional Brain Networks Underlying Temporal Integration of Audio-Visual Speech Perception: An EEG Study.

    Science.gov (United States)

    Kumar, G Vinodh; Halder, Tamesh; Jaiswal, Amit K; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan

    2016-01-01

    Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300-600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus, our

  4. Effects of sounds of locomotion on speech perception

    Directory of Open Access Journals (Sweden)

    Matz Larsson

    2015-01-01

    Full Text Available Human locomotion typically creates noise, a possible consequence of which is the masking of sound signals originating in the surroundings. When walking side by side, people often subconsciously synchronize their steps. The neurophysiological and evolutionary background of this behavior is unclear. The present study investigated the potential of sound created by walking to mask perception of speech and compared the masking produced by walking in step with that produced by unsynchronized walking. The masking sound (footsteps on gravel and the target sound (speech were presented through the same speaker to 15 normal-hearing subjects. The original recorded walking sound was modified to mimic the sound of two individuals walking in pace or walking out of synchrony. The participants were instructed to adjust the sound level of the target sound until they could just comprehend the speech signal ("just follow conversation" or JFC level when presented simultaneously with synchronized or unsynchronized walking sound at 40 dBA, 50 dBA, 60 dBA, or 70 dBA. Synchronized walking sounds produced slightly less masking of speech than did unsynchronized sound. The median JFC threshold in the synchronized condition was 38.5 dBA, while the corresponding value for the unsynchronized condition was 41.2 dBA. Combined results at all sound pressure levels showed an improvement in the signal-to-noise ratio (SNR for synchronized footsteps; the median difference was 2.7 dB and the mean difference was 1.2 dB [P < 0.001, repeated-measures analysis of variance (RM-ANOVA]. The difference was significant for masker levels of 50 dBA and 60 dBA, but not for 40 dBA or 70 dBA. This study provides evidence that synchronized walking may reduce the masking potential of footsteps.

  5. Intelligibility for Binaural Speech with Discarded Low-SNR Speech Components.

    Science.gov (United States)

    Schoenmaker, Esther; van de Par, Steven

    2016-01-01

    Speech intelligibility in multitalker settings improves when the target speaker is spatially separated from the interfering speakers. A factor that may contribute to this improvement is the improved detectability of target-speech components due to binaural interaction in analogy to the Binaural Masking Level Difference (BMLD). This would allow listeners to hear target speech components within specific time-frequency intervals that have a negative SNR, similar to the improvement in the detectability of a tone in noise when these contain disparate interaural difference cues. To investigate whether these negative-SNR target-speech components indeed contribute to speech intelligibility, a stimulus manipulation was performed where all target components were removed when local SNRs were smaller than a certain criterion value. It can be expected that for sufficiently high criterion values target speech components will be removed that do contribute to speech intelligibility. For spatially separated speakers, assuming that a BMLD-like detection advantage contributes to intelligibility, degradation in intelligibility is expected already at criterion values below 0 dB SNR. However, for collocated speakers it is expected that higher criterion values can be applied without impairing speech intelligibility. Results show that degradation of intelligibility for separated speakers is only seen for criterion values of 0 dB and above, indicating a negligible contribution of a BMLD-like detection advantage in multitalker settings. These results show that the spatial benefit is related to a spatial separation of speech components at positive local SNRs rather than to a BMLD-like detection improvement for speech components at negative local SNRs.

  6. An experimental Dutch keyboard-to-speech system for the speech impaired

    NARCIS (Netherlands)

    Deliege, R.J.H.

    1989-01-01

    An experimental Dutch keyboard-to-speech system has been developed to explor the possibilities and limitations of Dutch speech synthesis in a communication aid for the speech impaired. The system uses diphones and a formant synthesizer chip for speech synthesis. Input to the system is in

  7. Functional magnetic resonance imaging exploration of combined hand and speech movements in Parkinson's disease.

    Science.gov (United States)

    Pinto, Serge; Mancini, Laura; Jahanshahi, Marjan; Thornton, John S; Tripoliti, Elina; Yousry, Tarek A; Limousin, Patricia

    2011-10-01

    Among the repertoire of motor functions, although hand movement and speech production tasks have been investigated widely by functional neuroimaging, paradigms combining both movements have been studied less so. Such paradigms are of particular interest in Parkinson's disease, in which patients have specific difficulties performing two movements simultaneously. In 9 unmedicated patients with Parkinson's disease and 15 healthy control subjects, externally cued tasks (i.e., hand movement, speech production, and combined hand movement and speech production) were performed twice in a random order and functional magnetic resonance imaging detected cerebral activations, compared to the rest. F-statistics tested within-group (significant activations at P values 10 voxels). For control subjects, the combined task activations comprised the sum of those obtained during hand movement and speech production performed separately, reflecting the neural correlates of performing movements sharing similar programming modalities. In patients with Parkinson's disease, only activations underlying hand movement were observed during the combined task. We interpreted this phenomenon as patients' potential inability to recruit facilitatory activations while performing two movements simultaneously. This lost capacity could be related to a functional prioritization of one movement (i.e., hand movement), in comparison with the other (i.e., speech production). Our observation could also reflect the inability of patients with Parkinson's disease to intrinsically engage the motor coordination necessary to perform a combined task. Copyright © 2011 Movement Disorder Society.

  8. Development of the nervus terminalis: origin and migration.

    Science.gov (United States)

    Whitlock, Kathleen E

    2004-09-01

    The origin of the nervus terminalis is one of the least well understood developmental events involved in generating the cranial ganglia of the forebrain in vertebrate animals. This cranial nerve forms at the formidable interface of the anteriormost limits of migrating cranial neural crest cells, the terminal end of the neural tube and the differentiating olfactory and adenohypophyseal placodes. The complex cellular interactions that give rise to the various structures associated with the sensory placode (olfactory) and endocrine placode (adenohypophysis) surround and engulf this enigmatic cranial nerve. The tortured history of nervus terminalis development (see von Bartheld, this issue, pages 13-24) reflects the lack of consensus on the origin (or origins), as well as the experimental difficulties in uncovering the origin, of the nervus terminalis. Recent technical advances have allowed us to make headway in understanding the origin(s) of this nerve. The emergence of the externally fertilized zebrafish embryo as a model system for developmental biology and genetics has shed new light on this century-old problem. Coupled with new developmental models are techniques that allow us to trace lineage, visualize gene expression, and genetically ablate cells, adding to our experimental tools with which to follow up on studies provided by our scientific predecessors. Through these techniques, a picture is emerging in which the origin of at least a subset of the nervus terminalis cells lies in the cranial neural crest. In this review, the data surrounding this finding will be discussed in light of recent findings on neural crest and placode origins. Copyright 2004 Wiley-Liss, Inc.

  9. Interpretations of Frequency Domain Analyses of Neural Entrainment: Periodicity, Fundamental Frequency, and Harmonics.

    Science.gov (United States)

    Zhou, Hong; Melloni, Lucia; Poeppel, David; Ding, Nai

    2016-01-01

    Brain activity can follow the rhythms of dynamic sensory stimuli, such as speech and music, a phenomenon called neural entrainment. It has been hypothesized that low-frequency neural entrainment in the neural delta and theta bands provides a potential mechanism to represent and integrate temporal information. Low-frequency neural entrainment is often studied using periodically changing stimuli and is analyzed in the frequency domain using the Fourier analysis. The Fourier analysis decomposes a periodic signal into harmonically related sinusoids. However, it is not intuitive how these harmonically related components are related to the response waveform. Here, we explain the interpretation of response harmonics, with a special focus on very low-frequency neural entrainment near 1 Hz. It is illustrated why neural responses repeating at f Hz do not necessarily generate any neural response at f Hz in the Fourier spectrum. A strong neural response at f Hz indicates that the time scales of the neural response waveform within each cycle match the time scales of the stimulus rhythm. Therefore, neural entrainment at very low frequency implies not only that the neural response repeats at f Hz but also that each period of the neural response is a slow wave matching the time scale of a f Hz sinusoid.

  10. Speech Function and Speech Role in Carl Fredricksen's Dialogue on Up Movie

    OpenAIRE

    Rehana, Ridha; Silitonga, Sortha

    2013-01-01

    One aim of this article is to show through a concrete example how speech function and speech role used in movie. The illustrative example is taken from the dialogue of Up movie. Central to the analysis proper form of dialogue on Up movie that contain of speech function and speech role; i.e. statement, offer, question, command, giving, and demanding. 269 dialogue were interpreted by actor, and it was found that the use of speech function and speech role.

  11. Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

    Science.gov (United States)

    Larm, Petra; Hongisto, Valtteri

    2006-02-01

    During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.

  12. Bio-Inspired Neural Model for Learning Dynamic Models

    Science.gov (United States)

    Duong, Tuan; Duong, Vu; Suri, Ronald

    2009-01-01

    A neural-network mathematical model that, relative to prior such models, places greater emphasis on some of the temporal aspects of real neural physical processes, has been proposed as a basis for massively parallel, distributed algorithms that learn dynamic models of possibly complex external processes by means of learning rules that are local in space and time. The algorithms could be made to perform such functions as recognition and prediction of words in speech and of objects depicted in video images. The approach embodied in this model is said to be "hardware-friendly" in the following sense: The algorithms would be amenable to execution by special-purpose computers implemented as very-large-scale integrated (VLSI) circuits that would operate at relatively high speeds and low power demands.

  13. Robust Speech/Non-Speech Classification in Heterogeneous Multimedia Content

    NARCIS (Netherlands)

    Huijbregts, M.A.H.; de Jong, Franciska M.G.

    In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because

  14. Experiments on Automatic Recognition of Nonnative Arabic Speech

    Directory of Open Access Journals (Sweden)

    Douglas O'Shaughnessy

    2008-05-01

    Full Text Available The automatic recognition of foreign-accented Arabic speech is a challenging task since it involves a large number of nonnative accents. As well, the nonnative speech data available for training are generally insufficient. Moreover, as compared to other languages, the Arabic language has sparked a relatively small number of research efforts. In this paper, we are concerned with the problem of nonnative speech in a speaker independent, large-vocabulary speech recognition system for modern standard Arabic (MSA. We analyze some major differences at the phonetic level in order to determine which phonemes have a significant part in the recognition performance for both native and nonnative speakers. Special attention is given to specific Arabic phonemes. The performance of an HMM-based Arabic speech recognition system is analyzed with respect to speaker gender and its native origin. The WestPoint modern standard Arabic database from the language data consortium (LDC and the hidden Markov Model Toolkit (HTK are used throughout all experiments. Our study shows that the best performance in the overall phoneme recognition is obtained when nonnative speakers are involved in both training and testing phases. This is not the case when a language model and phonetic lattice networks are incorporated in the system. At the phonetic level, the results show that female nonnative speakers perform better than nonnative male speakers, and that emphatic phonemes yield a significant decrease in performance when they are uttered by both male and female nonnative speakers.

  15. Experiments on Automatic Recognition of Nonnative Arabic Speech

    Directory of Open Access Journals (Sweden)

    Selouani Sid-Ahmed

    2008-01-01

    Full Text Available The automatic recognition of foreign-accented Arabic speech is a challenging task since it involves a large number of nonnative accents. As well, the nonnative speech data available for training are generally insufficient. Moreover, as compared to other languages, the Arabic language has sparked a relatively small number of research efforts. In this paper, we are concerned with the problem of nonnative speech in a speaker independent, large-vocabulary speech recognition system for modern standard Arabic (MSA. We analyze some major differences at the phonetic level in order to determine which phonemes have a significant part in the recognition performance for both native and nonnative speakers. Special attention is given to specific Arabic phonemes. The performance of an HMM-based Arabic speech recognition system is analyzed with respect to speaker gender and its native origin. The WestPoint modern standard Arabic database from the language data consortium (LDC and the hidden Markov Model Toolkit (HTK are used throughout all experiments. Our study shows that the best performance in the overall phoneme recognition is obtained when nonnative speakers are involved in both training and testing phases. This is not the case when a language model and phonetic lattice networks are incorporated in the system. At the phonetic level, the results show that female nonnative speakers perform better than nonnative male speakers, and that emphatic phonemes yield a significant decrease in performance when they are uttered by both male and female nonnative speakers.

  16. The speech-based envelope power spectrum model (sEPSM) family: Development, achievements, and current challenges

    DEFF Research Database (Denmark)

    Relano-Iborra, Helia; Chabot-Leclerc, Alexandre; Scheidiger, Christoph

    2017-01-01

    have extended the predictive power of the original model to a broad range of conditions. This contribution presents the most recent developments within the sEPSM “family:” (i) A binaural extension, the B-sEPSM [Chabot-Leclerc et al. (2016). J. Acoust. Soc. Am. 140(1), 192-205] which combines better......Intelligibility models provide insights regarding the effects of target speech characteristics, transmission channels and/or auditory processing on the speech perception performance of listeners. In 2011, Jørgensen and Dau proposed the speech-based envelope power spectrum model [sEPSM, Jørgensen...

  17. Feature Fusion Algorithm for Multimodal Emotion Recognition from Speech and Facial Expression Signal

    Directory of Open Access Journals (Sweden)

    Han Zhiyan

    2016-01-01

    Full Text Available In order to overcome the limitation of single mode emotion recognition. This paper describes a novel multimodal emotion recognition algorithm, and takes speech signal and facial expression signal as the research subjects. First, fuse the speech signal feature and facial expression signal feature, get sample sets by putting back sampling, and then get classifiers by BP neural network (BPNN. Second, measure the difference between two classifiers by double error difference selection strategy. Finally, get the final recognition result by the majority voting rule. Experiments show the method improves the accuracy of emotion recognition by giving full play to the advantages of decision level fusion and feature level fusion, and makes the whole fusion process close to human emotion recognition more, with a recognition rate 90.4%.

  18. Intelligibility of speech of children with speech and sound disorders

    OpenAIRE

    Ivetac, Tina

    2014-01-01

    The purpose of this study is to examine speech intelligibility of children with primary speech and sound disorders aged 3 to 6 years in everyday life. The research problem is based on the degree to which parents or guardians, immediate family members (sister, brother, grandparents), extended family members (aunt, uncle, cousin), child's friends, other acquaintances, child's teachers and strangers understand the speech of children with speech sound disorders. We examined whether the level ...

  19. Speech disorders - children

    Science.gov (United States)

    ... disorder; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder; Speech disorder - stuttering ... evaluation tools that can help identify and diagnose speech disorders: Denver Articulation Screening Examination Goldman-Fristoe Test of ...

  20. The contribution of dynamic visual cues to audiovisual speech perception.

    Science.gov (United States)

    Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador

    2015-08-01

    Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Neurophysiology of speech differences in childhood apraxia of speech.

    Science.gov (United States)

    Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole

    2014-01-01

    Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.

  2. Whole-exome sequencing supports genetic heterogeneity in childhood apraxia of speech.

    Science.gov (United States)

    Worthey, Elizabeth A; Raca, Gordana; Laffin, Jennifer J; Wilk, Brandon M; Harris, Jeremy M; Jakielski, Kathy J; Dimmock, David P; Strand, Edythe A; Shriberg, Lawrence D

    2013-10-02

    Childhood apraxia of speech (CAS) is a rare, severe, persistent pediatric motor speech disorder with associated deficits in sensorimotor, cognitive, language, learning and affective processes. Among other neurogenetic origins, CAS is the disorder segregating with a mutation in FOXP2 in a widely studied, multigenerational London family. We report the first whole-exome sequencing (WES) findings from a cohort of 10 unrelated participants, ages 3 to 19 years, with well-characterized CAS. As part of a larger study of children and youth with motor speech sound disorders, 32 participants were classified as positive for CAS on the basis of a behavioral classification marker using auditory-perceptual and acoustic methods that quantify the competence, precision and stability of a speaker's speech, prosody and voice. WES of 10 randomly selected participants was completed using the Illumina Genome Analyzer IIx Sequencing System. Image analysis, base calling, demultiplexing, read mapping, and variant calling were performed using Illumina software. Software developed in-house was used for variant annotation, prioritization and interpretation to identify those variants likely to be deleterious to neurodevelopmental substrates of speech-language development. Among potentially deleterious variants, clinically reportable findings of interest occurred on a total of five chromosomes (Chr3, Chr6, Chr7, Chr9 and Chr17), which included six genes either strongly associated with CAS (FOXP1 and CNTNAP2) or associated with disorders with phenotypes overlapping CAS (ATP13A4, CNTNAP1, KIAA0319 and SETX). A total of 8 (80%) of the 10 participants had clinically reportable variants in one or two of the six genes, with variants in ATP13A4, KIAA0319 and CNTNAP2 being the most prevalent. Similar to the results reported in emerging WES studies of other complex neurodevelopmental disorders, our findings from this first WES study of CAS are interpreted as support for heterogeneous genetic origins of

  3. Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS

    Directory of Open Access Journals (Sweden)

    Christopher Bergmeir

    2012-01-01

    Full Text Available Neural networks are important standard machine learning procedures for classification and regression. We describe the R package RSNNS that provides a convenient interface to the popular Stuttgart Neural Network Simulator SNNS. The main features are (a encapsulation of the relevant SNNS parts in a C++ class, for sequential and parallel usage of different networks, (b accessibility of all of the SNNSalgorithmic functionality from R using a low-level interface, and (c a high-level interface for convenient, R-style usage of many standard neural network procedures. The package also includes functions for visualization and analysis of the models and the training procedures, as well as functions for data input/output from/to the original SNNSfile formats.

  4. Atypical speech lateralization in adults with developmental coordination disorder demonstrated using functional transcranial Doppler ultrasound.

    Science.gov (United States)

    Hodgson, Jessica C; Hudson, John M

    2017-03-01

    Research using clinical populations to explore the relationship between hemispheric speech lateralization and handedness has focused on individuals with speech and language disorders, such as dyslexia or specific language impairment (SLI). Such work reveals atypical patterns of cerebral lateralization and handedness in these groups compared to controls. There are few studies that examine this relationship in people with motor coordination impairments but without speech or reading deficits, which is a surprising omission given the prevalence of theories suggesting a common neural network underlying both functions. We use an emerging imaging technique in cognitive neuroscience; functional transcranial Doppler (fTCD) ultrasound, to assess whether individuals with developmental coordination disorder (DCD) display reduced left-hemisphere lateralization for speech production compared to control participants. Twelve adult control participants and 12 adults with DCD, but no other developmental/cognitive impairments, performed a word-generation task whilst undergoing fTCD imaging to establish a hemispheric lateralization index for speech production. All participants also completed an electronic peg-moving task to determine hand skill. As predicted, the DCD group showed a significantly reduced left lateralization pattern for the speech production task compared to controls. Performance on the motor skill task showed a clear preference for the dominant hand across both groups; however, the DCD group mean movement times were significantly higher for the non-dominant hand. This is the first study of its kind to assess hand skill and speech lateralization in DCD. The results reveal a reduced leftwards asymmetry for speech and a slower motor performance. This fits alongside previous work showing atypical cerebral lateralization in DCD for other cognitive processes (e.g., executive function and short-term memory) and thus speaks to debates on theories of the links between motor

  5. A Framework for Automated Marmoset Vocalization Detection And Classification

    Science.gov (United States)

    2016-09-08

    for studying the origins and neural basis of human language. Vocalizations belonging to the same species, or Conspecific Vocalizations (CVs), are...applications including automatic speech recognition [17], speech enhancement [18], voice activity detection [19], hyper-nasality detection [20], and emotion ...vocalizations. The feature sets chosen have the desirable property of capturing characteristics of the signals that are useful in both identifying and

  6. On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech

    Directory of Open Access Journals (Sweden)

    Wesley Mattheyses

    2009-01-01

    Full Text Available Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two information streams requires experimental exploration since it could degrade the quality of the output. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the well-known unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video. Subjective experiments confirm that the audiovisual signals created by our multimodal synthesis strategy are indeed perceived as being more synchronous than those of systems in which both modes are not intrinsically coherent. Furthermore, it is shown that the degree of coherence between the auditory mode and the visual mode has an influence on the perceived quality of the synthetic visual speech fragment. In addition, the audio quality was found to have only a minor influence on the perceived visual signal's quality.

  7. Cortical Representations of Speech in a Multitalker Auditory Scene.

    Science.gov (United States)

    Puvvada, Krishna C; Simon, Jonathan Z

    2017-09-20

    The ability to parse a complex auditory scene into perceptual objects is facilitated by a hierarchical auditory system. Successive stages in the hierarchy transform an auditory scene of multiple overlapping sources, from peripheral tonotopically based representations in the auditory nerve, into perceptually distinct auditory-object-based representations in the auditory cortex. Here, using magnetoencephalography recordings from men and women, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in distinct hierarchical stages of the auditory cortex. Using systems-theoretic methods of stimulus reconstruction, we show that the primary-like areas in the auditory cortex contain dominantly spectrotemporal-based representations of the entire auditory scene. Here, both attended and ignored speech streams are represented with almost equal fidelity, and a global representation of the full auditory scene with all its streams is a better candidate neural representation than that of individual streams being represented separately. We also show that higher-order auditory cortical areas, by contrast, represent the attended stream separately and with significantly higher fidelity than unattended streams. Furthermore, the unattended background streams are more faithfully represented as a single unsegregated background object rather than as separated objects. Together, these findings demonstrate the progression of the representations and processing of a complex acoustic scene up through the hierarchy of the human auditory cortex. SIGNIFICANCE STATEMENT Using magnetoencephalography recordings from human listeners in a simulated cocktail party environment, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in separate hierarchical stages of the auditory cortex. We show that the primary-like areas in the auditory cortex use a dominantly spectrotemporal-based representation of the entire auditory

  8. A glimpsing account of the role of temporal fine structure information in speech recognition.

    Science.gov (United States)

    Apoux, Frédéric; Healy, Eric W

    2013-01-01

    Many behavioral studies have reported a significant decrease in intelligibility when the temporal fine structure (TFS) of a sound mixture is replaced with noise or tones (i.e., vocoder processing). This finding has led to the conclusion that TFS information is critical for speech recognition in noise. How the normal -auditory system takes advantage of the original TFS, however, remains unclear. Three -experiments on the role of TFS in noise are described. All three experiments measured speech recognition in various backgrounds while manipulating the envelope, TFS, or both. One experiment tested the hypothesis that vocoder processing may artificially increase the apparent importance of TFS cues. Another experiment evaluated the relative contribution of the target and masker TFS by disturbing only the TFS of the target or that of the masker. Finally, a last experiment evaluated the -relative contribution of envelope and TFS information. In contrast to previous -studies, however, the original envelope and TFS were both preserved - to some extent - in all conditions. Overall, the experiments indicate a limited influence of TFS and suggest that little speech information is extracted from the TFS. Concomitantly, these experiments confirm that most speech information is carried by the temporal envelope in real-world conditions. When interpreted within the framework of the glimpsing model, the results of these experiments suggest that TFS is primarily used as a grouping cue to select the time-frequency regions -corresponding to the target speech signal.

  9. Neural overlap of L1 and L2 semantic representations in speech : A decoding approach

    NARCIS (Netherlands)

    Van De Putte, Eowyn; De Baene, W.; Brass, Marcel; Duyck, Wouter

    2017-01-01

    Although research has now converged towards a consensus that both languages of a bilingual are represented in at least partly shared systems for language comprehension, it remains unclear whether both languages are represented in the same neural populations for production. We investigated the neural

  10. Proceedings of the workshop cum symposium on applications of neural networks in nuclear science and industry

    International Nuclear Information System (INIS)

    1993-01-01

    The Workshop cum Symposium on Application of Neural Networks in Nuclear Science and Industry was held at Bombay during November 24-26. 1993. The past decade has seen many important advances in the design and technology of artificial neural networks in research and industry. Neural networks is an interdisciplinary field covering a broad spectrum of applications in surveillance, diagnosis of nuclear power plants, nuclear spectroscopy, speech and written text recognition, robotic control, signal processing etc. The objective of the symposium was to promote awareness of advances in neural network research and applications. It was also aimed at conducting the review of the present status and giving direction for future technological developments. Contributed papers have been organized into the following groups: a) neural network architectures, learning algorithms and modelling, b) computer vision and image processing, c) signal processing, d) neural networks and fuzzy systems, e) nuclear applications and f) neural networks and allied applications. Papers relevant to INIS are indexed separately. (M.K.V.)

  11. Neural initialization of audiovisual integration in prereaders at varying risk for developmental dyslexia.

    Science.gov (United States)

    I Karipidis, Iliana; Pleisch, Georgette; Röthlisberger, Martina; Hofstetter, Christoph; Dornbierer, Dario; Stämpfli, Philipp; Brem, Silvia

    2017-02-01

    Learning letter-speech sound correspondences is a major step in reading acquisition and is severely impaired in children with dyslexia. Up to now, it remains largely unknown how quickly neural networks adopt specific functions during audiovisual integration of linguistic information when prereading children learn letter-speech sound correspondences. Here, we simulated the process of learning letter-speech sound correspondences in 20 prereading children (6.13-7.17 years) at varying risk for dyslexia by training artificial letter-speech sound correspondences within a single experimental session. Subsequently, we acquired simultaneously event-related potentials (ERP) and functional magnetic resonance imaging (fMRI) scans during implicit audiovisual presentation of trained and untrained pairs. Audiovisual integration of trained pairs correlated with individual learning rates in right superior temporal, left inferior temporal, and bilateral parietal areas and with phonological awareness in left temporal areas. In correspondence, a differential left-lateralized parietooccipitotemporal ERP at 400 ms for trained pairs correlated with learning achievement and familial risk. Finally, a late (650 ms) posterior negativity indicating audiovisual congruency of trained pairs was associated with increased fMRI activation in the left occipital cortex. Taken together, a short (audiovisual integration in neural systems that are responsible for processing linguistic information in proficient readers. To conclude, the ability to learn grapheme-phoneme correspondences, the familial history of reading disability, and phonological awareness of prereading children account for the degree of audiovisual integration in a distributed brain network. Such findings on emerging linguistic audiovisual integration could allow for distinguishing between children with typical and atypical reading development. Hum Brain Mapp 38:1038-1055, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals

  12. Listeners Experience Linguistic Masking Release in Noise-Vocoded Speech-in-Speech Recognition

    Science.gov (United States)

    Viswanathan, Navin; Kokkinakis, Kostas; Williams, Brittany T.

    2018-01-01

    Purpose: The purpose of this study was to evaluate whether listeners with normal hearing perceiving noise-vocoded speech-in-speech demonstrate better intelligibility of target speech when the background speech was mismatched in language (linguistic release from masking [LRM]) and/or location (spatial release from masking [SRM]) relative to the…

  13. Neural correlates of gesture processing across human development.

    Science.gov (United States)

    Wakefield, Elizabeth M; James, Thomas W; James, Karin H

    2013-01-01

    Co-speech gesture facilitates learning to a greater degree in children than in adults, suggesting that the mechanisms underlying the processing of co-speech gesture differ as a function of development. We suggest that this may be partially due to children's lack of experience producing gesture, leading to differences in the recruitment of sensorimotor networks when comparing adults to children. Here, we investigated the neural substrates of gesture processing in a cross-sectional sample of 5-, 7.5-, and 10-year-old children and adults and focused on relative recruitment of a sensorimotor system that included the precentral gyrus (PCG) and the posterior middle temporal gyrus (pMTG). Children and adults were presented with videos in which communication occurred through different combinations of speech and gesture during a functional magnetic resonance imaging (fMRI) session. Results demonstrated that the PCG and pMTG were recruited to different extents in the two populations. We interpret these novel findings as supporting the idea that gesture perception (pMTG) is affected by a history of gesture production (PCG), revealing the importance of considering gesture processing as a sensorimotor process.

  14. State-of-the-art of applications of neural networks in the nuclear industry

    International Nuclear Information System (INIS)

    Zwingelstein, G.; Masson, M.H.

    1990-01-01

    Artificial neural net models have been extensively studied for many years in various laboratories to try to simulate with computer programs the human brain performances. The first applications were developed in the fields of speech and image recognition. The aims of these studies were mainly to classify rapidly patterns corrupted by noises or partly missing. Neural networks with the development of new net topologies and algorithms and parallel computing hardwares and softwares are to-day very promising for applications in many industries. In the introduction, this paper presents the anticipated benefits of the uses of neural networks for industrial applications. Then a brief overview of the main neural networks is provided. Finally a short review of neural networks applications in the nuclear industry is given. It covers domains such as: predictive maintenance for vibratory surveillance of rotating machinery, signal processing, operator guidance and eddy current inspection. In conclusion recommendations are made to use with efficiency neural networks for practical applications. In particular the need for supercomputing will be pinpointed. (author)

  15. Effect of an 8-week practice of externally triggered speech on basal ganglia activity of stuttering and fluent speakers.

    Science.gov (United States)

    Toyomura, Akira; Fujii, Tetsunoshin; Kuriki, Shinya

    2015-04-01

    The neural mechanisms underlying stuttering are not well understood. It is known that stuttering appears when persons who stutter speak in a self-paced manner, but speech fluency is temporarily increased when they speak in unison with external trigger such as a metronome. This phenomenon is very similar to the behavioral improvement by external pacing in patients with Parkinson's disease. Recent imaging studies have also suggested that the basal ganglia are involved in the etiology of stuttering. In addition, previous studies have shown that the basal ganglia are involved in self-paced movement. Then, the present study focused on the basal ganglia and explored whether long-term speech-practice using external triggers can induce modification of the basal ganglia activity of stuttering speakers. Our study of functional magnetic resonance imaging revealed that stuttering speakers possessed significantly lower activity in the basal ganglia than fluent speakers before practice, especially when their speech was self-paced. After an 8-week speech practice of externally triggered speech using a metronome, the significant difference in activity between the two groups disappeared. The cerebellar vermis of stuttering speakers showed significantly decreased activity during the self-paced speech in the second compared to the first experiment. The speech fluency and naturalness of the stuttering speakers were also improved. These results suggest that stuttering is associated with defective motor control during self-paced speech, and that the basal ganglia and the cerebellum are involved in an improvement of speech fluency of stuttering by the use of external trigger. Copyright © 2015 Elsevier Inc. All rights reserved.

  16. Performance of wavelet analysis and neural networks for pathological voices identification

    Science.gov (United States)

    Salhi, Lotfi; Talbi, Mourad; Abid, Sabeur; Cherif, Adnane

    2011-09-01

    Within the medical environment, diverse techniques exist to assess the state of the voice of the patient. The inspection technique is inconvenient for a number of reasons, such as its high cost, the duration of the inspection, and above all, the fact that it is an invasive technique. This study focuses on a robust, rapid and accurate system for automatic identification of pathological voices. This system employs non-invasive, non-expensive and fully automated method based on hybrid approach: wavelet transform analysis and neural network classifier. First, we present the results obtained in our previous study while using classic feature parameters. These results allow visual identification of pathological voices. Second, quantified parameters drifting from the wavelet analysis are proposed to characterise the speech sample. On the other hand, a system of multilayer neural networks (MNNs) has been developed which carries out the automatic detection of pathological voices. The developed method was evaluated using voice database composed of recorded voice samples (continuous speech) from normophonic or dysphonic speakers. The dysphonic speakers were patients of a National Hospital 'RABTA' of Tunis Tunisia and a University Hospital in Brussels, Belgium. Experimental results indicate a success rate ranging between 75% and 98.61% for discrimination of normal and pathological voices using the proposed parameters and neural network classifier. We also compared the average classification rate based on the MNN, Gaussian mixture model and support vector machines.

  17. Speech Perception and Short-Term Memory Deficits in Persistent Developmental Speech Disorder

    Science.gov (United States)

    Kenney, Mary Kay; Barac-Cikoja, Dragana; Finnegan, Kimberly; Jeffries, Neal; Ludlow, Christy L.

    2006-01-01

    Children with developmental speech disorders may have additional deficits in speech perception and/or short-term memory. To determine whether these are only transient developmental delays that can accompany the disorder in childhood or persist as part of the speech disorder, adults with a persistent familial speech disorder were tested on speech…

  18. Perceiving Speech Rhythm in Music: Listeners Classify Instrumental Songs According to Language of Origin

    Science.gov (United States)

    Hannon, Eric E.

    2009-01-01

    Recent evidence suggests that the musical rhythm of a particular culture may parallel the speech rhythm of that culture's language (Patel, A. D., & Daniele, J. R. (2003). "An empirical comparison of rhythm in language and music." "Cognition, 87," B35-B45). The present experiments aimed to determine whether listeners actually perceive such rhythmic…

  19. Automatic speech recognition (ASR) based approach for speech therapy of aphasic patients: A review

    Science.gov (United States)

    Jamal, Norezmi; Shanta, Shahnoor; Mahmud, Farhanahani; Sha'abani, MNAH

    2017-09-01

    This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system's library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.

  20. Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples

    DEFF Research Database (Denmark)

    Gajhede, Nicolai; Beck, Oliver; Purwins, Hendrik

    2016-01-01

    After having revolutionized image and speech processing, convolu- tional neural networks (CNN) are now starting to become more and more successful in music information retrieval as well. We compare four CNN types for classifying a dataset of more than 3000 acoustic and synthesized samples...

  1. Cell of Origin and Cancer Stem Cell Phenotype in Medulloblastomas

    Science.gov (United States)

    2017-09-01

    AWARD NUMBER: W81XWH-14-1-0115 TITLE: Cell of Origin and Cancer Stem Cell Phenotype in Medulloblastomas PRINCIPAL INVESTIGATOR: Kyuson Yun...CA130273 - Cell of Origin and Cancer Stem Cell Phenotype in Medulloblastomas 5a. CONTRACT NUMBER 5b. GRANT NUMBER W81XWH-14-1-0115 5c. PROGRAM...hypothesis, we originally proposed to transform neural stem cells (NSCs) and neural progenitor cells (NPCs) in vivo by expressing an activated form

  2. Speech disorders in Parkinson's disease: early diagnostics and effects of medication and brain stimulation.

    Science.gov (United States)

    Brabenec, L; Mekyska, J; Galaz, Z; Rektorova, Irena

    2017-03-01

    Hypokinetic dysarthria (HD) occurs in 90% of Parkinson's disease (PD) patients. It manifests specifically in the areas of articulation, phonation, prosody, speech fluency, and faciokinesis. We aimed to systematically review papers on HD in PD with a special focus on (1) early PD diagnosis and monitoring of the disease progression using acoustic voice and speech analysis, and (2) functional imaging studies exploring neural correlates of HD in PD, and (3) clinical studies using acoustic analysis to evaluate effects of dopaminergic medication and brain stimulation. A systematic literature search of articles written in English before March 2016 was conducted in the Web of Science, PubMed, SpringerLink, and IEEE Xplore databases using and combining specific relevant keywords. Articles were categorized into three groups: (1) articles focused on neural correlates of HD in PD using functional imaging (n = 13); (2) articles dealing with the acoustic analysis of HD in PD (n = 52); and (3) articles concerning specifically dopaminergic and brain stimulation-related effects as assessed by acoustic analysis (n = 31); the groups were then reviewed. We identified 14 combinations of speech tasks and acoustic features that can be recommended for use in describing the main features of HD in PD. While only a few acoustic parameters correlate with limb motor symptoms and can be partially relieved by dopaminergic medication, HD in PD seems to be mainly related to non-dopaminergic deficits and associated particularly with non-motor symptoms. Future studies should combine non-invasive brain stimulation with voice behavior approaches to achieve the best treatment effects by enhancing auditory-motor integration.

  3. Speech and Language Delay

    Science.gov (United States)

    ... OTC Relief for Diarrhea Home Diseases and Conditions Speech and Language Delay Condition Speech and Language Delay Share Print Table of Contents1. ... Treatment6. Everyday Life7. Questions8. Resources What is a speech and language delay? A speech and language delay ...

  4. Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech Emotion Recognition

    Directory of Open Access Journals (Sweden)

    Michalis Papakostas

    2017-06-01

    Full Text Available Emotion recognition from speech may play a crucial role in many applications related to human–computer interaction or understanding the affective state of users in certain tasks, where other modalities such as video or physiological parameters are unavailable. In general, a human’s emotions may be recognized using several modalities such as analyzing facial expressions, speech, physiological parameters (e.g., electroencephalograms, electrocardiograms etc. However, measuring of these modalities may be difficult, obtrusive or require expensive hardware. In that context, speech may be the best alternative modality in many practical applications. In this work we present an approach that uses a Convolutional Neural Network (CNN functioning as a visual feature extractor and trained using raw speech information. In contrast to traditional machine learning approaches, CNNs are responsible for identifying the important features of the input thus, making the need of hand-crafted feature engineering optional in many tasks. In this paper no extra features are required other than the spectrogram representations and hand-crafted features were only extracted for validation purposes of our method. Moreover, it does not require any linguistic model and is not specific to any particular language. We compare the proposed approach using cross-language datasets and demonstrate that it is able to provide superior results vs. traditional ones that use hand-crafted features.

  5. Plasticity in the Human Speech Motor System Drives Changes in Speech Perception

    Science.gov (United States)

    Lametti, Daniel R.; Rochet-Capellan, Amélie; Neufeld, Emily; Shiller, Douglas M.

    2014-01-01

    Recent studies of human speech motor learning suggest that learning is accompanied by changes in auditory perception. But what drives the perceptual change? Is it a consequence of changes in the motor system? Or is it a result of sensory inflow during learning? Here, subjects participated in a speech motor-learning task involving adaptation to altered auditory feedback and they were subsequently tested for perceptual change. In two separate experiments, involving two different auditory perceptual continua, we show that changes in the speech motor system that accompany learning drive changes in auditory speech perception. Specifically, we obtained changes in speech perception when adaptation to altered auditory feedback led to speech production that fell into the phonetic range of the speech perceptual tests. However, a similar change in perception was not observed when the auditory feedback that subjects' received during learning fell into the phonetic range of the perceptual tests. This indicates that the central motor outflow associated with vocal sensorimotor adaptation drives changes to the perceptual classification of speech sounds. PMID:25080594

  6. Hearing and seeing meaning in noise: Alpha, beta, and gamma oscillations predict gestural enhancement of degraded speech comprehension.

    Science.gov (United States)

    Drijvers, Linda; Özyürek, Asli; Jensen, Ole

    2018-05-01

    During face-to-face communication, listeners integrate speech with gestures. The semantic information conveyed by iconic gestures (e.g., a drinking gesture) can aid speech comprehension in adverse listening conditions. In this magnetoencephalography (MEG) study, we investigated the spatiotemporal neural oscillatory activity associated with gestural enhancement of degraded speech comprehension. Participants watched videos of an actress uttering clear or degraded speech, accompanied by a gesture or not and completed a cued-recall task after watching every video. When gestures semantically disambiguated degraded speech comprehension, an alpha and beta power suppression and a gamma power increase revealed engagement and active processing in the hand-area of the motor cortex, the extended language network (LIFG/pSTS/STG/MTG), medial temporal lobe, and occipital regions. These observed low- and high-frequency oscillatory modulations in these areas support general unification, integration and lexical access processes during online language comprehension, and simulation of and increased visual attention to manual gestures over time. All individual oscillatory power modulations associated with gestural enhancement of degraded speech comprehension predicted a listener's correct disambiguation of the degraded verb after watching the videos. Our results thus go beyond the previously proposed role of oscillatory dynamics in unimodal degraded speech comprehension and provide first evidence for the role of low- and high-frequency oscillations in predicting the integration of auditory and visual information at a semantic level. © 2018 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.

  7. Hearing feelings: affective categorization of music and speech in alexithymia, an ERP study.

    Directory of Open Access Journals (Sweden)

    Katharina Sophia Goerlich

    Full Text Available BACKGROUND: Alexithymia, a condition characterized by deficits in interpreting and regulating feelings, is a risk factor for a variety of psychiatric conditions. Little is known about how alexithymia influences the processing of emotions in music and speech. Appreciation of such emotional qualities in auditory material is fundamental to human experience and has profound consequences for functioning in daily life. We investigated the neural signature of such emotional processing in alexithymia by means of event-related potentials. METHODOLOGY: Affective music and speech prosody were presented as targets following affectively congruent or incongruent visual word primes in two conditions. In two further conditions, affective music and speech prosody served as primes and visually presented words with affective connotations were presented as targets. Thirty-two participants (16 male judged the affective valence of the targets. We tested the influence of alexithymia on cross-modal affective priming and on N400 amplitudes, indicative of individual sensitivity to an affective mismatch between words, prosody, and music. Our results indicate that the affective priming effect for prosody targets tended to be reduced with increasing scores on alexithymia, while no behavioral differences were observed for music and word targets. At the electrophysiological level, alexithymia was associated with significantly smaller N400 amplitudes in response to affectively incongruent music and speech targets, but not to incongruent word targets. CONCLUSIONS: Our results suggest a reduced sensitivity for the emotional qualities of speech and music in alexithymia during affective categorization. This deficit becomes evident primarily in situations in which a verbalization of emotional information is required.

  8. Hearing Feelings: Affective Categorization of Music and Speech in Alexithymia, an ERP Study

    Science.gov (United States)

    Goerlich, Katharina Sophia; Witteman, Jurriaan; Aleman, André; Martens, Sander

    2011-01-01

    Background Alexithymia, a condition characterized by deficits in interpreting and regulating feelings, is a risk factor for a variety of psychiatric conditions. Little is known about how alexithymia influences the processing of emotions in music and speech. Appreciation of such emotional qualities in auditory material is fundamental to human experience and has profound consequences for functioning in daily life. We investigated the neural signature of such emotional processing in alexithymia by means of event-related potentials. Methodology Affective music and speech prosody were presented as targets following affectively congruent or incongruent visual word primes in two conditions. In two further conditions, affective music and speech prosody served as primes and visually presented words with affective connotations were presented as targets. Thirty-two participants (16 male) judged the affective valence of the targets. We tested the influence of alexithymia on cross-modal affective priming and on N400 amplitudes, indicative of individual sensitivity to an affective mismatch between words, prosody, and music. Our results indicate that the affective priming effect for prosody targets tended to be reduced with increasing scores on alexithymia, while no behavioral differences were observed for music and word targets. At the electrophysiological level, alexithymia was associated with significantly smaller N400 amplitudes in response to affectively incongruent music and speech targets, but not to incongruent word targets. Conclusions Our results suggest a reduced sensitivity for the emotional qualities of speech and music in alexithymia during affective categorization. This deficit becomes evident primarily in situations in which a verbalization of emotional information is required. PMID:21573026

  9. Childhood apraxia of speech: A survey of praxis and typical speech characteristics.

    Science.gov (United States)

    Malmenholt, Ann; Lohmander, Anette; McAllister, Anita

    2017-07-01

    The purpose of this study was to investigate current knowledge of the diagnosis childhood apraxia of speech (CAS) in Sweden and compare speech characteristics and symptoms to those of earlier survey findings in mainly English-speakers. In a web-based questionnaire 178 Swedish speech-language pathologists (SLPs) anonymously answered questions about their perception of typical speech characteristics for CAS. They graded own assessment skills and estimated clinical occurrence. The seven top speech characteristics reported as typical for children with CAS were: inconsistent speech production (85%), sequencing difficulties (71%), oro-motor deficits (63%), vowel errors (62%), voicing errors (61%), consonant cluster deletions (54%), and prosodic disturbance (53%). Motor-programming deficits described as lack of automatization of speech movements were perceived by 82%. All listed characteristics were consistent with the American Speech-Language-Hearing Association (ASHA) consensus-based features, Strand's 10-point checklist, and the diagnostic model proposed by Ozanne. The mode for clinical occurrence was 5%. Number of suspected cases of CAS in the clinical caseload was approximately one new patient/year and SLP. The results support and add to findings from studies of CAS in English-speaking children with similar speech characteristics regarded as typical. Possibly, these findings could contribute to cross-linguistic consensus on CAS characteristics.

  10. A Translational Approach to Vocalization Deficits and Neural Recovery after Behavioral Treatment in Parkinson Disease

    Science.gov (United States)

    Ciucci, Michelle R.; Vinney, Lisa; Wahoske, Emerald J.; Connor, Nadine P.

    2010-01-01

    Parkinson disease is characterized by a complex neuropathological profile that primarily affects dopaminergic neural pathways in the basal ganglia, including pathways that modulate cranial sensorimotor functions such as swallowing, voice and speech. Prior work from our lab has shown that the rat model of unilateral 6-hydroxydopamine infusion to…

  11. Application of an artificial neural network in the enumeration of yeasts and bacteria adhering to solid substrata

    NARCIS (Netherlands)

    Wit, P; Busscher, HJ

    Artificial neural networks (ANNs) combined with automated image processing are bring used in a growing number of applications, ranging from car license plate identification to speech recognition. ANN analysis is capable of handling complicated images that cannot be dealt with using conventional

  12. Reference-Free Assessment of Speech Intelligibility Using Bispectrum of an Auditory Neurogram

    Science.gov (United States)

    Hossain, Mohammad E.; Jassim, Wissam A.; Zilany, Muhammad S. A.

    2016-01-01

    Sensorineural hearing loss occurs due to damage to the inner and outer hair cells of the peripheral auditory system. Hearing loss can cause decreases in audibility, dynamic range, frequency and temporal resolution of the auditory system, and all of these effects are known to affect speech intelligibility. In this study, a new reference-free speech intelligibility metric is proposed using 2-D neurograms constructed from the output of a computational model of the auditory periphery. The responses of the auditory-nerve fibers with a wide range of characteristic frequencies were simulated to construct neurograms. The features of the neurograms were extracted using third-order statistics referred to as bispectrum. The phase coupling of neurogram bispectrum provides a unique insight for the presence (or deficit) of supra-threshold nonlinearities beyond audibility for listeners with normal hearing (or hearing loss). The speech intelligibility scores predicted by the proposed method were compared to the behavioral scores for listeners with normal hearing and hearing loss both in quiet and under noisy background conditions. The results were also compared to the performance of some existing methods. The predicted results showed a good fit with a small error suggesting that the subjective scores can be estimated reliably using the proposed neural-response-based metric. The proposed metric also had a wide dynamic range, and the predicted scores were well-separated as a function of hearing loss. The proposed metric successfully captures the effects of hearing loss and supra-threshold nonlinearities on speech intelligibility. This metric could be applied to evaluate the performance of various speech-processing algorithms designed for hearing aids and cochlear implants. PMID:26967160

  13. Speech-specific audiovisual perception affects identification but not detection of speech

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate...... of audiovisual integration specific to speech perception. However, the results of Tuomainen et al. might have been influenced by another effect. When observers were naïve, they had little motivation to look at the face. When informed, they knew that the face was relevant for the task and this could increase...... visual detection task. In our first experiment, observers presented with congruent and incongruent audiovisual sine-wave speech stimuli did only show a McGurk effect when informed of the speech nature of the stimulus. Performance on the secondary visual task was very good, thus supporting the finding...

  14. The Interpretation Of Speech Code In A Communication Ethnographic Context For Outsider Students Of Graduate Communication Science Universitas Sumatera Utara In Medan

    Directory of Open Access Journals (Sweden)

    Fauzi Eka Putra

    2017-06-01

    Full Text Available Interpreting the typical Medan speech code is something unique and distinctive which could create confusion for the outsider students because of the speech code similarities and differences in Medan. Therefore the graduate students of communication science Universitas Sumatera Utara whose originated from outside of North Sumatera needs to learn comprehend and aware in order to perform effective communication. The purpose of this research is to discover how the interpretation of speech code for the graduate students of communication science Universitas Sumatera Utara whose originated from outside of North Sumatera in adapting themselves in Medan. This research uses qualitative method with the study of ethnography and acculturation communication. The subject of this research is the graduate students of communication science Universitas Sumatera Utara whose originated from outside of North Sumatera in adapting themselves in Medan. Data were collected through interviews observation and documentation. The conclusion of this research shows that speech code interpretation by students from outside of North Sumatera in adapting themselves in Medan leads to an acculturation process of assimilation and integration.

  15. Speech-Language Therapy (For Parents)

    Science.gov (United States)

    ... Staying Safe Videos for Educators Search English Español Speech-Language Therapy KidsHealth / For Parents / Speech-Language Therapy ... most kids with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech ...

  16. Digital speech processing using Matlab

    CERN Document Server

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  17. Hemispheric lateralization of linguistic prosody recognition in comparison to speech and speaker recognition.

    Science.gov (United States)

    Kreitewolf, Jens; Friederici, Angela D; von Kriegstein, Katharina

    2014-11-15

    Hemispheric specialization for linguistic prosody is a controversial issue. While it is commonly assumed that linguistic prosody and emotional prosody are preferentially processed in the right hemisphere, neuropsychological work directly comparing processes of linguistic prosody and emotional prosody suggests a predominant role of the left hemisphere for linguistic prosody processing. Here, we used two functional magnetic resonance imaging (fMRI) experiments to clarify the role of left and right hemispheres in the neural processing of linguistic prosody. In the first experiment, we sought to confirm previous findings showing that linguistic prosody processing compared to other speech-related processes predominantly involves the right hemisphere. Unlike previous studies, we controlled for stimulus influences by employing a prosody and speech task using the same speech material. The second experiment was designed to investigate whether a left-hemispheric involvement in linguistic prosody processing is specific to contrasts between linguistic prosody and emotional prosody or whether it also occurs when linguistic prosody is contrasted against other non-linguistic processes (i.e., speaker recognition). Prosody and speaker tasks were performed on the same stimulus material. In both experiments, linguistic prosody processing was associated with activity in temporal, frontal, parietal and cerebellar regions. Activation in temporo-frontal regions showed differential lateralization depending on whether the control task required recognition of speech or speaker: recognition of linguistic prosody predominantly involved right temporo-frontal areas when it was contrasted against speech recognition; when contrasted against speaker recognition, recognition of linguistic prosody predominantly involved left temporo-frontal areas. The results show that linguistic prosody processing involves functions of both hemispheres and suggest that recognition of linguistic prosody is based on

  18. Precision of working memory for speech sounds.

    Science.gov (United States)

    Joseph, Sabine; Iverson, Paul; Manohar, Sanjay; Fox, Zoe; Scott, Sophie K; Husain, Masud

    2015-01-01

    Memory for speech sounds is a key component of models of verbal working memory (WM). But how good is verbal WM? Most investigations assess this using binary report measures to derive a fixed number of items that can be stored. However, recent findings in visual WM have challenged such "quantized" views by employing measures of recall precision with an analogue response scale. WM for speech sounds might rely on both continuous and categorical storage mechanisms. Using a novel speech matching paradigm, we measured WM recall precision for phonemes. Vowel qualities were sampled from a formant space continuum. A probe vowel had to be adjusted to match the vowel quality of a target on a continuous, analogue response scale. Crucially, this provided an index of the variability of a memory representation around its true value and thus allowed us to estimate how memories were distorted from the original sounds. Memory load affected the quality of speech sound recall in two ways. First, there was a gradual decline in recall precision with increasing number of items, consistent with the view that WM representations of speech sounds become noisier with an increase in the number of items held in memory, just as for vision. Based on multidimensional scaling (MDS), the level of noise appeared to be reflected in distortions of the formant space. Second, as memory load increased, there was evidence of greater clustering of participants' responses around particular vowels. A mixture model captured both continuous and categorical responses, demonstrating a shift from continuous to categorical memory with increasing WM load. This suggests that direct acoustic storage can be used for single items, but when more items must be stored, categorical representations must be used.

  19. Lexical decoder for continuous speech recognition: sequential neural network approach

    International Nuclear Information System (INIS)

    Iooss, Christine

    1991-01-01

    The work presented in this dissertation concerns the study of a connectionist architecture to treat sequential inputs. In this context, the model proposed by J.L. Elman, a recurrent multilayers network, is used. Its abilities and its limits are evaluated. Modifications are done in order to treat erroneous or noisy sequential inputs and to classify patterns. The application context of this study concerns the realisation of a lexical decoder for analytical multi-speakers continuous speech recognition. Lexical decoding is completed from lattices of phonemes which are obtained after an acoustic-phonetic decoding stage relying on a K Nearest Neighbors search technique. Test are done on sentences formed from a lexicon of 20 words. The results are obtained show the ability of the proposed connectionist model to take into account the sequentiality at the input level, to memorize the context and to treat noisy or erroneous inputs. (author) [fr

  20. Neurophysiology and neural engineering: a review.

    Science.gov (United States)

    Prochazka, Arthur

    2017-08-01

    Neurophysiology is the branch of physiology concerned with understanding the function of neural systems. Neural engineering (also known as neuroengineering) is a discipline within biomedical engineering that uses engineering techniques to understand, repair, replace, enhance, or otherwise exploit the properties and functions of neural systems. In most cases neural engineering involves the development of an interface between electronic devices and living neural tissue. This review describes the origins of neural engineering, the explosive development of methods and devices commencing in the late 1950s, and the present-day devices that have resulted. The barriers to interfacing electronic devices with living neural tissues are many and varied, and consequently there have been numerous stops and starts along the way. Representative examples are discussed. None of this could have happened without a basic understanding of the relevant neurophysiology. I also consider examples of how neural engineering is repaying the debt to basic neurophysiology with new knowledge and insight. Copyright © 2017 the American Physiological Society.

  1. Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems.

    Science.gov (United States)

    Greene, Beth G; Logan, John S; Pisoni, David B

    1986-03-01

    We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.

  2. Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems

    Science.gov (United States)

    GREENE, BETH G.; LOGAN, JOHN S.; PISONI, DAVID B.

    2012-01-01

    We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered. PMID:23225916

  3. The speech perception skills of children with and without speech sound disorder.

    Science.gov (United States)

    Hearnshaw, Stephanie; Baker, Elise; Munro, Natalie

    To investigate whether Australian-English speaking children with and without speech sound disorder (SSD) differ in their overall speech perception accuracy. Additionally, to investigate differences in the perception of specific phonemes and the association between speech perception and speech production skills. Twenty-five Australian-English speaking children aged 48-60 months participated in this study. The SSD group included 12 children and the typically developing (TD) group included 13 children. Children completed routine speech and language assessments in addition to an experimental Australian-English lexical and phonetic judgement task based on Rvachew's Speech Assessment and Interactive Learning System (SAILS) program (Rvachew, 2009). This task included eight words across four word-initial phonemes-/k, ɹ, ʃ, s/. Children with SSD showed significantly poorer perceptual accuracy on the lexical and phonetic judgement task compared with TD peers. The phonemes /ɹ/ and /s/ were most frequently perceived in error across both groups. Additionally, the phoneme /ɹ/ was most commonly produced in error. There was also a positive correlation between overall speech perception and speech production scores. Children with SSD perceived speech less accurately than their typically developing peers. The findings suggest that an Australian-English variation of a lexical and phonetic judgement task similar to the SAILS program is promising and worthy of a larger scale study. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Speech Matters

    DEFF Research Database (Denmark)

    Hasse Jørgensen, Stina

    2011-01-01

    About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....

  5. Neural Correlates of Early Sound Encoding and their Relationship to Speech-in-Noise Perception

    Directory of Open Access Journals (Sweden)

    Emily B. J. Coffey

    2017-08-01

    Full Text Available Speech-in-noise (SIN perception is a complex cognitive skill that affects social, vocational, and educational activities. Poor SIN ability particularly affects young and elderly populations, yet varies considerably even among healthy young adults with normal hearing. Although SIN skills are known to be influenced by top-down processes that can selectively enhance lower-level sound representations, the complementary role of feed-forward mechanisms and their relationship to musical training is poorly understood. Using a paradigm that minimizes the main top-down factors that have been implicated in SIN performance such as working memory, we aimed to better understand how robust encoding of periodicity in the auditory system (as measured by the frequency-following response contributes to SIN perception. Using magnetoencephalograpy, we found that the strength of encoding at the fundamental frequency in the brainstem, thalamus, and cortex is correlated with SIN accuracy. The amplitude of the slower cortical P2 wave was previously also shown to be related to SIN accuracy and FFR strength; we use MEG source localization to show that the P2 wave originates in a temporal region anterior to that of the cortical FFR. We also confirm that the observed enhancements were related to the extent and timing of musicianship. These results are consistent with the hypothesis that basic feed-forward sound encoding affects SIN perception by providing better information to later processing stages, and that modifying this process may be one mechanism through which musical training might enhance the auditory networks that subserve both musical and language functions.

  6. Eyes and ears: Using eye tracking and pupillometry to understand challenges to speech recognition.

    Science.gov (United States)

    Van Engen, Kristin J; McLaughlin, Drew J

    2018-05-04

    Although human speech recognition is often experienced as relatively effortless, a number of common challenges can render the task more difficult. Such challenges may originate in talkers (e.g., unfamiliar accents, varying speech styles), the environment (e.g. noise), or in listeners themselves (e.g., hearing loss, aging, different native language backgrounds). Each of these challenges can reduce the intelligibility of spoken language, but even when intelligibility remains high, they can place greater processing demands on listeners. Noisy conditions, for example, can lead to poorer recall for speech, even when it has been correctly understood. Speech intelligibility measures, memory tasks, and subjective reports of listener difficulty all provide critical information about the effects of such challenges on speech recognition. Eye tracking and pupillometry complement these methods by providing objective physiological measures of online cognitive processing during listening. Eye tracking records the moment-to-moment direction of listeners' visual attention, which is closely time-locked to unfolding speech signals, and pupillometry measures the moment-to-moment size of listeners' pupils, which dilate in response to increased cognitive load. In this paper, we review the uses of these two methods for studying challenges to speech recognition. Copyright © 2018. Published by Elsevier B.V.

  7. Why do I hear but not understand? Stochastic undersampling as a model of degraded neural encoding of speech

    Directory of Open Access Journals (Sweden)

    Enrique A Lopez-Poveda

    2014-10-01

    Full Text Available Hearing impairment is a serious disease with increasing prevalence. It is defined based on increased audiometric thresholds but increased thresholds are only partly responsible for the greater difficulty understanding speech in noisy environments experienced by some older listeners or by hearing-impaired listeners. Identifying the additional factors and mechanisms that impair intelligibility is fundamental to understanding hearing impairment but these factors remain uncertain. Traditionally, these additional factors have been sought in the way the speech spectrum is encoded in the pattern of impaired mechanical cochlear responses. Recent studies, however, are steering the focus toward impaired encoding of the speech waveform in the auditory nerve. In our recent work, we gave evidence that a significant factor might be the loss of afferent auditory nerve fibers, a pathology that comes with aging or noise overexposure. Our approach was based on a signal-processing analogy whereby the auditory nerve may be regarded as a stochastic sampler of the sound waveform and deafferentation may be described in terms of waveform undersampling. We showed that stochastic undersampling simultaneously degrades the encoding of soft and rapid waveform features, and that this degrades speech intelligibility in noise more than in quiet without significant increases in audiometric thresholds. Here, we review our recent work in a broader context and argue that the stochastic undersampling analogy may be extended to study the perceptual consequences of various different hearing pathologies and their treatment.

  8. Hate speech

    Directory of Open Access Journals (Sweden)

    Anne Birgitta Nilsen

    2014-12-01

    Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the

  9. Speech Inconsistency in Children with Childhood Apraxia of Speech, Language Impairment, and Speech Delay: Depends on the Stimuli

    Science.gov (United States)

    Iuzzini-Seigel, Jenya; Hogan, Tiffany P.; Green, Jordan R.

    2017-01-01

    Purpose: The current research sought to determine (a) if speech inconsistency is a core feature of childhood apraxia of speech (CAS) or if it is driven by comorbid language impairment that affects a large subset of children with CAS and (b) if speech inconsistency is a sensitive and specific diagnostic marker that can differentiate between CAS and…

  10. Acoustic-phonetic and artificial neural network feature analysis to assess speech quality of stop consonants produced by patients treated for oral or oropharyngeal cancer

    NARCIS (Netherlands)

    de Bruijn, Marieke J.; ten Bosch, Louis; Kuik, Dirk J.; Witte, Birgit I.; Langendijk, Johannes A.; Leemans, C. Rene; Verdonck-de Leeuw, Irma M.

    Speech impairment often occurs in patients after treatment for head and neck cancer. A specific speech characteristic that influences intelligibility and speech quality is voice-onset-time (VOT) in stop consonants. VOT is one of the functionally most relevant parameters that distinguishes voiced and

  11. Neural Architectures for Control

    Science.gov (United States)

    Peterson, James K.

    1991-01-01

    The cerebellar model articulated controller (CMAC) neural architectures are shown to be viable for the purposes of real-time learning and control. Software tools for the exploration of CMAC performance are developed for three hardware platforms, the MacIntosh, the IBM PC, and the SUN workstation. All algorithm development was done using the C programming language. These software tools were then used to implement an adaptive critic neuro-control design that learns in real-time how to back up a trailer truck. The truck backer-upper experiment is a standard performance measure in the neural network literature, but previously the training of the controllers was done off-line. With the CMAC neural architectures, it was possible to train the neuro-controllers on-line in real-time on a MS-DOS PC 386. CMAC neural architectures are also used in conjunction with a hierarchical planning approach to find collision-free paths over 2-D analog valued obstacle fields. The method constructs a coarse resolution version of the original problem and then finds the corresponding coarse optimal path using multipass dynamic programming. CMAC artificial neural architectures are used to estimate the analog transition costs that dynamic programming requires. The CMAC architectures are trained in real-time for each obstacle field presented. The coarse optimal path is then used as a baseline for the construction of a fine scale optimal path through the original obstacle array. These results are a very good indication of the potential power of the neural architectures in control design. In order to reach as wide an audience as possible, we have run a seminar on neuro-control that has met once per week since 20 May 1991. This seminar has thoroughly discussed the CMAC architecture, relevant portions of classical control, back propagation through time, and adaptive critic designs.

  12. Clear Speech - Mere Speech? How segmental and prosodic speech reduction shape the impression that speakers create on listeners

    DEFF Research Database (Denmark)

    Niebuhr, Oliver

    2017-01-01

    of reduction levels and perceived speaker attributes in which moderate reduction can make a better impression on listeners than no reduction. In addition to its relevance in reduction models and theories, this interplay is instructive for various fields of speech application from social robotics to charisma...... whether variation in the degree of reduction also has a systematic effect on the attributes we ascribe to the speaker who produces the speech signal. A perception experiment was carried out for German in which 46 listeners judged whether or not speakers showing 3 different combinations of segmental...... and prosodic reduction levels (unreduced, moderately reduced, strongly reduced) are appropriately described by 13 physical, social, and cognitive attributes. The experiment shows that clear speech is not mere speech, and less clear speech is not just reduced either. Rather, results revealed a complex interplay...

  13. Audiovisual Temporal Recalibration for Speech in Synchrony Perception and Speech Identification

    Science.gov (United States)

    Asakawa, Kaori; Tanaka, Akihiro; Imai, Hisato

    We investigated whether audiovisual synchrony perception for speech could change after observation of the audiovisual temporal mismatch. Previous studies have revealed that audiovisual synchrony perception is re-calibrated after exposure to a constant timing difference between auditory and visual signals in non-speech. In the present study, we examined whether this audiovisual temporal recalibration occurs at the perceptual level even for speech (monosyllables). In Experiment 1, participants performed an audiovisual simultaneity judgment task (i.e., a direct measurement of the audiovisual synchrony perception) in terms of the speech signal after observation of the speech stimuli which had a constant audiovisual lag. The results showed that the “simultaneous” responses (i.e., proportion of responses for which participants judged the auditory and visual stimuli to be synchronous) at least partly depended on exposure lag. In Experiment 2, we adopted the McGurk identification task (i.e., an indirect measurement of the audiovisual synchrony perception) to exclude the possibility that this modulation of synchrony perception was solely attributable to the response strategy using stimuli identical to those of Experiment 1. The characteristics of the McGurk effect reported by participants depended on exposure lag. Thus, it was shown that audiovisual synchrony perception for speech could be modulated following exposure to constant lag both in direct and indirect measurement. Our results suggest that temporal recalibration occurs not only in non-speech signals but also in monosyllabic speech at the perceptual level.

  14. Neural network wavelet technology: A frontier of automation

    Science.gov (United States)

    Szu, Harold

    1994-01-01

    Neural networks are an outgrowth of interdisciplinary studies concerning the brain. These studies are guiding the field of Artificial Intelligence towards the, so-called, 6th Generation Computer. Enormous amounts of resources have been poured into R/D. Wavelet Transforms (WT) have replaced Fourier Transforms (FT) in Wideband Transient (WT) cases since the discovery of WT in 1985. The list of successful applications includes the following: earthquake prediction; radar identification; speech recognition; stock market forecasting; FBI finger print image compression; and telecommunication ISDN-data compression.

  15. Under-resourced speech recognition based on the speech manifold

    CSIR Research Space (South Africa)

    Sahraeian, R

    2015-09-01

    Full Text Available Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a...

  16. PRACTICING SPEECH THERAPY INTERVENTION FOR SOCIAL INTEGRATION OF CHILDREN WITH SPEECH DISORDERS

    Directory of Open Access Journals (Sweden)

    Martin Ofelia POPESCU

    2016-11-01

    Full Text Available The article presents a concise speech correction intervention program in of dyslalia in conjunction with capacity development of intra, interpersonal and social integration of children with speech disorders. The program main objectives represent: the potential increasing of individual social integration by correcting speech disorders in conjunction with intra- and interpersonal capacity, the potential growth of children and community groups for social integration by optimizing the socio-relational context of children with speech disorder. In the program were included 60 children / students with dyslalia speech disorders (monomorphic and polymorphic dyslalia, from 11 educational institutions - 6 kindergartens and 5 schools / secondary schools, joined with inter-school logopedic centre (CLI from Targu Jiu city and areas of Gorj district. The program was implemented under the assumption that therapeutic-formative intervention to correct speech disorders and facilitate the social integration will lead, in combination with correct pronunciation disorders, to social integration optimization of children with speech disorders. The results conirm the hypothesis and gives facts about the intervention program eficiency.

  17. Schizophrenia alters intra-network functional connectivity in the caudate for detecting speech under informational speech masking conditions.

    Science.gov (United States)

    Zheng, Yingjun; Wu, Chao; Li, Juanhua; Li, Ruikeng; Peng, Hongjun; She, Shenglin; Ning, Yuping; Li, Liang

    2018-04-04

    Speech recognition under noisy "cocktail-party" environments involves multiple perceptual/cognitive processes, including target detection, selective attention, irrelevant signal inhibition, sensory/working memory, and speech production. Compared to health listeners, people with schizophrenia are more vulnerable to masking stimuli and perform worse in speech recognition under speech-on-speech masking conditions. Although the schizophrenia-related speech-recognition impairment under "cocktail-party" conditions is associated with deficits of various perceptual/cognitive processes, it is crucial to know whether the brain substrates critically underlying speech detection against informational speech masking are impaired in people with schizophrenia. Using functional magnetic resonance imaging (fMRI), this study investigated differences between people with schizophrenia (n = 19, mean age = 33 ± 10 years) and their matched healthy controls (n = 15, mean age = 30 ± 9 years) in intra-network functional connectivity (FC) specifically associated with target-speech detection under speech-on-speech-masking conditions. The target-speech detection performance under the speech-on-speech-masking condition in participants with schizophrenia was significantly worse than that in matched healthy participants (healthy controls). Moreover, in healthy controls, but not participants with schizophrenia, the strength of intra-network FC within the bilateral caudate was positively correlated with the speech-detection performance under the speech-masking conditions. Compared to controls, patients showed altered spatial activity pattern and decreased intra-network FC in the caudate. In people with schizophrenia, the declined speech-detection performance under speech-on-speech masking conditions is associated with reduced intra-caudate functional connectivity, which normally contributes to detecting target speech against speech masking via its functions of suppressing masking-speech signals.

  18. Speech disorder prevention

    Directory of Open Access Journals (Sweden)

    Miladis Fornaris-Méndez

    2017-04-01

    Full Text Available Language therapy has trafficked from a medical focus until a preventive focus. However, difficulties are evidenced in the development of this last task, because he is devoted bigger space to the correction of the disorders of the language. Because the speech disorders is the dysfunction with more frequently appearance, acquires special importance the preventive work that is developed to avoid its appearance. Speech education since early age of the childhood makes work easier for prevent the appearance of speech disorders in the children. The present work has as objective to offer different activities for the prevention of the speech disorders.

  19. Speech and Speech-Related Quality of Life After Late Palate Repair: A Patient's Perspective.

    Science.gov (United States)

    Schönmeyr, Björn; Wendby, Lisa; Sharma, Mitali; Jacobson, Lia; Restrepo, Carolina; Campbell, Alex

    2015-07-01

    Many patients with cleft palate deformities worldwide receive treatment at a later age than is recommended for normal speech to develop. The outcomes after late palate repairs in terms of speech and quality of life (QOL) still remain largely unstudied. In the current study, questionnaires were used to assess the patients' perception of speech and QOL before and after primary palate repair. All of the patients were operated at a cleft center in northeast India and had a cleft palate with a normal lip or with a cleft lip that had been previously repaired. A total of 134 patients (7-35 years) were interviewed preoperatively and 46 patients (7-32 years) were assessed in the postoperative survey. The survey showed that scores based on the speech handicap index, concerning speech and speech-related QOL, did not improve postoperatively. In fact, the questionnaires indicated that the speech became more unpredictable (P reported that their self-confidence had improved after the operation. Thus, the majority of interviewed patients who underwent late primary palate repair were satisfied with the surgery. At the same time, speech and speech-related QOL did not improve according to the speech handicap index-based survey. Speech predictability may even become worse and nasal regurgitation may increase after late palate repair, according to these results.

  20. Visual Speech Fills in Both Discrimination and Identification of Non-Intact Auditory Speech in Children

    Science.gov (United States)

    Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Herve

    2018-01-01

    To communicate, children must discriminate and identify speech sounds. Because visual speech plays an important role in this process, we explored how visual speech influences phoneme discrimination and identification by children. Critical items had intact visual speech (e.g. baez) coupled to non-intact (excised onsets) auditory speech (signified…

  1. James Weldon Johnson and the Speech Lab Recordings

    Directory of Open Access Journals (Sweden)

    Chris Mustazza

    2016-03-01

    Full Text Available On December 24, 1935, James Weldon Johnson read thirteen of his poems at Columbia University, in a recording session engineered by Columbia Professor of Speech George W. Hibbitt and Barnard colleague Professor W. Cabell Greet, pioneers in the field that became sociolinguistics. Interested in American dialects, Greet and Hibbitt used early sound recording technologies to preserve dialect samples. In the same lab where they recorded T.S. Eliot, Gertrude Stein, and others, James Weldon Johnson read a selection of poems that included several from his seminal collection God’s Trombones and some dialect poems. Mustazza has digitized these and made them publicly available in the PennSound archive. In this essay, Mustazza contextualizes the collection, considering the recordings as sonic inscriptions alongside their textual manifestations. He argues that the collection must be heard within the frames of its production conditions—especially its recording in a speech lab—and that the sound recordings are essential elements in an hermeneutic analysis of the poems. He reasons that the poems’ original topics are reframed and refocused when historicized and contextualized within the frame of The Speech Lab Recordings.

  2. Tackling the complexity in speech

    DEFF Research Database (Denmark)

    section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...

  3. Preattentive cortical-evoked responses to pure tones, harmonic tones, and speech: influence of music training.

    Science.gov (United States)

    Nikjeh, Dee A; Lister, Jennifer J; Frisch, Stefan A

    2009-08-01

    presented as a standard and a deviant in separate blocks. P1-N1-P2 was elicited before each oddball task by presenting each auditory stimulus alone in single blocks. All cortical auditory evoked potentials were recorded in a passive listening condition. Incidental findings revealed that musicians had longer P1 latencies for pure tones and smaller P1 amplitudes for harmonic tones than nonmusicians. There were no P1 group differences for speech stimuli. Musicians compared with nonmusicians had shorter MMN latencies for all deviances (harmonic tones, pure tones, and speech). Musicians had shorter P3a latencies to harmonic tones and speech but not to pure tones. MMN and P3a amplitude were modulated by deviant frequency but not by group membership. Formally trained musicians compared with nonmusicians showed more efficient neural detection of pure tones and harmonic tones; demonstrated superior auditory sensory-memory traces for acoustic features of pure tones, harmonic tones, and speech; and revealed enhanced sensitivity to acoustic changes of spectrally rich stimuli (i.e., harmonic tones and speech). Findings support a general influence of music training on central auditory function and illustrate experience-facilitated modulation of the auditory neural system.

  4. Speech in spinocerebellar ataxia.

    Science.gov (United States)

    Schalling, Ellika; Hartelius, Lena

    2013-12-01

    Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.

  5. Logopenic and nonfluent variants of primary progressive aphasia are differentiated by acoustic measures of speech production.

    Directory of Open Access Journals (Sweden)

    Kirrie J Ballard

    Full Text Available Differentiation of logopenic (lvPPA and nonfluent/agrammatic (nfvPPA variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r(2 = 0.47 with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word

  6. Decoding Speech With Integrated Hybrid Signals Recorded From the Human Ventral Motor Cortex

    Directory of Open Access Journals (Sweden)

    Kenji Ibayashi

    2018-04-01

    Full Text Available Restoration of speech communication for locked-in patients by means of brain computer interfaces (BCIs is currently an important area of active research. Among the neural signals obtained from intracranial recordings, single/multi-unit activity (SUA/MUA, local field potential (LFP, and electrocorticography (ECoG are good candidates for an input signal for BCIs. However, the question of which signal or which combination of the three signal modalities is best suited for decoding speech production remains unverified. In order to record SUA, LFP, and ECoG simultaneously from a highly localized area of human ventral sensorimotor cortex (vSMC, we fabricated an electrode the size of which was 7 by 13 mm containing sparsely arranged microneedle and conventional macro contacts. We determined which signal modality is the most capable of decoding speech production, and tested if the combination of these signals could improve the decoding accuracy of spoken phonemes. Feature vectors were constructed from spike frequency obtained from SUAs and event-related spectral perturbation derived from ECoG and LFP signals, then input to the decoder. The results showed that the decoding accuracy for five spoken vowels was highest when features from multiple signals were combined and optimized for each subject, and reached 59% when averaged across all six subjects. This result suggests that multi-scale signals convey complementary information for speech articulation. The current study demonstrated that simultaneous recording of multi-scale neuronal activities could raise decoding accuracy even though the recording area is limited to a small portion of cortex, which is advantageous for future implementation of speech-assisting BCIs.

  7. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    Science.gov (United States)

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  8. A causal test of the motor theory of speech perception: a case of impaired speech production and spared speech perception.

    Science.gov (United States)

    Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z

    2015-01-01

    The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.

  9. Improvement of electrolaryngeal speech quality using a supraglottal voice source with compensation of vocal tract characteristics.

    Science.gov (United States)

    Wu, Liang; Wan, Congying; Wang, Supin; Wan, Mingxi

    2013-07-01

    Electrolarynx (EL) is a medical speech-recovery device designed for patients who have lost their original voice box due to laryngeal cancer. As a substitute for human larynx, the current commercial EL voice source cannot reconstruct natural EL speech under laryngectomy conditions. To eliminate the abnormal acoustic properties of EL speech, a supraglottal voice source with compensation of vocal tract characteristics was proposed and provided through an experimental EL(SGVS-EL) system. The acoustic analyses of simulated EL speech and reconstructed EL speech produced with different voice sources were performed in the normal subject and laryngectomee. The results indicated that the supraglottal voice source was successful in improving the acoustic properties of EL speech by enhancing low- frequency energy, correcting the shifted formants to normal range, and eliminating the visible spectral zeros. Both normal subject and laryngectomee also produced more natural vowels using SGVS-EL than commercial EL, even if the vocal tract parameter was substituted and the supraglottal voice source was biased to a certain degree. Therefore, supraglottal voice source is a feasible and effective approach to improving the acoustic quality of EL speech.

  10. The Relationship between Speech Production and Speech Perception Deficits in Parkinson's Disease

    Science.gov (United States)

    De Keyser, Kim; Santens, Patrick; Bockstael, Annelies; Botteldooren, Dick; Talsma, Durk; De Vos, Stefanie; Van Cauwenberghe, Mieke; Verheugen, Femke; Corthals, Paul; De Letter, Miet

    2016-01-01

    Purpose: This study investigated the possible relationship between hypokinetic speech production and speech intensity perception in patients with Parkinson's disease (PD). Method: Participants included 14 patients with idiopathic PD and 14 matched healthy controls (HCs) with normal hearing and cognition. First, speech production was objectified…

  11. Atypical lateralization of ERP response to native and non-native speech in infants at risk for autism spectrum disorder.

    Science.gov (United States)

    Seery, Anne M; Vogel-Farley, Vanessa; Tager-Flusberg, Helen; Nelson, Charles A

    2013-07-01

    Language impairment is common in autism spectrum disorders (ASD) and is often accompanied by atypical neural lateralization. However, it is unclear when in development language impairment or atypical lateralization first emerges. To address these questions, we recorded event-related-potentials (ERPs) to native and non-native speech contrasts longitudinally in infants at risk for ASD (HRA) over the first year of life to determine whether atypical lateralization is present as an endophenotype early in development and whether these infants show delay in a very basic precursor of language acquisition: phonemic perceptual narrowing. ERP response for the HRA group to a non-native speech contrast revealed a trajectory of perceptual narrowing similar to a group of low-risk controls (LRC), suggesting that phonemic perceptual narrowing does not appear to be delayed in these high-risk infants. In contrast there were significant group differences in the development of lateralized ERP response to speech: between 6 and 12 months the LRC group displayed a lateralized response to the speech sounds, while the HRA group failed to display this pattern. We suggest the possibility that atypical lateralization to speech may be an ASD endophenotype over the first year of life. Copyright © 2012 Elsevier Ltd. All rights reserved.

  12. Paving the Way for Speech: Voice-Training-Induced Plasticity in Chronic Aphasia and Apraxia of Speech—Three Single Cases

    Directory of Open Access Journals (Sweden)

    Monika Jungblut

    2014-01-01

    Full Text Available Difficulties with temporal coordination or sequencing of speech movements are frequently reported in aphasia patients with concomitant apraxia of speech (AOS. Our major objective was to investigate the effects of specific rhythmic-melodic voice training on brain activation of those patients. Three patients with severe chronic nonfluent aphasia and AOS were included in this study. Before and after therapy, patients underwent the same fMRI procedure as 30 healthy control subjects in our prestudy, which investigated the neural substrates of sung vowel changes in untrained rhythm sequences. A main finding was that post-minus pretreatment imaging data yielded significant perilesional activations in all patients for example, in the left superior temporal gyrus, whereas the reverse subtraction revealed either no significant activation or right hemisphere activation. Likewise, pre- and posttreatment assessments of patients’ vocal rhythm production, language, and speech motor performance yielded significant improvements for all patients. Our results suggest that changes in brain activation due to the applied training might indicate specific processes of reorganization, for example, improved temporal sequencing of sublexical speech components. In this context, a training that focuses on rhythmic singing with differently demanding complexity levels as concerns motor and cognitive capabilities seems to support paving the way for speech.

  13. The treatment of apraxia of speech : Speech and music therapy, an innovative joint effort

    NARCIS (Netherlands)

    Hurkmans, Josephus Johannes Stephanus

    2016-01-01

    Apraxia of Speech (AoS) is a neurogenic speech disorder. A wide variety of behavioural methods have been developed to treat AoS. Various therapy programmes use musical elements to improve speech production. A unique therapy programme combining elements of speech therapy and music therapy is called

  14. Practical speech user interface design

    CERN Document Server

    Lewis, James R

    2010-01-01

    Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application

  15. Motor Speech Phenotypes of Frontotemporal Dementia, Primary Progressive Aphasia, and Progressive Apraxia of Speech

    Science.gov (United States)

    Poole, Matthew L.; Brodtmann, Amy; Darby, David; Vogel, Adam P.

    2017-01-01

    Purpose: Our purpose was to create a comprehensive review of speech impairment in frontotemporal dementia (FTD), primary progressive aphasia (PPA), and progressive apraxia of speech in order to identify the most effective measures for diagnosis and monitoring, and to elucidate associations between speech and neuroimaging. Method: Speech and…

  16. Can you hear me now? Musical training shapes functional brain networks for selective auditory attention and hearing speech in noise

    Directory of Open Access Journals (Sweden)

    Dana L Strait

    2011-06-01

    Full Text Available Even in the quietest of rooms, our senses are perpetually inundated by a barrage of sounds, requiring the auditory system to adapt to a variety of listening conditions in order to extract signals of interest (e.g., one speaker’s voice amidst others. Brain networks that promote selective attention are thought to sharpen the neural encoding of a target signal, suppressing competing sounds and enhancing perceptual performance. Here, we ask: does musical training benefit cortical mechanisms that underlie selective attention to speech? To answer this question, we assessed the impact of selective auditory attention on cortical auditory-evoked response variability in musicians and nonmusicians. Outcomes indicate strengthened brain networks for selective auditory attention in musicians in that musicians but not nonmusicians demonstrate decreased prefrontal response variability with auditory attention. Results are interpreted in the context of previous work from our laboratory documenting perceptual and subcortical advantages in musicians for the hearing and neural encoding of speech in background noise. Musicians’ neural proficiency for selectively engaging and sustaining auditory attention to language indicates a potential benefit of music for auditory training. Given the importance of auditory attention for the development of language-related skills, musical training may aid in the prevention, habilitation and remediation of children with a wide range of attention-based language and learning impairments.

  17. An analysis of the masking of speech by competing speech using self-report data.

    Science.gov (United States)

    Agus, Trevor R; Akeroyd, Michael A; Noble, William; Bhullar, Navjot

    2009-01-01

    Many of the items in the "Speech, Spatial, and Qualities of Hearing" scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol. 43, 85-99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively higher scores for competing speech (six items), energetic masking (one item), and no masking (three items). The results suggest significant masking by competing speech in everyday listening situations.

  18. Application of radial basis neural network for state estimation of ...

    African Journals Online (AJOL)

    An original application of radial basis function (RBF) neural network for power system state estimation is proposed in this paper. The property of massive parallelism of neural networks is employed for this. The application of RBF neural network for state estimation is investigated by testing its applicability on a IEEE 14 bus ...

  19. Part-of-speech effects on text-to-speech synthesis

    CSIR Research Space (South Africa)

    Schlunz, GI

    2010-11-01

    Full Text Available One of the goals of text-to-speech (TTS) systems is to produce natural-sounding synthesised speech. Towards this end various natural language processing (NLP) tasks are performed to model the prosodic aspects of the TTS voice. One of the fundamental...

  20. Speech processing: from peripheral to hemispheric asymmetry of the auditory system.

    Science.gov (United States)

    Lazard, Diane S; Collette, Jean-Louis; Perrot, Xavier

    2012-01-01

    Language processing from the cochlea to auditory association cortices shows side-dependent specificities with an apparent left hemispheric dominance. The aim of this article was to propose to nonspeech specialists a didactic review of two complementary theories about hemispheric asymmetry in speech processing. Starting from anatomico-physiological and clinical observations of auditory asymmetry and interhemispheric connections, this review then exposes behavioral (dichotic listening paradigm) as well as functional (functional magnetic resonance imaging and positron emission tomography) experiments that assessed hemispheric specialization for speech processing. Even though speech at an early phonological level is regarded as being processed bilaterally, a left-hemispheric dominance exists for higher-level processing. This asymmetry may arise from a segregation of the speech signal, broken apart within nonprimary auditory areas in two distinct temporal integration windows--a fast one on the left and a slower one on the right--modeled through the asymmetric sampling in time theory or a spectro-temporal trade-off, with a higher temporal resolution in the left hemisphere and a higher spectral resolution in the right hemisphere, modeled through the spectral/temporal resolution trade-off theory. Both theories deal with the concept that lower-order tuning principles for acoustic signal might drive higher-order organization for speech processing. However, the precise nature, mechanisms, and origin of speech processing asymmetry are still being debated. Finally, an example of hemispheric asymmetry alteration, which has direct clinical implications, is given through the case of auditory aging that mixes peripheral disorder and modifications of central processing. Copyright © 2011 The American Laryngological, Rhinological, and Otological Society, Inc.

  1. Neural decoding of attentional selection in multi-speaker environments without access to clean sources

    Science.gov (United States)

    O'Sullivan, James; Chen, Zhuo; Herrero, Jose; McKhann, Guy M.; Sheth, Sameer A.; Mehta, Ashesh D.; Mesgarani, Nima

    2017-10-01

    Objective. People who suffer from hearing impairments can find it difficult to follow a conversation in a multi-speaker environment. Current hearing aids can suppress background noise; however, there is little that can be done to help a user attend to a single conversation amongst many without knowing which speaker the user is attending to. Cognitively controlled hearing aids that use auditory attention decoding (AAD) methods are the next step in offering help. Translating the successes in AAD research to real-world applications poses a number of challenges, including the lack of access to the clean sound sources in the environment with which to compare with the neural signals. We propose a novel framework that combines single-channel speech separation algorithms with AAD. Approach. We present an end-to-end system that (1) receives a single audio channel containing a mixture of speakers that is heard by a listener along with the listener’s neural signals, (2) automatically separates the individual speakers in the mixture, (3) determines the attended speaker, and (4) amplifies the attended speaker’s voice to assist the listener. Main results. Using invasive electrophysiology recordings, we identified the regions of the auditory cortex that contribute to AAD. Given appropriate electrode locations, our system is able to decode the attention of subjects and amplify the attended speaker using only the mixed audio. Our quality assessment of the modified audio demonstrates a significant improvement in both subjective and objective speech quality measures. Significance. Our novel framework for AAD bridges the gap between the most recent advancements in speech processing technologies and speech prosthesis research and moves us closer to the development of cognitively controlled hearable devices for the hearing impaired.

  2. Saltatory Evolution of the Ectodermal Neural Cortex Gene Family at the Vertebrate Origin

    Science.gov (United States)

    Feiner, Nathalie; Murakami, Yasunori; Breithut, Lisa; Mazan, Sylvie; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    The ectodermal neural cortex (ENC) gene family, whose members are implicated in neurogenesis, is part of the kelch repeat superfamily. To date, ENC genes have been identified only in osteichthyans, although other kelch repeat-containing genes are prevalent throughout bilaterians. The lack of elaborate molecular phylogenetic analysis with exhaustive taxon sampling has obscured the possible link of the establishment of this gene family with vertebrate novelties. In this study, we identified ENC homologs in diverse vertebrates by means of database mining and polymerase chain reaction screens. Our analysis revealed that the ENC3 ortholog was lost in the basal eutherian lineage through single-gene deletion and that the triplication between ENC1, -2, and -3 occurred early in vertebrate evolution. Including our original data on the catshark and the zebrafish, our comparison revealed high conservation of the pleiotropic expression pattern of ENC1 and shuffling of expression domains between ENC1, -2, and -3. Compared with many other gene families including developmental key regulators, the ENC gene family is unique in that conventional molecular phylogenetic inference could identify no obvious invertebrate ortholog. This suggests a composite nature of the vertebrate-specific gene repertoire, consisting not only of de novo genes introduced at the vertebrate origin but also of long-standing genes with no apparent invertebrate orthologs. Some of the latter, including the ENC gene family, may be too rapidly evolving to provide sufficient phylogenetic signals marking orthology to their invertebrate counterparts. Such gene families that experienced saltatory evolution likely remain to be explored and might also have contributed to phenotypic evolution of vertebrates. PMID:23843192

  3. 75 FR 26701 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...

    Science.gov (United States)

    2010-05-12

    ...] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities... proposed compensation rates for Interstate TRS, Speech-to-Speech Services (STS), Captioned Telephone... costs reported in the data submitted to NECA by VRS providers. In this regard, document DA 10-761 also...

  4. Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification

    OpenAIRE

    Hwang, Kyuyeon; Sung, Wonyong

    2015-01-01

    Connectionist temporal classification (CTC) based supervised sequence training of recurrent neural networks (RNNs) has shown great success in many machine learning areas including end-to-end speech and handwritten character recognition. For the CTC training, however, it is required to unroll (or unfold) the RNN by the length of an input sequence. This unrolling requires a lot of memory and hinders a small footprint implementation of online learning or adaptation. Furthermore, the length of tr...

  5. Predicting automatic speech recognition performance over communication channels from instrumental speech quality and intelligibility scores

    NARCIS (Netherlands)

    Gallardo, L.F.; Möller, S.; Beerends, J.

    2017-01-01

    The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility

  6. [Non-speech oral motor treatment efficacy for children with developmental speech sound disorders].

    Science.gov (United States)

    Ygual-Fernandez, A; Cervera-Merida, J F

    2016-01-01

    In the treatment of speech disorders by means of speech therapy two antagonistic methodological approaches are applied: non-verbal ones, based on oral motor exercises (OME), and verbal ones, which are based on speech processing tasks with syllables, phonemes and words. In Spain, OME programmes are called 'programas de praxias', and are widely used and valued by speech therapists. To review the studies conducted on the effectiveness of OME-based treatments applied to children with speech disorders and the theoretical arguments that could justify, or not, their usefulness. Over the last few decades evidence has been gathered about the lack of efficacy of this approach to treat developmental speech disorders and pronunciation problems in populations without any neurological alteration of motor functioning. The American Speech-Language-Hearing Association has advised against its use taking into account the principles of evidence-based practice. The knowledge gathered to date on motor control shows that the pattern of mobility and its corresponding organisation in the brain are different in speech and other non-verbal functions linked to nutrition and breathing. Neither the studies on their effectiveness nor the arguments based on motor control studies recommend the use of OME-based programmes for the treatment of pronunciation problems in children with developmental language disorders.

  7. Evaluation of Short-Term Cepstral Based Features for Detection of Parkinson’s Disease Severity Levels through Speech signals

    Science.gov (United States)

    Oung, Qi Wei; Nisha Basah, Shafriza; Muthusamy, Hariharan; Vijean, Vikneswaran; Lee, Hoileong

    2018-03-01

    Parkinson’s disease (PD) is one type of progressive neurodegenerative disease known as motor system syndrome, which is due to the death of dopamine-generating cells, a region of the human midbrain. PD normally affects people over 60 years of age, which at present has influenced a huge part of worldwide population. Lately, many researches have shown interest into the connection between PD and speech disorders. Researches have revealed that speech signals may be a suitable biomarker for distinguishing between people with Parkinson’s (PWP) from healthy subjects. Therefore, early diagnosis of PD through the speech signals can be considered for this aim. In this research, the speech data are acquired based on speech behaviour as the biomarker for differentiating PD severity levels (mild and moderate) from healthy subjects. Feature extraction algorithms applied are Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), and Weighted Linear Prediction Cepstral Coefficients (WLPCC). For classification, two types of classifiers are used: k-Nearest Neighbour (KNN) and Probabilistic Neural Network (PNN). The experimental results demonstrated that PNN classifier and KNN classifier achieve the best average classification performance of 92.63% and 88.56% respectively through 10-fold cross-validation measures. Favourably, the suggested techniques have the possibilities of becoming a new choice of promising tools for the PD detection with tremendous performance.

  8. 75 FR 54040 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...

    Science.gov (United States)

    2010-09-03

    ...] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...; speech-to-speech (STS); pay-per-call (900) calls; types of calls; and equal access to interexchange... of a report, due April 16, 2011, addressing whether it is necessary for the waivers to remain in...

  9. Comparison of the HiFocus Mid-Scala and HiFocus 1J Electrode Array: Angular Insertion Depths and Speech Perception Outcomes.

    Science.gov (United States)

    van der Jagt, M Annerie; Briaire, Jeroen J; Verbist, Berit M; Frijns, Johan H M

    2016-01-01

    The HiFocus Mid-Scala (MS) electrode array has recently been introduced onto the market. This precurved design with a targeted mid-scalar intracochlear position pursues an atraumatic insertion and optimal distance for neural stimulation. In this study we prospectively examined the angular insertion depth achieved and speech perception outcomes resulting from the HiFocus MS electrode array for 6 months after implantation, and retrospectively compared these with the HiFocus 1J lateral wall electrode array. The mean angular insertion depth within the MS population (n = 96) was found at 470°. This was 50° shallower but more consistent than the 1J electrode array (n = 110). Audiological evaluation within a subgroup, including only postlingual, unilaterally implanted, adult cochlear implant recipients who were matched on preoperative speech perception scores and the duration of deafness (MS = 32, 1J = 32), showed no difference in speech perception outcomes between the MS and 1J groups. Furthermore, speech perception outcome was not affected by the angular insertion depth or frequency mismatch. © 2016 S. Karger AG, Basel.

  10. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    Science.gov (United States)

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  11. Environmental Contamination of Normal Speech.

    Science.gov (United States)

    Harley, Trevor A.

    1990-01-01

    Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…

  12. Structured Semantic Knowledge Can Emerge Automatically from Predicting Word Sequences in Child-Directed Speech

    Science.gov (United States)

    Huebner, Philip A.; Willits, Jon A.

    2018-01-01

    Previous research has suggested that distributional learning mechanisms may contribute to the acquisition of semantic knowledge. However, distributional learning mechanisms, statistical learning, and contemporary “deep learning” approaches have been criticized for being incapable of learning the kind of abstract and structured knowledge that many think is required for acquisition of semantic knowledge. In this paper, we show that recurrent neural networks, trained on noisy naturalistic speech to children, do in fact learn what appears to be abstract and structured knowledge. We trained two types of recurrent neural networks (Simple Recurrent Network, and Long Short-Term Memory) to predict word sequences in a 5-million-word corpus of speech directed to children ages 0–3 years old, and assessed what semantic knowledge they acquired. We found that learned internal representations are encoding various abstract grammatical and semantic features that are useful for predicting word sequences. Assessing the organization of semantic knowledge in terms of the similarity structure, we found evidence of emergent categorical and hierarchical structure in both models. We found that the Long Short-term Memory (LSTM) and SRN are both learning very similar kinds of representations, but the LSTM achieved higher levels of performance on a quantitative evaluation. We also trained a non-recurrent neural network, Skip-gram, on the same input to compare our results to the state-of-the-art in machine learning. We found that Skip-gram achieves relatively similar performance to the LSTM, but is representing words more in terms of thematic compared to taxonomic relations, and we provide reasons why this might be the case. Our findings show that a learning system that derives abstract, distributed representations for the purpose of predicting sequential dependencies in naturalistic language may provide insight into emergence of many properties of the developing semantic system. PMID

  13. Structured Semantic Knowledge Can Emerge Automatically from Predicting Word Sequences in Child-Directed Speech

    Directory of Open Access Journals (Sweden)

    Philip A. Huebner

    2018-02-01

    Full Text Available Previous research has suggested that distributional learning mechanisms may contribute to the acquisition of semantic knowledge. However, distributional learning mechanisms, statistical learning, and contemporary “deep learning” approaches have been criticized for being incapable of learning the kind of abstract and structured knowledge that many think is required for acquisition of semantic knowledge. In this paper, we show that recurrent neural networks, trained on noisy naturalistic speech to children, do in fact learn what appears to be abstract and structured knowledge. We trained two types of recurrent neural networks (Simple Recurrent Network, and Long Short-Term Memory to predict word sequences in a 5-million-word corpus of speech directed to children ages 0–3 years old, and assessed what semantic knowledge they acquired. We found that learned internal representations are encoding various abstract grammatical and semantic features that are useful for predicting word sequences. Assessing the organization of semantic knowledge in terms of the similarity structure, we found evidence of emergent categorical and hierarchical structure in both models. We found that the Long Short-term Memory (LSTM and SRN are both learning very similar kinds of representations, but the LSTM achieved higher levels of performance on a quantitative evaluation. We also trained a non-recurrent neural network, Skip-gram, on the same input to compare our results to the state-of-the-art in machine learning. We found that Skip-gram achieves relatively similar performance to the LSTM, but is representing words more in terms of thematic compared to taxonomic relations, and we provide reasons why this might be the case. Our findings show that a learning system that derives abstract, distributed representations for the purpose of predicting sequential dependencies in naturalistic language may provide insight into emergence of many properties of the developing

  14. Emotionally conditioning the target-speech voice enhances recognition of the target speech under "cocktail-party" listening conditions.

    Science.gov (United States)

    Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang

    2018-05-01

    Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.

  15. Electrophysiological and Kinematic Correlates of Communicative Intent in the Planning and Production of Pointing Gestures and Speech.

    Science.gov (United States)

    Peeters, David; Chu, Mingyuan; Holler, Judith; Hagoort, Peter; Özyürek, Aslı

    2015-12-01

    In everyday human communication, we often express our communicative intentions by manually pointing out referents in the material world around us to an addressee, often in tight synchronization with referential speech. This study investigated whether and how the kinematic form of index finger pointing gestures is shaped by the gesturer's communicative intentions and how this is modulated by the presence of concurrently produced speech. Furthermore, we explored the neural mechanisms underpinning the planning of communicative pointing gestures and speech. Two experiments were carried out in which participants pointed at referents for an addressee while the informativeness of their gestures and speech was varied. Kinematic and electrophysiological data were recorded online. It was found that participants prolonged the duration of the stroke and poststroke hold phase of their gesture to be more communicative, in particular when the gesture was carrying the main informational burden in their multimodal utterance. Frontal and P300 effects in the ERPs suggested the importance of intentional and modality-independent attentional mechanisms during the planning phase of informative pointing gestures. These findings contribute to a better understanding of the complex interplay between action, attention, intention, and language in the production of pointing gestures, a communicative act core to human interaction.

  16. The role of phase synchronisation between low frequency amplitude modulations in child phonology and morphology speech tasks.

    Science.gov (United States)

    Flanagan, Sheila; Goswami, Usha

    2018-03-01

    Recent models of the neural encoding of speech suggest a core role for amplitude modulation (AM) structure, particularly regarding AM phase alignment. Accordingly, speech tasks that measure linguistic development in children may exhibit systematic properties regarding AM structure. Here, the acoustic structure of spoken items in child phonological and morphological tasks, phoneme deletion and plural elicitation, was investigated. The phase synchronisation index (PSI), reflecting the degree of phase alignment between pairs of AMs, was computed for 3 AM bands (delta, theta, beta/low gamma; 0.9-2.5 Hz, 2.5-12 Hz, 12-40 Hz, respectively), for five spectral bands covering 100-7250 Hz. For phoneme deletion, data from 94 child participants with and without dyslexia was used to relate AM structure to behavioural performance. Results revealed that a significant change in magnitude of the phase synchronisation index (ΔPSI) of slower AMs (delta-theta) systematically accompanied both phoneme deletion and plural elicitation. Further, children with dyslexia made more linguistic errors as the delta-theta ΔPSI increased. Accordingly, ΔPSI between slower temporal modulations in the speech signal systematically distinguished test items from accurate responses and predicted task performance. This may suggest that sensitivity to slower AM information in speech is a core aspect of phonological and morphological development.

  17. Embedding recurrent neural networks into predator-prey models.

    Science.gov (United States)

    Moreau, Yves; Louiès, Stephane; Vandewalle, Joos; Brenig, Leon

    1999-03-01

    We study changes of coordinates that allow the embedding of ordinary differential equations describing continuous-time recurrent neural networks into differential equations describing predator-prey models-also called Lotka-Volterra systems. We transform the equations for the neural network first into quasi-monomial form (Brenig, L. (1988). Complete factorization and analytic solutions of generalized Lotka-Volterra equations. Physics Letters A, 133(7-8), 378-382), where we express the vector field of the dynamical system as a linear combination of products of powers of the variables. In practice, this transformation is possible only if the activation function is the hyperbolic tangent or the logistic sigmoid. From this quasi-monomial form, we can directly transform the system further into Lotka-Volterra equations. The resulting Lotka-Volterra system is of higher dimension than the original system, but the behavior of its first variables is equivalent to the behavior of the original neural network. We expect that this transformation will permit the application of existing techniques for the analysis of Lotka-Volterra systems to recurrent neural networks. Furthermore, our results show that Lotka-Volterra systems are universal approximators of dynamical systems, just as are continuous-time neural networks.

  18. Multilevel Analysis in Analyzing Speech Data

    Science.gov (United States)

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  19. Representation of neural networks as Lotka-Volterra systems

    International Nuclear Information System (INIS)

    Moreau, Yves; Vandewalle, Joos; Louies, Stephane; Brenig, Leon

    1999-01-01

    We study changes of coordinates that allow the representation of the ordinary differential equations describing continuous-time recurrent neural networks into differential equations describing predator-prey models--also called Lotka-Volterra systems. We transform the equations for the neural network first into quasi-monomial form, where we express the vector field of the dynamical system as a linear combination of products of powers of the variables. In practice, this transformation is possible only if the activation function is the hyperbolic tangent or the logistic sigmoied. From this quasi-monomial form, we can directly transform the system further into Lotka-Volterra equations. The resulting Lotka-Volterra system is of higher dimension than the original system, but the behavior of its first variables is equivalent to the behavior of the original neural network

  20. Perceived Liveliness and Speech Comprehensibility in Aphasia: The Effects of Direct Speech in Auditory Narratives

    Science.gov (United States)

    Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike

    2014-01-01

    Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in "healthy" communication direct speech constructions contribute to the liveliness, and indirectly to the comprehensibility, of speech.…

  1. Separate neural systems support representations for actions and objects during narrative speech in post-stroke aphasia

    Directory of Open Access Journals (Sweden)

    Ezequiel Gleichgerrcht

    2016-01-01

    Conclusions: The finding that the two major grammatical classes in human speech rely on two dissociable networks has both important theoretical implications for the neurobiology of language and clinical implications for the assessment and potential rehabilitation and treatment of patients with chronic aphasia due to stroke.

  2. Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model

    Directory of Open Access Journals (Sweden)

    Lotter Thomas

    2005-01-01

    Full Text Available This contribution presents two spectral amplitude estimators for acoustical background noise suppression based on maximum a posteriori estimation and super-Gaussian statistical modelling of the speech DFT amplitudes. The probability density function of the speech spectral amplitude is modelled with a simple parametric function, which allows a high approximation accuracy for Laplace- or Gamma-distributed real and imaginary parts of the speech DFT coefficients. Also, the statistical model can be adapted to optimally fit the distribution of the speech spectral amplitudes for a specific noise reduction system. Based on the super-Gaussian statistical model, computationally efficient maximum a posteriori speech estimators are derived, which outperform the commonly applied Ephraim-Malah algorithm.

  3. Exploring the role of brain oscillations in speech perception in noise: Intelligibility of isochronously retimed speech

    Directory of Open Access Journals (Sweden)

    Vincent Aubanel

    2016-08-01

    Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.

  4. Early and parallel processing of pragmatic and semantic information in speech acts: neurophysiological evidence

    Directory of Open Access Journals (Sweden)

    Natalia eEgorova

    2013-03-01

    Full Text Available Although language is a tool for communication, most research in the neuroscience of language has focused on studying words and sentences, while little is known about the brain mechanisms of speech acts, or communicative functions, for which words and sentences are used as tools. Here the neural processing of two types of speech acts, Naming and Requesting, was addressed using the time-resolved event-related potential (ERP technique. The brain responses for Naming and Request diverged as early as ~120 ms after the onset of the critical words, at the same time as, or even before, the earliest brain manifestations of semantic word properties could be detected. Request-evoked potentials were generally larger in amplitude than those for Naming. The use of identical words in closely matched settings for both speech acts rules out explanation of the difference in terms of phonological, lexical, semantic properties or word expectancy. The cortical sources underlying the ERP enhancement for Requests were found in the fronto-central cortex, consistent with the activation of action knowledge, as well as in right temporo-parietal junction, possibly reflecting additional implications of speech acts for social interaction and theory of mind. These results provide the first evidence for surprisingly early access to pragmatic and social interactive knowledge, which possibly occurs in parallel with other types of linguistic processing, and thus supports the near-simultaneous access to different subtypes of psycholinguistic information.

  5. The development of multisensory speech perception continues into the late childhood years.

    Science.gov (United States)

    Ross, Lars A; Molholm, Sophie; Blanco, Daniella; Gomez-Ramirez, Manuel; Saint-Amour, Dave; Foxe, John J

    2011-06-01

    Observing a speaker's articulations substantially improves the intelligibility of spoken speech, especially under noisy listening conditions. This multisensory integration of speech inputs is crucial to effective communication. Appropriate development of this ability has major implications for children in classroom and social settings, and deficits in it have been linked to a number of neurodevelopmental disorders, especially autism. It is clear from structural imaging studies that there is a prolonged maturational course within regions of the perisylvian cortex that persists into late childhood, and these regions have been firmly established as being crucial to speech and language functions. Given this protracted maturational timeframe, we reasoned that multisensory speech processing might well show a similarly protracted developmental course. Previous work in adults has shown that audiovisual enhancement in word recognition is most apparent within a restricted range of signal-to-noise ratios (SNRs). Here, we investigated when these properties emerge during childhood by testing multisensory speech recognition abilities in typically developing children aged between 5 and 14 years, and comparing them with those of adults. By parametrically varying SNRs, we found that children benefited significantly less from observing visual articulations, displaying considerably less audiovisual enhancement. The findings suggest that improvement in the ability to recognize speech-in-noise and in audiovisual integration during speech perception continues quite late into the childhood years. The implication is that a considerable amount of multisensory learning remains to be achieved during the later schooling years, and that explicit efforts to accommodate this learning may well be warranted. European Journal of Neuroscience © 2011 Federation of European Neuroscience Societies and Blackwell Publishing Ltd. No claim to original US government works.

  6. Ear, Hearing and Speech

    DEFF Research Database (Denmark)

    Poulsen, Torben

    2000-01-01

    An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...

  7. Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech and music

    Directory of Open Access Journals (Sweden)

    Hwee Ling eLee

    2014-08-01

    Full Text Available This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogues of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms. Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past three years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.

  8. Effect of gap detection threshold on consistency of speech in children with speech sound disorder.

    Science.gov (United States)

    Sayyahi, Fateme; Soleymani, Zahra; Akbari, Mohammad; Bijankhan, Mahmood; Dolatshahi, Behrooz

    2017-02-01

    The present study examined the relationship between gap detection threshold and speech error consistency in children with speech sound disorder. The participants were children five to six years of age who were categorized into three groups of typical speech, consistent speech disorder (CSD) and inconsistent speech disorder (ISD).The phonetic gap detection threshold test was used for this study, which is a valid test comprised six syllables with inter-stimulus intervals between 20-300ms. The participants were asked to listen to the recorded stimuli three times and indicate whether they heard one or two sounds. There was no significant difference between the typical and CSD groups (p=0.55), but there were significant differences in performance between the ISD and CSD groups and the ISD and typical groups (p=0.00). The ISD group discriminated between speech sounds at a higher threshold. Children with inconsistent speech errors could not distinguish speech sounds during time-limited phonetic discrimination. It is suggested that inconsistency in speech is a representation of inconsistency in auditory perception, which causes by high gap detection threshold. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Spasmodic dysphonia: a laryngeal control disorder specific to speech.

    Science.gov (United States)

    Ludlow, Christy L

    2011-01-19

    Spasmodic dysphonia (SD) is a rare neurological disorder that emerges in middle age, is usually sporadic, and affects intrinsic laryngeal muscle control only during speech. Spasmodic bursts in particular laryngeal muscles disrupt voluntary control during vowel sounds in adductor SD and interfere with voice onset after voiceless consonants in abductor SD. Little is known about its origins; it is classified as a focal dystonia secondary to an unknown neurobiological mechanism that produces a chronic abnormality of laryngeal motor neuron regulation during speech. It develops primarily in females and does not interfere with breathing, crying, laughter, and shouting. Recent postmortem studies have implicated the accumulation of clusters in the parenchyma and perivascular regions with inflammatory changes in the brainstem in one to two cases. A few cases with single mutations in THAP1, a gene involved in transcription regulation, suggest that a weak genetic predisposition may contribute to mechanisms causing a nonprogressive abnormality in laryngeal motor neuron control for speech but not for vocal emotional expression. Research is needed to address the basic cellular and proteomic mechanisms that produce this disorder to provide intervention that could target the pathogenesis of the disorder rather than only providing temporary symptom relief.

  10. Neural representation in the auditory midbrain of the envelope of vocalizations based on a peripheral ear model

    Directory of Open Access Journals (Sweden)

    Thilo eRode

    2013-10-01

    Full Text Available The auditory midbrain implant (AMI consists of a single shank array (20 sites for stimulation along the tonotopic axis of the central nucleus of the inferior colliculus (ICC and has been safely implanted in deaf patients who cannot benefit from a cochlear implant (CI. The AMI improves lip-reading abilities and environmental awareness in the implanted patients. However, the AMI cannot achieve the high levels of speech perception possible with the CI. It appears the AMI can transmit sufficient spectral cues but with limited temporal cues required for speech understanding. Currently, the AMI uses a CI-based strategy, which was originally designed to stimulate each frequency region along the cochlea with amplitude-modulated pulse trains matching the envelope of the bandpass-filtered sound components. However, it is unclear if this type of stimulation with only a single site within each frequency lamina of the ICC can elicit sufficient temporal cues for speech perception. At least speech understanding in quiet is still possible with envelope cues as low as 50 Hz. Therefore, we investigated how ICC neurons follow the bandpass-filtered envelope structure of natural stimuli in ketamine-anesthetized guinea pigs. We identified a subset of ICC neurons that could closely follow the envelope structure (up to ~100 Hz of a diverse set of species-specific calls, which was revealed by using a peripheral ear model to estimate the true bandpass-filtered envelopes observed by the brain. Although previous studies have suggested a complex neural transformation from the auditory nerve to the ICC, our data suggest that the brain maintains a robust temporal code in a subset of ICC neurons matching the envelope structure of natural stimuli. Clinically, these findings suggest that a CI-based strategy may still be effective for the AMI if the appropriate neurons are entrained to the envelope of the acoustic stimulus and can transmit sufficient temporal cues to higher

  11. Speech Perception as a Multimodal Phenomenon

    OpenAIRE

    Rosenblum, Lawrence D.

    2008-01-01

    Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal s...

  12. Electrophysiological and hemodynamic mismatch responses in rats listening to human speech syllables.

    Directory of Open Access Journals (Sweden)

    Mahdi Mahmoudzadeh

    Full Text Available Speech is a complex auditory stimulus which is processed according to several time-scales. Whereas consonant discrimination is required to resolve rapid acoustic events, voice perception relies on slower cues. Humans, right from preterm ages, are particularly efficient to encode temporal cues. To compare the capacities of preterms to those observed in other mammals, we tested anesthetized adult rats by using exactly the same paradigm as that used in preterm neonates. We simultaneously recorded neural (using ECoG and hemodynamic responses (using fNIRS to series of human speech syllables and investigated the brain response to a change of consonant (ba vs. ga and to a change of voice (male vs. female. Both methods revealed concordant results, although ECoG measures were more sensitive than fNIRS. Responses to syllables were bilateral, but with marked right-hemispheric lateralization. Responses to voice changes were observed with both methods, while only ECoG was sensitive to consonant changes. These results suggest that rats more effectively processed the speech envelope than fine temporal cues in contrast with human preterm neonates, in whom the opposite effects were observed. Cross-species comparisons constitute a very valuable tool to define the singularities of the human brain and species-specific bias that may help human infants to learn their native language.

  13. Speech-specific tuning of neurons in human superior temporal gyrus.

    Science.gov (United States)

    Chan, Alexander M; Dykstra, Andrew R; Jayaram, Vinay; Leonard, Matthew K; Travis, Katherine E; Gygi, Brian; Baker, Janet M; Eskandar, Emad; Hochberg, Leigh R; Halgren, Eric; Cash, Sydney S

    2014-10-01

    How the brain extracts words from auditory signals is an unanswered question. We recorded approximately 150 single and multi-units from the left anterior superior temporal gyrus of a patient during multiple auditory experiments. Against low background activity, 45% of units robustly fired to particular spoken words with little or no response to pure tones, noise-vocoded speech, or environmental sounds. Many units were tuned to complex but specific sets of phonemes, which were influenced by local context but invariant to speaker, and suppressed during self-produced speech. The firing of several units to specific visual letters was correlated with their response to the corresponding auditory phonemes, providing the first direct neural evidence for phonological recoding during reading. Maximal decoding of individual phonemes and words identities was attained using firing rates from approximately 5 neurons within 200 ms after word onset. Thus, neurons in human superior temporal gyrus use sparse spatially organized population encoding of complex acoustic-phonetic features to help recognize auditory and visual words. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. Poor Speech Perception Is Not a Core Deficit of Childhood Apraxia of Speech: Preliminary Findings

    Science.gov (United States)

    Zuk, Jennifer; Iuzzini-Seigel, Jenya; Cabbage, Kathryn; Green, Jordan R.; Hogan, Tiffany P.

    2018-01-01

    Purpose: Childhood apraxia of speech (CAS) is hypothesized to arise from deficits in speech motor planning and programming, but the influence of abnormal speech perception in CAS on these processes is debated. This study examined speech perception abilities among children with CAS with and without language impairment compared to those with…

  15. Weaving and neural complexity in symmetric quantum states

    Science.gov (United States)

    Susa, Cristian E.; Girolami, Davide

    2018-04-01

    We study the behaviour of two different measures of the complexity of multipartite correlation patterns, weaving and neural complexity, for symmetric quantum states. Weaving is the weighted sum of genuine multipartite correlations of any order, where the weights are proportional to the correlation order. The neural complexity, originally introduced to characterize correlation patterns in classical neural networks, is here extended to the quantum scenario. We derive closed formulas of the two quantities for GHZ states mixed with white noise.

  16. Principles of speech coding

    CERN Document Server

    Ogunfunmi, Tokunbo

    2010-01-01

    It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the

  17. A chimpanzee recognizes synthetic speech with significantly reduced acoustic cues to phonetic content.

    Science.gov (United States)

    Heimbauer, Lisa A; Beran, Michael J; Owren, Michael J

    2011-07-26

    A long-standing debate concerns whether humans are specialized for speech perception, which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words, asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuographic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users. Experiment 2 tested "impossibly unspeechlike" sine-wave (SW) synthesis, which reduces speech to just three moving tones. Although receiving only intermittent and noncontingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate but improved in experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human. Copyright © 2011 Elsevier Ltd. All rights reserved.

  18. Interface between psychoanalysis and speech language and hearing sciences: a literature review

    Directory of Open Access Journals (Sweden)

    Edinalva Neves Nascimento

    Full Text Available ABSTRACT The aim of this study was to verify the Brazilian and international scientific productions by correlating Speech Language and Hearing Sciences and Psychoanalysis. A literature review was performed using the databases BVS, Scielo, Scopus and PubMed. The used descriptors were “Fonoaudiologia”, “Psicanálise”, “Comunicação”, “Speech Therapy”, “Psychoanalysis” and “Communication”, identifying 65 full articles between the years 1980 and 2015. The analysis was performed using a “Protocol for article classification”. It was verified that Original Articles are the most published type, SCOPUS and BVS being the most common databases. There is a predominance of articles in the Portuguese language, followed by English, French and German. Several specialties of Speech Language and Hearing Sciences presented interface with Psychoanalysis, especially Language and Neuropsychology. The studies were published mainly in Psychology journals, also found in the area of audiology and interdisciplinary area. This review showed the psychoanalytic interference in speech language and hearing clinic, highlighting the need for further studies correlating both areas that may contribute to the work of these professionals and, consequently, enable an improvement in the quality of life of psychic subjects.

  19. Metaheuristic applications to speech enhancement

    CERN Document Server

    Kunche, Prajna

    2016-01-01

    This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.

  20. Systematic Studies of Modified Vocalization: The Effect of Speech Rate on Speech Production Measures during Metronome-Paced Speech in Persons Who Stutter

    Science.gov (United States)

    Davidow, Jason H.

    2014-01-01

    Background: Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control…