WorldWideScience

Sample records for noise robust speech

  1. Noise-robust speech recognition through auditory feature detection and spike sequence decoding.

    Science.gov (United States)

    Schafer, Phillip B; Jin, Dezhe Z

    2014-03-01

    Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.

  2. Noise-robust speech triage.

    Science.gov (United States)

    Bartos, Anthony L; Cipr, Tomas; Nelson, Douglas J; Schwarz, Petr; Banowetz, John; Jerabek, Ladislav

    2018-04-01

    A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (-10 dB ≤ SNR < +10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR ≥ +10 dB).

  3. Noise-robust cortical tracking of attended speech in real-world acoustic scenes

    DEFF Research Database (Denmark)

    Fuglsang, Søren; Dau, Torsten; Hjortkjær, Jens

    2017-01-01

    Selectively attending to one speaker in a multi-speaker scenario is thought to synchronize low-frequency cortical activity to the attended speech signal. In recent studies, reconstruction of speech from single-trial electroencephalogram (EEG) data has been used to decode which talker a listener...... is attending to in a two-talker situation. It is currently unclear how this generalizes to more complex sound environments. Behaviorally, speech perception is robust to the acoustic distortions that listeners typically encounter in everyday life, but it is unknown whether this is mirrored by a noise......-robust neural tracking of attended speech. Here we used advanced acoustic simulations to recreate real-world acoustic scenes in the laboratory. In virtual acoustic realities with varying amounts of reverberation and number of interfering talkers, listeners selectively attended to the speech stream...

  4. Histogram equalization with Bayesian estimation for noise robust speech recognition.

    Science.gov (United States)

    Suh, Youngjoo; Kim, Hoirin

    2018-02-01

    The histogram equalization approach is an efficient feature normalization technique for noise robust automatic speech recognition. However, it suffers from performance degradation when some fundamental conditions are not satisfied in the test environment. To remedy these limitations of the original histogram equalization methods, class-based histogram equalization approach has been proposed. Although this approach showed substantial performance improvement under noise environments, it still suffers from performance degradation due to the overfitting problem when test data are insufficient. To address this issue, the proposed histogram equalization technique employs the Bayesian estimation method in the test cumulative distribution function estimation. It was reported in a previous study conducted on the Aurora-4 task that the proposed approach provided substantial performance gains in speech recognition systems based on the acoustic modeling of the Gaussian mixture model-hidden Markov model. In this work, the proposed approach was examined in speech recognition systems with deep neural network-hidden Markov model (DNN-HMM), the current mainstream speech recognition approach where it also showed meaningful performance improvement over the conventional maximum likelihood estimation-based method. The fusion of the proposed features with the mel-frequency cepstral coefficients provided additional performance gains in DNN-HMM systems, which otherwise suffer from performance degradation in the clean test condition.

  5. Cortical activity patterns predict robust speech discrimination ability in noise

    Science.gov (United States)

    Shetake, Jai A.; Wolf, Jordan T.; Cheung, Ryan J.; Engineer, Crystal T.; Ram, Satyananda K.; Kilgard, Michael P.

    2012-01-01

    The neural mechanisms that support speech discrimination in noisy conditions are poorly understood. In quiet conditions, spike timing information appears to be used in the discrimination of speech sounds. In this study, we evaluated the hypothesis that spike timing is also used to distinguish between speech sounds in noisy conditions that significantly degrade neural responses to speech sounds. We tested speech sound discrimination in rats and recorded primary auditory cortex (A1) responses to speech sounds in background noise of different intensities and spectral compositions. Our behavioral results indicate that rats, like humans, are able to accurately discriminate consonant sounds even in the presence of background noise that is as loud as the speech signal. Our neural recordings confirm that speech sounds evoke degraded but detectable responses in noise. Finally, we developed a novel neural classifier that mimics behavioral discrimination. The classifier discriminates between speech sounds by comparing the A1 spatiotemporal activity patterns evoked on single trials with the average spatiotemporal patterns evoked by known sounds. Unlike classifiers in most previous studies, this classifier is not provided with the stimulus onset time. Neural activity analyzed with the use of relative spike timing was well correlated with behavioral speech discrimination in quiet and in noise. Spike timing information integrated over longer intervals was required to accurately predict rat behavioral speech discrimination in noisy conditions. The similarity of neural and behavioral discrimination of speech in noise suggests that humans and rats may employ similar brain mechanisms to solve this problem. PMID:22098331

  6. Histogram Equalization to Model Adaptation for Robust Speech Recognition

    Directory of Open Access Journals (Sweden)

    Suh Youngjoo

    2010-01-01

    Full Text Available We propose a new model adaptation method based on the histogram equalization technique for providing robustness in noisy environments. The trained acoustic mean models of a speech recognizer are adapted into environmentally matched conditions by using the histogram equalization algorithm on a single utterance basis. For more robust speech recognition in the heavily noisy conditions, trained acoustic covariance models are efficiently adapted by the signal-to-noise ratio-dependent linear interpolation between trained covariance models and utterance-level sample covariance models. Speech recognition experiments on both the digit-based Aurora2 task and the large vocabulary-based task showed that the proposed model adaptation approach provides significant performance improvements compared to the baseline speech recognizer trained on the clean speech data.

  7. Eigennoise Speech Recovery in Adverse Environments with Joint Compensation of Additive and Convolutive Noise

    Directory of Open Access Journals (Sweden)

    Trung-Nghia Phung

    2015-01-01

    Full Text Available The learning-based speech recovery approach using statistical spectral conversion has been used for some kind of distorted speech as alaryngeal speech and body-conducted speech (or bone-conducted speech. This approach attempts to recover clean speech (undistorted speech from noisy speech (distorted speech by converting the statistical models of noisy speech into that of clean speech without the prior knowledge on characteristics and distributions of noise source. Presently, this approach has still not attracted many researchers to apply in general noisy speech enhancement because of some major problems: those are the difficulties of noise adaptation and the lack of noise robust synthesizable features in different noisy environments. In this paper, we adopted the methods of state-of-the-art voice conversions and speaker adaptation in speech recognition to the proposed speech recovery approach applied in different kinds of noisy environment, especially in adverse environments with joint compensation of additive and convolutive noises. We proposed to use the decorrelated wavelet packet coefficients as a low-dimensional robust synthesizable feature under noisy environments. We also proposed a noise adaptation for speech recovery with the eigennoise similar to the eigenvoice in voice conversion. The experimental results showed that the proposed approach highly outperformed traditional nonlearning-based approaches.

  8. Influence of musical training on understanding voiced and whispered speech in noise.

    Science.gov (United States)

    Ruggles, Dorea R; Freyman, Richard L; Oxenham, Andrew J

    2014-01-01

    This study tested the hypothesis that the previously reported advantage of musicians over non-musicians in understanding speech in noise arises from more efficient or robust coding of periodic voiced speech, particularly in fluctuating backgrounds. Speech intelligibility was measured in listeners with extensive musical training, and in those with very little musical training or experience, using normal (voiced) or whispered (unvoiced) grammatically correct nonsense sentences in noise that was spectrally shaped to match the long-term spectrum of the speech, and was either continuous or gated with a 16-Hz square wave. Performance was also measured in clinical speech-in-noise tests and in pitch discrimination. Musicians exhibited enhanced pitch discrimination, as expected. However, no systematic or statistically significant advantage for musicians over non-musicians was found in understanding either voiced or whispered sentences in either continuous or gated noise. Musicians also showed no statistically significant advantage in the clinical speech-in-noise tests. Overall, the results provide no evidence for a significant difference between young adult musicians and non-musicians in their ability to understand speech in noise.

  9. A Robust Approach For Acoustic Noise Suppression In Speech Using ANFIS

    Science.gov (United States)

    Martinek, Radek; Kelnar, Michal; Vanus, Jan; Bilik, Petr; Zidek, Jan

    2015-11-01

    The authors of this article deals with the implementation of a combination of techniques of the fuzzy system and artificial intelligence in the application area of non-linear noise and interference suppression. This structure used is called an Adaptive Neuro Fuzzy Inference System (ANFIS). This system finds practical use mainly in audio telephone (mobile) communication in a noisy environment (transport, production halls, sports matches, etc). Experimental methods based on the two-input adaptive noise cancellation concept was clearly outlined. Within the experiments carried out, the authors created, based on the ANFIS structure, a comprehensive system for adaptive suppression of unwanted background interference that occurs in audio communication and degrades the audio signal. The system designed has been tested on real voice signals. This article presents the investigation and comparison amongst three distinct approaches to noise cancellation in speech; they are LMS (least mean squares) and RLS (recursive least squares) adaptive filtering and ANFIS. A careful review of literatures indicated the importance of non-linear adaptive algorithms over linear ones in noise cancellation. It was concluded that the ANFIS approach had the overall best performance as it efficiently cancelled noise even in highly noise-degraded speech. Results were drawn from the successful experimentation, subjective-based tests were used to analyse their comparative performance while objective tests were used to validate them. Implementation of algorithms was experimentally carried out in Matlab to justify the claims and determine their relative performances.

  10. Speech production in amplitude-modulated noise

    DEFF Research Database (Denmark)

    Macdonald, Ewen N; Raufer, Stefan

    2013-01-01

    The Lombard effect refers to the phenomenon where talkers automatically increase their level of speech in a noisy environment. While many studies have characterized how the Lombard effect influences different measures of speech production (e.g., F0, spectral tilt, etc.), few have investigated...... the consequences of temporally fluctuating noise. In the present study, 20 talkers produced speech in a variety of noise conditions, including both steady-state and amplitude-modulated white noise. While listening to noise over headphones, talkers produced randomly generated five word sentences. Similar...... of noisy environments and will alter their speech accordingly....

  11. The impact of musicianship on the cortical mechanisms related to separating speech from background noise.

    Science.gov (United States)

    Zendel, Benjamin Rich; Tremblay, Charles-David; Belleville, Sylvie; Peretz, Isabelle

    2015-05-01

    Musicians have enhanced auditory processing abilities. In some studies, these abilities are paralleled by an improved understanding of speech in noisy environments, partially due to more robust encoding of speech signals in noise at the level of the brainstem. Little is known about the impact of musicianship on attention-dependent cortical activity related to lexical access during a speech-in-noise task. To address this issue, we presented musicians and nonmusicians with single words mixed with three levels of background noise, across two conditions, while monitoring electrical brain activity. In the active condition, listeners repeated the words aloud, and in the passive condition, they ignored the words and watched a silent film. When background noise was most intense, musicians repeated more words correctly compared with nonmusicians. Auditory evoked responses were attenuated and delayed with the addition of background noise. In musicians, P1 amplitude was marginally enhanced during active listening and was related to task performance in the most difficult listening condition. By comparing ERPs from the active and passive conditions, we isolated an N400 related to lexical access. The amplitude of the N400 was not influenced by the level of background noise in musicians, whereas N400 amplitude increased with the level of background noise in nonmusicians. In nonmusicians, the increase in N400 amplitude was related to a reduction in task performance. In musicians only, there was a rightward shift of the sources contributing to the N400 as the level of background noise increased. This pattern of results supports the hypothesis that encoding of speech in noise is more robust in musicians and suggests that this facilitates lexical access. Moreover, the shift in sources suggests that musicians, to a greater extent than nonmusicians, may increasingly rely on acoustic cues to understand speech in noise.

  12. Auditory-neurophysiological responses to speech during early childhood: Effects of background noise.

    Science.gov (United States)

    White-Schwoch, Travis; Davies, Evan C; Thompson, Elaine C; Woodruff Carr, Kali; Nicol, Trent; Bradlow, Ann R; Kraus, Nina

    2015-10-01

    Early childhood is a critical period of auditory learning, during which children are constantly mapping sounds to meaning. But this auditory learning rarely occurs in ideal listening conditions-children are forced to listen against a relentless din. This background noise degrades the neural coding of these critical sounds, in turn interfering with auditory learning. Despite the importance of robust and reliable auditory processing during early childhood, little is known about the neurophysiology underlying speech processing in children so young. To better understand the physiological constraints these adverse listening scenarios impose on speech sound coding during early childhood, auditory-neurophysiological responses were elicited to a consonant-vowel syllable in quiet and background noise in a cohort of typically-developing preschoolers (ages 3-5 yr). Overall, responses were degraded in noise: they were smaller, less stable across trials, slower, and there was poorer coding of spectral content and the temporal envelope. These effects were exacerbated in response to the consonant transition relative to the vowel, suggesting that the neural coding of spectrotemporally-dynamic speech features is more tenuous in noise than the coding of static features-even in children this young. Neural coding of speech temporal fine structure, however, was more resilient to the addition of background noise than coding of temporal envelope information. Taken together, these results demonstrate that noise places a neurophysiological constraint on speech processing during early childhood by causing a breakdown in neural processing of speech acoustics. These results may explain why some listeners have inordinate difficulties understanding speech in noise. Speech-elicited auditory-neurophysiological responses offer objective insight into listening skills during early childhood by reflecting the integrity of neural coding in quiet and noise; this paper documents typical response

  13. Robust audio-visual speech recognition under noisy audio-video conditions.

    Science.gov (United States)

    Stewart, Darryl; Seymour, Rowan; Pass, Adrian; Ming, Ji

    2014-02-01

    This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.

  14. Interdependent processing and encoding of speech and concurrent background noise.

    Science.gov (United States)

    Cooper, Angela; Brouwer, Susanne; Bradlow, Ann R

    2015-05-01

    Speech processing can often take place in adverse listening conditions that involve the mixing of speech and background noise. In this study, we investigated processing dependencies between background noise and indexical speech features, using a speeded classification paradigm (Garner, 1974; Exp. 1), and whether background noise is encoded and represented in memory for spoken words in a continuous recognition memory paradigm (Exp. 2). Whether or not the noise spectrally overlapped with the speech signal was also manipulated. The results of Experiment 1 indicated that background noise and indexical features of speech (gender, talker identity) cannot be completely segregated during processing, even when the two auditory streams are spectrally nonoverlapping. Perceptual interference was asymmetric, whereby irrelevant indexical feature variation in the speech signal slowed noise classification to a greater extent than irrelevant noise variation slowed speech classification. This asymmetry may stem from the fact that speech features have greater functional relevance to listeners, and are thus more difficult to selectively ignore than background noise. Experiment 2 revealed that a recognition cost for words embedded in different types of background noise on the first and second occurrences only emerged when the noise and the speech signal were spectrally overlapping. Together, these data suggest integral processing of speech and background noise, modulated by the level of processing and the spectral separation of the speech and noise.

  15. Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients

    Directory of Open Access Journals (Sweden)

    TjongWan Sen

    2009-11-01

    Full Text Available To improve the performance of phoneme based Automatic Speech Recognition (ASR in noisy environment; we developed a new technique that could add robustness to clean phonemes features. These robust features are obtained from Complex Wavelet Packet Transform (CWPT coefficients. Since the CWPT coefficients represent all different frequency bands of the input signal, decomposing the input signal into complete CWPT tree would also cover all frequencies involved in recognition process. For time overlapping signals with different frequency contents, e. g. phoneme signal with noises, its CWPT coefficients are the combination of CWPT coefficients of phoneme signal and CWPT coefficients of noises. The CWPT coefficients of phonemes signal would be changed according to frequency components contained in noises. Since the numbers of phonemes in every language are relatively small (limited and already well known, one could easily derive principal component vectors from clean training dataset using Principal Component Analysis (PCA. These principal component vectors could be used then to add robustness and minimize noises effects in testing phase. Simulation results, using Alpha Numeric 4 (AN4 from Carnegie Mellon University and NOISEX-92 examples from Rice University, showed that this new technique could be used as features extractor that improves the robustness of phoneme based ASR systems in various adverse noisy conditions and still preserves the performance in clean environments.

  16. Sound quality measures for speech in noise through a commercial hearing aid implementing digital noise reduction.

    Science.gov (United States)

    Ricketts, Todd A; Hornsby, Benjamin W Y

    2005-05-01

    This brief report discusses the affect of digital noise reduction (DNR) processing on aided speech recognition and sound quality measures in 14 adults fitted with a commercial hearing aid. Measures of speech recognition and sound quality were obtained in two different speech-in-noise conditions (71 dBA speech, +6 dB SNR and 75 dBA speech, +1 dB SNR). The results revealed that the presence or absence of DNR processing did not impact speech recognition in noise (either positively or negatively). Paired comparisons of sound quality for the same speech in noise signals, however, revealed a strong preference for DNR processing. These data suggest that at least one implementation of DNR processing is capable of providing improved sound quality, for speech in noise, in the absence of improved speech recognition.

  17. Voice Activity Detection. Fundamentals and Speech Recognition System Robustness

    OpenAIRE

    Ramirez, J.; Gorriz, J. M.; Segura, J. C.

    2007-01-01

    This chapter has shown an overview of the main challenges in robust speech detection and a review of the state of the art and applications. VADs are frequently used in a number of applications including speech coding, speech enhancement and speech recognition. A precise VAD extracts a set of discriminative speech features from the noisy speech and formulates the decision in terms of well defined rule. The chapter has summarized three robust VAD methods that yield high speech/non-speech discri...

  18. Robust Frequency Invariant Beamforming with Low Sidelobe for Speech Enhancement

    Science.gov (United States)

    Zhu, Yiting; Pan, Xiang

    2018-01-01

    Frequency invariant beamformers (FIBs) are widely used in speech enhancement and source localization. There are two traditional optimization methods for FIB design. The first one is convex optimization, which is simple but the frequency invariant characteristic of the beam pattern is poor with respect to frequency band of five octaves. The least squares (LS) approach using spatial response variation (SRV) constraint is another optimization method. Although, it can provide good frequency invariant property, it usually couldn’t be used in speech enhancement for its lack of weight norm constraint which is related to the robustness of a beamformer. In this paper, a robust wideband beamforming method with a constant beamwidth is proposed. The frequency invariant beam pattern is achieved by resolving an optimization problem of the SRV constraint to cover speech frequency band. With the control of sidelobe level, it is available for the frequency invariant beamformer (FIB) to prevent distortion of interference from the undesirable direction. The approach is completed in time-domain by placing tapped delay lines(TDL) and finite impulse response (FIR) filter at the output of each sensor which is more convenient than the Frost processor. By invoking the weight norm constraint, the robustness of the beamformer is further improved against random errors. Experiment results show that the proposed method has a constant beamwidth and almost the same white noise gain as traditional delay-and-sum (DAS) beamformer.

  19. Deep neural network and noise classification-based speech enhancement

    Science.gov (United States)

    Shi, Wenhua; Zhang, Xiongwei; Zou, Xia; Han, Wei

    2017-07-01

    In this paper, a speech enhancement method using noise classification and Deep Neural Network (DNN) was proposed. Gaussian mixture model (GMM) was employed to determine the noise type in speech-absent frames. DNN was used to model the relationship between noisy observation and clean speech. Once the noise type was determined, the corresponding DNN model was applied to enhance the noisy speech. GMM was trained with mel-frequency cepstrum coefficients (MFCC) and the parameters were estimated with an iterative expectation-maximization (EM) algorithm. Noise type was updated by spectrum entropy-based voice activity detection (VAD). Experimental results demonstrate that the proposed method could achieve better objective speech quality and smaller distortion under stationary and non-stationary conditions.

  20. Listeners Experience Linguistic Masking Release in Noise-Vocoded Speech-in-Speech Recognition

    Science.gov (United States)

    Viswanathan, Navin; Kokkinakis, Kostas; Williams, Brittany T.

    2018-01-01

    Purpose: The purpose of this study was to evaluate whether listeners with normal hearing perceiving noise-vocoded speech-in-speech demonstrate better intelligibility of target speech when the background speech was mismatched in language (linguistic release from masking [LRM]) and/or location (spatial release from masking [SRM]) relative to the…

  1. Individual differences in speech-in-noise perception parallel neural speech processing and attention in preschoolers

    Science.gov (United States)

    Thompson, Elaine C.; Carr, Kali Woodruff; White-Schwoch, Travis; Otto-Meyer, Sebastian; Kraus, Nina

    2016-01-01

    From bustling classrooms to unruly lunchrooms, school settings are noisy. To learn effectively in the unwelcome company of numerous distractions, children must clearly perceive speech in noise. In older children and adults, speech-in-noise perception is supported by sensory and cognitive processes, but the correlates underlying this critical listening skill in young children (3–5 year olds) remain undetermined. Employing a longitudinal design (two evaluations separated by ~12 months), we followed a cohort of 59 preschoolers, ages 3.0–4.9, assessing word-in-noise perception, cognitive abilities (intelligence, short-term memory, attention), and neural responses to speech. Results reveal changes in word-in-noise perception parallel changes in processing of the fundamental frequency (F0), an acoustic cue known for playing a role central to speaker identification and auditory scene analysis. Four unique developmental trajectories (speech-in-noise perception groups) confirm this relationship, in that improvements and declines in word-in-noise perception couple with enhancements and diminishments of F0 encoding, respectively. Improvements in word-in-noise perception also pair with gains in attention. Word-in-noise perception does not relate to strength of neural harmonic representation or short-term memory. These findings reinforce previously-reported roles of F0 and attention in hearing speech in noise in older children and adults, and extend this relationship to preschool children. PMID:27864051

  2. Speech perception in noise in unilateral hearing loss

    OpenAIRE

    Mondelli, Maria Fernanda Capoani Garcia; dos Santos, Marina de Marchi; José, Maria Renata

    2016-01-01

    ABSTRACT INTRODUCTION: Unilateral hearing loss is characterized by a decrease of hearing in one ear only. In the presence of ambient noise, individuals with unilateral hearing loss are faced with greater difficulties understanding speech than normal listeners. OBJECTIVE: To evaluate the speech perception of individuals with unilateral hearing loss in speech perception with and without competitive noise, before and after the hearing aid fitting process. METHODS: The study included 30 adu...

  3. Binaural speech discrimination under noise in hearing-impaired listeners

    Science.gov (United States)

    Kumar, K. V.; Rao, A. B.

    1988-01-01

    This paper presents the results of an assessment of speech discrimination by hearing-impaired listeners (sensori-neural, conductive, and mixed groups) under binaural free-field listening in the presence of background noise. Subjects with pure-tone thresholds greater than 20 dB in 0.5, 1.0 and 2.0 kHz were presented with a version of the W-22 list of phonetically balanced words under three conditions: (1) 'quiet', with the chamber noise below 28 dB and speech at 60 dB; (2) at a constant S/N ratio of +10 dB, and with a background white noise at 70 dB; and (3) same as condition (2), but with the background noise at 80 dB. The mean speech discrimination scores decreased significantly with noise in all groups. However, the decrease in binaural speech discrimination scores with an increase in hearing impairment was less for material presented under the noise conditions than for the material presented in quiet.

  4. Training to Improve Hearing Speech in Noise: Biological Mechanisms

    Science.gov (United States)

    Song, Judy H.; Skoe, Erika; Banai, Karen

    2012-01-01

    We investigated training-related improvements in listening in noise and the biological mechanisms mediating these improvements. Training-related malleability was examined using a program that incorporates cognitively based listening exercises to improve speech-in-noise perception. Before and after training, auditory brainstem responses to a speech syllable were recorded in quiet and multitalker noise from adults who ranged in their speech-in-noise perceptual ability. Controls did not undergo training but were tested at intervals equivalent to the trained subjects. Trained subjects exhibited significant improvements in speech-in-noise perception that were retained 6 months later. Subcortical responses in noise demonstrated training-related enhancements in the encoding of pitch-related cues (the fundamental frequency and the second harmonic), particularly for the time-varying portion of the syllable that is most vulnerable to perceptual disruption (the formant transition region). Subjects with the largest strength of pitch encoding at pretest showed the greatest perceptual improvement. Controls exhibited neither neurophysiological nor perceptual changes. We provide the first demonstration that short-term training can improve the neural representation of cues important for speech-in-noise perception. These results implicate and delineate biological mechanisms contributing to learning success, and they provide a conceptual advance to our understanding of the kind of training experiences that can influence sensory processing in adulthood. PMID:21799207

  5. Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise.

    Science.gov (United States)

    Cao, Shuyang; Li, Liang; Wu, Xihong

    2011-04-01

    When a target-speech/masker mixture is processed with the signal-separation technique, ideal binary mask (IBM), intelligibility of target speech is remarkably improved in both normal-hearing listeners and hearing-impaired listeners. Intelligibility of speech can also be improved by filling in speech gaps with un-modulated broadband noise. This study investigated whether intelligibility of target speech in the IBM-treated target-speech/masker mixture can be further improved by adding a broadband-noise background. The results of this study show that following the IBM manipulation, which remarkably released target speech from speech-spectrum noise, foreign-speech, or native-speech masking (experiment 1), adding a broadband-noise background with the signal-to-noise ratio no less than 4 dB significantly improved intelligibility of target speech when the masker was either noise (experiment 2) or speech (experiment 3). The results suggest that since adding the noise background shallows the areas of silence in the time-frequency domain of the IBM-treated target-speech/masker mixture, the abruption of transient changes in the mixture is smoothed and the perceived continuity of target-speech components becomes enhanced, leading to improved target-speech intelligibility. The findings are useful for advancing computational auditory scene analysis, hearing-aid/cochlear-implant designs, and understanding of speech perception under "cocktail-party" conditions.

  6. Investigating the effects of noise-estimation errors in simulated cochlear implant speech intelligibility

    DEFF Research Database (Denmark)

    Kressner, Abigail Anne; May, Tobias; Malik Thaarup Høegh, Rasmus

    2017-01-01

    A recent study suggested that the most important factor for obtaining high speech intelligibility in noise with cochlear implant recipients is to preserve the low-frequency amplitude modulations of speech across time and frequency by, for example, minimizing the amount of noise in speech gaps....... In contrast, other studies have argued that the transients provide the most information. Thus, the present study investigates the relative impact of these two factors in the framework of noise reduction by systematically correcting noise-estimation errors within speech segments, speech gaps......, and the transitions between them. Speech intelligibility in noise was measured using a cochlear implant simulation tested on normal-hearing listeners. The results suggest that minimizing noise in the speech gaps can substantially improve intelligibility, especially in modulated noise. However, significantly larger...

  7. Comparison of Speech Perception in Background Noise with Acceptance of Background Noise in Aided and Unaided Conditions.

    Science.gov (United States)

    Nabelek, Anna K.; Tampas, Joanna W.; Burchfield, Samuel B.

    2004-01-01

    l, speech perception in noiseBackground noise is a significant factor influencing hearing-aid satisfaction and is a major reason for rejection of hearing aids. Attempts have been made by previous researchers to relate the use of hearing aids to speech perception in noise (SPIN), with an expectation of improved speech perception followed by an…

  8. Acceptable noise level with Danish, Swedish, and non-semantic speech materials

    DEFF Research Database (Denmark)

    Brännström, K Jonas; Lantz, Johannes; Nielsen, Lars Holme

    2012-01-01

    reported results from American studies. Generally, significant differences were seen between test conditions using different types of noise within ears in each population. Significant differences were seen for ANL across populations, also when the non-semantic ISTS was used as speech signal. Conclusions......Abstract Objective: Acceptable noise level (ANL) has been established as a method to quantify the acceptance of background noise while listening to speech presented at the most comfortable level. The aim of the present study was to generate Danish, Swedish, and a non-semantic version of the ANL...... test and investigate normal-hearing Danish and Swedish subjects' performance on these tests. Design: ANL was measured using Danish and Swedish running speech with two different noises: Speech-weighted amplitude-modulated noise, and multitalker speech babble. ANL was also measured using the non...

  9. Working memory training to improve speech perception in noise across languages.

    Science.gov (United States)

    Ingvalson, Erin M; Dhar, Sumitrajit; Wong, Patrick C M; Liu, Hanjun

    2015-06-01

    Working memory capacity has been linked to performance on many higher cognitive tasks, including the ability to perceive speech in noise. Current efforts to train working memory have demonstrated that working memory performance can be improved, suggesting that working memory training may lead to improved speech perception in noise. A further advantage of working memory training to improve speech perception in noise is that working memory training materials are often simple, such as letters or digits, making them easily translatable across languages. The current effort tested the hypothesis that working memory training would be associated with improved speech perception in noise and that materials would easily translate across languages. Native Mandarin Chinese and native English speakers completed ten days of reversed digit span training. Reading span and speech perception in noise both significantly improved following training, whereas untrained controls showed no gains. These data suggest that working memory training may be used to improve listeners' speech perception in noise and that the materials may be quickly adapted to a wide variety of listeners.

  10. Hearing loss and speech perception in noise difficulties in Fanconi anemia.

    Science.gov (United States)

    Verheij, Emmy; Oomen, Karin P Q; Smetsers, Stephanie E; van Zanten, Gijsbert A; Speleman, Lucienne

    2017-10-01

    Fanconi anemia is a hereditary chromosomal instability disorder. Hearing loss and ear abnormalities are among the many manifestations reported in this disorder. In addition, Fanconi anemia patients often complain about hearing difficulties in situations with background noise (speech perception in noise difficulties). Our study aimed to describe the prevalence of hearing loss and speech perception in noise difficulties in Dutch Fanconi anemia patients. Retrospective chart review. A retrospective chart review was conducted at a Dutch tertiary care center. All patients with Fanconi anemia at clinical follow-up in our hospital were included. Medical files were reviewed to collect data on hearing loss and speech perception in noise difficulties. In total, 49 Fanconi anemia patients were included. Audiograms were available in 29 patients and showed hearing loss in 16 patients (55%). Conductive hearing loss was present in 24.1%, sensorineural in 20.7%, and mixed in 10.3%. A speech in noise test was performed in 17 patients; speech perception in noise was subnormal in nine patients (52.9%) and abnormal in two patients (11.7%). Hearing loss and speech perception in noise abnormalities are common in Fanconi anemia. Therefore, pure tone audiograms and speech in noise tests should be performed, preferably already at a young age, because hearing aids or assistive listening devices could be very valuable in developing language and communication skills. 4. Laryngoscope, 127:2358-2361, 2017. © 2016 The American Laryngological, Rhinological and Otological Society, Inc.

  11. The Effect of Background Noise on Intelligibility of Dysphonic Speech

    Science.gov (United States)

    Ishikawa, Keiko; Boyce, Suzanne; Kelchner, Lisa; Powell, Maria Golla; Schieve, Heidi; de Alarcon, Alessandro; Khosla, Sid

    2017-01-01

    Purpose: The aim of this study is to determine the effect of background noise on the intelligibility of dysphonic speech and to examine the relationship between intelligibility in noise and an acoustic measure of dysphonia--cepstral peak prominence (CPP). Method: A study of speech perception was conducted using speech samples from 6 adult speakers…

  12. The importance for speech intelligibility of random fluctuations in "steady" background noise.

    Science.gov (United States)

    Stone, Michael A; Füllgrabe, Christian; Mackinnon, Robert C; Moore, Brian C J

    2011-11-01

    Spectrally shaped steady noise is commonly used as a masker of speech. The effects of inherent random fluctuations in amplitude of such a noise are typically ignored. Here, the importance of these random fluctuations was assessed by comparing two cases. For one, speech was mixed with steady speech-shaped noise and N-channel tone vocoded, a process referred to as signal-domain mixing (SDM); this preserved the random fluctuations of the noise. For the second, the envelope of speech alone was extracted for each vocoder channel and a constant was added corresponding to the root-mean-square value of the noise envelope for that channel. This is referred to as envelope-domain mixing (EDM); it removed the random fluctuations of the noise. Sinusoidally modulated noise and a single talker were also used as backgrounds, with both SDM and EDM. Speech intelligibility was measured for N = 12, 19, and 30, with the target-to-background ratio fixed at -7 dB. For SDM, performance was best for the speech background and worst for the steady noise. For EDM, this pattern was reversed. Intelligibility with steady noise was consistently very poor for SDM, but near-ceiling for EDM, demonstrating that the random fluctuations in steady noise have a large effect.

  13. The Galker test of speech reception in noise

    DEFF Research Database (Denmark)

    Lauritsen, Maj-Britt Glenn; Söderström, Margareta; Kreiner, Svend

    2016-01-01

    PURPOSE: We tested "the Galker test", a speech reception in noise test developed for primary care for Danish preschool children, to explore if the children's ability to hear and understand speech was associated with gender, age, middle ear status, and the level of background noise. METHODS......: The Galker test is a 35-item audio-visual, computerized word discrimination test in background noise. Included were 370 normally developed children attending day care center. The children were examined with the Galker test, tympanometry, audiometry, and the Reynell test of verbal comprehension. Parents...... and daycare teachers completed questionnaires on the children's ability to hear and understand speech. As most of the variables were not assessed using interval scales, non-parametric statistics (Goodman-Kruskal's gamma) were used for analyzing associations with the Galker test score. For comparisons...

  14. Robust digital processing of speech signals

    CERN Document Server

    Kovacevic, Branko; Veinović, Mladen; Marković, Milan

    2017-01-01

    This book focuses on speech signal phenomena, presenting a robustification of the usual speech generation models with regard to the presumed types of excitation signals, which is equivalent to the introduction of a class of nonlinear models and the corresponding criterion functions for parameter estimation. Compared to the general class of nonlinear models, such as various neural networks, these models possess good properties of controlled complexity, the option of working in “online” mode, as well as a low information volume for efficient speech encoding and transmission. Providing comprehensive insights, the book is based on the authors’ research, which has already been published, supplemented by additional texts discussing general considerations of speech modeling, linear predictive analysis and robust parameter estimation.

  15. Speech recognition in natural background noise.

    Directory of Open Access Journals (Sweden)

    Julien Meyer

    Full Text Available In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR. Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A, reference at 1 meter at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB. Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda. Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for

  16. Speech recognition in natural background noise.

    Science.gov (United States)

    Meyer, Julien; Dentel, Laure; Meunier, Fanny

    2013-01-01

    In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR). Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A), reference at 1 meter) at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB). Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda). Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for future studies.

  17. Magnified Neural Envelope Coding Predicts Deficits in Speech Perception in Noise.

    Science.gov (United States)

    Millman, Rebecca E; Mattys, Sven L; Gouws, André D; Prendergast, Garreth

    2017-08-09

    Verbal communication in noisy backgrounds is challenging. Understanding speech in background noise that fluctuates in intensity over time is particularly difficult for hearing-impaired listeners with a sensorineural hearing loss (SNHL). The reduction in fast-acting cochlear compression associated with SNHL exaggerates the perceived fluctuations in intensity in amplitude-modulated sounds. SNHL-induced changes in the coding of amplitude-modulated sounds may have a detrimental effect on the ability of SNHL listeners to understand speech in the presence of modulated background noise. To date, direct evidence for a link between magnified envelope coding and deficits in speech identification in modulated noise has been absent. Here, magnetoencephalography was used to quantify the effects of SNHL on phase locking to the temporal envelope of modulated noise (envelope coding) in human auditory cortex. Our results show that SNHL enhances the amplitude of envelope coding in posteromedial auditory cortex, whereas it enhances the fidelity of envelope coding in posteromedial and posterolateral auditory cortex. This dissociation was more evident in the right hemisphere, demonstrating functional lateralization in enhanced envelope coding in SNHL listeners. However, enhanced envelope coding was not perceptually beneficial. Our results also show that both hearing thresholds and, to a lesser extent, magnified cortical envelope coding in left posteromedial auditory cortex predict speech identification in modulated background noise. We propose a framework in which magnified envelope coding in posteromedial auditory cortex disrupts the segregation of speech from background noise, leading to deficits in speech perception in modulated background noise. SIGNIFICANCE STATEMENT People with hearing loss struggle to follow conversations in noisy environments. Background noise that fluctuates in intensity over time poses a particular challenge. Using magnetoencephalography, we demonstrate

  18. Analysis and removing noise from speech using wavelet transform

    Science.gov (United States)

    Tomala, Karel; Voznak, Miroslav; Partila, Pavol; Rezac, Filip; Safarik, Jakub

    2013-05-01

    The paper discusses the use of Discrete Wavelet Transform (DWT) and Stationary Wavelet Transform (SWT) wavelet in removing noise from voice samples and evaluation of its impact on speech quality. One significant part of Quality of Service (QoS) in communication technology is the speech quality assessment. However, this part is seriously overlooked as telecommunication providers often focus on increasing network capacity, expansion of services offered and their enforcement in the market. Among the fundamental factors affecting the transmission properties of the communication chain is noise, either at the transmitter or the receiver side. A wavelet transform (WT) is a modern tool for signal processing. One of the most significant areas in which wavelet transforms are used is applications designed to suppress noise in signals. To remove noise from the voice sample in our experiment, we used the reference segment of the voice which was distorted by Gaussian white noise. An evaluation of the impact on speech quality was carried out by an intrusive objective algorithm Perceptual Evaluation of Speech Quality (PESQ). DWT and SWT transformation was applied to voice samples that were devalued by Gaussian white noise. Afterwards, we determined the effectiveness of DWT and SWT by means of objective algorithm PESQ. The decisive criterion for determining the quality of a voice sample once the noise had been removed was Mean Opinion Score (MOS) which we obtained in PESQ. The contribution of this work lies in the evaluation of efficiency of wavelet transformation to suppress noise in voice samples.

  19. Effects of noise on speech recognition: Challenges for communication by service members.

    Science.gov (United States)

    Le Prell, Colleen G; Clavier, Odile H

    2017-06-01

    Speech communication often takes place in noisy environments; this is an urgent issue for military personnel who must communicate in high-noise environments. The effects of noise on speech recognition vary significantly according to the sources of noise, the number and types of talkers, and the listener's hearing ability. In this review, speech communication is first described as it relates to current standards of hearing assessment for military and civilian populations. The next section categorizes types of noise (also called maskers) according to their temporal characteristics (steady or fluctuating) and perceptive effects (energetic or informational masking). Next, speech recognition difficulties experienced by listeners with hearing loss and by older listeners are summarized, and questions on the possible causes of speech-in-noise difficulty are discussed, including recent suggestions of "hidden hearing loss". The final section describes tests used by military and civilian researchers, audiologists, and hearing technicians to assess performance of an individual in recognizing speech in background noise, as well as metrics that predict performance based on a listener and background noise profile. This article provides readers with an overview of the challenges associated with speech communication in noisy backgrounds, as well as its assessment and potential impact on functional performance, and provides guidance for important new research directions relevant not only to military personnel, but also to employees who work in high noise environments. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Evaluation of speech transmission in open public spaces affected by combined noises.

    Science.gov (United States)

    Lee, Pyoung Jik; Jeon, Jin Yong

    2011-07-01

    In the present study, the effects of interference from combined noises on speech transmission were investigated in a simulated open public space. Sound fields for dominant noises were predicted using a typical urban square model surrounded by buildings. Then road traffic noise and two types of construction noises, corresponding to stationary and impulsive noises, were selected as background noises. Listening tests were performed on a group of adults, and the quality of speech transmission was evaluated using listening difficulty as well as intelligibility scores. During the listening tests, two factors that affect speech transmission performance were considered: (1) temporal characteristics of construction noise (stationary or impulsive) and (2) the levels of the construction and road traffic noises. The results indicated that word intelligibility scores and listening difficulty ratings were affected by the temporal characteristics of construction noise due to fluctuations in the background noise level. It was also observed that listening difficulty is unable to describe the speech transmission in noisy open public spaces showing larger variation than did word intelligibility scores. © 2011 Acoustical Society of America

  1. Measurement of speech levels in the presence of time varying background noise

    Science.gov (United States)

    Pearsons, K. S.; Horonjeff, R.

    1982-01-01

    Short-term speech level measurements which could be used to note changes in vocal effort in a time varying noise environment were studied. Knowing the changes in speech level would in turn allow prediction of intelligibility in the presence of aircraft flyover noise. Tests indicated that it is possible to use two second samples of speech to estimate long term root mean square speech levels. Other tests were also performed in which people read out loud during aircraft flyover noise. Results of these tests indicate that people do indeed raise their voice during flyovers at a rate of about 3-1/2 dB for each 10 dB increase in background level. This finding is in agreement with other tests of speech levels in the presence of steady state background noise.

  2. Microscopic prediction of speech intelligibility in spatially distributed speech-shaped noise for normal-hearing listeners.

    Science.gov (United States)

    Geravanchizadeh, Masoud; Fallah, Ali

    2015-12-01

    A binaural and psychoacoustically motivated intelligibility model, based on a well-known monaural microscopic model is proposed. This model simulates a phoneme recognition task in the presence of spatially distributed speech-shaped noise in anechoic scenarios. In the proposed model, binaural advantage effects are considered by generating a feature vector for a dynamic-time-warping speech recognizer. This vector consists of three subvectors incorporating two monaural subvectors to model the better-ear hearing, and a binaural subvector to simulate the binaural unmasking effect. The binaural unit of the model is based on equalization-cancellation theory. This model operates blindly, which means separate recordings of speech and noise are not required for the predictions. Speech intelligibility tests were conducted with 12 normal hearing listeners by collecting speech reception thresholds (SRTs) in the presence of single and multiple sources of speech-shaped noise. The comparison of the model predictions with the measured binaural SRTs, and with the predictions of a macroscopic binaural model called extended equalization-cancellation, shows that this approach predicts the intelligibility in anechoic scenarios with good precision. The square of the correlation coefficient (r(2)) and the mean-absolute error between the model predictions and the measurements are 0.98 and 0.62 dB, respectively.

  3. Biological impact of preschool music classes on processing speech in noise.

    Science.gov (United States)

    Strait, Dana L; Parbery-Clark, Alexandra; O'Connell, Samantha; Kraus, Nina

    2013-10-01

    Musicians have increased resilience to the effects of noise on speech perception and its neural underpinnings. We do not know, however, how early in life these enhancements arise. We compared auditory brainstem responses to speech in noise in 32 preschool children, half of whom were engaged in music training. Thirteen children returned for testing one year later, permitting the first longitudinal assessment of subcortical auditory function with music training. Results indicate emerging neural enhancements in musically trained preschoolers for processing speech in noise. Longitudinal outcomes reveal that children enrolled in music classes experience further increased neural resilience to background noise following one year of continued training compared to nonmusician peers. Together, these data reveal enhanced development of neural mechanisms undergirding speech-in-noise perception in preschoolers undergoing music training and may indicate a biological impact of music training on auditory function during early childhood. Copyright © 2013 Elsevier Ltd. All rights reserved.

  4. Development of Trivia Game for speech understanding in background noise.

    Science.gov (United States)

    Schwartz, Kathryn; Ringleb, Stacie I; Sandberg, Hilary; Raymer, Anastasia; Watson, Ginger S

    2015-01-01

    Listening in noise is an everyday activity and poses a challenge for many people. To improve the ability to understand speech in noise, a computerized auditory rehabilitation game was developed. In Trivia Game players are challenged to answer trivia questions spoken aloud. As players progress through the game, the level of background noise increases. A study using Trivia Game was conducted as a proof-of-concept investigation in healthy participants. College students with normal hearing were randomly assigned to a control (n = 13) or a treatment (n = 14) group. Treatment participants played Trivia Game 12 times over a 4-week period. All participants completed objective (auditory-only and audiovisual formats) and subjective listening in noise measures at baseline and 4 weeks later. There were no statistical differences between the groups at baseline. At post-test, the treatment group significantly improved their overall speech understanding in noise in the audiovisual condition and reported significant benefits in their functional listening abilities. Playing Trivia Game improved speech understanding in noise in healthy listeners. Significant findings for the audiovisual condition suggest that participants improved face-reading abilities. Trivia Game may be a platform for investigating changes in speech understanding in individuals with sensory, linguistic and cognitive impairments.

  5. Individual differences in language and working memory affect children's speech recognition in noise.

    Science.gov (United States)

    McCreery, Ryan W; Spratford, Meredith; Kirby, Benjamin; Brennan, Marc

    2017-05-01

    We examined how cognitive and linguistic skills affect speech recognition in noise for children with normal hearing. Children with better working memory and language abilities were expected to have better speech recognition in noise than peers with poorer skills in these domains. As part of a prospective, cross-sectional study, children with normal hearing completed speech recognition in noise for three types of stimuli: (1) monosyllabic words, (2) syntactically correct but semantically anomalous sentences and (3) semantically and syntactically anomalous word sequences. Measures of vocabulary, syntax and working memory were used to predict individual differences in speech recognition in noise. Ninety-six children with normal hearing, who were between 5 and 12 years of age. Higher working memory was associated with better speech recognition in noise for all three stimulus types. Higher vocabulary abilities were associated with better recognition in noise for sentences and word sequences, but not for words. Working memory and language both influence children's speech recognition in noise, but the relationships vary across types of stimuli. These findings suggest that clinical assessment of speech recognition is likely to reflect underlying cognitive and linguistic abilities, in addition to a child's auditory skills, consistent with the Ease of Language Understanding model.

  6. Improved Noise Minimum Statistics Estimation Algorithm for Using in a Speech-Passing Noise-Rejecting Headset

    Directory of Open Access Journals (Sweden)

    Seyedtabaee Saeed

    2010-01-01

    Full Text Available This paper deals with configuration of an algorithm to be used in a speech-passing angle grinder noise-canceling headset. Angle grinder noise is annoying and interrupts ordinary oral communication. Meaning that, low SNR noisy condition is ahead. Since variation in angle grinder working condition changes noise statistics, the noise will be nonstationary with possible jumps in its power. Studies are conducted for picking an appropriate algorithm. A modified version of the well-known spectral subtraction shows superior performance against alternate methods. Noise estimation is calculated through a multi-band fast adapting scheme. The algorithm is adapted very quickly to the non-stationary noise environment while inflecting minimum musical noise and speech distortion on the processed signal. Objective and subjective measures illustrating the performance of the proposed method are introduced.

  7. Modification of computational auditory scene analysis (CASA) for noise-robust acoustic feature

    Science.gov (United States)

    Kwon, Minseok

    While there have been many attempts to mitigate interferences of background noise, the performance of automatic speech recognition (ASR) still can be deteriorated by various factors with ease. However, normal hearing listeners can accurately perceive sounds of their interests, which is believed to be a result of Auditory Scene Analysis (ASA). As a first attempt, the simulation of the human auditory processing, called computational auditory scene analysis (CASA), was fulfilled through physiological and psychological investigations of ASA. CASA comprised of Zilany-Bruce auditory model, followed by tracking fundamental frequency for voice segmentation and detecting pairs of onset/offset at each characteristic frequency (CF) for unvoiced segmentation. The resulting Time-Frequency (T-F) representation of acoustic stimulation was converted into acoustic feature, gammachirp-tone frequency cepstral coefficients (GFCC). 11 keywords with various environmental conditions are used and the robustness of GFCC was evaluated by spectral distance (SD) and dynamic time warping distance (DTW). In "clean" and "noisy" conditions, the application of CASA generally improved noise robustness of the acoustic feature compared to a conventional method with or without noise suppression using MMSE estimator. The intial study, however, not only showed the noise-type dependency at low SNR, but also called the evaluation methods in question. Some modifications were made to capture better spectral continuity from an acoustic feature matrix, to obtain faster processing speed, and to describe the human auditory system more precisely. The proposed framework includes: 1) multi-scale integration to capture more accurate continuity in feature extraction, 2) contrast enhancement (CE) of each CF by competition with neighboring frequency bands, and 3) auditory model modifications. The model modifications contain the introduction of higher Q factor, middle ear filter more analogous to human auditory system

  8. Some Neurocognitive Correlates of Noise-Vocoded Speech Perception in Children With Normal Hearing: A Replication and Extension of ).

    Science.gov (United States)

    Roman, Adrienne S; Pisoni, David B; Kronenberger, William G; Faulkner, Kathleen F

    recognized lexically easy words better than lexically hard words in sentences. Older children perceived noise-vocoded speech better than younger children. Finally, we found that measures of AA and short-term memory capacity were significantly correlated with a child's ability to perceive noise-vocoded isolated words and sentences. First, we successfully replicated the major findings from the ) study. Because familiarity, phonological distinctiveness and lexical competition affect word recognition, these findings provide additional support for the proposal that several foundational elementary neurocognitive processes underlie the perception of spectrally degraded speech. Second, we found strong and significant correlations between performance on neurocognitive measures and children's ability to recognize words and sentences noise-vocoded to four spectral channels. These findings extend earlier research suggesting that perception of spectrally degraded speech reflects early peripheral auditory processes, as well as additional contributions of executive function, specifically, selective attention and short-term memory processes in spoken word recognition. The present findings suggest that AA and short-term memory support robust spoken word recognition in children with NH even under compromised and challenging listening conditions. These results are relevant to research carried out with listeners who have hearing loss, because they are routinely required to encode, process, and understand spectrally degraded acoustic signals.

  9. The effects of noise exposure and musical training on suprathreshold auditory processing and speech perception in noise.

    Science.gov (United States)

    Yeend, Ingrid; Beach, Elizabeth Francis; Sharma, Mridula; Dillon, Harvey

    2017-09-01

    Recent animal research has shown that exposure to single episodes of intense noise causes cochlear synaptopathy without affecting hearing thresholds. It has been suggested that the same may occur in humans. If so, it is hypothesized that this would result in impaired encoding of sound and lead to difficulties hearing at suprathreshold levels, particularly in challenging listening environments. The primary aim of this study was to investigate the effect of noise exposure on auditory processing, including the perception of speech in noise, in adult humans. A secondary aim was to explore whether musical training might improve some aspects of auditory processing and thus counteract or ameliorate any negative impacts of noise exposure. In a sample of 122 participants (63 female) aged 30-57 years with normal or near-normal hearing thresholds, we conducted audiometric tests, including tympanometry, audiometry, acoustic reflexes, otoacoustic emissions and medial olivocochlear responses. We also assessed temporal and spectral processing, by determining thresholds for detection of amplitude modulation and temporal fine structure. We assessed speech-in-noise perception, and conducted tests of attention, memory and sentence closure. We also calculated participants' accumulated lifetime noise exposure and administered questionnaires to assess self-reported listening difficulty and musical training. The results showed no clear link between participants' lifetime noise exposure and performance on any of the auditory processing or speech-in-noise tasks. Musical training was associated with better performance on the auditory processing tasks, but not the on the speech-in-noise perception tasks. The results indicate that sentence closure skills, working memory, attention, extended high frequency hearing thresholds and medial olivocochlear suppression strength are important factors that are related to the ability to process speech in noise. Crown Copyright © 2017. Published by

  10. The Effects of Background Noise on the Performance of an Automatic Speech Recogniser

    Science.gov (United States)

    Littlefield, Jason; HashemiSakhtsari, Ahmad

    2002-11-01

    Ambient or environmental noise is a major factor that affects the performance of an automatic speech recognizer. Large vocabulary, speaker-dependent, continuous speech recognizers are commercially available. Speech recognizers, perform well in a quiet environment, but poorly in a noisy environment. Speaker-dependent speech recognizers require training prior to them being tested, where the level of background noise in both phases affects the performance of the recognizer. This study aims to determine whether the best performance of a speech recognizer occurs when the levels of background noise during the training and test phases are the same, and how the performance is affected when the levels of background noise during the training and test phases are different. The relationship between the performance of the speech recognizer and upgrading the computer speed and amount of memory as well as software version was also investigated.

  11. SII-Based Speech Prepocessing for Intelligibility Improvement in Noise

    DEFF Research Database (Denmark)

    Taal, Cees H.; Jensen, Jesper

    2013-01-01

    filter sets certain frequency bands to zero when they do not contribute to intelligibility anymore. Experiments show large intelligibility improvements with the proposed method when used in stationary speech-shaped noise. However, it was also found that the method does not perform well for speech...... corrupted by a competing speaker. This is due to the fact that the SII is not a reliable intelligibility predictor for fluctuating noise sources. MATLAB code is provided....

  12. Difficulty understanding speech in noise by the hearing impaired: underlying causes and technological solutions.

    Science.gov (United States)

    Healy, Eric W; Yoho, Sarah E

    2016-08-01

    A primary complaint of hearing-impaired individuals involves poor speech understanding when background noise is present. Hearing aids and cochlear implants often allow good speech understanding in quiet backgrounds. But hearing-impaired individuals are highly noise intolerant, and existing devices are not very effective at combating background noise. As a result, speech understanding in noise is often quite poor. In accord with the significance of the problem, considerable effort has been expended toward understanding and remedying this issue. Fortunately, our understanding of the underlying issues is reasonably good. In sharp contrast, effective solutions have remained elusive. One solution that seems promising involves a single-microphone machine-learning algorithm to extract speech from background noise. Data from our group indicate that the algorithm is capable of producing vast increases in speech understanding by hearing-impaired individuals. This paper will first provide an overview of the speech-in-noise problem and outline why hearing-impaired individuals are so noise intolerant. An overview of our approach to solving this problem will follow.

  13. Audio-Visual Speech in Noise Perception in Dyslexia

    Science.gov (United States)

    van Laarhoven, Thijs; Keetels, Mirjam; Schakel, Lemmy; Vroomen, Jean

    2018-01-01

    Individuals with developmental dyslexia (DD) may experience, besides reading problems, other speech-related processing deficits. Here, we examined the influence of visual articulatory information (lip-read speech) at various levels of background noise on auditory word recognition in children and adults with DD. We found that children with a…

  14. Individual differences in language and working memory affect children’s speech recognition in noise

    Science.gov (United States)

    McCreery, Ryan W.; Spratford, Meredith; Kirby, Benjamin; Brennan, Marc

    2017-01-01

    Objective We examined how cognitive and linguistic skills affect speech recognition in noise for children with normal hearing. Children with better working memory and language abilities were expected to have better speech recognition in noise than peers with poorer skills in these domains. Design As part of a prospective, cross-sectional study, children with normal hearing completed speech recognition in noise for three types of stimuli: (1) monosyllabic words, (2) syntactically correct but semantically anomalous sentences and (3) semantically and syntactically anomalous word sequences. Measures of vocabulary, syntax and working memory were used to predict individual differences in speech recognition in noise. Study sample Ninety-six children with normal hearing, who were between 5 and 12 years of age. Results Higher working memory was associated with better speech recognition in noise for all three stimulus types. Higher vocabulary abilities were associated with better recognition in noise for sentences and word sequences, but not for words. Conclusions Working memory and language both influence children’s speech recognition in noise, but the relationships vary across types of stimuli. These findings suggest that clinical assessment of speech recognition is likely to reflect underlying cognitive and linguistic abilities, in addition to a child’s auditory skills, consistent with the Ease of Language Understanding model. PMID:27981855

  15. Relationship between Speech Intelligibility and Speech Comprehension in Babble Noise

    Science.gov (United States)

    Fontan, Lionel; Tardieu, Julien; Gaillard, Pascal; Woisard, Virginie; Ruiz, Robert

    2015-01-01

    Purpose: The authors investigated the relationship between the intelligibility and comprehension of speech presented in babble noise. Method: Forty participants listened to French imperative sentences (commands for moving objects) in a multitalker babble background for which intensity was experimentally controlled. Participants were instructed to…

  16. The relationship between the intelligibility of time-compressed speech and speech in noise in young and elderly listeners

    Science.gov (United States)

    Versfeld, Niek J.; Dreschler, Wouter A.

    2002-01-01

    A conventional measure to determine the ability to understand speech in noisy backgrounds is the so-called speech reception threshold (SRT) for sentences. It yields the signal-to-noise ratio (in dB) for which half of the sentences are correctly perceived. The SRT defines to what degree speech must be audible to a listener in order to become just intelligible. There are indications that elderly listeners have greater difficulty in understanding speech in adverse listening conditions than young listeners. This may be partly due to the differences in hearing sensitivity (presbycusis), hence audibility, but other factors, such as temporal acuity, may also play a significant role. A potential measure for the temporal acuity may be the threshold to which speech can be accelerated, or compressed in time. A new test is introduced where the speech rate is varied adaptively. In analogy to the SRT, the time-compression threshold (or TCT) then is defined as the speech rate (expressed in syllables per second) for which half of the sentences are correctly perceived. In experiment I, the TCT test is introduced and normative data are provided. In experiment II, four groups of subjects (young and elderly normal-hearing and hearing-impaired subjects) participated, and the SRT's in stationary and fluctuating speech-shaped noise were determined, as well as the TCT. The results show that the SRT in fluctuating noise and the TCT are highly correlated. All tests indicate that, even after correction for the hearing loss, elderly normal-hearing subjects perform worse than young normal-hearing subjects. The results indicate that the use of the TCT test or the SRT test in fluctuating noise is preferred over the SRT test in stationary noise.

  17. Speech intelligibility of normal listeners and persons with impaired hearing in traffic noise

    Science.gov (United States)

    Aniansson, G.; Peterson, Y.

    1983-10-01

    Speech intelligibility (PB words) in traffic-like noise was investigated in a laboratory situation simulating three common listening situations, indoors at 1 and 4 m and outdoors at 1 m. The maximum noise levels still permitting 75% intelligibility of PB words in these three listening situations were also defined. A total of 269 persons were examined. Forty-six had normal hearing, 90 a presbycusis-type hearing loss, 95 a noise-induced hearing loss and 38 a conductive hearing loss. In the indoor situation the majority of the groups with impaired hearing retained good speech intelligibility in 40 dB(A) masking noise. Lowering the noise level to less than 40 dB(A) resulted in a minor, usually insignificant, improvement in speech intelligibility. Listeners with normal hearing maintained good speech intelligibility in the outdoor listening situation at noise levels up to 60 dB(A), without lip-reading (i.e., using non-auditory information). For groups with impaired hearing due to age and/or noise, representing 8% of the population in Sweden, the noise level outdoors had to be lowered to less than 50 dB(A), in order to achieve good speech intelligibility at 1 m without lip-reading.

  18. Improving speech perception in noise with current focusing in cochlear implant users.

    Science.gov (United States)

    Srinivasan, Arthi G; Padilla, Monica; Shannon, Robert V; Landsberger, David M

    2013-05-01

    Cochlear implant (CI) users typically have excellent speech recognition in quiet but struggle with understanding speech in noise. It is thought that broad current spread from stimulating electrodes causes adjacent electrodes to activate overlapping populations of neurons which results in interactions across adjacent channels. Current focusing has been studied as a way to reduce spread of excitation, and therefore, reduce channel interactions. In particular, partial tripolar stimulation has been shown to reduce spread of excitation relative to monopolar stimulation. However, the crucial question is whether this benefit translates to improvements in speech perception. In this study, we compared speech perception in noise with experimental monopolar and partial tripolar speech processing strategies. The two strategies were matched in terms of number of active electrodes, microphone, filterbanks, stimulation rate and loudness (although both strategies used a lower stimulation rate than typical clinical strategies). The results of this study showed a significant improvement in speech perception in noise with partial tripolar stimulation. All subjects benefited from the current focused speech processing strategy. There was a mean improvement in speech recognition threshold of 2.7 dB in a digits in noise task and a mean improvement of 3 dB in a sentences in noise task with partial tripolar stimulation relative to monopolar stimulation. Although the experimental monopolar strategy was worse than the clinical, presumably due to different microphones, frequency allocations and stimulation rates, the experimental partial-tripolar strategy, which had the same changes, showed no acute deficit relative to the clinical. Copyright © 2013 Elsevier B.V. All rights reserved.

  19. Exploring the role of brain oscillations in speech perception in noise: Intelligibility of isochronously retimed speech

    Directory of Open Access Journals (Sweden)

    Vincent Aubanel

    2016-08-01

    Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.

  20. Speech perception for adult cochlear implant recipients in a realistic background noise: effectiveness of preprocessing strategies and external options for improving speech recognition in noise.

    Science.gov (United States)

    Gifford, René H; Revit, Lawrence J

    2010-01-01

    Although cochlear implant patients are achieving increasingly higher levels of performance, speech perception in noise continues to be problematic. The newest generations of implant speech processors are equipped with preprocessing and/or external accessories that are purported to improve listening in noise. Most speech perception measures in the clinical setting, however, do not provide a close approximation to real-world listening environments. To assess speech perception for adult cochlear implant recipients in the presence of a realistic restaurant simulation generated by an eight-loudspeaker (R-SPACE) array in order to determine whether commercially available preprocessing strategies and/or external accessories yield improved sentence recognition in noise. Single-subject, repeated-measures design with two groups of participants: Advanced Bionics and Cochlear Corporation recipients. Thirty-four subjects, ranging in age from 18 to 90 yr (mean 54.5 yr), participated in this prospective study. Fourteen subjects were Advanced Bionics recipients, and 20 subjects were Cochlear Corporation recipients. Speech reception thresholds (SRTs) in semidiffuse restaurant noise originating from an eight-loudspeaker array were assessed with the subjects' preferred listening programs as well as with the addition of either Beam preprocessing (Cochlear Corporation) or the T-Mic accessory option (Advanced Bionics). In Experiment 1, adaptive SRTs with the Hearing in Noise Test sentences were obtained for all 34 subjects. For Cochlear Corporation recipients, SRTs were obtained with their preferred everyday listening program as well as with the addition of Focus preprocessing. For Advanced Bionics recipients, SRTs were obtained with the integrated behind-the-ear (BTE) mic as well as with the T-Mic. Statistical analysis using a repeated-measures analysis of variance (ANOVA) evaluated the effects of the preprocessing strategy or external accessory in reducing the SRT in noise. In addition

  1. Perceptual effects of noise reduction by time-frequency masking of noisy speech.

    Science.gov (United States)

    Brons, Inge; Houben, Rolph; Dreschler, Wouter A

    2012-10-01

    Time-frequency masking is a method for noise reduction that is based on the time-frequency representation of a speech in noise signal. Depending on the estimated signal-to-noise ratio (SNR), each time-frequency unit is either attenuated or not. A special type of a time-frequency mask is the ideal binary mask (IBM), which has access to the real SNR (ideal). The IBM either retains or removes each time-frequency unit (binary mask). The IBM provides large improvements in speech intelligibility and is a valuable tool for investigating how different factors influence intelligibility. This study extends the standard outcome measure (speech intelligibility) with additional perceptual measures relevant for noise reduction: listening effort, noise annoyance, speech naturalness, and overall preference. Four types of time-frequency masking were evaluated: the original IBM, a tempered version of the IBM (called ITM) which applies limited and non-binary attenuation, and non-ideal masking (also tempered) with two different types of noise-estimation algorithms. The results from ideal masking imply that there is a trade-off between intelligibility and sound quality, which depends on the attenuation strength. Additionally, the results for non-ideal masking suggest that subjective measures can show effects of noise reduction even if noise reduction does not lead to differences in intelligibility.

  2. Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2016-01-01

    In this paper, single channel speech enhancement in the time domain is considered. We address the problem of modelling non-stationary speech by describing the voiced speech parts by a harmonic linear chirp model instead of using the traditional harmonic model. This means that the speech signal...... through simulations on synthetic and speech signals, that the chirp versions of the filters perform better than their harmonic counterparts in terms of output signal-to-noise ratio (SNR) and signal reduction factor. For synthetic signals, the output SNR for the harmonic chirp APES based filter...... is increased 3 dB compared to the harmonic APES based filter at an input SNR of 10 dB, and at the same time the signal reduction factor is decreased. For speech signals, the increase is 1.5 dB along with a decrease in the signal reduction factor of 0.7. As an implicit part of the APES filter, a noise...

  3. Musical training during early childhood enhances the neural encoding of speech in noise.

    Science.gov (United States)

    Strait, Dana L; Parbery-Clark, Alexandra; Hittner, Emily; Kraus, Nina

    2012-12-01

    For children, learning often occurs in the presence of background noise. As such, there is growing desire to improve a child's access to a target signal in noise. Given adult musicians' perceptual and neural speech-in-noise enhancements, we asked whether similar effects are present in musically-trained children. We assessed the perception and subcortical processing of speech in noise and related cognitive abilities in musician and nonmusician children that were matched for a variety of overarching factors. Outcomes reveal that musicians' advantages for processing speech in noise are present during pivotal developmental years. Supported by correlations between auditory working memory and attention and auditory brainstem response properties, we propose that musicians' perceptual and neural enhancements are driven in a top-down manner by strengthened cognitive abilities with training. Our results may be considered by professionals involved in the remediation of language-based learning deficits, which are often characterized by poor speech perception in noise. Copyright © 2012 Elsevier Inc. All rights reserved.

  4. Speech Denoising in White Noise Based on Signal Subspace Low-rank Plus Sparse Decomposition

    Directory of Open Access Journals (Sweden)

    yuan Shuai

    2017-01-01

    Full Text Available In this paper, a new subspace speech enhancement method using low-rank and sparse decomposition is presented. In the proposed method, we firstly structure the corrupted data as a Toeplitz matrix and estimate its effective rank for the underlying human speech signal. Then the low-rank and sparse decomposition is performed with the guidance of speech rank value to remove the noise. Extensive experiments have been carried out in white Gaussian noise condition, and experimental results show the proposed method performs better than conventional speech enhancement methods, in terms of yielding less residual noise and lower speech distortion.

  5. Noise and pitch interact during the cortical segregation of concurrent speech.

    Science.gov (United States)

    Bidelman, Gavin M; Yellamsetty, Anusha

    2017-08-01

    Behavioral studies reveal listeners exploit intrinsic differences in voice fundamental frequency (F0) to segregate concurrent speech sounds-the so-called "F0-benefit." More favorable signal-to-noise ratio (SNR) in the environment, an extrinsic acoustic factor, similarly benefits the parsing of simultaneous speech. Here, we examined the neurobiological substrates of these two cues in the perceptual segregation of concurrent speech mixtures. We recorded event-related brain potentials (ERPs) while listeners performed a speeded double-vowel identification task. Listeners heard two concurrent vowels whose F0 differed by zero or four semitones presented in either clean (no noise) or noise-degraded (+5 dB SNR) conditions. Behaviorally, listeners were more accurate in correctly identifying both vowels for larger F0 separations but F0-benefit was more pronounced at more favorable SNRs (i.e., pitch × SNR interaction). Analysis of the ERPs revealed that only the P2 wave (∼200 ms) showed a similar F0 x SNR interaction as behavior and was correlated with listeners' perceptual F0-benefit. Neural classifiers applied to the ERPs further suggested that speech sounds are segregated neurally within 200 ms based on SNR whereas segregation based on pitch occurs later in time (400-700 ms). The earlier timing of extrinsic SNR compared to intrinsic F0-based segregation implies that the cortical extraction of speech from noise is more efficient than differentiating speech based on pitch cues alone, which may recruit additional cortical processes. Findings indicate that noise and pitch differences interact relatively early in cerebral cortex and that the brain arrives at the identities of concurrent speech mixtures as early as ∼200 ms. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Examining speech perception in noise and cognitive functions in the elderly.

    Science.gov (United States)

    Meister, Hartmut; Schreitmüller, Stefan; Grugel, Linda; Beutner, Dirk; Walger, Martin; Meister, Ingo

    2013-12-01

    The purpose of this study was to investigate the relationship of cognitive functions (i.e., working memory [WM]) and speech recognition against different background maskers in older individuals. Speech reception thresholds (SRTs) were determined using a matrix-sentence test. Unmodulated noise, modulated noise (International Collegium for Rehabilitative Audiology [ICRA] noise 5-250), and speech fragments (International Speech Test Signal [ISTS]) were used as background maskers. Verbal WM was assessed using the Verbal Learning and Memory Test (VLMT; Helmstaedter & Durwen, 1990). Measurements were conducted with 14 normal-hearing older individuals and a control group of 12 normal-hearing young listeners. Despite their normal hearing ability, the young listeners outperformed the older individuals in all background maskers. These differences were largest for the modulated maskers. SRTs were significantly correlated with the scores of the VLMT. A linear regression model also included WM as the only significant predictor variable. The results support the assumption that WM plays an important role for speech understanding and that it might have impact on results obtained using speech audiometry. Thus, an individual's WM capacity should be considered with aural diagnosis and rehabilitation. The VLMT proved to be a clinically applicable test for WM. Further cognitive functions important with speech understanding are currently being investigated within the SAKoLA (Sprachaudiometrie und kognitive Leistungen im Alter [Speech Audiometry and Cognitive Functions in the Elderly]) project.

  7. The Effect of Age and Type of Noise on Speech Perception under Conditions of Changing Context and Noise Levels.

    Science.gov (United States)

    Taitelbaum-Swead, Riki; Fostick, Leah

    2016-01-01

    Everyday life includes fluctuating noise levels, resulting in continuously changing speech intelligibility. The study aims were: (1) to quantify the amount of decrease in age-related speech perception, as a result of increasing noise level, and (2) to test the effect of age on context usage at the word level (smaller amount of contextual cues). A total of 24 young adults (age 20-30 years) and 20 older adults (age 60-75 years) were tested. Meaningful and nonsense one-syllable consonant-vowel-consonant words were presented with the background noise types of speech noise (SpN), babble noise (BN), and white noise (WN), with a signal-to-noise ratio (SNR) of 0 and -5 dB. Older adults had lower accuracy in SNR = 0, with WN being the most difficult condition for all participants. Measuring the change in speech perception when SNR decreased showed a reduction of 18.6-61.5% in intelligibility, with age effect only for BN. Both young and older adults used less phonemic context with WN, as compared to other conditions. Older adults are more affected by an increasing noise level of fluctuating informational noise as compared to steady-state noise. They also use less contextual cues when perceiving monosyllabic words. Further studies should take into consideration that when presenting the stimulus differently (change in noise level, less contextual cues), other perceptual and cognitive processes are involved. © 2016 S. Karger AG, Basel.

  8. Auditory Brainstem Response to Complex Sounds Predicts Self-Reported Speech-in-Noise Performance

    Science.gov (United States)

    Anderson, Samira; Parbery-Clark, Alexandra; White-Schwoch, Travis; Kraus, Nina

    2013-01-01

    Purpose: To compare the ability of the auditory brainstem response to complex sounds (cABR) to predict subjective ratings of speech understanding in noise on the Speech, Spatial, and Qualities of Hearing Scale (SSQ; Gatehouse & Noble, 2004) relative to the predictive ability of the Quick Speech-in-Noise test (QuickSIN; Killion, Niquette,…

  9. The role of periodicity in perceiving speech in quiet and in background noise.

    Science.gov (United States)

    Steinmetzger, Kurt; Rosen, Stuart

    2015-12-01

    The ability of normal-hearing listeners to perceive sentences in quiet and in background noise was investigated in a variety of conditions mixing the presence and absence of periodicity (i.e., voicing) in both target and masker. Experiment 1 showed that in quiet, aperiodic noise-vocoded speech and speech with a natural amount of periodicity were equally intelligible, while fully periodic speech was much harder to understand. In Experiments 2 and 3, speech reception thresholds for these targets were measured in the presence of four different maskers: speech-shaped noise, harmonic complexes with a dynamically varying F0 contour, and 10 Hz amplitude-modulated versions of both. For experiment 2, results of experiment 1 were used to identify conditions with equal intelligibility in quiet, while in experiment 3 target intelligibility in quiet was near ceiling. In the presence of a masker, periodicity in the target speech mattered little, but listeners strongly benefited from periodicity in the masker. Substantial fluctuating-masker benefits required the target speech to be almost perfectly intelligible in quiet. In summary, results suggest that the ability to exploit periodicity cues may be an even more important factor when attempting to understand speech embedded in noise than the ability to benefit from masker fluctuations.

  10. Comparison of two speech privacy measurements, articulation index (AI) and speech privacy noise isolation class (NIC'), in open workplaces

    Science.gov (United States)

    Yoon, Heakyung C.; Loftness, Vivian

    2002-05-01

    Lack of speech privacy has been reported to be the main dissatisfaction among occupants in open workplaces, according to workplace surveys. Two speech privacy measurements, Articulation Index (AI), standardized by the American National Standards Institute in 1969, and Speech Privacy Noise Isolation Class (NIC', Noise Isolation Class Prime), adapted from Noise Isolation Class (NIC) by U. S. General Services Administration (GSA) in 1979, have been claimed as objective tools to measure speech privacy in open offices. To evaluate which of them, normal privacy for AI or satisfied privacy for NIC', is a better tool in terms of speech privacy in a dynamic open office environment, measurements were taken in the field. AIs and NIC's in the different partition heights and workplace configurations have been measured following ASTM E1130 (Standard Test Method for Objective Measurement of Speech Privacy in Open Offices Using Articulation Index) and GSA test PBS-C.1 (Method for the Direct Measurement of Speech-Privacy Potential (SPP) Based on Subjective Judgments) and PBS-C.2 (Public Building Service Standard Method of Test Method for the Sufficient Verification of Speech-Privacy Potential (SPP) Based on Objective Measurements Including Methods for the Rating of Functional Interzone Attenuation and NC-Background), respectively.

  11. Air Traffic Controllers’ Long-Term Speech-in-Noise Training Effects: A Control Group Study

    Science.gov (United States)

    Zaballos, María T.P.; Plasencia, Daniel P.; González, María L.Z.; de Miguel, Angel R.; Macías, Ángel R.

    2016-01-01

    Introduction: Speech perception in noise relies on the capacity of the auditory system to process complex sounds using sensory and cognitive skills. The possibility that these can be trained during adulthood is of special interest in auditory disorders, where speech in noise perception becomes compromised. Air traffic controllers (ATC) are constantly exposed to radio communication, a situation that seems to produce auditory learning. The objective of this study has been to quantify this effect. Subjects and Methods: 19 ATC and 19 normal hearing individuals underwent a speech in noise test with three signal to noise ratios: 5, 0 and −5 dB. Noise and speech were presented through two different loudspeakers in azimuth position. Speech tokes were presented at 65 dB SPL, while white noise files were at 60, 65 and 70 dB respectively. Results: Air traffic controllers outperform the control group in all conditions [P<0.05 in ANOVA and Mann-Whitney U tests]. Group differences were largest in the most difficult condition, SNR=−5 dB. However, no correlation between experience and performance were found for any of the conditions tested. The reason might be that ceiling performance is achieved much faster than the minimum experience time recorded, 5 years, although intrinsic cognitive abilities cannot be disregarded. Discussion: ATC demonstrated enhanced ability to hear speech in challenging listening environments. This study provides evidence that long-term auditory training is indeed useful in achieving better speech-in-noise understanding even in adverse conditions, although good cognitive qualities are likely to be a basic requirement for this training to be effective. Conclusion: Our results show that ATC outperform the control group in all conditions. Thus, this study provides evidence that long-term auditory training is indeed useful in achieving better speech-in-noise understanding even in adverse conditions. PMID:27991470

  12. Air traffic controllers' long-term speech-in-noise training effects: A control group study.

    Science.gov (United States)

    Zaballos, Maria T P; Plasencia, Daniel P; González, María L Z; de Miguel, Angel R; Macías, Ángel R

    2016-01-01

    Speech perception in noise relies on the capacity of the auditory system to process complex sounds using sensory and cognitive skills. The possibility that these can be trained during adulthood is of special interest in auditory disorders, where speech in noise perception becomes compromised. Air traffic controllers (ATC) are constantly exposed to radio communication, a situation that seems to produce auditory learning. The objective of this study has been to quantify this effect. 19 ATC and 19 normal hearing individuals underwent a speech in noise test with three signal to noise ratios: 5, 0 and -5 dB. Noise and speech were presented through two different loudspeakers in azimuth position. Speech tokes were presented at 65 dB SPL, while white noise files were at 60, 65 and 70 dB respectively. Air traffic controllers outperform the control group in all conditions [P<0.05 in ANOVA and Mann-Whitney U tests]. Group differences were largest in the most difficult condition, SNR=-5 dB. However, no correlation between experience and performance were found for any of the conditions tested. The reason might be that ceiling performance is achieved much faster than the minimum experience time recorded, 5 years, although intrinsic cognitive abilities cannot be disregarded. ATC demonstrated enhanced ability to hear speech in challenging listening environments. This study provides evidence that long-term auditory training is indeed useful in achieving better speech-in-noise understanding even in adverse conditions, although good cognitive qualities are likely to be a basic requirement for this training to be effective. Our results show that ATC outperform the control group in all conditions. Thus, this study provides evidence that long-term auditory training is indeed useful in achieving better speech-in-noise understanding even in adverse conditions.

  13. Temporal and speech processing skills in normal hearing individuals exposed to occupational noise.

    Science.gov (United States)

    Kumar, U Ajith; Ameenudin, Syed; Sangamanatha, A V

    2012-01-01

    Prolonged exposure to high levels of occupational noise can cause damage to hair cells in the cochlea and result in permanent noise-induced cochlear hearing loss. Consequences of cochlear hearing loss on speech perception and psychophysical abilities have been well documented. Primary goal of this research was to explore temporal processing and speech perception Skills in individuals who are exposed to occupational noise of more than 80 dBA and not yet incurred clinically significant threshold shifts. Contribution of temporal processing skills to speech perception in adverse listening situation was also evaluated. A total of 118 participants took part in this research. Participants comprised three groups of train drivers in the age range of 30-40 (n= 13), 41 50 ( = 13), 41-50 (n = 9), and 51-60 (n = 6) years and their non-noise-exposed counterparts (n = 30 in each age group). Participants of all the groups including the train drivers had hearing sensitivity within 25 dB HL in the octave frequencies between 250 and 8 kHz. Temporal processing was evaluated using gap detection, modulation detection, and duration pattern tests. Speech recognition was tested in presence multi-talker babble at -5dB SNR. Differences between experimental and control groups were analyzed using ANOVA and independent sample t-tests. Results showed a trend of reduced temporal processing skills in individuals with noise exposure. These deficits were observed despite normal peripheral hearing sensitivity. Speech recognition scores in the presence of noise were also significantly poor in noise-exposed group. Furthermore, poor temporal processing skills partially accounted for the speech recognition difficulties exhibited by the noise-exposed individuals. These results suggest that noise can cause significant distortions in the processing of suprathreshold temporal cues which may add to difficulties in hearing in adverse listening conditions.

  14. Temporal and speech processing skills in normal hearing individuals exposed to occupational noise

    Directory of Open Access Journals (Sweden)

    U Ajith Kumar

    2012-01-01

    Full Text Available Prolonged exposure to high levels of occupational noise can cause damage to hair cells in the cochlea and result in permanent noise-induced cochlear hearing loss. Consequences of cochlear hearing loss on speech perception and psychophysical abilities have been well documented. Primary goal of this research was to explore temporal processing and speech perception Skills in individuals who are exposed to occupational noise of more than 80 dBA and not yet incurred clinically significant threshold shifts. Contribution of temporal processing skills to speech perception in adverse listening situation was also evaluated. A total of 118 participants took part in this research. Participants comprised three groups of train drivers in the age range of 30-40 (n= 13, 41 50 ( = 13, 41-50 (n = 9, and 51-60 (n = 6 years and their non-noise-exposed counterparts (n = 30 in each age group. Participants of all the groups including the train drivers had hearing sensitivity within 25 dB HL in the octave frequencies between 250 and 8 kHz. Temporal processing was evaluated using gap detection, modulation detection, and duration pattern tests. Speech recognition was tested in presence multi-talker babble at -5dB SNR. Differences between experimental and control groups were analyzed using ANOVA and independent sample t-tests. Results showed a trend of reduced temporal processing skills in individuals with noise exposure. These deficits were observed despite normal peripheral hearing sensitivity. Speech recognition scores in the presence of noise were also significantly poor in noise-exposed group. Furthermore, poor temporal processing skills partially accounted for the speech recognition difficulties exhibited by the noise-exposed individuals. These results suggest that noise can cause significant distortions in the processing of suprathreshold temporal cues which may add to difficulties in hearing in adverse listening conditions.

  15. Impact of noise and other factors on speech recognition in anaesthesia

    DEFF Research Database (Denmark)

    Alapetite, Alexandre

    2008-01-01

    of training. Methods: Eight volunteers read aloud a total of about 3 600 typical short anaesthesia comments to be transcribed by a continuous speech recognition system. Background noises were collected in an operating room and reproduced. A regression analysis and descriptive statistics were done to evaluate...... operations. Objective: The aim of the experiment is to evaluate the relative impact of several factors affecting speech recognition when used in operating rooms, such as the type or loudness of background noises, type of microphone, type of recognition mode (free speech versus command mode), and type...... the relative effect of various factors. Results: Some factors have a major impact, such as the words to be recognised, the type of recognition, and participants. The type of microphone is especially significant when combined with the type of noise. While loud noises in the operating room can have a predominant...

  16. Source Separation via Spectral Masking for Speech Recognition Systems

    Directory of Open Access Journals (Sweden)

    Gustavo Fernandes Rodrigues

    2012-12-01

    Full Text Available In this paper we present an insight into the use of spectral masking techniques in time-frequency domain, as a preprocessing step for the speech signal recognition. Speech recognition systems have their performance negatively affected in noisy environments or in the presence of other speech signals. The limits of these masking techniques for different levels of the signal-to-noise ratio are discussed. We show the robustness of the spectral masking techniques against four types of noise: white, pink, brown and human speech noise (bubble noise. The main contribution of this work is to analyze the performance limits of recognition systems  using spectral masking. We obtain an increase of 18% on the speech hit rate, when the speech signals were corrupted by other speech signals or bubble noise, with different signal-to-noise ratio of approximately 1, 10 and 20 dB. On the other hand, applying the ideal binary masks to mixtures corrupted by white, pink and brown noise, results an average growth of 9% on the speech hit rate, with the same different signal-to-noise ratio. The experimental results suggest that the masking spectral techniques are more suitable for the case when it is applied a bubble noise, which is produced by human speech, than for the case of applying white, pink and brown noise.

  17. Robust Speaker Authentication Based on Combined Speech and Voiceprint Recognition

    Science.gov (United States)

    Malcangi, Mario

    2009-08-01

    Personal authentication is becoming increasingly important in many applications that have to protect proprietary data. Passwords and personal identification numbers (PINs) prove not to be robust enough to ensure that unauthorized people do not use them. Biometric authentication technology may offer a secure, convenient, accurate solution but sometimes fails due to its intrinsically fuzzy nature. This research aims to demonstrate that combining two basic speech processing methods, voiceprint identification and speech recognition, can provide a very high degree of robustness, especially if fuzzy decision logic is used.

  18. Superlinearly scalable noise robustness of redundant coupled dynamical systems.

    Science.gov (United States)

    Kohar, Vivek; Kia, Behnam; Lindner, John F; Ditto, William L

    2016-03-01

    We illustrate through theory and numerical simulations that redundant coupled dynamical systems can be extremely robust against local noise in comparison to uncoupled dynamical systems evolving in the same noisy environment. Previous studies have shown that the noise robustness of redundant coupled dynamical systems is linearly scalable and deviations due to noise can be minimized by increasing the number of coupled units. Here, we demonstrate that the noise robustness can actually be scaled superlinearly if some conditions are met and very high noise robustness can be realized with very few coupled units. We discuss these conditions and show that this superlinear scalability depends on the nonlinearity of the individual dynamical units. The phenomenon is demonstrated in discrete as well as continuous dynamical systems. This superlinear scalability not only provides us an opportunity to exploit the nonlinearity of physical systems without being bogged down by noise but may also help us in understanding the functional role of coupled redundancy found in many biological systems. Moreover, engineers can exploit superlinear noise suppression by starting a coupled system near (not necessarily at) the appropriate initial condition.

  19. Patient-reported speech in noise difficulties and hyperacusis symptoms and correlation with test results.

    Science.gov (United States)

    Spyridakou, Chrysa; Luxon, Linda M; Bamiou, Doris E

    2012-07-01

    To compare self-reported symptoms of difficulty hearing speech in noise and hyperacusis in adults with auditory processing disorders (APDs) and normal controls; and to compare self-reported symptoms to objective test results (speech in babble test, transient evoked otoacoustic emission [TEOAE] suppression test using contralateral noise). A prospective case-control pilot study. Twenty-two participants were recruited in the study: 10 patients with reported hearing difficulty, normal audiometry, and a clinical diagnosis of APD; and 12 normal age-matched controls with no reported hearing difficulty. All participants completed the validated Amsterdam Inventory for Auditory Disability questionnaire, a hyperacusis questionnaire, a speech in babble test, and a TEOAE suppression test using contralateral noise. Patients had significantly worse scores than controls in all domains of the Amsterdam Inventory questionnaire (with the exception of sound detection) and the hyperacusis questionnaire (P reported symptoms of difficulty hearing speech in noise and speech in babble test results in the right ear (ρ = 0.624, P = .002), and between self-reported symptoms of hyperacusis and TEOAE suppression test results in the right ear (ρ = -0.597 P = .003). There was no significant correlation between the two tests. A strong correlation was observed between right ear speech in babble and patient-reported intelligibility of speech in noise, and right ear TEOAE suppression by contralateral noise and hyperacusis questionnaire. Copyright © 2012 The American Laryngological, Rhinological, and Otological Society, Inc.

  20. Musical Experience and the Aging Auditory System: Implications for Cognitive Abilities and Hearing Speech in Noise

    Science.gov (United States)

    Parbery-Clark, Alexandra; Strait, Dana L.; Anderson, Samira; Hittner, Emily; Kraus, Nina

    2011-01-01

    Much of our daily communication occurs in the presence of background noise, compromising our ability to hear. While understanding speech in noise is a challenge for everyone, it becomes increasingly difficult as we age. Although aging is generally accompanied by hearing loss, this perceptual decline cannot fully account for the difficulties experienced by older adults for hearing in noise. Decreased cognitive skills concurrent with reduced perceptual acuity are thought to contribute to the difficulty older adults experience understanding speech in noise. Given that musical experience positively impacts speech perception in noise in young adults (ages 18–30), we asked whether musical experience benefits an older cohort of musicians (ages 45–65), potentially offsetting the age-related decline in speech-in-noise perceptual abilities and associated cognitive function (i.e., working memory). Consistent with performance in young adults, older musicians demonstrated enhanced speech-in-noise perception relative to nonmusicians along with greater auditory, but not visual, working memory capacity. By demonstrating that speech-in-noise perception and related cognitive function are enhanced in older musicians, our results imply that musical training may reduce the impact of age-related auditory decline. PMID:21589653

  1. Musical experience and the aging auditory system: implications for cognitive abilities and hearing speech in noise.

    Science.gov (United States)

    Parbery-Clark, Alexandra; Strait, Dana L; Anderson, Samira; Hittner, Emily; Kraus, Nina

    2011-05-11

    Much of our daily communication occurs in the presence of background noise, compromising our ability to hear. While understanding speech in noise is a challenge for everyone, it becomes increasingly difficult as we age. Although aging is generally accompanied by hearing loss, this perceptual decline cannot fully account for the difficulties experienced by older adults for hearing in noise. Decreased cognitive skills concurrent with reduced perceptual acuity are thought to contribute to the difficulty older adults experience understanding speech in noise. Given that musical experience positively impacts speech perception in noise in young adults (ages 18-30), we asked whether musical experience benefits an older cohort of musicians (ages 45-65), potentially offsetting the age-related decline in speech-in-noise perceptual abilities and associated cognitive function (i.e., working memory). Consistent with performance in young adults, older musicians demonstrated enhanced speech-in-noise perception relative to nonmusicians along with greater auditory, but not visual, working memory capacity. By demonstrating that speech-in-noise perception and related cognitive function are enhanced in older musicians, our results imply that musical training may reduce the impact of age-related auditory decline.

  2. Musical experience and the aging auditory system: implications for cognitive abilities and hearing speech in noise.

    Directory of Open Access Journals (Sweden)

    Alexandra Parbery-Clark

    Full Text Available Much of our daily communication occurs in the presence of background noise, compromising our ability to hear. While understanding speech in noise is a challenge for everyone, it becomes increasingly difficult as we age. Although aging is generally accompanied by hearing loss, this perceptual decline cannot fully account for the difficulties experienced by older adults for hearing in noise. Decreased cognitive skills concurrent with reduced perceptual acuity are thought to contribute to the difficulty older adults experience understanding speech in noise. Given that musical experience positively impacts speech perception in noise in young adults (ages 18-30, we asked whether musical experience benefits an older cohort of musicians (ages 45-65, potentially offsetting the age-related decline in speech-in-noise perceptual abilities and associated cognitive function (i.e., working memory. Consistent with performance in young adults, older musicians demonstrated enhanced speech-in-noise perception relative to nonmusicians along with greater auditory, but not visual, working memory capacity. By demonstrating that speech-in-noise perception and related cognitive function are enhanced in older musicians, our results imply that musical training may reduce the impact of age-related auditory decline.

  3. Low-dimensional recurrent neural network-based Kalman filter for speech enhancement.

    Science.gov (United States)

    Xia, Youshen; Wang, Jun

    2015-07-01

    This paper proposes a new recurrent neural network-based Kalman filter for speech enhancement, based on a noise-constrained least squares estimate. The parameters of speech signal modeled as autoregressive process are first estimated by using the proposed recurrent neural network and the speech signal is then recovered from Kalman filtering. The proposed recurrent neural network is globally asymptomatically stable to the noise-constrained estimate. Because the noise-constrained estimate has a robust performance against non-Gaussian noise, the proposed recurrent neural network-based speech enhancement algorithm can minimize the estimation error of Kalman filter parameters in non-Gaussian noise. Furthermore, having a low-dimensional model feature, the proposed neural network-based speech enhancement algorithm has a much faster speed than two existing recurrent neural networks-based speech enhancement algorithms. Simulation results show that the proposed recurrent neural network-based speech enhancement algorithm can produce a good performance with fast computation and noise reduction. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. The Speech multi features fusion perceptual hash algorithm based on tensor decomposition

    Science.gov (United States)

    Huang, Y. B.; Fan, M. H.; Zhang, Q. Y.

    2018-03-01

    With constant progress in modern speech communication technologies, the speech data is prone to be attacked by the noise or maliciously tampered. In order to make the speech perception hash algorithm has strong robustness and high efficiency, this paper put forward a speech perception hash algorithm based on the tensor decomposition and multi features is proposed. This algorithm analyses the speech perception feature acquires each speech component wavelet packet decomposition. LPCC, LSP and ISP feature of each speech component are extracted to constitute the speech feature tensor. Speech authentication is done by generating the hash values through feature matrix quantification which use mid-value. Experimental results showing that the proposed algorithm is robust for content to maintain operations compared with similar algorithms. It is able to resist the attack of the common background noise. Also, the algorithm is highly efficiency in terms of arithmetic, and is able to meet the real-time requirements of speech communication and complete the speech authentication quickly.

  5. Biological impact of preschool music classes on processing speech in noise

    OpenAIRE

    Strait, Dana L.; Parbery-Clark, Alexandra; O’Connell, Samantha; Kraus, Nina

    2013-01-01

    Musicians have increased resilience to the effects of noise on speech perception and its neural underpinnings. We do not know, however, how early in life these enhancements arise. We compared auditory brainstem responses to speech in noise in 32 preschool children, half of whom were engaged in music training. Thirteen children returned for testing one year later, permitting the first longitudinal assessment of subcortical auditory function with music training. Results indicate emerging neural...

  6. Updating working memory in aircraft noise and speech noise causes different fMRI activations.

    Science.gov (United States)

    Saetrevik, Bjørn; Sörqvist, Patrik

    2015-02-01

    The present study used fMRI/BOLD neuroimaging to investigate how visual-verbal working memory is updated when exposed to three different background-noise conditions: speech noise, aircraft noise and silence. The number-updating task that was used can distinguish between "substitution processes," which involve adding new items to the working memory representation and suppressing old items, and "exclusion processes," which involve rejecting new items and maintaining an intact memory set. The current findings supported the findings of a previous study by showing that substitution activated the dorsolateral prefrontal cortex, the posterior medial frontal cortex and the parietal lobes, whereas exclusion activated the anterior medial frontal cortex. Moreover, the prefrontal cortex was activated more by substitution processes when exposed to background speech than when exposed to aircraft noise. These results indicate that (a) the prefrontal cortex plays a special role when task-irrelevant materials should be denied access to working memory and (b) that, when compensating for different types of noise, either different cognitive mechanisms are involved or those cognitive mechanisms that are involved are involved to different degrees. © 2014 The Authors. Scandinavian Journal of Psychology published by Scandinavian Psychological Associations and John Wiley & Sons Ltd.

  7. DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement

    DEFF Research Database (Denmark)

    C. Hendriks, Richard; Gerkmann, Timo; Jensen, Jesper

    As speech processing devices like mobile phones, voice controlled devices, and hearing aids have increased in popularity, people expect them to work anywhere and at any time without user intervention. However, the presence of acoustical disturbances limits the use of these applications, degrades...... their performance, or causes the user difficulties in understanding the conversation or appreciating the device. A common way to reduce the effects of such disturbances is through the use of single-microphone noise reduction algorithms for speech enhancement. The field of single-microphone noise reduction...

  8. The benefit obtained from visually displayed text from an automatic speech recognizer during listening to speech presented in noise

    NARCIS (Netherlands)

    Zekveld, A.A.; Kramer, S.E.; Kessens, J.M.; Vlaming, M.S.M.G.; Houtgast, T.

    2008-01-01

    OBJECTIVES: The aim of this study was to evaluate the benefit that listeners obtain from visually presented output from an automatic speech recognition (ASR) system during listening to speech in noise. DESIGN: Auditory-alone and audiovisual speech reception thresholds (SRTs) were measured. The SRT

  9. Effects of Background Noise on Cortical Encoding of Speech in Autism Spectrum Disorders

    Science.gov (United States)

    Russo, Nicole; Zecker, Steven; Trommer, Barbara; Chen, Julia; Kraus, Nina

    2009-01-01

    This study provides new evidence of deficient auditory cortical processing of speech in noise in autism spectrum disorders (ASD). Speech-evoked responses (approximately 100-300 ms) in quiet and background noise were evaluated in typically-developing (TD) children and children with ASD. ASD responses showed delayed timing (both conditions) and…

  10. Simulation of modified hybrid noise reduction algorithm to enhance the speech quality

    International Nuclear Information System (INIS)

    Waqas, A.; Muhammad, T.; Jamal, H.

    2013-01-01

    Speech is the most essential method of correspondence of humankind. Cell telephony, portable hearing assistants and, hands free are specific provisions in this respect. The performance of these communication devices could be affected because of distortions which might augment them. There are two essential sorts of distortions that might be recognized, specifically: convolutive and additive noises. These mutilations contaminate the clean speech and make it unsatisfactory to human audiences i.e. perceptual value and intelligibility of speech signal diminishes. The objective of speech upgrade systems is to enhance the quality and understandability of speech to make it more satisfactory to audiences. This paper recommends a modified hybrid approach for single channel devices to process the noisy signals considering only the effect of background noises. It is a mixture of pre-processing relative spectral amplitude (RASTA) filter, which is approximated by a straight forward 4th order band-pass filter, and conventional minimum mean square error short time spectral amplitude (MMSE STSA85) estimator. To analyze the performance of the algorithm an objective parameter called Perceptual estimation of speech quality (PESQ) is measured. The results show that the modified algorithm performs well to remove the background noises. SIMULINK implementation is also performed and its profile report has been generated to observe the execution time. (author)

  11. Children's Speech Perception in Noise: Evidence for Dissociation From Language and Working Memory.

    Science.gov (United States)

    Magimairaj, Beula M; Nagaraj, Naveen K; Benafield, Natalie J

    2018-05-17

    We examined the association between speech perception in noise (SPIN), language abilities, and working memory (WM) capacity in school-age children. Existing studies supporting the Ease of Language Understanding (ELU) model suggest that WM capacity plays a significant role in adverse listening situations. Eighty-three children between the ages of 7 to 11 years participated. The sample represented a continuum of individual differences in attention, memory, and language abilities. All children had normal-range hearing and normal-range nonverbal IQ. Children completed the Bamford-Kowal-Bench Speech-in-Noise Test (BKB-SIN; Etymotic Research, 2005), a selective auditory attention task, and multiple measures of language and WM. Partial correlations (controlling for age) showed significant positive associations among attention, memory, and language measures. However, BKB-SIN did not correlate significantly with any of the other measures. Principal component analysis revealed a distinct WM factor and a distinct language factor. BKB-SIN loaded robustly as a distinct 3rd factor with minimal secondary loading from sentence recall and short-term memory. Nonverbal IQ loaded as a 4th factor. Results did not support an association between SPIN and WM capacity in children. However, in this study, a single SPIN measure was used. Future studies using multiple SPIN measures are warranted. Evidence from the current study supports the use of BKB-SIN as clinical measure of speech perception ability because it was not influenced by variation in children's language and memory abilities. More large-scale studies in school-age children are needed to replicate the proposed role played by WM in adverse listening situations.

  12. Effects of irrelevant speech and traffic noise on speech perception and cognitive performance in elementary school children.

    Science.gov (United States)

    Klatte, Maria; Meis, Markus; Sukowski, Helga; Schick, August

    2007-01-01

    The effects of background noise of moderate intensity on short-term storage and processing of verbal information were analyzed in 6 to 8 year old children. In line with adult studies on "irrelevant sound effect" (ISE), serial recall of visually presented digits was severely disrupted by background speech that the children did not understand. Train noises of equal Intensity however, had no effect. Similar results were demonstrated with tasks requiring storage and processing of heard information. Memory for nonwords, execution of oral instructions and categorizing speech sounds were significantly disrupted by irrelevant speech. The affected functions play a fundamental role in the acquisition of spoken and written language. Implications concerning current models of the ISE and the acoustic conditions in schools and kindergardens are discussed.

  13. Reliability of Interaural Time Difference-Based Localization Training in Elderly Individuals with Speech-in-Noise Perception Disorder.

    Science.gov (United States)

    Delphi, Maryam; Lotfi, M-Yones; Moossavi, Abdollah; Bakhshi, Enayatollah; Banimostafa, Maryam

    2017-09-01

    Previous studies have shown that interaural-time-difference (ITD) training can improve localization ability. Surprisingly little is, however, known about localization training vis-à-vis speech perception in noise based on interaural time difference in the envelope (ITD ENV). We sought to investigate the reliability of an ITD ENV-based training program in speech-in-noise perception among elderly individuals with normal hearing and speech-in-noise disorder. The present interventional study was performed during 2016. Sixteen elderly men between 55 and 65 years of age with the clinical diagnosis of normal hearing up to 2000 Hz and speech-in-noise perception disorder participated in this study. The training localization program was based on changes in ITD ENV. In order to evaluate the reliability of the training program, we performed speech-in-noise tests before the training program, immediately afterward, and then at 2 months' follow-up. The reliability of the training program was analyzed using the Friedman test and the SPSS software. Significant statistical differences were shown in the mean scores of speech-in-noise perception between the 3 time points (P=0.001). The results also indicated no difference in the mean scores of speech-in-noise perception between the 2 time points of immediately after the training program and 2 months' follow-up (P=0.212). The present study showed the reliability of an ITD ENV-based localization training in elderly individuals with speech-in-noise perception disorder.

  14. Auditory Peripheral Processing of Degraded Speech

    National Research Council Canada - National Science Library

    Ghitza, Oded

    2003-01-01

    ...". The underlying thesis is that the auditory periphery contributes to the robust performance of humans in speech reception in noise through a concerted contribution of the efferent feedback system...

  15. Modeling speech intelligibility based on the signal-to-noise envelope power ratio

    DEFF Research Database (Denmark)

    Jørgensen, Søren

    of modulation frequency selectivity in the auditory processing of sound with a decision metric for intelligibility that is based on the signal-to-noise envelope power ratio (SNRenv). The proposed speech-based envelope power spectrum model (sEPSM) is demonstrated to account for the effects of stationary...... through three commercially available mobile phones. The model successfully accounts for the performance across the phones in conditions with a stationary speech-shaped background noise, whereas deviations were observed in conditions with “Traffic” and “Pub” noise. Overall, the results of this thesis...

  16. Auditory and Cognitive Factors Associated with Speech-in-Noise Complaints following Mild Traumatic Brain Injury.

    Science.gov (United States)

    Hoover, Eric C; Souza, Pamela E; Gallun, Frederick J

    2017-04-01

    Auditory complaints following mild traumatic brain injury (MTBI) are common, but few studies have addressed the role of auditory temporal processing in speech recognition complaints. In this study, deficits understanding speech in a background of speech noise following MTBI were evaluated with the goal of comparing the relative contributions of auditory and nonauditory factors. A matched-groups design was used in which a group of listeners with a history of MTBI were compared to a group matched in age and pure-tone thresholds, as well as a control group of young listeners with normal hearing (YNH). Of the 33 listeners who participated in the study, 13 were included in the MTBI group (mean age = 46.7 yr), 11 in the Matched group (mean age = 49 yr), and 9 in the YNH group (mean age = 20.8 yr). Speech-in-noise deficits were evaluated using subjective measures as well as monaural word (Words-in-Noise test) and sentence (Quick Speech-in-Noise test) tasks, and a binaural spatial release task. Performance on these measures was compared to psychophysical tasks that evaluate monaural and binaural temporal fine-structure tasks and spectral resolution. Cognitive measures of attention, processing speed, and working memory were evaluated as possible causes of differences between MTBI and Matched groups that might contribute to speech-in-noise perception deficits. A high proportion of listeners in the MTBI group reported difficulty understanding speech in noise (84%) compared to the Matched group (9.1%), and listeners who reported difficulty were more likely to have abnormal results on objective measures of speech in noise. No significant group differences were found between the MTBI and Matched listeners on any of the measures reported, but the number of abnormal tests differed across groups. Regression analysis revealed that a combination of auditory and auditory processing factors contributed to monaural speech-in-noise scores, but the benefit of spatial separation was

  17. White noise speech illusion and psychosis expression: An experimental investigation of psychosis liability.

    Science.gov (United States)

    Pries, Lotta-Katrin; Guloksuz, Sinan; Menne-Lothmann, Claudia; Decoster, Jeroen; van Winkel, Ruud; Collip, Dina; Delespaul, Philippe; De Hert, Marc; Derom, Catherine; Thiery, Evert; Jacobs, Nele; Wichers, Marieke; Simons, Claudia J P; Rutten, Bart P F; van Os, Jim

    2017-01-01

    An association between white noise speech illusion and psychotic symptoms has been reported in patients and their relatives. This supports the theory that bottom-up and top-down perceptual processes are involved in the mechanisms underlying perceptual abnormalities. However, findings in nonclinical populations have been conflicting. The aim of this study was to examine the association between white noise speech illusion and subclinical expression of psychotic symptoms in a nonclinical sample. Findings were compared to previous results to investigate potential methodology dependent differences. In a general population adolescent and young adult twin sample (n = 704), the association between white noise speech illusion and subclinical psychotic experiences, using the Structured Interview for Schizotypy-Revised (SIS-R) and the Community Assessment of Psychic Experiences (CAPE), was analyzed using multilevel logistic regression analyses. Perception of any white noise speech illusion was not associated with either positive or negative schizotypy in the general population twin sample, using the method by Galdos et al. (2011) (positive: ORadjusted: 0.82, 95% CI: 0.6-1.12, p = 0.217; negative: ORadjusted: 0.75, 95% CI: 0.56-1.02, p = 0.065) and the method by Catalan et al. (2014) (positive: ORadjusted: 1.11, 95% CI: 0.79-1.57, p = 0.557). No association was found between CAPE scores and speech illusion (ORadjusted: 1.25, 95% CI: 0.88-1.79, p = 0.220). For the Catalan et al. (2014) but not the Galdos et al. (2011) method, a negative association was apparent between positive schizotypy and speech illusion with positive or negative affective valence (ORadjusted: 0.44, 95% CI: 0.24-0.81, p = 0.008). Contrary to findings in clinical populations, white noise speech illusion may not be associated with psychosis proneness in nonclinical populations.

  18. CAR2 - Czech Database of Car Speech

    Directory of Open Access Journals (Sweden)

    P. Sovka

    1999-12-01

    Full Text Available This paper presents new Czech language two-channel (stereo speech database recorded in car environment. The created database was designed for experiments with speech enhancement for communication purposes and for the study and the design of a robust speech recognition systems. Tools for automated phoneme labelling based on Baum-Welch re-estimation were realised. The noise analysis of the car background environment was done.

  19. CAR2 - Czech Database of Car Speech

    OpenAIRE

    Pollak, P.; Vopicka, J.; Hanzl, V.; Sovka, Pavel

    1999-01-01

    This paper presents new Czech language two-channel (stereo) speech database recorded in car environment. The created database was designed for experiments with speech enhancement for communication purposes and for the study and the design of a robust speech recognition systems. Tools for automated phoneme labelling based on Baum-Welch re-estimation were realised. The noise analysis of the car background environment was done.

  20. Reliability of Interaural Time Difference-Based Localization Training in Elderly Individuals with Speech-in-Noise Perception Disorder

    Directory of Open Access Journals (Sweden)

    Maryam Delphi

    2017-09-01

    Full Text Available Background: Previous studies have shown that interaural-time-difference (ITD training can improve localization ability. Surprisingly little is, however, known about localization training vis-à-vis speech perception in noise based on interaural time difference in the envelope (ITD ENV. We sought to investigate the reliability of an ITD ENV-based training program in speech-in-noise perception among elderly individuals with normal hearing and speech-in-noise disorder. Methods: The present interventional study was performed during 2016. Sixteen elderly men between 55 and 65 years of age with the clinical diagnosis of normal hearing up to 2000 Hz and speech-in-noise perception disorder participated in this study. The training localization program was based on changes in ITD ENV. In order to evaluate the reliability of the training program, we performed speech-in-noise tests before the training program, immediately afterward, and then at 2 months’ follow-up. The reliability of the training program was analyzed using the Friedman test and the SPSS software. Results: Significant statistical differences were shown in the mean scores of speech-in-noise perception between the 3 time points (P=0.001. The results also indicated no difference in the mean scores of speech-in-noise perception between the 2 time points of immediately after the training program and 2 months’ follow-up (P=0.212. Conclusion: The present study showed the reliability of an ITD ENV-based localization training in elderly individuals with speech-in-noise perception disorder.

  1. Some Neurocognitive Correlates of Noise-Vocoded Speech Perception in Children with Normal Hearing: A Replication and Extension of Eisenberg et al., 2002

    Science.gov (United States)

    Roman, Adrienne S.; Pisoni, David B.; Kronenberger, William G.; Faulkner, Kathleen F.

    2016-01-01

    better than lexically hard words in sentences. Older children perceived noise-vocoded speech better than younger children. Finally, we found that measures of auditory attention and short-term memory capacity were significantly correlated with a child’s ability to perceive noise-vocoded isolated words and sentences. Conclusions First, we successfully replicated the major findings from the Eisenberg et al. (2002) study. Because familiarity, phonological distinctiveness and lexical competition affect word recognition, these findings provide additional support for the proposal that several foundational elementary neurocognitive processes underlie the perception of spectrally-degraded speech. Second, we found strong and significant correlations between performance on neurocognitive measures and children’s ability to recognize words and sentences noise-vocoded to four spectral channels. These findings extend earlier research suggesting that perception of spectrally-degraded speech reflects early peripheral auditory processes as well as additional contributions of executive function, specifically, selective attention and short-term memory processes in spoken word recognition. The present findings suggest that auditory attention and short-term memory support robust spoken word recognition in children with NH even under compromised and challenging listening conditions. These results are relevant to research carried out with listeners who have hearing loss, since they are routinely required to encode, process and understand spectrally-degraded acoustic signals. PMID:28045787

  2. Speech Recognition in Real-Life Background Noise by Young and Middle-Aged Adults with Normal Hearing

    OpenAIRE

    Lee, Ji Young; Lee, Jin Tae; Heo, Hye Jeong; Choi, Chul-Hee; Choi, Seong Hee; Lee, Kyungjae

    2015-01-01

    Background and Objectives People usually converse in real-life background noise. They experience more difficulty understanding speech in noise than in a quiet environment. The present study investigated how speech recognition in real-life background noise is affected by the type of noise, signal-to-noise ratio (SNR), and age. Subjects and Methods Eighteen young adults and fifteen middle-aged adults with normal hearing participated in the present study. Three types of noise [subway noise, vacu...

  3. Relation between speech-in-noise threshold, hearing loss and cognition from 40-69 years of age.

    Science.gov (United States)

    Moore, David R; Edmondson-Jones, Mark; Dawes, Piers; Fortnum, Heather; McCormack, Abby; Pierzycki, Robert H; Munro, Kevin J

    2014-01-01

    Healthy hearing depends on sensitive ears and adequate brain processing. Essential aspects of both hearing and cognition decline with advancing age, but it is largely unknown how one influences the other. The current standard measure of hearing, the pure-tone audiogram is not very cognitively demanding and does not predict well the most important yet challenging use of hearing, listening to speech in noisy environments. We analysed data from UK Biobank that asked 40-69 year olds about their hearing, and assessed their ability on tests of speech-in-noise hearing and cognition. About half a million volunteers were recruited through NHS registers. Respondents completed 'whole-body' testing in purpose-designed, community-based test centres across the UK. Objective hearing (spoken digit recognition in noise) and cognitive (reasoning, memory, processing speed) data were analysed using logistic and multiple regression methods. Speech hearing in noise declined exponentially with age for both sexes from about 50 years, differing from previous audiogram data that showed a more linear decline from speech-in-noise hearing was especially dramatic among those with lower cognitive scores. Decreasing cognitive ability and increasing age were both independently associated with decreasing ability to hear speech-in-noise (0.70 and 0.89 dB, respectively) among the population studied. Men subjectively reported up to 60% higher rates of difficulty hearing than women. Workplace noise history associated with difficulty in both subjective hearing and objective speech hearing in noise. Leisure noise history was associated with subjective, but not with objective difficulty hearing. Older people have declining cognitive processing ability associated with reduced ability to hear speech in noise, measured by recognition of recorded spoken digits. Subjective reports of hearing difficulty generally show a higher prevalence than objective measures, suggesting that current objective methods could

  4. Cortical Mechanisms of Speech Perception in Noise

    Science.gov (United States)

    Wong, Patrick C. M.; Uppunda, Ajith K.; Parrish, Todd B.; Dhar, Sumitrajit

    2008-01-01

    Purpose: The present study examines the brain basis of listening to spoken words in noise, which is a ubiquitous characteristic of communication, with the focus on the dorsal auditory pathway. Method: English-speaking young adults identified single words in 3 listening conditions while their hemodynamic response was measured using fMRI: speech in…

  5. Effects of noise and working memory capacity on memory processing of speech for hearing-aid users.

    Science.gov (United States)

    Ng, Elaine Hoi Ning; Rudner, Mary; Lunner, Thomas; Pedersen, Michael Syskind; Rönnberg, Jerker

    2013-07-01

    It has been shown that noise reduction algorithms can reduce the negative effects of noise on memory processing in persons with normal hearing. The objective of the present study was to investigate whether a similar effect can be obtained for persons with hearing impairment and whether such an effect is dependent on individual differences in working memory capacity. A sentence-final word identification and recall (SWIR) test was conducted in two noise backgrounds with and without noise reduction as well as in quiet. Working memory capacity was measured using a reading span (RS) test. Twenty-six experienced hearing-aid users with moderate to moderately severe sensorineural hearing loss. Noise impaired recall performance. Competing speech disrupted memory performance more than speech-shaped noise. For late list items the disruptive effect of the competing speech background was virtually cancelled out by noise reduction for persons with high working memory capacity. Noise reduction can reduce the adverse effect of noise on memory for speech for persons with good working memory capacity. We argue that the mechanism behind this is faster word identification that enhances encoding into working memory.

  6. Deep Learning-Based Noise Reduction Approach to Improve Speech Intelligibility for Cochlear Implant Recipients.

    Science.gov (United States)

    Lai, Ying-Hui; Tsao, Yu; Lu, Xugang; Chen, Fei; Su, Yu-Ting; Chen, Kuang-Chao; Chen, Yu-Hsuan; Chen, Li-Ching; Po-Hung Li, Lieber; Lee, Chin-Hui

    2018-01-20

    We investigate the clinical effectiveness of a novel deep learning-based noise reduction (NR) approach under noisy conditions with challenging noise types at low signal to noise ratio (SNR) levels for Mandarin-speaking cochlear implant (CI) recipients. The deep learning-based NR approach used in this study consists of two modules: noise classifier (NC) and deep denoising autoencoder (DDAE), thus termed (NC + DDAE). In a series of comprehensive experiments, we conduct qualitative and quantitative analyses on the NC module and the overall NC + DDAE approach. Moreover, we evaluate the speech recognition performance of the NC + DDAE NR and classical single-microphone NR approaches for Mandarin-speaking CI recipients under different noisy conditions. The testing set contains Mandarin sentences corrupted by two types of maskers, two-talker babble noise, and a construction jackhammer noise, at 0 and 5 dB SNR levels. Two conventional NR techniques and the proposed deep learning-based approach are used to process the noisy utterances. We qualitatively compare the NR approaches by the amplitude envelope and spectrogram plots of the processed utterances. Quantitative objective measures include (1) normalized covariance measure to test the intelligibility of the utterances processed by each of the NR approaches; and (2) speech recognition tests conducted by nine Mandarin-speaking CI recipients. These nine CI recipients use their own clinical speech processors during testing. The experimental results of objective evaluation and listening test indicate that under challenging listening conditions, the proposed NC + DDAE NR approach yields higher intelligibility scores than the two compared classical NR techniques, under both matched and mismatched training-testing conditions. When compared to the two well-known conventional NR techniques under challenging listening condition, the proposed NC + DDAE NR approach has superior noise suppression capabilities and gives less distortion

  7. When cognition kicks in: Working memory and speech understanding in noise

    Directory of Open Access Journals (Sweden)

    Jerker Ronnberg

    2010-01-01

    Full Text Available Perceptual load and cognitive load can be separately manipulated and dissociated in their effects on speech understanding in noise. The Ease of Language Understanding model assumes a theoretical position where perceptual task characteristics interact with the individual′s implicit capacities to extract the phonological elements of speech. Phonological precision and speed of lexical access are important determinants for listening in adverse conditions. If there are mismatches between the phonological elements perceived and phonological representations in long-term memory, explicit working memory (WM-related capacities will be continually invoked to reconstruct and infer the contents of the ongoing discourse. Whether this induces a high cognitive load or not will in turn depend on the individual′s storage and processing capacities in WM. Data suggest that modulated noise maskers may serve as triggers for speech maskers and therefore induce a WM, explicit mode of processing. Individuals with high WM capacity benefit more than low WM-capacity individuals from fast amplitude compression at low or negative input speech-to-noise ratios. The general conclusion is that there is an overarching interaction between the focal purpose of processing in the primary listening task and the extent to which a secondary, distracting task taps into these processes.

  8. Contribution of resolved and unresolved harmonic regions to brainstem speech-evoked responses in quiet and in background noise

    Directory of Open Access Journals (Sweden)

    M. Laroche

    2011-03-01

    Full Text Available Speech auditory brainstem responses (speech ABR reflect activity that is phase-locked to the harmonics of the fundamental frequency (F0 up to at least the first formant (F1. Recent evidence suggests that responses at F0 in the presence of noise are more robust than responses at F1, and are also dissociated in some learning-impaired children. Peripheral auditory processing can be broadly divided into resolved and unresolved harmonic regions. This study investigates the contribution of these two regions to the speech ABR, and their susceptibility to noise. We recorded, in quiet and in background white noise, evoked responses in twelve normal hearing adults in response to three variants of a synthetic vowel: i Allformants, which contains all first three formants, ii F1Only, which is dominated by resolved harmonics, and iii F2&F3Only, which is dominated by unresolved harmonics. There were no statistically significant differences in the response at F0 due to the three variants of the stimulus in quiet, nor did the noise affect this response with the Allformants and F1Only variants. On the other hand, the response at F0 with the F2&F3Only variant was significantly weaker in noise than with the two other variants (p<0.001. With the response at F1, there was no difference with the Allformants and F1Only variants in quiet, but was expectedly weaker with the F2&F3Only variant (p<0.01. The addition of noise significantly weakened the response at F1 with the F1Only variant (p<0.05, but this weakening only tended towards significance with the Allformants variant (p=0.07. The results of this study indicate that resolved and unresolved harmonics are processed in different but interacting pathways that converge in the upper brainstem. The results also support earlier work on the differential susceptibility of responses at F0 and F1 to added noise.

  9. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  10. Speech-in-noise perception deficit in adults with dyslexia: effects of background type and listening configuration.

    Science.gov (United States)

    Dole, Marjorie; Hoen, Michel; Meunier, Fanny

    2012-06-01

    Developmental dyslexia is associated with impaired speech-in-noise perception. The goal of the present research was to further characterize this deficit in dyslexic adults. In order to specify the mechanisms and processing strategies used by adults with dyslexia during speech-in-noise perception, we explored the influence of background type, presenting single target-words against backgrounds made of cocktail party sounds, modulated speech-derived noise or stationary noise. We also evaluated the effect of three listening configurations differing in terms of the amount of spatial processing required. In a monaural condition, signal and noise were presented to the same ear while in a dichotic situation, target and concurrent sound were presented to two different ears, finally in a spatialised configuration, target and competing signals were presented as if they originated from slightly differing positions in the auditory scene. Our results confirm the presence of a speech-in-noise perception deficit in dyslexic adults, in particular when the competing signal is also speech, and when both signals are presented to the same ear, an observation potentially relating to phonological accounts of dyslexia. However, adult dyslexics demonstrated better levels of spatial release of masking than normal reading controls when the background was speech, suggesting that they are well able to rely on denoising strategies based on spatial auditory scene analysis strategies. Copyright © 2012 Elsevier Ltd. All rights reserved.

  11. Masking Period Patterns & Forward Masking for Speech-Shaped Noise: Age-related effects

    Science.gov (United States)

    Grose, John H.; Menezes, Denise C.; Porter, Heather L.; Griz, Silvana

    2015-01-01

    Objective The purpose of this study was to assess age-related changes in temporal resolution in listeners with relatively normal audiograms. The hypothesis was that increased susceptibility to non-simultaneous masking contributes to the hearing difficulties experienced by older listeners in complex fluctuating backgrounds. Design Participants included younger (n = 11), middle-aged (n = 12), and older (n = 11) listeners with relatively normal audiograms. The first phase of the study measured masking period patterns for speech-shaped noise maskers and signals. From these data, temporal window shapes were derived. The second phase measured forward-masking functions, and assessed how well the temporal window fits accounted for these data. Results The masking period patterns demonstrated increased susceptibility to backward masking in the older listeners, compatible with a more symmetric temporal window in this group. The forward-masking functions exhibited an age-related decline in recovery to baseline thresholds, and there was also an increase in the variability of the temporal window fits to these data. Conclusions This study demonstrated an age-related increase in susceptibility to non-simultaneous masking, supporting the hypothesis that exacerbated non-simultaneous masking contributes to age-related difficulties understanding speech in fluctuating noise. Further support for this hypothesis comes from limited speech-in-noise data suggesting an association between susceptibility to forward masking and speech understanding in modulated noise. PMID:26230495

  12. Robust image authentication in the presence of noise

    CERN Document Server

    2015-01-01

    This book addresses the problems that hinder image authentication in the presence of noise. It considers the advantages and disadvantages of existing algorithms for image authentication and shows new approaches and solutions for robust image authentication. The state of the art algorithms are compared and, furthermore, innovative approaches and algorithms are introduced. The introduced algorithms are applied to improve image authentication, watermarking and biometry.    Aside from presenting new directions and algorithms for robust image authentication in the presence of noise, as well as image correction, this book also:   Provides an overview of the state of the art algorithms for image authentication in the presence of noise and modifications, as well as a comparison of these algorithms, Presents novel algorithms for robust image authentication, whereby the image is tried to be corrected and authenticated, Examines different views for the solution of problems connected to image authentication in the pre...

  13. A new time-adaptive discrete bionic wavelet transform for enhancing speech from adverse noise environment

    Science.gov (United States)

    Palaniswamy, Sumithra; Duraisamy, Prakash; Alam, Mohammad Showkat; Yuan, Xiaohui

    2012-04-01

    Automatic speech processing systems are widely used in everyday life such as mobile communication, speech and speaker recognition, and for assisting the hearing impaired. In speech communication systems, the quality and intelligibility of speech is of utmost importance for ease and accuracy of information exchange. To obtain an intelligible speech signal and one that is more pleasant to listen, noise reduction is essential. In this paper a new Time Adaptive Discrete Bionic Wavelet Thresholding (TADBWT) scheme is proposed. The proposed technique uses Daubechies mother wavelet to achieve better enhancement of speech from additive non- stationary noises which occur in real life such as street noise and factory noise. Due to the integration of human auditory system model into the wavelet transform, bionic wavelet transform (BWT) has great potential for speech enhancement which may lead to a new path in speech processing. In the proposed technique, at first, discrete BWT is applied to noisy speech to derive TADBWT coefficients. Then the adaptive nature of the BWT is captured by introducing a time varying linear factor which updates the coefficients at each scale over time. This approach has shown better performance than the existing algorithms at lower input SNR due to modified soft level dependent thresholding on time adaptive coefficients. The objective and subjective test results confirmed the competency of the TADBWT technique. The effectiveness of the proposed technique is also evaluated for speaker recognition task under noisy environment. The recognition results show that the TADWT technique yields better performance when compared to alternate methods specifically at lower input SNR.

  14. Requirements for the evaluation of computational speech segregation systems

    DEFF Research Database (Denmark)

    May, Tobias; Dau, Torsten

    2014-01-01

    Recent studies on computational speech segregation reported improved speech intelligibility in noise when estimating and applying an ideal binary mask with supervised learning algorithms. However, an important requirement for such systems in technical applications is their robustness to acoustic...... associated with perceptual attributes in speech segregation. The results could help establish a framework for a systematic evaluation of future segregation systems....

  15. Assessment of speech intelligibility in background noise and reverberation

    DEFF Research Database (Denmark)

    Nielsen, Jens Bo

    Reliable methods for assessing speech intelligibility are essential within hearing research, audiology, and related areas. Such methods can be used for obtaining a better understanding of how speech intelligibility is affected by, e.g., various environmental factors or different types of hearing...... impairment. In this thesis, two sentence-based tests for speech intelligibility in Danish were developed. The first test is the Conversational Language Understanding Evaluation (CLUE), which is based on the principles of the original American-English Hearing in Noise Test (HINT). The second test...... is a modified version of CLUE where the speech material and the scoring rules have been reconsidered. An extensive validation of the modified test was conducted with both normal-hearing and hearing-impaired listeners. The validation showed that the test produces reliable results for both groups of listeners...

  16. The Effect of Noise on Relationships Between Speech Intelligibility and Self-Reported Communication Measures in Tracheoesophageal Speakers.

    Science.gov (United States)

    Eadie, Tanya L; Otero, Devon Sawin; Bolt, Susan; Kapsner-Smith, Mara; Sullivan, Jessica R

    2016-08-01

    The purpose of this study was to examine how sentence intelligibility relates to self-reported communication in tracheoesophageal speakers when speech intelligibility is measured in quiet and noise. Twenty-four tracheoesophageal speakers who were at least 1 year postlaryngectomy provided audio recordings of 5 sentences from the Sentence Intelligibility Test. Speakers also completed self-reported measures of communication-the Voice Handicap Index-10 and the Communicative Participation Item Bank short form. Speech recordings were presented to 2 groups of inexperienced listeners who heard sentences in quiet or noise. Listeners transcribed the sentences to yield speech intelligibility scores. Very weak relationships were found between intelligibility in quiet and measures of voice handicap and communicative participation. Slightly stronger, but still weak and nonsignificant, relationships were observed between measures of intelligibility in noise and both self-reported measures. However, 12 speakers who were more than 65% intelligible in noise showed strong and statistically significant relationships with both self-reported measures (R2 = .76-.79). Speech intelligibility in quiet is a weak predictor of self-reported communication measures in tracheoesophageal speakers. Speech intelligibility in noise may be a better metric of self-reported communicative function for speakers who demonstrate higher speech intelligibility in noise.

  17. Speech Enhancement of Mobile Devices Based on the Integration of a Dual Microphone Array and a Background Noise Elimination Algorithm.

    Science.gov (United States)

    Chen, Yung-Yue

    2018-05-08

    Mobile devices are often used in our daily lives for the purposes of speech and communication. The speech quality of mobile devices is always degraded due to the environmental noises surrounding mobile device users. Regretfully, an effective background noise reduction solution cannot easily be developed for this speech enhancement problem. Due to these depicted reasons, a methodology is systematically proposed to eliminate the effects of background noises for the speech communication of mobile devices. This methodology integrates a dual microphone array with a background noise elimination algorithm. The proposed background noise elimination algorithm includes a whitening process, a speech modelling method and an H ₂ estimator. Due to the adoption of the dual microphone array, a low-cost design can be obtained for the speech enhancement of mobile devices. Practical tests have proven that this proposed method is immune to random background noises, and noiseless speech can be obtained after executing this denoise process.

  18. Speech Enhancement of Mobile Devices Based on the Integration of a Dual Microphone Array and a Background Noise Elimination Algorithm

    Directory of Open Access Journals (Sweden)

    Yung-Yue Chen

    2018-05-01

    Full Text Available Mobile devices are often used in our daily lives for the purposes of speech and communication. The speech quality of mobile devices is always degraded due to the environmental noises surrounding mobile device users. Regretfully, an effective background noise reduction solution cannot easily be developed for this speech enhancement problem. Due to these depicted reasons, a methodology is systematically proposed to eliminate the effects of background noises for the speech communication of mobile devices. This methodology integrates a dual microphone array with a background noise elimination algorithm. The proposed background noise elimination algorithm includes a whitening process, a speech modelling method and an H2 estimator. Due to the adoption of the dual microphone array, a low-cost design can be obtained for the speech enhancement of mobile devices. Practical tests have proven that this proposed method is immune to random background noises, and noiseless speech can be obtained after executing this denoise process.

  19. Audiomotor Perceptual Training Enhances Speech Intelligibility in Background Noise.

    Science.gov (United States)

    Whitton, Jonathon P; Hancock, Kenneth E; Shannon, Jeffrey M; Polley, Daniel B

    2017-11-06

    Sensory and motor skills can be improved with training, but learning is often restricted to practice stimuli. As an exception, training on closed-loop (CL) sensorimotor interfaces, such as action video games and musical instruments, can impart a broad spectrum of perceptual benefits. Here we ask whether computerized CL auditory training can enhance speech understanding in levels of background noise that approximate a crowded restaurant. Elderly hearing-impaired subjects trained for 8 weeks on a CL game that, like a musical instrument, challenged them to monitor subtle deviations between predicted and actual auditory feedback as they moved their fingertip through a virtual soundscape. We performed our study as a randomized, double-blind, placebo-controlled trial by training other subjects in an auditory working-memory (WM) task. Subjects in both groups improved at their respective auditory tasks and reported comparable expectations for improved speech processing, thereby controlling for placebo effects. Whereas speech intelligibility was unchanged after WM training, subjects in the CL training group could correctly identify 25% more words in spoken sentences or digit sequences presented in high levels of background noise. Numerically, CL audiomotor training provided more than three times the benefit of our subjects' hearing aids for speech processing in noisy listening conditions. Gains in speech intelligibility could be predicted from gameplay accuracy and baseline inhibitory control. However, benefits did not persist in the absence of continuing practice. These studies employ stringent clinical standards to demonstrate that perceptual learning on a computerized audio game can transfer to "real-world" communication challenges. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Robustness against parametric noise of nonideal holonomic gates

    International Nuclear Information System (INIS)

    Lupo, Cosmo; Aniello, Paolo; Napolitano, Mario; Florio, Giuseppe

    2007-01-01

    Holonomic gates for quantum computation are commonly considered to be robust against certain kinds of parametric noise, the cause of this robustness being the geometric character of the transformation achieved in the adiabatic limit. On the other hand, the effects of decoherence are expected to become more and more relevant when the adiabatic limit is approached. Starting from the system described by Florio et al. [Phys. Rev. A 73, 022327 (2006)], here we discuss the behavior of nonideal holonomic gates at finite operational time, i.e., long before the adiabatic limit is reached. We have considered several models of parametric noise and studied the robustness of finite-time gates. The results obtained suggest that the finite-time gates present some effects of cancellation of the perturbations introduced by the noise which mimic the geometrical cancellation effect of standard holonomic gates. Nevertheless, a careful analysis of the results leads to the conclusion that these effects are related to a dynamical instead of a geometrical feature

  1. Robustness against parametric noise of nonideal holonomic gates

    Science.gov (United States)

    Lupo, Cosmo; Aniello, Paolo; Napolitano, Mario; Florio, Giuseppe

    2007-07-01

    Holonomic gates for quantum computation are commonly considered to be robust against certain kinds of parametric noise, the cause of this robustness being the geometric character of the transformation achieved in the adiabatic limit. On the other hand, the effects of decoherence are expected to become more and more relevant when the adiabatic limit is approached. Starting from the system described by Florio [Phys. Rev. A 73, 022327 (2006)], here we discuss the behavior of nonideal holonomic gates at finite operational time, i.e., long before the adiabatic limit is reached. We have considered several models of parametric noise and studied the robustness of finite-time gates. The results obtained suggest that the finite-time gates present some effects of cancellation of the perturbations introduced by the noise which mimic the geometrical cancellation effect of standard holonomic gates. Nevertheless, a careful analysis of the results leads to the conclusion that these effects are related to a dynamical instead of a geometrical feature.

  2. Masking Period Patterns and Forward Masking for Speech-Shaped Noise: Age-Related Effects.

    Science.gov (United States)

    Grose, John H; Menezes, Denise C; Porter, Heather L; Griz, Silvana

    2016-01-01

    The purpose of this study was to assess age-related changes in temporal resolution in listeners with relatively normal audiograms. The hypothesis was that increased susceptibility to nonsimultaneous masking contributes to the hearing difficulties experienced by older listeners in complex fluctuating backgrounds. Participants included younger (n = 11), middle-age (n = 12), and older (n = 11) listeners with relatively normal audiograms. The first phase of the study measured masking period patterns for speech-shaped noise maskers and signals. From these data, temporal window shapes were derived. The second phase measured forward-masking functions and assessed how well the temporal window fits accounted for these data. The masking period patterns demonstrated increased susceptibility to backward masking in the older listeners, compatible with a more symmetric temporal window in this group. The forward-masking functions exhibited an age-related decline in recovery to baseline thresholds, and there was also an increase in the variability of the temporal window fits to these data. This study demonstrated an age-related increase in susceptibility to nonsimultaneous masking, supporting the hypothesis that exacerbated nonsimultaneous masking contributes to age-related difficulties understanding speech in fluctuating noise. Further support for this hypothesis comes from limited speech-in-noise data, suggesting an association between susceptibility to forward masking and speech understanding in modulated noise.

  3. Hybrid model decomposition of speech and noise in a radial basis function neural model framework

    DEFF Research Database (Denmark)

    Sørensen, Helge Bjarup Dissing; Hartmann, Uwe

    1994-01-01

    The aim of the paper is to focus on a new approach to automatic speech recognition in noisy environments where the noise has either stationary or non-stationary statistical characteristics. The aim is to perform automatic recognition of speech in the presence of additive car noise. The technique...

  4. How much does language proficiency by non-native listeners influence speech audiometric tests in noise?

    Science.gov (United States)

    Warzybok, Anna; Brand, Thomas; Wagener, Kirsten C; Kollmeier, Birger

    2015-01-01

    The current study investigates the extent to which the linguistic complexity of three commonly employed speech recognition tests and second language proficiency influence speech recognition thresholds (SRTs) in noise in non-native listeners. SRTs were measured for non-natives and natives using three German speech recognition tests: the digit triplet test (DTT), the Oldenburg sentence test (OLSA), and the Göttingen sentence test (GÖSA). Sixty-four non-native and eight native listeners participated. Non-natives can show native-like SRTs in noise only for the linguistically easy speech material (DTT). Furthermore, the limitation of phonemic-acoustical cues in digit triplets affects speech recognition to the same extent in non-natives and natives. For more complex and less familiar speech materials, non-natives, ranging from basic to advanced proficiency in German, require on average 3-dB better signal-to-noise ratio for the OLSA and 6-dB for the GÖSA to obtain 50% speech recognition compared to native listeners. In clinical audiology, SRT measurements with a closed-set speech test (i.e. DTT for screening or OLSA test for clinical purposes) should be used with non-native listeners rather than open-set speech tests (such as the GÖSA or HINT), especially if a closed-set version in the patient's own native language is available.

  5. The Contribution of Cognitive Factors to Individual Differences in Understanding Noise-Vocoded Speech in Young and Older Adults

    Directory of Open Access Journals (Sweden)

    Stephanie Rosemann

    2017-06-01

    Full Text Available Noise-vocoded speech is commonly used to simulate the sensation after cochlear implantation as it consists of spectrally degraded speech. High individual variability exists in learning to understand both noise-vocoded speech and speech perceived through a cochlear implant (CI. This variability is partly ascribed to differing cognitive abilities like working memory, verbal skills or attention. Although clinically highly relevant, up to now, no consensus has been achieved about which cognitive factors exactly predict the intelligibility of speech in noise-vocoded situations in healthy subjects or in patients after cochlear implantation. We aimed to establish a test battery that can be used to predict speech understanding in patients prior to receiving a CI. Young and old healthy listeners completed a noise-vocoded speech test in addition to cognitive tests tapping on verbal memory, working memory, lexicon and retrieval skills as well as cognitive flexibility and attention. Partial-least-squares analysis revealed that six variables were important to significantly predict vocoded-speech performance. These were the ability to perceive visually degraded speech tested by the Text Reception Threshold, vocabulary size assessed with the Multiple Choice Word Test, working memory gauged with the Operation Span Test, verbal learning and recall of the Verbal Learning and Retention Test and task switching abilities tested by the Comprehensive Trail-Making Test. Thus, these cognitive abilities explain individual differences in noise-vocoded speech understanding and should be considered when aiming to predict hearing-aid outcome.

  6. The effects of meaningful irrelevant speech and road traffic noise on teachers' attention, episodic and semantic memory.

    Science.gov (United States)

    Enmarker, Ingela

    2004-11-01

    The aim of the present experiment was to examine the effects of meaningful irrelevant speech and road traffic noise on attention, episodic and semantic memory, and also to examine whether the noise effects were age-dependent. A total of 96 male and female teachers in the age range of 35-45 and 55-65 years were randomly assigned to a silent or the two noise conditions. Noise effects found in episodic memory were limited to a meaningful text, where cued recall contrary to expectations was equally impaired by the two types of noise. However, meaningful irrelevant speech also deteriorated recognition of the text, whereas road traffic noise caused no decrement. Retrieval from two word fluency tests in semantic memory showed strong effects of noise exposure, one affected by meaningful irrelevant speech and the other by road traffic noise. The results implied that both acoustic variation and the semantic interference could be of importance for noise impairments. The expected age-dependent noise effects did not show up.

  7. Gated audiovisual speech identification in silence vs. noise: effects on time and accuracy

    Science.gov (United States)

    Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

    2013-01-01

    This study investigated the degree to which audiovisual presentation (compared to auditory-only presentation) affected isolation point (IPs, the amount of time required for the correct identification of speech stimuli using a gating paradigm) in silence and noise conditions. The study expanded on the findings of Moradi et al. (under revision), using the same stimuli, but presented in an audiovisual instead of an auditory-only manner. The results showed that noise impeded the identification of consonants and words (i.e., delayed IPs and lowered accuracy), but not the identification of final words in sentences. In comparison with the previous study by Moradi et al., it can be concluded that the provision of visual cues expedited IPs and increased the accuracy of speech stimuli identification in both silence and noise. The implication of the results is discussed in terms of models for speech understanding. PMID:23801980

  8. Acceptable noise level (ANL) with Danish and non-semantic speech materials in adult hearing-aid users

    DEFF Research Database (Denmark)

    Olsen, Steen Østergaard; Lantz, Johannes; Nielsen, Lars Holme

    2012-01-01

    The acceptable noise level (ANL) test is used for quantification of the amount of background noise subjects accept when listening to speech. This study investigates Danish hearing-aid users' ANL performance using Danish and non-semantic speech signals, the repeatability of ANL, and the association...

  9. Seeing the talker's face supports executive processing of speech in steady state noise.

    Science.gov (United States)

    Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Rönnberg, Jerker; Rudner, Mary

    2013-01-01

    Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT; Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition) and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity (WMC). Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.

  10. ADSL Transceivers Applying DSM and Their Nonstationary Noise Robustness

    Directory of Open Access Journals (Sweden)

    Bostoen Tom

    2006-01-01

    Full Text Available Dynamic spectrum management (DSM comprises a new set of techniques for multiuser power allocation and/or detection in digital subscriber line (DSL networks. At the Alcatel Research and Innovation Labs, we have recently developed a DSM test bed, which allows the performance of DSM algorithms to be evaluated in practice. With this test bed, we have evaluated the performance of a DSM level-1 algorithm known as iterative water-filling in an ADSL scenario. This paper describes the results of, on the one hand, the performance gains achieved with iterative water-filling, and, on the other hand, the nonstationary noise robustness of DSM-enabled ADSL modems. It will be shown that DSM trades off nonstationary noise robustness for performance improvements. A new bit swap procedure is then introduced to increase the noise robustness when applying DSM.

  11. MMSE Estimator for Children’s Speech with Car and Weather Noise

    Science.gov (United States)

    Sayuthi, V.

    2018-04-01

    Previous research mentioned that most people need and use vehicles for various purposes, in this recent time and future, as a means of traveling. Many ways can be done in a vehicle, such as for enjoying entertainment, and doing work, so vehicles not just only as a means of traveling. In this study, we will examine the children’s speech from a girl in the vehicle that affected by noise disturbances from the sound source of car noise and the weather sound noise around it, in this case, the rainy weather noise. Vehicle sounds may be from car engine or car air conditioner. The minimum mean square error (MMSE) estimator is used as an attempt to obtain or detect the children’s clear speech by representing simulation research as random process signal that factored by the autocorrelation of both the child’s voice and the disturbance noise signal. This MMSE estimator can be considered as wiener filter as the clear sound are reconstructed again. We expected that the results of this study can help as the basis for development of entertainment or communication technology for passengers of vehicles in the future, particularly using MMSE estimators.

  12. Speech-in-noise screening tests by internet, part 3: test sensitivity for uncontrolled parameters in domestic usage

    NARCIS (Netherlands)

    Leensen, Monique C. J.; Dreschler, Wouter A.

    2013-01-01

    The online speech-in-noise test 'Earcheck' is sensitive for noise-induced hearing loss (NIHL). This study investigates effects of uncontrollable parameters in domestic self-screening, such as presentation level and transducer type, on speech reception thresholds (SRTs) obtained with Earcheck.

  13. Effects of Age and Working Memory Capacity on Speech Recognition Performance in Noise Among Listeners With Normal Hearing.

    Science.gov (United States)

    Gordon-Salant, Sandra; Cole, Stacey Samuels

    2016-01-01

    This study aimed to determine if younger and older listeners with normal hearing who differ on working memory span perform differently on speech recognition tests in noise. Older adults typically exhibit poorer speech recognition scores in noise than younger adults, which is attributed primarily to poorer hearing sensitivity and more limited working memory capacity in older than younger adults. Previous studies typically tested older listeners with poorer hearing sensitivity and shorter working memory spans than younger listeners, making it difficult to discern the importance of working memory capacity on speech recognition. This investigation controlled for hearing sensitivity and compared speech recognition performance in noise by younger and older listeners who were subdivided into high and low working memory groups. Performance patterns were compared for different speech materials to assess whether or not the effect of working memory capacity varies with the demands of the specific speech test. The authors hypothesized that (1) normal-hearing listeners with low working memory span would exhibit poorer speech recognition performance in noise than those with high working memory span; (2) older listeners with normal hearing would show poorer speech recognition scores than younger listeners with normal hearing, when the two age groups were matched for working memory span; and (3) an interaction between age and working memory would be observed for speech materials that provide contextual cues. Twenty-eight older (61 to 75 years) and 25 younger (18 to 25 years) normal-hearing listeners were assigned to groups based on age and working memory status. Northwestern University Auditory Test No. 6 words and Institute of Electrical and Electronics Engineers sentences were presented in noise using an adaptive procedure to measure the signal-to-noise ratio corresponding to 50% correct performance. Cognitive ability was evaluated with two tests of working memory (Listening

  14. Speech-in-Noise Perception Deficit in Adults with Dyslexia: Effects of Background Type and Listening Configuration

    Science.gov (United States)

    Dole, Marjorie; Hoen, Michel; Meunier, Fanny

    2012-01-01

    Developmental dyslexia is associated with impaired speech-in-noise perception. The goal of the present research was to further characterize this deficit in dyslexic adults. In order to specify the mechanisms and processing strategies used by adults with dyslexia during speech-in-noise perception, we explored the influence of background type,…

  15. A user-operated test of suprathreshold acuity in noise for adult hearing screening: The SUN (Speech Understanding in Noise) test.

    Science.gov (United States)

    Paglialonga, Alessia; Tognola, Gabriella; Grandori, Ferdinando

    2014-09-01

    A novel, user-operated test of suprathreshold acuity in noise for use in adult hearing screening (AHS) was developed. The Speech Understanding in Noise test (SUN) is a speech-in-noise test that makes use of a list of vowel-consonant-vowel (VCV) stimuli in background noise presented in a three-alternative forced choice (3AFC) paradigm by means of a touch sensitive screen. The test is automated, easy-to-use, and provides self-explanatory results (i.e., 'no hearing difficulties', or 'a hearing check would be advisable', or 'a hearing check is recommended'). The test was developed from its building blocks (VCVs and speech-shaped noise) through two main steps: (i) development of the test list through equalization of the intelligibility of test stimuli across the set and (ii) optimization of the test results through maximization of the test sensitivity and specificity. The test had 82.9% sensitivity and 85.9% specificity compared to conventional pure-tone screening, and 83.8% sensitivity and 83.9% specificity to identify individuals with disabling hearing impairment. Results obtained so far showed that the test could be easily performed by adults and older adults in less than one minute per ear and that its results were not influenced by ambient noise (up to 65dBA), suggesting that the test might be a viable method for AHS in clinical as well as non-clinical settings. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Musical Training during Early Childhood Enhances the Neural Encoding of Speech in Noise

    Science.gov (United States)

    Strait, Dana L.; Parbery-Clark, Alexandra; Hittner, Emily; Kraus, Nina

    2012-01-01

    For children, learning often occurs in the presence of background noise. As such, there is growing desire to improve a child's access to a target signal in noise. Given adult musicians' perceptual and neural speech-in-noise enhancements, we asked whether similar effects are present in musically-trained children. We assessed the perception and…

  17. Human phoneme recognition depending on speech-intrinsic variability.

    Science.gov (United States)

    Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger

    2010-11-01

    The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).

  18. Effect of Simultaneous Bilingualism on Speech Intelligibility across Different Masker Types, Modalities, and Signal-to-Noise Ratios in School-Age Children.

    Science.gov (United States)

    Reetzke, Rachel; Lam, Boji Pak-Wing; Xie, Zilong; Sheng, Li; Chandrasekaran, Bharath

    2016-01-01

    Recognizing speech in adverse listening conditions is a significant cognitive, perceptual, and linguistic challenge, especially for children. Prior studies have yielded mixed results on the impact of bilingualism on speech perception in noise. Methodological variations across studies make it difficult to converge on a conclusion regarding the effect of bilingualism on speech-in-noise performance. Moreover, there is a dearth of speech-in-noise evidence for bilingual children who learn two languages simultaneously. The aim of the present study was to examine the extent to which various adverse listening conditions modulate differences in speech-in-noise performance between monolingual and simultaneous bilingual children. To that end, sentence recognition was assessed in twenty-four school-aged children (12 monolinguals; 12 simultaneous bilinguals, age of English acquisition ≤ 3 yrs.). We implemented a comprehensive speech-in-noise battery to examine recognition of English sentences across different modalities (audio-only, audiovisual), masker types (steady-state pink noise, two-talker babble), and a range of signal-to-noise ratios (SNRs; 0 to -16 dB). Results revealed no difference in performance between monolingual and simultaneous bilingual children across each combination of modality, masker, and SNR. Our findings suggest that when English age of acquisition and socioeconomic status is similar between groups, monolingual and bilingual children exhibit comparable speech-in-noise performance across a range of conditions analogous to everyday listening environments.

  19. Computationally Efficient and Noise Robust DOA and Pitch Estimation

    DEFF Research Database (Denmark)

    Karimian-Azari, Sam; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2016-01-01

    Many natural signals, such as voiced speech and some musical instruments, are approximately periodic over short intervals. These signals are often described in mathematics by the sum of sinusoids (harmonics) with frequencies that are proportional to the fundamental frequency, or pitch. In sensor...... a joint DOA and pitch estimator. In white Gaussian noise, we derive even more computationally efficient solutions which are designed using the narrowband power spectrum of the harmonics. Numerical results reveal the performance of the estimators in colored noise compared with the Cram\\'{e}r-Rao lower...

  20. Seeing the talker’s face supports executive processing of speech in steady state noise

    Directory of Open Access Journals (Sweden)

    Sushmit eMishra

    2013-11-01

    Full Text Available Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT, Mishra et al., 2013 along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity. Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.

  1. Seeing the talker’s face supports executive processing of speech in steady state noise

    Science.gov (United States)

    Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Rönnberg, Jerker; Rudner, Mary

    2013-01-01

    Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT; Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition) and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity (WMC). Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills. PMID:24324411

  2. Acceptable noise level (ANL) with Danish and non-semantic speech materials in adult hearing-aid users

    DEFF Research Database (Denmark)

    Olsen, Steen Østergaard; Lantz, Johannes; Nielsen, Lars Holme

    2012-01-01

    The acceptable noise level (ANL) test is used for quantification of the amount of background noise subjects accept when listening to speech. This study investigates Danish hearing-aid users' ANL performance using Danish and non-semantic speech signals, the repeatability of ANL, and the association...... between ANL and outcome of the international outcome inventory for hearing aids (IOI-HA)....

  3. Examining explanations for fundamental frequency's contribution to speech intelligibility in noise

    Science.gov (United States)

    Schlauch, Robert S.; Miller, Sharon E.; Watson, Peter J.

    2005-09-01

    Laures and Weismer [JSLHR, 42, 1148 (1999)] reported that speech with natural variation in fundamental frequency (F0) is more intelligible in noise than speech with a flattened F0 contour. Cognitive-linguistic based explanations have been offered to account for this drop in intelligibility for the flattened condition, but a lower-level mechanism related to auditory streaming may be responsible. Numerous psychoacoustic studies have demonstrated that modulating a tone enables a listener to segregate it from background sounds. To test these rival hypotheses, speech recognition in noise was measured for sentences with six different F0 contours: unmodified, flattened at the mean, natural but exaggerated, reversed, and frequency modulated (rates of 2.5 and 5.0 Hz). The 180 stimulus sentences were produced by five talkers (30 sentences per condition). Speech recognition for fifteen listeners replicate earlier findings showing that flattening the F0 contour results in a roughly 10% reduction in recognition of key words compared with the natural condition. Although the exaggerated condition produced results comparable to those of the flattened condition, the other conditions with unnatural F0 contours all yielded significantly poorer performance than the flattened condition. These results support the cognitive, linguistic-based explanations for the reduction in performance.

  4. Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation.

    Science.gov (United States)

    Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti

    2015-01-01

    Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.

  5. Low-Arousal Speech Noise Improves Performance in N-Back Task: An ERP Study

    Science.gov (United States)

    Zhang, Dandan; Jin, Yi; Luo, Yuejia

    2013-01-01

    The relationship between noise and human performance is a crucial topic in ergonomic research. However, the brain dynamics of the emotional arousal effects of background noises are still unclear. The current study employed meaningless speech noises in the n-back working memory task to explore the changes of event-related potentials (ERPs) elicited by the noises with low arousal level vs. high arousal level. We found that the memory performance in low arousal condition were improved compared with the silent and the high arousal conditions; participants responded more quickly and had larger P2 and P3 amplitudes in low arousal condition while the performance and ERP components showed no significant difference between high arousal and silent conditions. These findings suggested that the emotional arousal dimension of background noises had a significant influence on human working memory performance, and that this effect was independent of the acoustic characteristics of noises (e.g., intensity) and the meaning of speech materials. The current findings improve our understanding of background noise effects on human performance and lay the ground for the investigation of patients with attention deficits. PMID:24204607

  6. Low-arousal speech noise improves performance in N-back task: an ERP study.

    Science.gov (United States)

    Han, Longzhu; Liu, Yunzhe; Zhang, Dandan; Jin, Yi; Luo, Yuejia

    2013-01-01

    The relationship between noise and human performance is a crucial topic in ergonomic research. However, the brain dynamics of the emotional arousal effects of background noises are still unclear. The current study employed meaningless speech noises in the n-back working memory task to explore the changes of event-related potentials (ERPs) elicited by the noises with low arousal level vs. high arousal level. We found that the memory performance in low arousal condition were improved compared with the silent and the high arousal conditions; participants responded more quickly and had larger P2 and P3 amplitudes in low arousal condition while the performance and ERP components showed no significant difference between high arousal and silent conditions. These findings suggested that the emotional arousal dimension of background noises had a significant influence on human working memory performance, and that this effect was independent of the acoustic characteristics of noises (e.g., intensity) and the meaning of speech materials. The current findings improve our understanding of background noise effects on human performance and lay the ground for the investigation of patients with attention deficits.

  7. Low-arousal speech noise improves performance in N-back task: an ERP study.

    Directory of Open Access Journals (Sweden)

    Longzhu Han

    Full Text Available The relationship between noise and human performance is a crucial topic in ergonomic research. However, the brain dynamics of the emotional arousal effects of background noises are still unclear. The current study employed meaningless speech noises in the n-back working memory task to explore the changes of event-related potentials (ERPs elicited by the noises with low arousal level vs. high arousal level. We found that the memory performance in low arousal condition were improved compared with the silent and the high arousal conditions; participants responded more quickly and had larger P2 and P3 amplitudes in low arousal condition while the performance and ERP components showed no significant difference between high arousal and silent conditions. These findings suggested that the emotional arousal dimension of background noises had a significant influence on human working memory performance, and that this effect was independent of the acoustic characteristics of noises (e.g., intensity and the meaning of speech materials. The current findings improve our understanding of background noise effects on human performance and lay the ground for the investigation of patients with attention deficits.

  8. Effects of noise and reverberation on speech perception and listening comprehension of children and adults in a classroom-like setting.

    Science.gov (United States)

    Klatte, Maria; Lachmann, Thomas; Meis, Markus

    2010-01-01

    The effects of classroom noise and background speech on speech perception, measured by word-to-picture matching, and listening comprehension, measured by execution of oral instructions, were assessed in first- and third-grade children and adults in a classroom-like setting. For speech perception, in addition to noise, reverberation time (RT) was varied by conducting the experiment in two virtual classrooms with mean RT = 0.47 versus RT = 1.1 s. Children were more impaired than adults by background sounds in both speech perception and listening comprehension. Classroom noise evoked a reliable disruption in children's speech perception even under conditions of short reverberation. RT had no effect on speech perception in silence, but evoked a severe increase in the impairments due to background sounds in all age groups. For listening comprehension, impairments due to background sounds were found in the children, stronger for first- than for third-graders, whereas adults were unaffected. Compared to classroom noise, background speech had a smaller effect on speech perception, but a stronger effect on listening comprehension, remaining significant when speech perception was controlled. This indicates that background speech affects higher-order cognitive processes involved in children's comprehension. Children's ratings of the sound-induced disturbance were low overall and uncorrelated to the actual disruption, indicating that the children did not consciously realize the detrimental effects. The present results confirm earlier findings on the substantial impact of noise and reverberation on children's speech perception, and extend these to classroom-like environmental settings and listening demands closely resembling those faced by children at school.

  9. [Evaluation of the Freiburg monosyllabic speech test in background noise].

    Science.gov (United States)

    Löhler, J; Akcicek, B; Pilnik, M; Saager-Post, K; Dazert, S; Biedron, S; Oeken, J; Mürbe, D; Löbert, J; Laszig, R; Wesarg, T; Langer, C; Plontke, S; Rahne, T; Machate, U; Noppeney, R; Schultz, K; Plinkert, P; Hoth, S; Praetorius, M; Schlattmann, P; Meister, E F; Pau, H W; Ehrt, K; Hagen, R; Shehata-Dieler, W; Cebulla, M; Walther, L E; Ernst, A

    2013-07-01

    The Freiburg speech test has been the gold standard in speech audiometry in Germany for many years. Previously, however, this test had not been evaluated in assessing the effectiveness of a hearing aid in background noise. Furthermore, the validity of particular word lists used in the test has been questioned repeatedly in the past, due to a suspected higher variation within these lists as compared to the other word list used. In this prospective study, two groups of subjects [normal hearing control subjects and patients with SNHL (sensorineural hearing loss) that had been fitted with hearing aid] were examined. In a first group, 113 control subjects with normal age- and gender-related pure tone thresholds were assessed by means of the Freiburg monosyllabic test under free-field conditions at 65 dB. The second group comprised 104 patients that had been fitted with hearing aids at least 3 months previously to treat their SNHL. Members of the SNHL group were assessed by means of the Freiburg monosyllabic test both with and without hearing aids, and in the presence or absence of background noise (CCITT-noise; 65/60 dB signal-noise ratio, in accordance with the Comité Consultatif International Téléphonique et Télégraphique), under free-field conditions at 65 dB. The first (control) group exhibited no gender-related differences in the Freiburg test results. In a few instances, inter-individual variability of responses was observed, although the reasons for this remain to be clarified. Within the second (patient) group, the Freiburg test results under the four different measurement conditions differed significantly from each other (p>0.05). This group exhibited a high degree of inter-individual variability between responses. In light of this, no significant differences in outcome could be assigned to the different word lists employed in the Freiburg speech test. The Freiburg monosyllabic test is able to assess the extent of hearing loss, as well as the effectiveness

  10. Listening to speech in a background of other talkers: effects of talker number and noise vocoding.

    Science.gov (United States)

    Rosen, Stuart; Souza, Pamela; Ekelund, Caroline; Majeed, Arooj A

    2013-04-01

    Some of the most common interfering background sounds a listener experiences are the sounds of other talkers. In Experiment 1, recognition for natural Institute of Electrical and Electronics Engineers (IEEE) sentences was measured in normal-hearing adults at two fixed signal-to-noise ratios (SNRs) in 16 backgrounds with the same long-term spectrum: unprocessed speech babble (1, 2, 4, 8, and 16 talkers), noise-vocoded versions of the babbles (12 channels), noise modulated with the wide-band envelope of the speech babbles, and unmodulated noise. All talkers were adult males. For a given number of talkers, natural speech was always the most effective masker. The greatest changes in performance occurred as the number of talkers in the maskers increased from 1 to 2 or 4, with small changes thereafter. In Experiment 2, the same targets and maskers (1, 2, and 16 talkers) were used to measure speech reception thresholds (SRTs) adaptively. Periodicity in the target was also manipulated by noise-vocoding, which led to considerably higher SRTs. The greatest masking effect always occurred for the masker type most similar to the target, while the effects of the number of talkers were generally small. Implications are drawn with reference to glimpsing, informational vs energetic masking, overall SNR, and aspects of periodicity.

  11. Musician advantage for speech-on-speech perception

    NARCIS (Netherlands)

    Başkent, Deniz; Gaudrain, Etienne

    Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level

  12. Integrating speech technology to meet crew station design requirements

    Science.gov (United States)

    Simpson, Carol A.; Ruth, John C.; Moore, Carolyn A.

    The last two years have seen improvements in speech generation and speech recognition technology that make speech I/O for crew station controls and displays viable for operational systems. These improvements include increased robustness of algorithm performance in high levels of background noise, increased vocabulary size, improved performance in the connected speech mode, and less speaker dependence. This improved capability makes possible far more sophisticated user interface design than was possible with earlier technology. Engineering, linguistic, and human factors design issues are discussed in the context of current voice I/O technology performance.

  13. Neural indices of phonemic discrimination and sentence-level speech intelligibility in quiet and noise: A P3 study.

    Science.gov (United States)

    Koerner, Tess K; Zhang, Yang; Nelson, Peggy B; Wang, Boxiang; Zou, Hui

    2017-07-01

    This study examined how speech babble noise differentially affected the auditory P3 responses and the associated neural oscillatory activities for consonant and vowel discrimination in relation to segmental- and sentence-level speech perception in noise. The data were collected from 16 normal-hearing participants in a double-oddball paradigm that contained a consonant (/ba/ to /da/) and vowel (/ba/ to /bu/) change in quiet and noise (speech-babble background at a -3 dB signal-to-noise ratio) conditions. Time-frequency analysis was applied to obtain inter-trial phase coherence (ITPC) and event-related spectral perturbation (ERSP) measures in delta, theta, and alpha frequency bands for the P3 response. Behavioral measures included percent correct phoneme detection and reaction time as well as percent correct IEEE sentence recognition in quiet and in noise. Linear mixed-effects models were applied to determine possible brain-behavior correlates. A significant noise-induced reduction in P3 amplitude was found, accompanied by significantly longer P3 latency and decreases in ITPC across all frequency bands of interest. There was a differential effect of noise on consonant discrimination and vowel discrimination in both ERP and behavioral measures, such that noise impacted the detection of the consonant change more than the vowel change. The P3 amplitude and some of the ITPC and ERSP measures were significant predictors of speech perception at segmental- and sentence-levels across listening conditions and stimuli. These data demonstrate that the P3 response with its associated cortical oscillations represents a potential neurophysiological marker for speech perception in noise. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Subspace-Based Noise Reduction for Speech Signals via Diagonal and Triangular Matrix Decompositions

    DEFF Research Database (Denmark)

    Hansen, Per Christian; Jensen, Søren Holdt

    2007-01-01

    We survey the definitions and use of rank-revealing matrix decompositions in single-channel noise reduction algorithms for speech signals. Our algorithms are based on the rank-reduction paradigm and, in particular, signal subspace techniques. The focus is on practical working algorithms, using both...... with working Matlab code and applications in speech processing....

  15. Music and Speech Perception in Children Using Sung Speech.

    Science.gov (United States)

    Nie, Yingjiu; Galvin, John J; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie

    2018-01-01

    This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.

  16. Robust Digital Speech Watermarking For Online Speaker Recognition

    Directory of Open Access Journals (Sweden)

    Mohammad Ali Nematollahi

    2015-01-01

    Full Text Available A robust and blind digital speech watermarking technique has been proposed for online speaker recognition systems based on Discrete Wavelet Packet Transform (DWPT and multiplication to embed the watermark in the amplitudes of the wavelet’s subbands. In order to minimize the degradation effect of the watermark, these subbands are selected where less speaker-specific information was available (500 Hz–3500 Hz and 6000 Hz–7000 Hz. Experimental results on Texas Instruments Massachusetts Institute of Technology (TIMIT, Massachusetts Institute of Technology (MIT, and Mobile Biometry (MOBIO show that the degradation for speaker verification and identification is 1.16% and 2.52%, respectively. Furthermore, the proposed watermark technique can provide enough robustness against different signal processing attacks.

  17. Music training improves the ability to understand speech-in-noise in older adults

    OpenAIRE

    Belleville, Sylvie; Zendel, Benjamin; West, Greg; Peretz, Isabelle

    2017-01-01

    It is well known that hearing abilities decline with age, and one of the most commonly reported hearing difficulties reported in older adults is a reduced ability to understand speech in noisy environments. Older musicians have an enhanced ability to understand speech in noise, and this has been associated with enhanced brain responses related to both speech processing and the deployment of attention, however the causal impact of music lessons in older adults is poorly understood. A sample of...

  18. Improving speech-in-noise recognition for children with hearing loss: potential effects of language abilities, binaural summation, and head shadow.

    Science.gov (United States)

    Nittrouer, Susan; Caldwell-Tarr, Amanda; Tarr, Eric; Lowenstein, Joanna H; Rice, Caitlin; Moberly, Aaron C

    2013-08-01

    This study examined speech recognition in noise for children with hearing loss, compared it to recognition for children with normal hearing, and examined mechanisms that might explain variance in children's abilities to recognize speech in noise. Word recognition was measured in two levels of noise, both when the speech and noise were co-located in front and when the noise came separately from one side. Four mechanisms were examined as factors possibly explaining variance: vocabulary knowledge, sensitivity to phonological structure, binaural summation, and head shadow. Participants were 113 eight-year-old children. Forty-eight had normal hearing (NH) and 65 had hearing loss: 18 with hearing aids (HAs), 19 with one cochlear implant (CI), and 28 with two CIs. Phonological sensitivity explained a significant amount of between-groups variance in speech-in-noise recognition. Little evidence of binaural summation was found. Head shadow was similar in magnitude for children with NH and with CIs, regardless of whether they wore one or two CIs. Children with HAs showed reduced head shadow effects. These outcomes suggest that in order to improve speech-in-noise recognition for children with hearing loss, intervention needs to be comprehensive, focusing on both language abilities and auditory mechanisms.

  19. Using auditory-visual speech to probe the basis of noise-impaired consonant-vowel perception in dyslexia and auditory neuropathy

    Science.gov (United States)

    Ramirez, Joshua; Mann, Virginia

    2005-08-01

    Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.

  20. Right-Ear Advantage for Speech-in-Noise Recognition in Patients with Nonlateralized Tinnitus and Normal Hearing Sensitivity.

    Science.gov (United States)

    Tai, Yihsin; Husain, Fatima T

    2018-04-01

    Despite having normal hearing sensitivity, patients with chronic tinnitus may experience more difficulty recognizing speech in adverse listening conditions as compared to controls. However, the association between the characteristics of tinnitus (severity and loudness) and speech recognition remains unclear. In this study, the Quick Speech-in-Noise test (QuickSIN) was conducted monaurally on 14 patients with bilateral tinnitus and 14 age- and hearing-matched adults to determine the relation between tinnitus characteristics and speech understanding. Further, Tinnitus Handicap Inventory (THI), tinnitus loudness magnitude estimation, and loudness matching were obtained to better characterize the perceptual and psychological aspects of tinnitus. The patients reported low THI scores, with most participants in the slight handicap category. Significant between-group differences in speech-in-noise performance were only found at the 5-dB signal-to-noise ratio (SNR) condition. The tinnitus group performed significantly worse in the left ear than in the right ear, even though bilateral tinnitus percept and symmetrical thresholds were reported in all patients. This between-ear difference is likely influenced by a right-ear advantage for speech sounds, as factors related to testing order and fatigue were ruled out. Additionally, significant correlations found between SNR loss in the left ear and tinnitus loudness matching suggest that perceptual factors related to tinnitus had an effect on speech-in-noise performance, pointing to a possible interaction between peripheral and cognitive factors in chronic tinnitus. Further studies, that take into account both hearing and cognitive abilities of patients, are needed to better parse out the effect of tinnitus in the absence of hearing impairment.

  1. On the Use of Evolutionary Algorithms to Improve the Robustness of Continuous Speech Recognition Systems in Adverse Conditions

    Directory of Open Access Journals (Sweden)

    Sid-Ahmed Selouani

    2003-07-01

    Full Text Available Limiting the decrease in performance due to acoustic environment changes remains a major challenge for continuous speech recognition (CSR systems. We propose a novel approach which combines the Karhunen-Loève transform (KLT in the mel-frequency domain with a genetic algorithm (GA to enhance the data representing corrupted speech. The idea consists of projecting noisy speech parameters onto the space generated by the genetically optimized principal axis issued from the KLT. The enhanced parameters increase the recognition rate for highly interfering noise environments. The proposed hybrid technique, when included in the front-end of an HTK-based CSR system, outperforms that of the conventional recognition process in severe interfering car noise environments for a wide range of signal-to-noise ratios (SNRs varying from 16 dB to −4 dB. We also showed the effectiveness of the KLT-GA method in recognizing speech subject to telephone channel degradations.

  2. Cingulo-opercular activity affects incidental memory encoding for speech in noise.

    Science.gov (United States)

    Vaden, Kenneth I; Teubner-Rhodes, Susan; Ahlstrom, Jayne B; Dubno, Judy R; Eckert, Mark A

    2017-08-15

    Correctly understood speech in difficult listening conditions is often difficult to remember. A long-standing hypothesis for this observation is that the engagement of cognitive resources to aid speech understanding can limit resources available for memory encoding. This hypothesis is consistent with evidence that speech presented in difficult conditions typically elicits greater activity throughout cingulo-opercular regions of frontal cortex that are proposed to optimize task performance through adaptive control of behavior and tonic attention. However, successful memory encoding of items for delayed recognition memory tasks is consistently associated with increased cingulo-opercular activity when perceptual difficulty is minimized. The current study used a delayed recognition memory task to test competing predictions that memory encoding for words is enhanced or limited by the engagement of cingulo-opercular activity during challenging listening conditions. An fMRI experiment was conducted with twenty healthy adult participants who performed a word identification in noise task that was immediately followed by a delayed recognition memory task. Consistent with previous findings, word identification trials in the poorer signal-to-noise ratio condition were associated with increased cingulo-opercular activity and poorer recognition memory scores on average. However, cingulo-opercular activity decreased for correctly identified words in noise that were not recognized in the delayed memory test. These results suggest that memory encoding in difficult listening conditions is poorer when elevated cingulo-opercular activity is not sustained. Although increased attention to speech when presented in difficult conditions may detract from more active forms of memory maintenance (e.g., sub-vocal rehearsal), we conclude that task performance monitoring and/or elevated tonic attention supports incidental memory encoding in challenging listening conditions. Copyright © 2017

  3. Working Memory Training and Speech in Noise Comprehension in Older Adults

    Directory of Open Access Journals (Sweden)

    Rachel V. Wayne

    2016-03-01

    Full Text Available Understanding speech in the presence of background sound can be challenging for older adults. Speech comprehension in noise appears to depend on working memory and executive-control processes (e.g., Heald & Nusbaum, 2014, and their augmentation through training may have rehabilitative potential for age-related hearing loss. We examined the efficacy of adaptive working-memory training (Cogmed; Klingberg, Forssberg & Westerberg, 2002 in 24 older adults, assessing generalization to other working-memory tasks (near-transfer and to other cognitive domains (far-transfer using a cognitive test battery, including the Reading Span test, sensitive to working memory (e.g., Daneman and Carpenter 1980. We also assessed far transfer to speech-in-noise performance, including a closed-set sentence task (Kidd, Best & Mason 2005. To examine the effect of cognitive training on benefit obtained from semantic context, we also assessed transfer to open-set sentences; half were semantically coherent (high-context and half were semantically anomalous (low-context. Subjects completed 25 sessions (0.5-1 hour each; 5 sessions/week of both adaptive working memory training and placebo training over 10 weeks in a crossover design. Subjects’ scores on the adaptive working-memory training tasks improved as a result of training. However, training did not transfer to other working memory tasks, nor to tasks recruiting other cognitive domains. We did not observe any training-related improvement in speech-in-noise performance. Measures of working memory correlated with the intelligibility of low-context, but not high-context, sentences, suggesting that sentence context may reduce the load on working memory. The Reading Span test significantly correlated only with a test of visual episodic memory, suggesting that the Reading Span test is not a pure-test of working memory, as is commonly assumed.

  4. Working Memory Training and Speech in Noise Comprehension in Older Adults.

    Science.gov (United States)

    Wayne, Rachel V; Hamilton, Cheryl; Jones Huyck, Julia; Johnsrude, Ingrid S

    2016-01-01

    Understanding speech in the presence of background sound can be challenging for older adults. Speech comprehension in noise appears to depend on working memory and executive-control processes (e.g., Heald and Nusbaum, 2014), and their augmentation through training may have rehabilitative potential for age-related hearing loss. We examined the efficacy of adaptive working-memory training (Cogmed; Klingberg et al., 2002) in 24 older adults, assessing generalization to other working-memory tasks (near-transfer) and to other cognitive domains (far-transfer) using a cognitive test battery, including the Reading Span test, sensitive to working memory (e.g., Daneman and Carpenter, 1980). We also assessed far transfer to speech-in-noise performance, including a closed-set sentence task (Kidd et al., 2008). To examine the effect of cognitive training on benefit obtained from semantic context, we also assessed transfer to open-set sentences; half were semantically coherent (high-context) and half were semantically anomalous (low-context). Subjects completed 25 sessions (0.5-1 h each; 5 sessions/week) of both adaptive working memory training and placebo training over 10 weeks in a crossover design. Subjects' scores on the adaptive working-memory training tasks improved as a result of training. However, training did not transfer to other working memory tasks, nor to tasks recruiting other cognitive domains. We did not observe any training-related improvement in speech-in-noise performance. Measures of working memory correlated with the intelligibility of low-context, but not high-context, sentences, suggesting that sentence context may reduce the load on working memory. The Reading Span test significantly correlated only with a test of visual episodic memory, suggesting that the Reading Span test is not a pure-test of working memory, as is commonly assumed.

  5. Effects of noise and audiovisual cues on speech processing in adults with and without ADHD.

    Science.gov (United States)

    Michalek, Anne M P; Watson, Silvana M; Ash, Ivan; Ringleb, Stacie; Raymer, Anastasia

    2014-03-01

    This study examined the interplay among internal (e.g. attention, working memory abilities) and external (e.g. background noise, visual information) factors in individuals with and without ADHD. A 2 × 2 × 6 mixed design with correlational analyses was used to compare participant results on a standardized listening in noise sentence repetition task (QuickSin; Killion et al, 2004 ), presented in an auditory and an audiovisual condition as signal-to-noise ratio (SNR) varied from 25-0 dB and to determine individual differences in working memory capacity and short-term recall. Thirty-eight young adults without ADHD and twenty-five young adults with ADHD. Diagnosis, modality, and signal-to-noise ratio all affected the ability to process speech in noise. The interaction between the diagnosis of ADHD, the presence of visual cues, and the level of noise had an effect on a person's ability to process speech in noise. conclusion: Young adults with ADHD benefited less from visual information during noise than young adults without ADHD, an effect influenced by working memory abilities.

  6. Is the Speech Transmission Index (STI) a robust measure of sound system speech intelligibility performance?

    Science.gov (United States)

    Mapp, Peter

    2002-11-01

    Although RaSTI is a good indicator of the speech intelligibility capability of auditoria and similar spaces, during the past 2-3 years it has been shown that RaSTI is not a robust predictor of sound system intelligibility performance. Instead, it is now recommended, within both national and international codes and standards, that full STI measurement and analysis be employed. However, new research is reported, that indicates that STI is not as flawless, nor robust as many believe. The paper highlights a number of potential error mechanisms. It is shown that the measurement technique and signal excitation stimulus can have a significant effect on the overall result and accuracy, particularly where DSP-based equipment is employed. It is also shown that in its current state of development, STI is not capable of appropriately accounting for a number of fundamental speech and system attributes, including typical sound system frequency response variations and anomalies. This is particularly shown to be the case when a system is operating under reverberant conditions. Comparisons between actual system measurements and corresponding word score data are reported where errors of up to 50 implications for VA and PA system performance verification will be discussed.

  7. Temporal acuity and speech recognition score in noise in patients with multiple sclerosis

    Directory of Open Access Journals (Sweden)

    Mehri Maleki

    2014-04-01

    Full Text Available Background and Aim: Multiple sclerosis (MS is one of the central nervous system diseases can be associated with a variety of symptoms such as hearing disorders. The main consequence of hearing loss is poor speech perception, and temporal acuity has important role in speech perception. We evaluated the speech perception in silent and in the presence of noise and temporal acuity in patients with multiple sclerosis.Methods: Eighteen adults with multiple sclerosis with the mean age of 37.28 years and 18 age- and sex- matched controls with the mean age of 38.00 years participated in this study. Temporal acuity and speech perception were evaluated by random gap detection test (GDT and word recognition score (WRS in three different signal to noise ratios.Results: Statistical analysis of test results revealed significant differences between the two groups (p<0.05. Analysis of gap detection test (in 4 sensation levels and word recognition score in both groups showed significant differences (p<0.001.Conclusion: According to this survey, the ability of patients with multiple sclerosis to process temporal features of stimulus was impaired. It seems that, this impairment is important factor to decrease word recognition score and speech perception.

  8. Light field reconstruction robust to signal dependent noise

    Science.gov (United States)

    Ren, Kun; Bian, Liheng; Suo, Jinli; Dai, Qionghai

    2014-11-01

    Capturing four dimensional light field data sequentially using a coded aperture camera is an effective approach but suffers from low signal noise ratio. Although multiplexing can help raise the acquisition quality, noise is still a big issue especially for fast acquisition. To address this problem, this paper proposes a noise robust light field reconstruction method. Firstly, scene dependent noise model is studied and incorporated into the light field reconstruction framework. Then, we derive an optimization algorithm for the final reconstruction. We build a prototype by hacking an off-the-shelf camera for data capturing and prove the concept. The effectiveness of this method is validated with experiments on the real captured data.

  9. Modeling Speech Level as a Function of Background Noise Level and Talker-to-Listener Distance for Talkers Wearing Hearing Protection Devices

    DEFF Research Database (Denmark)

    Bouserhal, Rachel E.; Bockstael, Annelies; MacDonald, Ewen

    2017-01-01

    Purpose: Studying the variations in speech levels with changing background noise level and talker-to-listener distance for talkers wearing hearing protection devices (HPDs) can aid in understanding communication in background noise. Method: Speech was recorded using an intra-aural HPD from 12...... complements the existing model presented by Pelegrín-García, Smits, Brunskog, and Jeong (2011) and expands on it by taking into account the effects of occlusion and background noise level on changes in speech sound level. Conclusions: Three models of the relationship between vocal effort, background noise...

  10. Speech detection in noise and spatial unmasking in children with simultaneous versus sequential bilateral cochlear implants.

    Science.gov (United States)

    Chadha, Neil K; Papsin, Blake C; Jiwani, Salima; Gordon, Karen A

    2011-09-01

    To measure speech detection in noise performance for children with bilateral cochlear implants (BiCI), to compare performance in children with simultaneous implant versus those with sequential implant, and to compare performance to normal-hearing children. Prospective cohort study. Tertiary academic pediatric center. Children with early-onset bilateral deafness and 2-year BiCI experience, comprising the "sequential" group (>2 yr interimplantation delay, n = 12) and "simultaneous group" (no interimplantation delay, n = 10) and normal-hearing controls (n = 8). Thresholds to speech detection (at 0-degree azimuth) were measured with noise at 0-degree azimuth or ± 90-degree azimuth. Spatial unmasking (SU) as the noise condition changed from 0-degree azimuth to ± 90-degree azimuth and binaural summation advantage (BSA) of 2 over 1 CI. Speech detection in noise was significantly poorer than controls for both BiCI groups (p simultaneous group approached levels found in normal controls (7.2 ± 0.6 versus 8.6 ± 0.6 dB, p > 0.05) and was significantly better than that in the sequential group (3.9 ± 0.4 dB, p simultaneous group but, in the sequential group, was significantly better when noise was moved to the second rather than the first implanted ear (4.8 ± 0.5 versus 3.0 ± 0.4 dB, p sequential group's second rather than first CI. Children with simultaneously implanted BiCI demonstrated an advantage over children with sequential implant by using spatial cues to improve speech detection in noise.

  11. Neural Spike-Train Analyses of the Speech-Based Envelope Power Spectrum Model

    Science.gov (United States)

    Rallapalli, Varsha H.

    2016-01-01

    Diagnosing and treating hearing impairment is challenging because people with similar degrees of sensorineural hearing loss (SNHL) often have different speech-recognition abilities. The speech-based envelope power spectrum model (sEPSM) has demonstrated that the signal-to-noise ratio (SNRENV) from a modulation filter bank provides a robust speech-intelligibility measure across a wider range of degraded conditions than many long-standing models. In the sEPSM, noise (N) is assumed to: (a) reduce S + N envelope power by filling in dips within clean speech (S) and (b) introduce an envelope noise floor from intrinsic fluctuations in the noise itself. While the promise of SNRENV has been demonstrated for normal-hearing listeners, it has not been thoroughly extended to hearing-impaired listeners because of limited physiological knowledge of how SNHL affects speech-in-noise envelope coding relative to noise alone. Here, envelope coding to speech-in-noise stimuli was quantified from auditory-nerve model spike trains using shuffled correlograms, which were analyzed in the modulation-frequency domain to compute modulation-band estimates of neural SNRENV. Preliminary spike-train analyses show strong similarities to the sEPSM, demonstrating feasibility of neural SNRENV computations. Results suggest that individual differences can occur based on differential degrees of outer- and inner-hair-cell dysfunction in listeners currently diagnosed into the single audiological SNHL category. The predicted acoustic-SNR dependence in individual differences suggests that the SNR-dependent rate of susceptibility could be an important metric in diagnosing individual differences. Future measurements of the neural SNRENV in animal studies with various forms of SNHL will provide valuable insight for understanding individual differences in speech-in-noise intelligibility.

  12. Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

    NARCIS (Netherlands)

    Ahmadi, S.; Ahadi, S.M.; Cranen, B.; Boves, L.W.J.

    2014-01-01

    The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equivalent of a sequence of short-time power spectra, mainly to simplify the computation of the

  13. Effects of Long-Term Speech-in-Noise Training in Air Traffic Controllers and High Frequency Suppression. A Control Group Study.

    Science.gov (United States)

    Pérez Zaballos, María Teresa; Ramos de Miguel, Ángel; Pérez Plasencia, Daniel; Zaballos González, María Luisa; Ramos Macías, Ángel

    2015-12-01

    To evaluate 1) if air traffic controllers (ATC) perform better than non-air traffic controllers in an open-set speech-in-noise test because of their experience with radio communications, and 2) if high-frequency information (>8000 Hz) substantially improves speech-in-noise perception across populations. The control group comprised 28 normal-hearing subjects, and the target group comprised 48 ATCs aged between 19 and 55 years who were native Spanish speakers. The hearing -in-noise abilities of the two groups were characterized under two signal conditions: 1) speech tokens and white noise sampled at 44.1 kHz (unfiltered condition) and 2) speech tokens plus white noise, each passed through a 4th order Butterworth filter with 70 and 8000 Hz low and high cutoffs (filtered condition). These tests were performed at signal-to-noise ratios of +5, 0, and -5-dB SNR. The ATCs outperformed the control group in all conditions. The differences were statistically significant in all cases, and the largest difference was observed under the most difficult conditions (-5 dB SNR). Overall, scores were higher when high-frequency components were not suppressed for both groups, although statistically significant differences were not observed for the control group at 0 dB SNR. The results indicate that ATCs are more capable of identifying speech in noise. This may be due to the effect of their training. On the other hand, performance seems to decrease when the high frequency components of speech are removed, regardless of training.

  14. Speech perception performance of subjects with type I diabetes mellitus in noise

    Directory of Open Access Journals (Sweden)

    Bárbara Cristiane Sordi Silva

    Full Text Available Abstract Introduction: Diabetes mellitus (DM is a chronic metabolic disorder of various origins that occurs when the pancreas fails to produce insulin in sufficient quantities or when the organism fails to respond to this hormone in an efficient manner. Objective: To evaluate the speech recognition in subjects with type I diabetes mellitus (DMI in quiet and in competitive noise. Methods: It was a descriptive, observational and cross-section study. We included 40 participants of both genders aged 18-30 years, divided into a control group (CG of 20 healthy subjects with no complaints or auditory changes, paired for age and gender with the study group, consisting of 20 subjects with a diagnosis of DMI. First, we applied basic audiological evaluations (pure tone audiometry, speech audiometry and immittance audiometry for all subjects; after these evaluations, we applied Sentence Recognition Threshold in Quiet (SRTQ and Sentence Recognition Threshold in Noise (SRTN in free field, using the List of Sentences in Portuguese test. Results: All subjects showed normal bilateral pure tone threshold, compatible speech audiometry and "A" tympanometry curve. Group comparison revealed a statistically significant difference for SRTQ (p = 0.0001, SRTN (p < 0.0001 and the signal-to-noise ratio (p < 0.0001. Conclusion: The performance of DMI subjects in SRTQ and SRTN was worse compared to the subjects without diabetes.

  15. Accuracy of Repetition of Digitized and Synthesized Speech for Young Children in Background Noise

    Science.gov (United States)

    Drager, Kathryn D. R.; Clark-Serpentine, Elizabeth A.; Johnson, Kate E.; Roeser, Jennifer L.

    2006-01-01

    Purpose: The present study investigated the intelligibility of digitized and synthesized speech output in background noise for children 3-5 years old. The purpose of the study was to determine whether there was a difference in the intelligibility (ability to repeat) of 3 types of speech output (digitized, DECTalk synthesized, and MacinTalk…

  16. Updating working memory in aircraft noise and speech causes different fMRI activations

    OpenAIRE

    Sætrevik, Bjørn; Sörqvist, Patrik

    2014-01-01

    The present study used fMRI/BOLD neuroimaging to investigate how visual-verbal working memory is updated when exposed to three different background-noise conditions: speech noise, aircraft noise and silence. The number-updating task that was used can distinguish between ?substitution processes,? which involve adding new items to the working memory representation and suppressing old items, and ?exclusion processes,? which involve rejecting new items and maintaining an intact memory set. The cu...

  17. Lexical-Access Ability and Cognitive Predictors of Speech Recognition in Noise in Adult Cochlear Implant Users

    OpenAIRE

    Kaandorp, Marre W.; Smits, Cas; Merkus, Paul; Festen, Joost M.; Goverts, S. Theo

    2017-01-01

    Not all of the variance in speech-recognition performance of cochlear implant (CI) users can be explained by biographic and auditory factors. In normal-hearing listeners, linguistic and cognitive factors determine most of speech-in-noise performance. The current study explored specifically the influence of visually measured lexical-access ability compared with other cognitive factors on speech recognition of 24 postlingually deafened CI users. Speech-recognition performance was measured with ...

  18. Brainstem auditory responses to resolved and unresolved harmonics of a synthetic vowel in quiet and noise.

    Science.gov (United States)

    Laroche, Marilyn; Dajani, Hilmi R; Prévost, François; Marcoux, André M

    2013-01-01

    This study investigated speech auditory brainstem responses (speech ABR) with variants of a synthetic vowel in quiet and in background noise. Its objectives were to study the noise robustness of the brainstem response at the fundamental frequency F0 and at the first formant F1, evaluate how the resolved/unresolved harmonics regions in speech contribute to the response at F0, and investigate the origin of the response at F0 to resolved and unresolved harmonics in speech. In total, 18 normal-hearing subjects (11 women, aged 18-33 years) participated in this study. Speech ABRs were recorded using variants of a 300 msec formant-synthesized /a/ vowel in quiet and in white noise. The first experiment employed three variants containing the first three formants F1 to F3, F1 only, and F2 and F3 only with relative formant levels following those reported in the literature. The second experiment employed three variants containing F1 only, F2 only, and F3 only, with the formants equalized to the same level and the signal-to-noise ratio (SNR) maintained at -5 dB. Overall response latency was estimated, and the amplitude and local SNR of the envelope following response at F0 and of the frequency following response at F1 were compared for the different stimulus variants in quiet and in noise. The response at F0 was more robust to noise than that at F1. There were no statistically significant differences in the response at F0 caused by the three stimulus variants in both experiments in quiet. However, the response at F0 with the variant dominated by resolved harmonics was more robust to noise than the response at F0 with the stimulus variants dominated by unresolved harmonics. The latencies of the responses in all cases were very similar in quiet, but the responses at F0 due to resolved and unresolved harmonics combined nonlinearly when both were present in the stimulus. Speech ABR has been suggested as a marker of central auditory processing. The results of this study support

  19. Speech understanding in background noise with the two-microphone adaptive beamformer BEAM in the Nucleus Freedom Cochlear Implant System.

    Science.gov (United States)

    Spriet, Ann; Van Deun, Lieselot; Eftaxiadis, Kyriaky; Laneau, Johan; Moonen, Marc; van Dijk, Bas; van Wieringen, Astrid; Wouters, Jan

    2007-02-01

    This paper evaluates the benefit of the two-microphone adaptive beamformer BEAM in the Nucleus Freedom cochlear implant (CI) system for speech understanding in background noise by CI users. A double-blind evaluation of the two-microphone adaptive beamformer BEAM and a hardware directional microphone was carried out with five adult Nucleus CI users. The test procedure consisted of a pre- and post-test in the lab and a 2-wk trial period at home. In the pre- and post-test, the speech reception threshold (SRT) with sentences and the percentage correct phoneme scores for CVC words were measured in quiet and background noise at different signal-to-noise ratios. Performance was assessed for two different noise configurations (with a single noise source and with three noise sources) and two different noise materials (stationary speech-weighted noise and multitalker babble). During the 2-wk trial period at home, the CI users evaluated the noise reduction performance in different listening conditions by means of the SSQ questionnaire. In addition to the perceptual evaluation, the noise reduction performance of the beamformer was measured physically as a function of the direction of the noise source. Significant improvements of both the SRT in noise (average improvement of 5-16 dB) and the percentage correct phoneme scores (average improvement of 10-41%) were observed with BEAM compared to the standard hardware directional microphone. In addition, the SSQ questionnaire and subjective evaluation in controlled and real-life scenarios suggested a possible preference for the beamformer in noisy environments. The evaluation demonstrates that the adaptive noise reduction algorithm BEAM in the Nucleus Freedom CI-system may significantly increase the speech perception by cochlear implantees in noisy listening conditions. This is the first monolateral (adaptive) noise reduction strategy actually implemented in a mainstream commercial CI.

  20. Results from the Dutch speech-in-noise screening test by telephone

    NARCIS (Netherlands)

    Smits, C.H.M.; Houtgast, T.

    2005-01-01

    OBJECTIVE: The objective of the study was to implement a previously developed automatic speech-in-noise screening test by telephone (Smits, Kapteyn, & Houtgast, 2004), introduce it nationwide as a self-test, and analyze the results. DESIGN: The test was implemented on an interactive voice response

  1. Memory performance on the Auditory Inference Span Test is independent of background noise type for young adults with normal hearing at high speech intelligibility.

    Science.gov (United States)

    Rönnberg, Niklas; Rudner, Mary; Lunner, Thomas; Stenfelt, Stefan

    2014-01-01

    Listening in noise is often perceived to be effortful. This is partly because cognitive resources are engaged in separating the target signal from background noise, leaving fewer resources for storage and processing of the content of the message in working memory. The Auditory Inference Span Test (AIST) is designed to assess listening effort by measuring the ability to maintain and process heard information. The aim of this study was to use AIST to investigate the effect of background noise types and signal-to-noise ratio (SNR) on listening effort, as a function of working memory capacity (WMC) and updating ability (UA). The AIST was administered in three types of background noise: steady-state speech-shaped noise, amplitude modulated speech-shaped noise, and unintelligible speech. Three SNRs targeting 90% speech intelligibility or better were used in each of the three noise types, giving nine different conditions. The reading span test assessed WMC, while UA was assessed with the letter memory test. Twenty young adults with normal hearing participated in the study. Results showed that AIST performance was not influenced by noise type at the same intelligibility level, but became worse with worse SNR when background noise was speech-like. Performance on AIST also decreased with increasing memory load level. Correlations between AIST performance and the cognitive measurements suggested that WMC is of more importance for listening when SNRs are worse, while UA is of more importance for listening in easier SNRs. The results indicated that in young adults with normal hearing, the effort involved in listening in noise at high intelligibility levels is independent of the noise type. However, when noise is speech-like and intelligibility decreases, listening effort increases, probably due to extra demands on cognitive resources added by the informational masking created by the speech fragments and vocal sounds in the background noise.

  2. Memory performance on the Auditory Inference Span Test is independent of background noise type for young adults with normal hearing at high speech intelligibility

    Directory of Open Access Journals (Sweden)

    Niklas eRönnberg

    2014-12-01

    Full Text Available Listening in noise is often perceived to be effortful. This is partly because cognitive resources are engaged in separating the target signal from background noise, leaving fewer resources for storage and processing of the content of the message in working memory. The Auditory Inference Span Test (AIST is designed to assess listening effort by measuring the ability to maintain and process heard information. The aim of this study was to use AIST to investigate the effect of background noise types and signal-to-noise ratio (SNR on listening effort, as a function of working memory capacity (WMC and updating ability (UA. The AIST was administered in three types of background noise: steady-state speech-shaped noise, amplitude modulated speech-shaped noise, and unintelligible speech. Three SNRs targeting 90% speech intelligibility or better were used in each of the three noise types, giving nine different conditions. The reading span test assessed WMC, while UA was assessed with the letter memory test. Twenty young adults with normal hearing participated in the study. Results showed that AIST performance was not influenced by noise type at the same intelligibility level, but became worse with worse SNR when background noise was speech-like. Performance on AIST also decreased with increasing MLL. Correlations between AIST performance and the cognitive measurements suggested that WMC is of more importance for listening when SNRs are worse, while UA is of more importance for listening in easier SNRs. The results indicated that in young adults with normal hearing, the effort involved in listening in noise at high intelligibility levels is independent of the noise type. However, when noise is speech-like and intelligibility decreases, listening effort increases, probably due to extra demands on cognitive resources added by the informational masking created by the speech-fragments and vocal sounds in the background noise.

  3. Auditory and Non-Auditory Contributions for Unaided Speech Recognition in Noise as a Function of Hearing Aid Use.

    Science.gov (United States)

    Gieseler, Anja; Tahden, Maike A S; Thiel, Christiane M; Wagener, Kirsten C; Meis, Markus; Colonius, Hans

    2017-01-01

    Differences in understanding speech in noise among hearing-impaired individuals cannot be explained entirely by hearing thresholds alone, suggesting the contribution of other factors beyond standard auditory ones as derived from the audiogram. This paper reports two analyses addressing individual differences in the explanation of unaided speech-in-noise performance among n = 438 elderly hearing-impaired listeners ( mean = 71.1 ± 5.8 years). The main analysis was designed to identify clinically relevant auditory and non-auditory measures for speech-in-noise prediction using auditory (audiogram, categorical loudness scaling) and cognitive tests (verbal-intelligence test, screening test of dementia), as well as questionnaires assessing various self-reported measures (health status, socio-economic status, and subjective hearing problems). Using stepwise linear regression analysis, 62% of the variance in unaided speech-in-noise performance was explained, with measures Pure-tone average (PTA), Age , and Verbal intelligence emerging as the three most important predictors. In the complementary analysis, those individuals with the same hearing loss profile were separated into hearing aid users (HAU) and non-users (NU), and were then compared regarding potential differences in the test measures and in explaining unaided speech-in-noise recognition. The groupwise comparisons revealed significant differences in auditory measures and self-reported subjective hearing problems, while no differences in the cognitive domain were found. Furthermore, groupwise regression analyses revealed that Verbal intelligence had a predictive value in both groups, whereas Age and PTA only emerged significant in the group of hearing aid NU.

  4. A Robust Adaptive Unscented Kalman Filter for Nonlinear Estimation with Uncertain Noise Covariance.

    Science.gov (United States)

    Zheng, Binqi; Fu, Pengcheng; Li, Baoqing; Yuan, Xiaobing

    2018-03-07

    The Unscented Kalman filter (UKF) may suffer from performance degradation and even divergence while mismatch between the noise distribution assumed as a priori by users and the actual ones in a real nonlinear system. To resolve this problem, this paper proposes a robust adaptive UKF (RAUKF) to improve the accuracy and robustness of state estimation with uncertain noise covariance. More specifically, at each timestep, a standard UKF will be implemented first to obtain the state estimations using the new acquired measurement data. Then an online fault-detection mechanism is adopted to judge if it is necessary to update current noise covariance. If necessary, innovation-based method and residual-based method are used to calculate the estimations of current noise covariance of process and measurement, respectively. By utilizing a weighting factor, the filter will combine the last noise covariance matrices with the estimations as the new noise covariance matrices. Finally, the state estimations will be corrected according to the new noise covariance matrices and previous state estimations. Compared with the standard UKF and other adaptive UKF algorithms, RAUKF converges faster to the actual noise covariance and thus achieves a better performance in terms of robustness, accuracy, and computation for nonlinear estimation with uncertain noise covariance, which is demonstrated by the simulation results.

  5. Speech understanding in noise with an eyeglass hearing aid: asymmetric fitting and the head shadow benefit of anterior microphones.

    Science.gov (United States)

    Mens, Lucas H M

    2011-01-01

    To test speech understanding in noise using array microphones integrated in an eyeglass device and to test if microphones placed anteriorly at the temple provide better directivity than above the pinna. Sentences were presented from the front and uncorrelated noise from 45, 135, 225 and 315°. Fifteen hearing impaired participants with a significant speech discrimination loss were included, as well as 5 normal hearing listeners. The device (Varibel) improved speech understanding in noise compared to most conventional directional devices with a directional benefit of 5.3 dB in the asymmetric fit mode, which was not significantly different from the bilateral fully directional mode (6.3 dB). Anterior microphones outperformed microphones at a conventional position above the pinna by 2.6 dB. By integrating microphones in an eyeglass frame, a long array can be used resulting in a higher directionality index and improved speech understanding in noise. An asymmetric fit did not significantly reduce performance and can be considered to increase acceptance and environmental awareness. Directional microphones at the temple seemed to profit more from the head shadow than above the pinna, better suppressing noise from behind the listener.

  6. Robust Cyclic MUSIC Algorithm for Finding Directions in Impulsive Noise Environment

    Directory of Open Access Journals (Sweden)

    Sen Li

    2017-01-01

    Full Text Available This paper addresses the issue of direction finding of a cyclostationary signal under impulsive noise environments modeled by α-stable distribution. Since α-stable distribution does not have finite second-order statistics, the conventional cyclic correlation-based signal-selective direction finding algorithms do not work effectively. To resolve this problem, we define two robust cyclic correlation functions which are derived from robust statistics property of the correntropy and the nonlinear transformation, respectively. The MUSIC algorithm with the robust cyclic correlation matrix of the received signals of arrays is then used to estimate the direction of cyclostationary signal in the presence of impulsive noise. The computer simulation results demonstrate that the two proposed robust cyclic correlation-based algorithms outperform the conventional cyclic correlation and the fractional lower order cyclic correlation based methods.

  7. Belief Shift or Only Facilitation: How Semantic Expectancy Affects Processing of Speech Degraded by Background Noise.

    Science.gov (United States)

    Simeon, Katherine M; Bicknell, Klinton; Grieco-Calub, Tina M

    2018-01-01

    Individuals use semantic expectancy - applying conceptual and linguistic knowledge to speech input - to improve the accuracy and speed of language comprehension. This study tested how adults use semantic expectancy in quiet and in the presence of speech-shaped broadband noise at -7 and -12 dB signal-to-noise ratio. Twenty-four adults (22.1 ± 3.6 years, mean ± SD ) were tested on a four-alternative-forced-choice task whereby they listened to sentences and were instructed to select an image matching the sentence-final word. The semantic expectancy of the sentences was unrelated to (neutral), congruent with, or conflicting with the acoustic target. Congruent expectancy improved accuracy and conflicting expectancy decreased accuracy relative to neutral, consistent with a theory where expectancy shifts beliefs toward likely words and away from unlikely words. Additionally, there were no significant interactions of expectancy and noise level when analyzed in log-odds, supporting the predictions of ideal observer models of speech perception.

  8. Robustness of quantum correlations against linear noise

    International Nuclear Information System (INIS)

    Guo, Zhihua; Cao, Huaixin; Qu, Shixian

    2016-01-01

    Relative robustness of quantum correlations (RRoQC) of a bipartite state is firstly introduced relative to a classically correlated state. Robustness of quantum correlations (RoQC) of a bipartite state is then defined as the minimum of RRoQC of the state relative to all classically correlated ones. It is proved that as a function on quantum states, RoQC is nonnegative, lower semi-continuous and neither convex nor concave; especially, it is zero if and only if the state is classically correlated. Thus, RoQC not only quantifies the endurance of quantum correlations of a state against linear noise, but also can be used to distinguish between quantum and classically correlated states. Furthermore, the effects of local quantum channels on the robustness are explored and characterized. (paper)

  9. Single-Sided Deafness: Impact of Cochlear Implantation on Speech Perception in Complex Noise and on Auditory Localization Accuracy.

    Science.gov (United States)

    Döge, Julia; Baumann, Uwe; Weissgerber, Tobias; Rader, Tobias

    2017-12-01

    To assess auditory localization accuracy and speech reception threshold (SRT) in complex noise conditions in adult patients with acquired single-sided deafness, after intervention with a cochlear implant (CI) in the deaf ear. Nonrandomized, open, prospective patient series. Tertiary referral university hospital. Eleven patients with late-onset single-sided deafness (SSD) and normal hearing in the unaffected ear, who received a CI. All patients were experienced CI users. Unilateral cochlear implantation. Speech perception was tested in a complex multitalker equivalent noise field consisting of multiple sound sources. Speech reception thresholds in noise were determined in aided (with CI) and unaided conditions. Localization accuracy was assessed in complete darkness. Acoustic stimuli were radiated by multiple loudspeakers distributed in the frontal horizontal plane between -60 and +60 degrees. In the aided condition, results show slightly improved speech reception scores compared with the unaided condition in most of the patients. For 8 of the 11 subjects, SRT was improved between 0.37 and 1.70 dB. Three of the 11 subjects showed deteriorations between 1.22 and 3.24 dB SRT. Median localization error decreased significantly by 12.9 degrees compared with the unaided condition. CI in single-sided deafness is an effective treatment to improve the auditory localization accuracy. Speech reception in complex noise conditions is improved to a lesser extent in 73% of the participating CI SSD patients. However, the absence of true binaural interaction effects (summation, squelch) impedes further improvements. The development of speech processing strategies that respect binaural interaction seems to be mandatory to advance speech perception in demanding listening situations in SSD patients.

  10. Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

    Directory of Open Access Journals (Sweden)

    Petar S. Aleksic

    2002-11-01

    Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0–30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.

  11. Arduino-based noise robust online heart-rate detection.

    Science.gov (United States)

    Das, Sangita; Pal, Saurabh; Mitra, Madhuchhanda

    2017-04-01

    This paper introduces a noise robust real time heart rate detection system from electrocardiogram (ECG) data. An online data acquisition system is developed to collect ECG signals from human subjects. Heart rate is detected using window-based autocorrelation peak localisation technique. A low-cost Arduino UNO board is used to implement the complete automated process. The performance of the system is compared with PC-based heart rate detection technique. Accuracy of the system is validated through simulated noisy ECG data with various levels of signal to noise ratio (SNR). The mean percentage error of detected heart rate is found to be 0.72% for the noisy database with five different noise levels.

  12. Low- and high-frequency cortical brain oscillations reflect dissociable mechanisms of concurrent speech segregation in noise.

    Science.gov (United States)

    Yellamsetty, Anusha; Bidelman, Gavin M

    2018-04-01

    Parsing simultaneous speech requires listeners use pitch-guided segregation which can be affected by the signal-to-noise ratio (SNR) in the auditory scene. The interaction of these two cues may occur at multiple levels within the cortex. The aims of the current study were to assess the correspondence between oscillatory brain rhythms and determine how listeners exploit pitch and SNR cues to successfully segregate concurrent speech. We recorded electrical brain activity while participants heard double-vowel stimuli whose fundamental frequencies (F0s) differed by zero or four semitones (STs) presented in either clean or noise-degraded (+5 dB SNR) conditions. We found that behavioral identification was more accurate for vowel mixtures with larger pitch separations but F0 benefit interacted with noise. Time-frequency analysis decomposed the EEG into different spectrotemporal frequency bands. Low-frequency (θ, β) responses were elevated when speech did not contain pitch cues (0ST > 4ST) or was noisy, suggesting a correlate of increased listening effort and/or memory demands. Contrastively, γ power increments were observed for changes in both pitch (0ST > 4ST) and SNR (clean > noise), suggesting high-frequency bands carry information related to acoustic features and the quality of speech representations. Brain-behavior associations corroborated these effects; modulations in low-frequency rhythms predicted the speed of listeners' perceptual decisions with higher bands predicting identification accuracy. Results are consistent with the notion that neural oscillations reflect both automatic (pre-perceptual) and controlled (post-perceptual) mechanisms of speech processing that are largely divisible into high- and low-frequency bands of human brain rhythms. Copyright © 2018 Elsevier B.V. All rights reserved.

  13. Predicting the effect of spectral subtraction on the speech recognition threshold based on the signal-to-noise ratio in the envelope domain

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2011-01-01

    rarely been evaluated perceptually in terms of speech intelligibility. This study analyzed the effects of the spectral subtraction strategy proposed by Berouti at al. [ICASSP 4 (1979), 208-211] on the speech recognition threshold (SRT) obtained with sentences presented in stationary speech-shaped noise....... The SRT was measured in five normal-hearing listeners in six conditions of spectral subtraction. The results showed an increase of the SRT after processing, i.e. a decreased speech intelligibility, in contrast to what is predicted by the Speech Transmission Index (STI). Here, another approach is proposed......, denoted the speech-based envelope power spectrum model (sEPSM) which predicts the intelligibility based on the signal-to-noise ratio in the envelope domain. In contrast to the STI, the sEPSM is sensitive to the increased amount of the noise envelope power as a consequence of the spectral subtraction...

  14. Evidence-Based Occupational Hearing Screening I: Modeling the Effects of Real-World Noise Environments on the Likelihood of Effective Speech Communication.

    Science.gov (United States)

    Soli, Sigfrid D; Giguère, Christian; Laroche, Chantal; Vaillancourt, Véronique; Dreschler, Wouter A; Rhebergen, Koenraad S; Harkins, Kevin; Ruckstuhl, Mark; Ramulu, Pradeep; Meyers, Lawrence S

    The objectives of this study were to (1) identify essential hearing-critical job tasks for public safety and law enforcement personnel; (2) determine the locations and real-world noise environments where these tasks are performed; (3) characterize each noise environment in terms of its impact on the likelihood of effective speech communication, considering the effects of different levels of vocal effort, communication distances, and repetition; and (4) use this characterization to define an objective normative reference for evaluating the ability of individuals to perform essential hearing-critical job tasks in noisy real-world environments. Data from five occupational hearing studies performed over a 17-year period for various public safety agencies were analyzed. In each study, job task analyses by job content experts identified essential hearing-critical tasks and the real-world noise environments where these tasks are performed. These environments were visited, and calibrated recordings of each noise environment were made. The extended speech intelligibility index (ESII) was calculated for each 4-sec interval in each recording. These data, together with the estimated ESII value required for effective speech communication by individuals with normal hearing, allowed the likelihood of effective speech communication in each noise environment for different levels of vocal effort and communication distances to be determined. These likelihoods provide an objective norm-referenced and standardized means of characterizing the predicted impact of real-world noise on the ability to perform essential hearing-critical tasks. A total of 16 noise environments for law enforcement personnel and eight noise environments for corrections personnel were analyzed. Effective speech communication was essential to hearing-critical tasks performed in these environments. Average noise levels, ranged from approximately 70 to 87 dBA in law enforcement environments and 64 to 80 dBA in

  15. Acquirement and enhancement of remote speech signals

    Science.gov (United States)

    Lü, Tao; Guo, Jin; Zhang, He-yong; Yan, Chun-hui; Wang, Can-jin

    2017-07-01

    To address the challenges of non-cooperative and remote acoustic detection, an all-fiber laser Doppler vibrometer (LDV) is established. The all-fiber LDV system can offer the advantages of smaller size, lightweight design and robust structure, hence it is a better fit for remote speech detection. In order to improve the performance and the efficiency of LDV for long-range hearing, the speech enhancement technology based on optimally modified log-spectral amplitude (OM-LSA) algorithm is used. The experimental results show that the comprehensible speech signals within the range of 150 m can be obtained by the proposed LDV. The signal-to-noise ratio ( SNR) and mean opinion score ( MOS) of the LDV speech signal can be increased by 100% and 27%, respectively, by using the speech enhancement technology. This all-fiber LDV, which combines the speech enhancement technology, can meet the practical demand in engineering.

  16. Speech interference and transmission on residential balconies with road traffic noise.

    Science.gov (United States)

    Naish, Daniel A; Tan, Andy C C; Nur Demirbilek, F

    2013-01-01

    Balcony acoustic treatments can mitigate the effects of community road traffic noise. To further investigate, a theoretical study into the effects of balcony acoustic treatment combinations on speech interference and transmission is conducted for various street geometries. Nine different balcony types are investigated using a combined specular and diffuse reflection computer model. Diffusion in the model is calculated using the radiosity technique. The balcony types include a standard balcony with or without a ceiling and with various combinations of parapet, ceiling absorption and ceiling shield. A total of 70 balcony and street geometrical configurations are analyzed with each balcony type, resulting in 630 scenarios. In each scenario the reverberation time, speech interference level (SIL) and speech transmission index (STI) are calculated. These indicators are compared to determine trends based on the effects of propagation path, inclusion of opposite buildings and difference with a reference position outside the balcony. The results demonstrate trends in SIL and STI with different balcony types. It is found that an acoustically treated balcony reduces speech interference. A parapet provides the largest improvement, followed by absorption on the ceiling. The largest reductions in speech interference arise when a combination of balcony acoustic treatments are applied.

  17. Prediction of speech masking release for fluctuating interferers based on the envelope power signal-to-noise ratio

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2012-01-01

    -hearing listeners in conditions with additive stationary noise, reverberation, and nonlinear processing with spectral subtraction. The latter condition represents a case in which the standardized speech intelligibility index and speech transmission index fail. However, the sEPSM is limited to conditions...... for the stationary and non-stationary interferers, demonstrating further that the envelope SNR is crucial for speech comprehension....

  18. Robust Speech/Non-Speech Classification in Heterogeneous Multimedia Content

    NARCIS (Netherlands)

    Huijbregts, M.A.H.; de Jong, Franciska M.G.

    In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because

  19. The effect of different cochlear implant microphones on acoustic hearing individuals’ binaural benefits for speech perception in noise

    Science.gov (United States)

    Aronoff, Justin M.; Freed, Daniel J.; Fisher, Laurel M.; Pal, Ivan; Soli, Sigfrid D.

    2011-01-01

    directional microphone when the speech and masker were spatially separated, emphasizing the importance of measuring binaural benefits separately for each HRTF. Evaluation of binaural benefits indicated that binaural squelch and spatial release from masking were found for all HRTFs and binaural summation was found for all but one HRTF, although binaural summation was less robust than the other types of binaural benefits. Additionally, the results indicated that neither interaural time nor level cues dominated binaural benefits for the normal hearing participants. Conclusions This study provides a means to measure the degree to which cochlear implant microphones affect acoustic hearing with respect to speech perception in noise. It also provides measures that can be used to evaluate the independent contributions of interaural time and level cues. These measures provide tools that can aid researchers in understanding and improving binaural benefits in acoustic hearing individuals listening via cochlear implant microphones. PMID:21412155

  20. Speech perception at positive signal-to-noise ratios using adaptive adjustment of time compression.

    Science.gov (United States)

    Schlueter, Anne; Brand, Thomas; Lemke, Ulrike; Nitzschner, Stefan; Kollmeier, Birger; Holube, Inga

    2015-11-01

    Positive signal-to-noise ratios (SNRs) characterize listening situations most relevant for hearing-impaired listeners in daily life and should therefore be considered when evaluating hearing aid algorithms. For this, a speech-in-noise test was developed and evaluated, in which the background noise is presented at fixed positive SNRs and the speech rate (i.e., the time compression of the speech material) is adaptively adjusted. In total, 29 younger and 12 older normal-hearing, as well as 24 older hearing-impaired listeners took part in repeated measurements. Younger normal-hearing and older hearing-impaired listeners conducted one of two adaptive methods which differed in adaptive procedure and step size. Analysis of the measurements with regard to list length and estimation strategy for thresholds resulted in a practical method measuring the time compression for 50% recognition. This method uses time-compression adjustment and step sizes according to Versfeld and Dreschler [(2002). J. Acoust. Soc. Am. 111, 401-408], with sentence scoring, lists of 30 sentences, and a maximum likelihood method for threshold estimation. Evaluation of the procedure showed that older participants obtained higher test-retest reliability compared to younger participants. Depending on the group of listeners, one or two lists are required for training prior to data collection.

  1. Relations Between the Intelligibility of Speech in Noise and Psychophysical Measures of Hearing Measured in Four Languages Using the Auditory Profile Test Battery

    Directory of Open Access Journals (Sweden)

    T. E. M. Van Esch

    2015-12-01

    Full Text Available The aim of the present study was to determine the relations between the intelligibility of speech in noise and measures of auditory resolution, loudness recruitment, and cognitive function. The analyses were based on data published earlier as part of the presentation of the Auditory Profile, a test battery implemented in four languages. Tests of the intelligibility of speech, resolution, loudness recruitment, and lexical decision making were measured using headphones in five centers: in Germany, the Netherlands, Sweden, and the United Kingdom. Correlations and stepwise linear regression models were calculated. In sum, 72 hearing-impaired listeners aged 22 to 91 years with a broad range of hearing losses were included in the study. Several significant correlations were found with the intelligibility of speech in noise. Stepwise linear regression analyses showed that pure-tone average, age, spectral and temporal resolution, and loudness recruitment were significant predictors of the intelligibility of speech in fluctuating noise. Complex interrelationships between auditory factors and the intelligibility of speech in noise were revealed using the Auditory Profile data set in four languages. After taking into account the effects of pure-tone average and age, spectral and temporal resolution and loudness recruitment had an added value in the prediction of variation among listeners with respect to the intelligibility of speech in noise. The results of the lexical decision making test were not related to the intelligibility of speech in noise, in the population studied.

  2. Auditory and Non-Auditory Contributions for Unaided Speech Recognition in Noise as a Function of Hearing Aid Use

    OpenAIRE

    Gieseler, Anja; Tahden, Maike A. S.; Thiel, Christiane M.; Wagener, Kirsten C.; Meis, Markus; Colonius, Hans

    2017-01-01

    Differences in understanding speech in noise among hearing-impaired individuals cannot be explained entirely by hearing thresholds alone, suggesting the contribution of other factors beyond standard auditory ones as derived from the audiogram. This paper reports two analyses addressing individual differences in the explanation of unaided speech-in-noise performance among n = 438 elderly hearing-impaired listeners (mean = 71.1 ± 5.8 years). The main analysis was designed to identify clinically...

  3. Impact of Noise and Working Memory on Speech Processing in Adults with and without ADHD

    Science.gov (United States)

    Michalek, Anne M. P.

    2012-01-01

    Auditory processing of speech is influenced by internal (i.e., attention, working memory) and external factors (i.e., background noise, visual information). This study examined the interplay among these factors in individuals with and without ADHD. All participants completed a listening in noise task, two working memory capacity tasks, and two…

  4. Use of a Deep Recurrent Neural Network to Reduce Wind Noise: Effects on Judged Speech Intelligibility and Sound Quality.

    Science.gov (United States)

    Keshavarzi, Mahmoud; Goehring, Tobias; Zakis, Justin; Turner, Richard E; Moore, Brian C J

    2018-01-01

    Despite great advances in hearing-aid technology, users still experience problems with noise in windy environments. The potential benefits of using a deep recurrent neural network (RNN) for reducing wind noise were assessed. The RNN was trained using recordings of the output of the two microphones of a behind-the-ear hearing aid in response to male and female speech at various azimuths in the presence of noise produced by wind from various azimuths with a velocity of 3 m/s, using the "clean" speech as a reference. A paired-comparison procedure was used to compare all possible combinations of three conditions for subjective intelligibility and for sound quality or comfort. The conditions were unprocessed noisy speech, noisy speech processed using the RNN, and noisy speech that was high-pass filtered (which also reduced wind noise). Eighteen native English-speaking participants were tested, nine with normal hearing and nine with mild-to-moderate hearing impairment. Frequency-dependent linear amplification was provided for the latter. Processing using the RNN was significantly preferred over no processing by both subject groups for both subjective intelligibility and sound quality, although the magnitude of the preferences was small. High-pass filtering (HPF) was not significantly preferred over no processing. Although RNN was significantly preferred over HPF only for sound quality for the hearing-impaired participants, for the results as a whole, there was a preference for RNN over HPF. Overall, the results suggest that reduction of wind noise using an RNN is possible and might have beneficial effects when used in hearing aids.

  5. Use of a Deep Recurrent Neural Network to Reduce Wind Noise: Effects on Judged Speech Intelligibility and Sound Quality

    Science.gov (United States)

    Keshavarzi, Mahmoud; Goehring, Tobias; Zakis, Justin; Turner, Richard E.; Moore, Brian C. J.

    2018-01-01

    Despite great advances in hearing-aid technology, users still experience problems with noise in windy environments. The potential benefits of using a deep recurrent neural network (RNN) for reducing wind noise were assessed. The RNN was trained using recordings of the output of the two microphones of a behind-the-ear hearing aid in response to male and female speech at various azimuths in the presence of noise produced by wind from various azimuths with a velocity of 3 m/s, using the “clean” speech as a reference. A paired-comparison procedure was used to compare all possible combinations of three conditions for subjective intelligibility and for sound quality or comfort. The conditions were unprocessed noisy speech, noisy speech processed using the RNN, and noisy speech that was high-pass filtered (which also reduced wind noise). Eighteen native English-speaking participants were tested, nine with normal hearing and nine with mild-to-moderate hearing impairment. Frequency-dependent linear amplification was provided for the latter. Processing using the RNN was significantly preferred over no processing by both subject groups for both subjective intelligibility and sound quality, although the magnitude of the preferences was small. High-pass filtering (HPF) was not significantly preferred over no processing. Although RNN was significantly preferred over HPF only for sound quality for the hearing-impaired participants, for the results as a whole, there was a preference for RNN over HPF. Overall, the results suggest that reduction of wind noise using an RNN is possible and might have beneficial effects when used in hearing aids. PMID:29708061

  6. Implicit Talker Training Improves Comprehension of Auditory Speech in Noise

    Directory of Open Access Journals (Sweden)

    Jens Kreitewolf

    2017-09-01

    Full Text Available Previous studies have shown that listeners are better able to understand speech when they are familiar with the talker’s voice. In most of these studies, talker familiarity was ensured by explicit voice training; that is, listeners learned to identify the familiar talkers. In the real world, however, the characteristics of familiar talkers are learned incidentally, through communication. The present study investigated whether speech comprehension benefits from implicit voice training; that is, through exposure to talkers’ voices without listeners explicitly trying to identify them. During four training sessions, listeners heard short sentences containing a single verb (e.g., “he writes”, spoken by one talker. The sentences were mixed with noise, and listeners identified the verb within each sentence while their speech-reception thresholds (SRT were measured. In a final test session, listeners performed the same task, but this time they heard different sentences spoken by the familiar talker and three unfamiliar talkers. Familiar and unfamiliar talkers were counterbalanced across listeners. Half of the listeners performed a test session in which the four talkers were presented in separate blocks (blocked paradigm. For the other half, talkers varied randomly from trial to trial (interleaved paradigm. The results showed that listeners had lower SRT when the speech was produced by the familiar talker than the unfamiliar talkers. The type of talker presentation (blocked vs. interleaved had no effect on this familiarity benefit. These findings suggest that listeners implicitly learn talker-specific information during a speech-comprehension task, and exploit this information to improve the comprehension of novel speech material from familiar talkers.

  7. Spoken language achieves robustness and evolvability by exploiting degeneracy and neutrality.

    Science.gov (United States)

    Winter, Bodo

    2014-10-01

    As with biological systems, spoken languages are strikingly robust against perturbations. This paper shows that languages achieve robustness in a way that is highly similar to many biological systems. For example, speech sounds are encoded via multiple acoustically diverse, temporally distributed and functionally redundant cues, characteristics that bear similarities to what biologists call "degeneracy". Speech is furthermore adequately characterized by neutrality, with many different tongue configurations leading to similar acoustic outputs, and different acoustic variants understood as the same by recipients. This highlights the presence of a large neutral network of acoustic neighbors for every speech sound. Such neutrality ensures that a steady backdrop of variation can be maintained without impeding communication, assuring that there is "fodder" for subsequent evolution. Thus, studying linguistic robustness is not only important for understanding how linguistic systems maintain their functioning upon the background of noise, but also for understanding the preconditions for language evolution. © 2014 WILEY Periodicals, Inc.

  8. Joint variable frame rate and length analysis for speech recognition under adverse conditions

    DEFF Research Database (Denmark)

    Tan, Zheng-Hua; Kraljevski, Ivan

    2014-01-01

    This paper presents a method that combines variable frame length and rate analysis for speech recognition in noisy environments, together with an investigation of the effect of different frame lengths on speech recognition performance. The method adopts frame selection using an a posteriori signal......-to-noise (SNR) ratio weighted energy distance and increases the length of the selected frames, according to the number of non-selected preceding frames. It assigns a higher frame rate and a normal frame length to a rapidly changing and high SNR region of a speech signal, and a lower frame rate and an increased...... frame length to a steady or low SNR region. The speech recognition results show that the proposed variable frame rate and length method outperforms fixed frame rate and length analysis, as well as standalone variable frame rate analysis in terms of noise-robustness....

  9. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

    Directory of Open Access Journals (Sweden)

    Heracleous Panikos

    2007-01-01

    Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.

  10. Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel

    Science.gov (United States)

    Kleinschmidt, Dave F.; Jaeger, T. Florian

    2016-01-01

    Successful speech perception requires that listeners map the acoustic signal to linguistic categories. These mappings are not only probabilistic, but change depending on the situation. For example, one talker’s /p/ might be physically indistinguishable from another talker’s /b/ (cf. lack of invariance). We characterize the computational problem posed by such a subjectively non-stationary world and propose that the speech perception system overcomes this challenge by (1) recognizing previously encountered situations, (2) generalizing to other situations based on previous similar experience, and (3) adapting to novel situations. We formalize this proposal in the ideal adapter framework: (1) to (3) can be understood as inference under uncertainty about the appropriate generative model for the current talker, thereby facilitating robust speech perception despite the lack of invariance. We focus on two critical aspects of the ideal adapter. First, in situations that clearly deviate from previous experience, listeners need to adapt. We develop a distributional (belief-updating) learning model of incremental adaptation. The model provides a good fit against known and novel phonetic adaptation data, including perceptual recalibration and selective adaptation. Second, robust speech recognition requires listeners learn to represent the structured component of cross-situation variability in the speech signal. We discuss how these two aspects of the ideal adapter provide a unifying explanation for adaptation, talker-specificity, and generalization across talkers and groups of talkers (e.g., accents and dialects). The ideal adapter provides a guiding framework for future investigations into speech perception and adaptation, and more broadly language comprehension. PMID:25844873

  11. Predicting speech intelligibility in adverse conditions: evaluation of the speech-based envelope power spectrum model

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2011-01-01

    conditions by comparing predictions to measured data from [Kjems et al. (2009). J. Acoust. Soc. Am. 126 (3), 1415-1426] where speech is mixed with four different interferers, including speech-shaped noise, bottle noise, car noise, and cafe noise. The model accounts well for the differences in intelligibility......The speech-based envelope power spectrum model (sEPSM) [Jørgensen and Dau (2011). J. Acoust. Soc. Am., 130 (3), 1475–1487] estimates the envelope signal-to-noise ratio (SNRenv) of distorted speech and accurately describes the speech recognition thresholds (SRT) for normal-hearing listeners...... observed for the different interferers. None of the standardized models successfully describe these data....

  12. Modeling Speech Level as a Function of Background Noise Level and Talker-to-Listener Distance for Talkers Wearing Hearing Protection Devices

    Science.gov (United States)

    Bouserhal, Rachel E.; Bockstael, Annelies; MacDonald, Ewen; Falk, Tiago H.; Voix, Jérémie

    2017-01-01

    Purpose: Studying the variations in speech levels with changing background noise level and talker-to-listener distance for talkers wearing hearing protection devices (HPDs) can aid in understanding communication in background noise. Method: Speech was recorded using an intra-aural HPD from 12 different talkers at 5 different distances in 3…

  13. Music training improves speech-in-noise perception: Longitudinal evidence from a community-based music program.

    Science.gov (United States)

    Slater, Jessica; Skoe, Erika; Strait, Dana L; O'Connell, Samantha; Thompson, Elaine; Kraus, Nina

    2015-09-15

    Music training may strengthen auditory skills that help children not only in musical performance but in everyday communication. Comparisons of musicians and non-musicians across the lifespan have provided some evidence for a "musician advantage" in understanding speech in noise, although reports have been mixed. Controlled longitudinal studies are essential to disentangle effects of training from pre-existing differences, and to determine how much music training is necessary to confer benefits. We followed a cohort of elementary school children for 2 years, assessing their ability to perceive speech in noise before and after musical training. After the initial assessment, participants were randomly assigned to one of two groups: one group began music training right away and completed 2 years of training, while the second group waited a year and then received 1 year of music training. Outcomes provide the first longitudinal evidence that speech-in-noise perception improves after 2 years of group music training. The children were enrolled in an established and successful community-based music program and followed the standard curriculum, therefore these findings provide an important link between laboratory-based research and real-world assessment of the impact of music training on everyday communication skills. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. White noise speech illusion and psychosis expression : An experimental investigation of psychosis liability

    NARCIS (Netherlands)

    Pries, Lotta-Katrin; Guloksuz, Sinan; Menne-Lothmann, Claudia; Decoster, Jeroen; van Winkel, Ruud; Collip, Dina; Delespaul, Philippe; De Hert, Marc; Derom, Catherine; Thiery, Evert; Jacobs, Nele; Wichers, Marieke; Simons, Claudia J. P.; Rutten, Bart P. F.; van Os, Jim

    2017-01-01

    Background: An association between white noise speech illusion and psychotic symptoms has been reported in patients and their relatives. This supports the theory that bottom-up and top-down perceptual processes are involved in the mechanisms underlying perceptual abnormalities. However, findings in

  15. Evaluation of speech reception threshold in noise in young Cochlear™ Nucleus® system 6 implant recipients using two different digital remote microphone technologies and a speech enhancement sound processing algorithm.

    Science.gov (United States)

    Razza, Sergio; Zaccone, Monica; Meli, Aannalisa; Cristofari, Eliana

    2017-12-01

    Children affected by hearing loss can experience difficulties in challenging and noisy environments even when deafness is corrected by Cochlear implant (CI) devices. These patients have a selective attention deficit in multiple listening conditions. At present, the most effective ways to improve the performance of speech recognition in noise consists of providing CI processors with noise reduction algorithms and of providing patients with bilateral CIs. The aim of this study was to compare speech performances in noise, across increasing noise levels, in CI recipients using two kinds of wireless remote-microphone radio systems that use digital radio frequency transmission: the Roger Inspiro accessory and the Cochlear Wireless Mini Microphone accessory. Eleven Nucleus Cochlear CP910 CI young user subjects were studied. The signal/noise ratio, at a speech reception threshold (SRT) value of 50%, was measured in different conditions for each patient: with CI only, with the Roger or with the MiniMic accessory. The effect of the application of the SNR-noise reduction algorithm in each of these conditions was also assessed. The tests were performed with the subject positioned in front of the main speaker, at a distance of 2.5 m. Another two speakers were positioned at 3.50 m. The main speaker at 65 dB issued disyllabic words. Babble noise signal was delivered through the other speakers, with variable intensity. The use of both wireless remote microphones improved the SRT results. Both systems improved gain of speech performances. The gain was higher with the Mini Mic system (SRT = -4.76) than the Roger system (SRT = -3.01). The addition of the NR algorithm did not statistically further improve the results. There is significant improvement in speech recognition results with both wireless digital remote microphone accessories, in particular with the Mini Mic system when used with the CP910 processor. The use of a remote microphone accessory surpasses the benefit of

  16. Differences in Speech Recognition Between Children with Attention Deficits and Typically Developed Children Disappear When Exposed to 65 dB of Auditory Noise.

    Science.gov (United States)

    Söderlund, Göran B W; Jobs, Elisabeth Nilsson

    2016-01-01

    The most common neuropsychiatric condition in the in children is attention deficit hyperactivity disorder (ADHD), affecting ∼6-9% of the population. ADHD is distinguished by inattention and hyperactive, impulsive behaviors as well as poor performance in various cognitive tasks often leading to failures at school. Sensory and perceptual dysfunctions have also been noticed. Prior research has mainly focused on limitations in executive functioning where differences are often explained by deficits in pre-frontal cortex activation. Less notice has been given to sensory perception and subcortical functioning in ADHD. Recent research has shown that children with ADHD diagnosis have a deviant auditory brain stem response compared to healthy controls. The aim of the present study was to investigate if the speech recognition threshold differs between attentive and children with ADHD symptoms in two environmental sound conditions, with and without external noise. Previous research has namely shown that children with attention deficits can benefit from white noise exposure during cognitive tasks and here we investigate if noise benefit is present during an auditory perceptual task. For this purpose we used a modified Hagerman's speech recognition test where children with and without attention deficits performed a binaural speech recognition task to assess the speech recognition threshold in no noise and noise conditions (65 dB). Results showed that the inattentive group displayed a higher speech recognition threshold than typically developed children and that the difference in speech recognition threshold disappeared when exposed to noise at supra threshold level. From this we conclude that inattention can partly be explained by sensory perceptual limitations that can possibly be ameliorated through noise exposure.

  17. Differences in Speech Recognition Between Children with Attention Deficits and Typically Developed Children Disappear when Exposed to 65 dB of Auditory Noise

    Directory of Open Access Journals (Sweden)

    Göran B W Söderlund

    2016-01-01

    Full Text Available The most common neuropsychiatric condition in the in children is attention deficit hyperactivity disorder (ADHD, affecting approximately 6-9 % of the population. ADHD is distinguished by inattention and hyperactive, impulsive behaviors as well as poor performance in various cognitive tasks often leading to failures at school. Sensory and perceptual dysfunctions have also been noticed. Prior research has mainly focused on limitations in executive functioning where differences are often explained by deficits in pre-frontal cortex activation. Less notice has been given to sensory perception and subcortical functioning in ADHD. Recent research has shown that children with ADHD diagnosis have a deviant auditory brain stem response compared to healthy controls. The aim of the present study was to investigate if the speech recognition threshold differs between attentive and children with ADHD symptoms in two environmental sound conditions, with and without external noise. Previous research has namely shown that children with attention deficits can benefit from white noise exposure during cognitive tasks and here we investigate if noise benefit is present during an auditory perceptual task. For this purpose we used a modified Hagerman’s speech recognition test where children with and without attention deficits performed a binaural speech recognition task to assess the speech recognition threshold in no noise and noise conditions (65 dB. Results showed that the inattentive group displayed a higher speech recognition threshold than typically developed children (TDC and that the difference in speech recognition threshold disappeared when exposed to noise at supra threshold level. From this we conclude that inattention can partly be explained by sensory perceptual limitations that can possibly be ameliorated through noise exposure.

  18. Robust extended Kalman filter of discrete-time Markovian jump nonlinear system under uncertain noise

    International Nuclear Information System (INIS)

    Zhu, Jin; Park, Jun Hong; Lee, Kwan Soo; Spiryagin, Maksym

    2008-01-01

    This paper examines the problem of robust extended Kalman filter design for discrete -time Markovian jump nonlinear systems with noise uncertainty. Because of the existence of stochastic Markovian switching, the state and measurement equations of underlying system are subject to uncertain noise whose covariance matrices are time-varying or un-measurable instead of stationary. First, based on the expression of filtering performance deviation, admissible uncertainty of noise covariance matrix is given. Secondly, two forms of noise uncertainty are taken into account: Non- Structural and Structural. It is proved by applying game theory that this filter design is a robust mini-max filter. A numerical example shows the validity of the method

  19. [Improving speech comprehension using a new cochlear implant speech processor].

    Science.gov (United States)

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  20. Neuroscience-inspired computational systems for speech recognition under noisy conditions

    Science.gov (United States)

    Schafer, Phillip B.

    Humans routinely recognize speech in challenging acoustic environments with background music, engine sounds, competing talkers, and other acoustic noise. However, today's automatic speech recognition (ASR) systems perform poorly in such environments. In this dissertation, I present novel methods for ASR designed to approach human-level performance by emulating the brain's processing of sounds. I exploit recent advances in auditory neuroscience to compute neuron-based representations of speech, and design novel methods for decoding these representations to produce word transcriptions. I begin by considering speech representations modeled on the spectrotemporal receptive fields of auditory neurons. These representations can be tuned to optimize a variety of objective functions, which characterize the response properties of a neural population. I propose an objective function that explicitly optimizes the noise invariance of the neural responses, and find that it gives improved performance on an ASR task in noise compared to other objectives. The method as a whole, however, fails to significantly close the performance gap with humans. I next consider speech representations that make use of spiking model neurons. The neurons in this method are feature detectors that selectively respond to spectrotemporal patterns within short time windows in speech. I consider a number of methods for training the response properties of the neurons. In particular, I present a method using linear support vector machines (SVMs) and show that this method produces spikes that are robust to additive noise. I compute the spectrotemporal receptive fields of the neurons for comparison with previous physiological results. To decode the spike-based speech representations, I propose two methods designed to work on isolated word recordings. The first method uses a classical ASR technique based on the hidden Markov model. The second method is a novel template-based recognition scheme that takes

  1. Subspace-Based Noise Reduction for Speech Signals via Diagonal and Triangular Matrix Decompositions

    DEFF Research Database (Denmark)

    Hansen, Per Christian; Jensen, Søren Holdt

    We survey the definitions and use of rank-revealing matrix decompositions in single-channel noise reduction algorithms for speech signals. Our algorithms are based on the rank-reduction paradigm and, in particular, signal subspace techniques. The focus is on practical working algorithms, using both...... diagonal (eigenvalue and singular value) decompositions and rank-revealing triangular decompositions (ULV, URV, VSV, ULLV and ULLIV). In addition we show how the subspace-based algorithms can be evaluated and compared by means of simple FIR filter interpretations. The algorithms are illustrated...... with working Matlab code and applications in speech processing....

  2. Working Memory and Speech Recognition in Noise under Ecologically Relevant Listening Conditions: Effects of Visual Cues and Noise Type among Adults with Hearing Loss

    Science.gov (United States)

    Miller, Christi W.; Stewart, Erin K.; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A.; Tremblay, Kelly

    2017-01-01

    Purpose: This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Method: Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2…

  3. Medial olivocochlear function in children with poor speech-in-noise performance and language disorder.

    Science.gov (United States)

    Rocha-Muniz, Caroline Nunes; Mamede Carvallo, Renata Mota; Schochat, Eliane

    2017-05-01

    Contralateral masking of transient-evoked otoacoustic emissions is a phenomenon that suggests an inhibitory effect of the olivocochlear efferent auditory pathway. Many studies have been inconclusive in demonstrating a clear connection between this system and a behavioral speech-in-noise listening skill. The purpose of this study was to investigate the activation of a medial olivocochlear (MOC) efferent in children with poor speech-in-noise (PSIN) performance and children with language impairment and PSIN (SLI + PSIN). Transient evoked otoacoustic emissions (TEOAEs) with and without contralateral white noise were tested in 52 children (between 6 and 12 years). These children were arranged in three groups: typical development (TD) (n = 25), PSIN (n = 14) and SLI + PSI (n = 13). PSIN and SLI + PSI groups presented reduced otoacoustic emission suppression in comparison with the TD group. Our finding suggests differences in MOC function among children with typical development and children with poor SIN and language problems. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Development and evaluation of the British English coordinate response measure speech-in-noise test as an occupational hearing assessment tool.

    Science.gov (United States)

    Semeraro, Hannah D; Rowan, Daniel; van Besouw, Rachel M; Allsopp, Adrian A

    2017-10-01

    The studies described in this article outline the design and development of a British English version of the coordinate response measure (CRM) speech-in-noise (SiN) test. Our interest in the CRM is as a SiN test with high face validity for occupational auditory fitness for duty (AFFD) assessment. Study 1 used the method of constant stimuli to measure and adjust the psychometric functions of each target word, producing a speech corpus with equal intelligibility. After ensuring all the target words had similar intelligibility, for Studies 2 and 3, the CRM was presented in an adaptive procedure in stationary speech-spectrum noise to measure speech reception thresholds and evaluate the test-retest reliability of the CRM SiN test. Studies 1 (n = 20) and 2 (n = 30) were completed by normal-hearing civilians. Study 3 (n = 22) was completed by hearing impaired military personnel. The results display good test-retest reliability (95% confidence interval (CI) hearing impairment. The British English CRM using stationary speech-spectrum noise is a "ready to use" SiN test, suitable for investigation as an AFFD assessment tool for military personnel.

  5. Lexical-Access Ability and Cognitive Predictors of Speech Recognition in Noise in Adult Cochlear Implant Users.

    Science.gov (United States)

    Kaandorp, Marre W; Smits, Cas; Merkus, Paul; Festen, Joost M; Goverts, S Theo

    2017-01-01

    Not all of the variance in speech-recognition performance of cochlear implant (CI) users can be explained by biographic and auditory factors. In normal-hearing listeners, linguistic and cognitive factors determine most of speech-in-noise performance. The current study explored specifically the influence of visually measured lexical-access ability compared with other cognitive factors on speech recognition of 24 postlingually deafened CI users. Speech-recognition performance was measured with monosyllables in quiet (consonant-vowel-consonant [CVC]), sentences-in-noise (SIN), and digit-triplets in noise (DIN). In addition to a composite variable of lexical-access ability (LA), measured with a lexical-decision test (LDT) and word-naming task, vocabulary size, working-memory capacity (Reading Span test [RSpan]), and a visual analogue of the SIN test (text reception threshold test) were measured. The DIN test was used to correct for auditory factors in SIN thresholds by taking the difference between SIN and DIN: SRT diff . Correlation analyses revealed that duration of hearing loss (dHL) was related to SIN thresholds. Better working-memory capacity was related to SIN and SRT diff scores. LDT reaction time was positively correlated with SRT diff scores. No significant relationships were found for CVC or DIN scores with the predictor variables. Regression analyses showed that together with dHL, RSpan explained 55% of the variance in SIN thresholds. When controlling for auditory performance, LA, LDT, and RSpan separately explained, together with dHL, respectively 37%, 36%, and 46% of the variance in SRT diff outcome. The results suggest that poor verbal working-memory capacity and to a lesser extent poor lexical-access ability limit speech-recognition ability in listeners with a CI.

  6. Seeing the talker’s face supports executive processing of speech in steady state noise

    OpenAIRE

    Sushmit eMishra; Thomas eLunner; Thomas eLunner; Thomas eLunner; Stefan eStenfelt; Stefan eStenfelt; Jerker eRönnberg; Mary eRudner

    2013-01-01

    Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT, Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-st...

  7. Speech Processing to Improve the Perception of Speech in Background Noise for Children With Auditory Processing Disorder and Typically Developing Peers.

    Science.gov (United States)

    Flanagan, Sheila; Zorilă, Tudor-Cătălin; Stylianou, Yannis; Moore, Brian C J

    2018-01-01

    Auditory processing disorder (APD) may be diagnosed when a child has listening difficulties but has normal audiometric thresholds. For adults with normal hearing and with mild-to-moderate hearing impairment, an algorithm called spectral shaping with dynamic range compression (SSDRC) has been shown to increase the intelligibility of speech when background noise is added after the processing. Here, we assessed the effect of such processing using 8 children with APD and 10 age-matched control children. The loudness of the processed and unprocessed sentences was matched using a loudness model. The task was to repeat back sentences produced by a female speaker when presented with either speech-shaped noise (SSN) or a male competing speaker (CS) at two signal-to-background ratios (SBRs). Speech identification was significantly better with SSDRC processing than without, for both groups. The benefit of SSDRC processing was greater for the SSN than for the CS background. For the SSN, scores were similar for the two groups at both SBRs. For the CS, the APD group performed significantly more poorly than the control group. The overall improvement produced by SSDRC processing could be useful for enhancing communication in a classroom where the teacher's voice is broadcast using a wireless system.

  8. Do We Perceive Others Better than Ourselves? A Perceptual Benefit for Noise-Vocoded Speech Produced by an Average Speaker.

    Directory of Open Access Journals (Sweden)

    William L Schuerman

    Full Text Available In different tasks involving action perception, performance has been found to be facilitated when the presented stimuli were produced by the participants themselves rather than by another participant. These results suggest that the same mental representations are accessed during both production and perception. However, with regard to spoken word perception, evidence also suggests that listeners' representations for speech reflect the input from their surrounding linguistic community rather than their own idiosyncratic productions. Furthermore, speech perception is heavily influenced by indexical cues that may lead listeners to frame their interpretations of incoming speech signals with regard to speaker identity. In order to determine whether word recognition evinces similar self-advantages as found in action perception, it was necessary to eliminate indexical cues from the speech signal. We therefore asked participants to identify noise-vocoded versions of Dutch words that were based on either their own recordings or those of a statistically average speaker. The majority of participants were more accurate for the average speaker than for themselves, even after taking into account differences in intelligibility. These results suggest that the speech representations accessed during perception of noise-vocoded speech are more reflective of the input of the speech community, and hence that speech perception is not necessarily based on representations of one's own speech.

  9. Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting.

    Science.gov (United States)

    Wöllmer, Martin; Marchi, Erik; Squartini, Stefano; Schuller, Björn

    2011-09-01

    Highly spontaneous, conversational, and potentially emotional and noisy speech is known to be a challenge for today's automatic speech recognition (ASR) systems, which highlights the need for advanced algorithms that improve speech features and models. Histogram Equalization is an efficient method to reduce the mismatch between clean and noisy conditions by normalizing all moments of the probability distribution of the feature vector components. In this article, we propose to combine histogram equalization and multi-condition training for robust keyword detection in noisy speech. To better cope with conversational speaking styles, we show how contextual information can be effectively exploited in a multi-stream ASR framework that dynamically models context-sensitive phoneme estimates generated by a long short-term memory neural network. The proposed techniques are evaluated on the SEMAINE database-a corpus containing emotionally colored conversations with a cognitive system for "Sensitive Artificial Listening".

  10. Current trends in small vocabulary speech recognition for equipment control

    Science.gov (United States)

    Doukas, Nikolaos; Bardis, Nikolaos G.

    2017-09-01

    Speech recognition systems allow human - machine communication to acquire an intuitive nature that approaches the simplicity of inter - human communication. Small vocabulary speech recognition is a subset of the overall speech recognition problem, where only a small number of words need to be recognized. Speaker independent small vocabulary recognition can find significant applications in field equipment used by military personnel. Such equipment may typically be controlled by a small number of commands that need to be given quickly and accurately, under conditions where delicate manual operations are difficult to achieve. This type of application could hence significantly benefit by the use of robust voice operated control components, as they would facilitate the interaction with their users and render it much more reliable in times of crisis. This paper presents current challenges involved in attaining efficient and robust small vocabulary speech recognition. These challenges concern feature selection, classification techniques, speaker diversity and noise effects. A state machine approach is presented that facilitates the voice guidance of different equipment in a variety of situations.

  11. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

    Directory of Open Access Journals (Sweden)

    Hiroshi Saruwatari

    2007-01-01

    Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a 93.9% word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.

  12. Noise-Robust Monitoring of Lombard Speech Using a Wireless Neck-surface Accelerometer and Microphone

    Science.gov (United States)

    2017-08-20

    simultaneously powered via USB and battery. The system contains a small receiver that is equipped with the same Bluetooth module as the transmitter (BC127...G. R., “Subglottal impedance-based inverse filtering of voiced sounds using neck surface acceleration,” IEEE Trans. Audio Speech Lang. Processing

  13. Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2011-01-01

    A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The ...... process provides a key measure of speech intelligibility. © 2011 Acoustical Society of America.......A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data....... The model estimates the speech-to-noise envelope power ratio, SNR env, at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech...

  14. Stochastic algorithm for channel optimized vector quantization: application to robust narrow-band speech coding

    International Nuclear Information System (INIS)

    Bouzid, M.; Benkherouf, H.; Benzadi, K.

    2011-01-01

    In this paper, we propose a stochastic joint source-channel scheme developed for efficient and robust encoding of spectral speech LSF parameters. The encoding system, named LSF-SSCOVQ-RC, is an LSF encoding scheme based on a reduced complexity stochastic split vector quantizer optimized for noisy channel. For transmissions over noisy channel, we will show first that our LSF-SSCOVQ-RC encoder outperforms the conventional LSF encoder designed by the split vector quantizer. After that, we applied the LSF-SSCOVQ-RC encoder (with weighted distance) for the robust encoding of LSF parameters of the 2.4 Kbits/s MELP speech coder operating over a noisy/noiseless channel. The simulation results will show that the proposed LSF encoder, incorporated in the MELP, ensure better performances than the original MELP MSVQ of 25 bits/frame; especially when the transmission channel is highly disturbed. Indeed, we will show that the LSF-SSCOVQ-RC yields significant improvement to the LSFs encoding performances by ensuring reliable transmissions over noisy channel.

  15. Dynamic relation between working memory capacity and speech recognition in noise during the first 6 months of hearing aid use.

    Science.gov (United States)

    Ng, Elaine H N; Classon, Elisabet; Larsby, Birgitta; Arlinger, Stig; Lunner, Thomas; Rudner, Mary; Rönnberg, Jerker

    2014-11-23

    The present study aimed to investigate the changing relationship between aided speech recognition and cognitive function during the first 6 months of hearing aid use. Twenty-seven first-time hearing aid users with symmetrical mild to moderate sensorineural hearing loss were recruited. Aided speech recognition thresholds in noise were obtained in the hearing aid fitting session as well as at 3 and 6 months postfitting. Cognitive abilities were assessed using a reading span test, which is a measure of working memory capacity, and a cognitive test battery. Results showed a significant correlation between reading span and speech reception threshold during the hearing aid fitting session. This relation was significantly weakened over the first 6 months of hearing aid use. Multiple regression analysis showed that reading span was the main predictor of speech recognition thresholds in noise when hearing aids were first fitted, but that the pure-tone average hearing threshold was the main predictor 6 months later. One way of explaining the results is that working memory capacity plays a more important role in speech recognition in noise initially rather than after 6 months of use. We propose that new hearing aid users engage working memory capacity to recognize unfamiliar processed speech signals because the phonological form of these signals cannot be automatically matched to phonological representations in long-term memory. As familiarization proceeds, the mismatch effect is alleviated, and the engagement of working memory capacity is reduced. © The Author(s) 2014.

  16. Relations between perceptual measures of temporal processing, auditory-evoked brainstem responses and speech intelligibility in noise

    DEFF Research Database (Denmark)

    Papakonstantinou, Alexandra; Strelcyk, Olaf; Dau, Torsten

    2011-01-01

    This study investigates behavioural and objective measures of temporal auditory processing and their relation to the ability to understand speech in noise. The experiments were carried out on a homogeneous group of seven hearing-impaired listeners with normal sensitivity at low frequencies (up to 1...... kHz) and steeply sloping hearing losses above 1 kHz. For comparison, data were also collected for five normalhearing listeners. Temporal processing was addressed at low frequencies by means of psychoacoustical frequency discrimination, binaural masked detection and amplitude modulation (AM......) detection. In addition, auditory brainstem responses (ABRs) to clicks and broadband rising chirps were recorded. Furthermore, speech reception thresholds (SRTs) were determined for Danish sentences in speechshaped noise. The main findings were: (1) SRTs were neither correlated with hearing sensitivity...

  17. A Novel Voice Sensor for the Detection of Speech Signals

    Directory of Open Access Journals (Sweden)

    Kun-Ching Wang

    2013-12-01

    Full Text Available In order to develop a novel voice sensor to detect human voices, the use of features which are more robust to noise is an important issue. Voice sensor is also called voice activity detection (VAD. Due to that the inherent nature of the formant structure only occurred on the speech spectrogram (well-known as voiceprint, Wu et al. were the first to use band-spectral entropy (BSE to describe the characteristics of voiceprints. However, the performance of VAD based on BSE feature was degraded in colored noise (or voiceprint-like noise environments. In order to solve this problem, we propose the two-dimensional part-band energy entropy (TD-PBEE parameter based on two variables: part-band partition number upon frequency index and long-term window size upon time index to further improve the BSE-based VAD algorithm. The two variables can efficiently represent the characteristics of voiceprints on each critical frequency band and use long-term information for noisy speech spectrograms, respectively. The TD-PBEE parameter can be regarded as a PBEE parameter over time. First, the strength of voiceprints can be partly enhanced by using four entropies applied to four part-bands. We can use the four part-band energy entropies for describing the voiceprints in detail. Due to the characteristics of non-stationary for speech and various noises, we will then use long-term information processing to refine the PBEE, so the voice-like noise can be distinguished from noisy speech through the concept of PBEE with long-term information. Our experiments show that the proposed feature extraction with the TD-PBEE parameter is quite insensitive to background noise. The proposed TD-PBEE-based VAD algorithm is evaluated for four types of noises and five signal-to-noise ratio (SNR levels. We find that the accuracy of the proposed TD-PBEE-based VAD algorithm averaged over all noises and all SNR levels is better than that of other considered VAD algorithms.

  18. Reducing the Effects of Background Noise during Auditory Functional Magnetic Resonance Imaging of Speech Processing: Qualitative and Quantitative Comparisons between Two Image Acquisition Schemes and Noise Cancellation

    Science.gov (United States)

    Blackman, Graham A.; Hall, Deborah A.

    2011-01-01

    Purpose: The intense sound generated during functional magnetic resonance imaging (fMRI) complicates studies of speech and hearing. This experiment evaluated the benefits of using active noise cancellation (ANC), which attenuates the level of the scanner sound at the participant's ear by up to 35 dB around the peak at 600 Hz. Method: Speech and…

  19. Robustness of digitally modulated signal features against variation in HF noise model

    Directory of Open Access Journals (Sweden)

    Shoaib Mobien

    2011-01-01

    Full Text Available Abstract High frequency (HF band has both military and civilian uses. It can be used either as a primary or backup communication link. Automatic modulation classification (AMC is of an utmost importance in this band for the purpose of communications monitoring; e.g., signal intelligence and spectrum management. A widely used method for AMC is based on pattern recognition (PR. Such a method has two main steps: feature extraction and classification. The first step is generally performed in the presence of channel noise. Recent studies show that HF noise could be modeled by Gaussian or bi-kappa distributions, depending on day-time. Therefore, it is anticipated that change in noise model will have impact on features extraction stage. In this article, we investigate the robustness of well known digitally modulated signal features against variation in HF noise. Specifically, we consider temporal time domain (TTD features, higher order cumulants (HOC, and wavelet based features. In addition, we propose new features extracted from the constellation diagram and evaluate their robustness against the change in noise model. This study is targeting 2PSK, 4PSK, 8PSK, 16QAM, 32QAM, and 64QAM modulations, as they are commonly used in HF communications.

  20. Environmental Noise, Genetic Diversity and the Evolution of Evolvability and Robustness in Model Gene Networks

    Science.gov (United States)

    Steiner, Christopher F.

    2012-01-01

    The ability of organisms to adapt and persist in the face of environmental change is accepted as a fundamental feature of natural systems. More contentious is whether the capacity of organisms to adapt (or “evolvability”) can itself evolve and the mechanisms underlying such responses. Using model gene networks, I provide evidence that evolvability emerges more readily when populations experience positively autocorrelated environmental noise (red noise) compared to populations in stable or randomly varying (white noise) environments. Evolvability was correlated with increasing genetic robustness to effects on network viability and decreasing robustness to effects on phenotypic expression; populations whose networks displayed greater viability robustness and lower phenotypic robustness produced more additive genetic variation and adapted more rapidly in novel environments. Patterns of selection for robustness varied antagonistically with epistatic effects of mutations on viability and phenotypic expression, suggesting that trade-offs between these properties may constrain their evolutionary responses. Evolution of evolvability and robustness was stronger in sexual populations compared to asexual populations indicating that enhanced genetic variation under fluctuating selection combined with recombination load is a primary driver of the emergence of evolvability. These results provide insight into the mechanisms potentially underlying rapid adaptation as well as the environmental conditions that drive the evolution of genetic interactions. PMID:23284934

  1. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Koji Iwano

    2007-03-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  2. Acceptable noise level

    DEFF Research Database (Denmark)

    Olsen, Steen Østergaard; Nielsen, Lars Holme; Lantz, Johannes

    2012-01-01

    The acceptable noise level (ANL) is used to quantify the amount of background noise that subjects can accept while listening to speech, and is suggested for prediction of individual hearing-aid use. The aim of this study was to assess the repeatability of the ANL measured in normal-hearing subjects...... using running Danish and non-semantic speech materials as stimuli and modulated speech-spectrum and multi-talker babble noises as competing stimuli....

  3. Acceptable noise level

    DEFF Research Database (Denmark)

    Olsen, Steen Østergaard; Nielsen, Lars Holme; Lantz, Johannes

    2012-01-01

    The acceptable noise level (ANL) is used to quantify the amount of background noise that subjects can accept while listening to speech, and is suggested for prediction of individual hearing-aid use. The aim of this study was to assess the repeatability of the ANL measured in normal-hearing subjec...... using running Danish and non-semantic speech materials as stimuli and modulated speech-spectrum and multi-talker babble noises as competing stimuli....

  4. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    Science.gov (United States)

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  5. A Comparative Analysis of Pitch Detection Methods Under the Influence of Different Noise Conditions.

    Science.gov (United States)

    Sukhostat, Lyudmila; Imamverdiyev, Yadigar

    2015-07-01

    Pitch is one of the most important components in various speech processing systems. The aim of this study was to evaluate different pitch detection methods in terms of various noise conditions. Prospective study. For evaluation of pitch detection algorithms, time-domain, frequency-domain, and hybrid methods were considered by using Keele and CSTR speech databases. Each of them has its own advantages and disadvantages. Experiments have shown that BaNa method achieves the highest pitch detection accuracy. The development of methods for pitch detection, which are robust to additive noise at different signal-to-noise ratio, is an important field of research with many opportunities for enhancement the modern methods. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  6. Speech understanding in noise with an eyeglass hearing aid: asymmetric fitting and the head shadow benefit of anterior microphones.

    NARCIS (Netherlands)

    Mens, L.H.M.

    2011-01-01

    OBJECTIVE: To test speech understanding in noise using array microphones integrated in an eyeglass device and to test if microphones placed anteriorly at the temple provide better directivity than above the pinna. DESIGN: Sentences were presented from the front and uncorrelated noise from 45, 135,

  7. Robust estimation of the noise variance from background MR data

    NARCIS (Netherlands)

    Sijbers, J.; Den Dekker, A.J.; Poot, D.; Bos, R.; Verhoye, M.; Van Camp, N.; Van der Linden, A.

    2006-01-01

    In the literature, many methods are available for estimation of the variance of the noise in magnetic resonance (MR) images. A commonly used method, based on the maximum of the background mode of the histogram, is revisited and a new, robust, and easy to use method is presented based on maximum

  8. Speech enhancement

    CERN Document Server

    Benesty, Jacob; Chen, Jingdong

    2006-01-01

    We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red

  9. Robustness of SOC Estimation Algorithms for EV Lithium-Ion Batteries against Modeling Errors and Measurement Noise

    Directory of Open Access Journals (Sweden)

    Xue Li

    2015-01-01

    Full Text Available State of charge (SOC is one of the most important parameters in battery management system (BMS. There are numerous algorithms for SOC estimation, mostly of model-based observer/filter types such as Kalman filters, closed-loop observers, and robust observers. Modeling errors and measurement noises have critical impact on accuracy of SOC estimation in these algorithms. This paper is a comparative study of robustness of SOC estimation algorithms against modeling errors and measurement noises. By using a typical battery platform for vehicle applications with sensor noise and battery aging characterization, three popular and representative SOC estimation methods (extended Kalman filter, PI-controlled observer, and H∞ observer are compared on such robustness. The simulation and experimental results demonstrate that deterioration of SOC estimation accuracy under modeling errors resulted from aging and larger measurement noise, which is quantitatively characterized. The findings of this paper provide useful information on the following aspects: (1 how SOC estimation accuracy depends on modeling reliability and voltage measurement accuracy; (2 pros and cons of typical SOC estimators in their robustness and reliability; (3 guidelines for requirements on battery system identification and sensor selections.

  10. A speech reception in noise test for preschool children (the Galker-test)

    DEFF Research Database (Denmark)

    Lauritsen, Maj-Britt Glenn; Kreiner, Svend; Söderström, Margareta

    2015-01-01

    Purpose: This study evaluates initial validity and reliability of the “Galker test of speech reception in noise” developed for Danish preschool children suspected to have problems with hearing or understanding speech against strict psychometric standards and assesses acceptance by the children....... Methods:The Galker test is an audio-visual, computerised, word discrimination test in background noise, originally comprised of 50 word pairs. Three hundred and eighty eight children attending ordinary day care centres and aged 3–5 years were included. With multiple regression and the Rasch item response...... model it was examined whether the total score of the Galker test validly reflected item responses across subgroups defined by sex, age, bilingualism, tympanometry, audiometry and verbal comprehension. Results: A total of 370 children (95%) accepted testing and 339 (87%) completed all 50 items...

  11. Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach

    Directory of Open Access Journals (Sweden)

    B. Yegnanarayana

    2007-01-01

    Full Text Available Speech recorded from a throat microphone is robust to the surrounding noise, but sounds unnatural unlike the speech recorded from a close-speaking microphone. This paper addresses the issue of improving the perceptual quality of the throat microphone speech by mapping the speech spectra from the throat microphone to the close-speaking microphone. A neural network model is used to capture the speaker-dependent functional relationship between the feature vectors (cepstral coefficients of the two speech signals. A method is proposed to ensure the stability of the all-pole synthesis filter. Objective evaluations indicate the effectiveness of the proposed mapping scheme. The advantage of this method is that the model gives a smooth estimate of the spectra of the close-speaking microphone speech. No distortions are perceived in the reconstructed speech. This mapping technique is also used for bandwidth extension of telephone speech.

  12. Ear, Hearing and Speech

    DEFF Research Database (Denmark)

    Poulsen, Torben

    2000-01-01

    An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...

  13. Hearing speech in music.

    Science.gov (United States)

    Ekström, Seth-Reino; Borg, Erik

    2011-01-01

    The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (Ptempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (Pmusic offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

  14. Introducing the White Noise task in childhood: associations between speech illusions and psychosis vulnerability.

    Science.gov (United States)

    Rimvall, M K; Clemmensen, L; Munkholm, A; Rask, C U; Larsen, J T; Skovgaard, A M; Simons, C J P; van Os, J; Jeppesen, P

    2016-10-01

    Auditory verbal hallucinations (AVH) are common during development and may arise due to dysregulation in top-down processing of sensory input. This study was designed to examine the frequency and correlates of speech illusions measured using the White Noise (WN) task in children from the general population. Associations between speech illusions and putative risk factors for psychotic disorder and negative affect were examined. A total of 1486 children aged 11-12 years of the Copenhagen Child Cohort 2000 were examined with the WN task. Psychotic experiences and negative affect were determined using the Kiddie-SADS-PL. Register data described family history of mental disorders. Exaggerated Theory of Mind functioning (hyper-ToM) was measured by the ToM Storybook Frederik. A total of 145 (10%) children experienced speech illusions (hearing speech in the absence of speech stimuli), of which 102 (70%) experienced illusions perceived by the child as positive or negative (affectively salient). Experiencing hallucinations during the last month was associated with affectively salient speech illusions in the WN task [general cognitive ability: adjusted odds ratio (aOR) 2.01, 95% confidence interval (CI) 1.03-3.93]. Negative affect, both last month and lifetime, was also associated with affectively salient speech illusions (aOR 2.01, 95% CI 1.05-3.83 and aOR 1.79, 95% CI 1.11-2.89, respectively). Speech illusions were not associated with delusions, hyper-ToM or family history of mental disorders. Speech illusions were elicited in typically developing children in a WN-test paradigm, and point to an affective pathway to AVH mediated by dysregulation in top-down processing of sensory input.

  15. Influence of hearing loss on children’s identification of spondee words in a speech-shaped noise or a two-talker masker

    Science.gov (United States)

    Leibold, Lori J.; Hillock-Dunn, Andrea; Duncan, Nicole; Roush, Patricia A.; Buss, Emily

    2013-01-01

    This study compared spondee identification performance in presence of speech-shaped noise or two competing talkers across children with hearing loss and age-matched children with normal hearing. The results showed a greater masking effect for children with hearing loss compared to children with normal hearing for both masker conditions. However, the magnitude of this group difference was significantly larger for the two-talker compared to the speech-shaped noise masker. These results support the hypothesis that hearing loss influences children’s perceptual processing abilities. PMID:23492919

  16. Laboratory evaluation of an optimised internet-based speech-in-noise test for occupational high-frequency hearing loss screening: Occupational Earcheck

    NARCIS (Netherlands)

    Sheikh Rashid, Marya; Leensen, Monique C. J.; de Laat, Jan A. P. M.; Dreschler, Wouter A.

    2017-01-01

    Objective: The "Occupational Earcheck'' (OEC) is a Dutch onlineself-screening speech-in-noise test developed for the detection of occupational high-frequency hearing loss (HFHL). This study evaluates an optimised version of the test and determines the most appropriate masking noise. Design: The

  17. Adaptive filtration of speech signals in the presence of correlated noise with random variation of probabilistic characteristics

    OpenAIRE

    M. O. Partala; S. Ya. Zhuk

    2007-01-01

    On the base of mixed Markoff process in discrete time optimal and quasioptimal algorithms is designed for adaptive filtration of speech signals in the presence of correlated noise with random variation of probabilistic characteristics.

  18. Cognitive load during speech perception in noise: the influence of age, hearing loss, and cognition on the pupil response.

    Science.gov (United States)

    Zekveld, Adriana A; Kramer, Sophia E; Festen, Joost M

    2011-01-01

    The aim of the present study was to evaluate the influence of age, hearing loss, and cognitive ability on the cognitive processing load during listening to speech presented in noise. Cognitive load was assessed by means of pupillometry (i.e., examination of pupil dilation), supplemented with subjective ratings. Two groups of subjects participated: 38 middle-aged participants (mean age = 55 yrs) with normal hearing and 36 middle-aged participants (mean age = 61 yrs) with hearing loss. Using three Speech Reception Threshold (SRT) in stationary noise tests, we estimated the speech-to-noise ratios (SNRs) required for the correct repetition of 50%, 71%, or 84% of the sentences (SRT50%, SRT71%, and SRT84%, respectively). We examined the pupil response during listening: the peak amplitude, the peak latency, the mean dilation, and the pupil response duration. For each condition, participants rated the experienced listening effort and estimated their performance level. Participants also performed the Text Reception Threshold (TRT) test, a test of processing speed, and a word vocabulary test. Data were compared with previously published data from young participants with normal hearing. Hearing loss was related to relatively poor SRTs, and higher speech intelligibility was associated with lower effort and higher performance ratings. For listeners with normal hearing, increasing age was associated with poorer TRTs and slower processing speed but with larger word vocabulary. A multivariate repeated-measures analysis of variance indicated main effects of group and SNR and an interaction effect between these factors on the pupil response. The peak latency was relatively short and the mean dilation was relatively small at low intelligibility levels for the middle-aged groups, whereas the reverse was observed for high intelligibility levels. The decrease in the pupil response as a function of increasing SNR was relatively small for the listeners with hearing loss. Spearman

  19. Office noise: Can headphones and masking sound attenuate distraction by background speech?

    Science.gov (United States)

    Jahncke, Helena; Björkeholm, Patrik; Marsh, John E; Odelius, Johan; Sörqvist, Patrik

    2016-11-22

    Background speech is one of the most disturbing noise sources at shared workplaces in terms of both annoyance and performance-related disruption. Therefore, it is important to identify techniques that can efficiently protect performance against distraction. It is also important that the techniques are perceived as satisfactory and are subjectively evaluated as effective in their capacity to reduce distraction. The aim of the current study was to compare three methods of attenuating distraction from background speech: masking a background voice with nature sound through headphones, masking a background voice with other voices through headphones and merely wearing headphones (without masking) as a way to attenuate the background sound. Quiet was deployed as a baseline condition. Thirty students participated in an experiment employing a repeated measures design. Performance (serial short-term memory) was impaired by background speech (1 voice), but this impairment was attenuated when the speech was masked - and in particular when it was masked by nature sound. Furthermore, perceived workload was lowest in the quiet condition and significantly higher in all other sound conditions. Notably, the headphones tested as a sound-attenuating device (i.e. without masking) did not protect against the effects of background speech on performance and subjective work load. Nature sound was the only masking condition that worked as a protector of performance, at least in the context of the serial recall task. However, despite the attenuation of distraction by nature sound, perceived workload was still high - suggesting that it is difficult to find a masker that is both effective and perceived as satisfactory.

  20. Auto Regressive Moving Average (ARMA) Modeling Method for Gyro Random Noise Using a Robust Kalman Filter

    Science.gov (United States)

    Huang, Lei

    2015-01-01

    To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required. PMID:26437409

  1. Neural Segregation of Concurrent Speech: Effects of Background Noise and Reverberation on Auditory Scene Analysis in the Ventral Cochlear Nucleus.

    Science.gov (United States)

    Sayles, Mark; Stasiak, Arkadiusz; Winter, Ian M

    2016-01-01

    Concurrent complex sounds (e.g., two voices speaking at once) are perceptually disentangled into separate "auditory objects". This neural processing often occurs in the presence of acoustic-signal distortions from noise and reverberation (e.g., in a busy restaurant). A difference in periodicity between sounds is a strong segregation cue under quiet, anechoic conditions. However, noise and reverberation exert differential effects on speech intelligibility under "cocktail-party" listening conditions. Previous neurophysiological studies have concentrated on understanding auditory scene analysis under ideal listening conditions. Here, we examine the effects of noise and reverberation on periodicity-based neural segregation of concurrent vowels /a/ and /i/, in the responses of single units in the guinea-pig ventral cochlear nucleus (VCN): the first processing station of the auditory brain stem. In line with human psychoacoustic data, we find reverberation significantly impairs segregation when vowels have an intonated pitch contour, but not when they are spoken on a monotone. In contrast, noise impairs segregation independent of intonation pattern. These results are informative for models of speech processing under ecologically valid listening conditions, where noise and reverberation abound.

  2. Effects of background noise on inter-trial phase coherence and auditory N1-P2 responses to speech stimuli.

    Science.gov (United States)

    Koerner, Tess K; Zhang, Yang

    2015-10-01

    This study investigated the effects of a speech-babble background noise on inter-trial phase coherence (ITPC, also referred to as phase locking value (PLV)) and auditory event-related responses (AERP) to speech sounds. Specifically, we analyzed EEG data from 11 normal hearing subjects to examine whether ITPC can predict noise-induced variations in the obligatory N1-P2 complex response. N1-P2 amplitude and latency data were obtained for the /bu/syllable in quiet and noise listening conditions. ITPC data in delta, theta, and alpha frequency bands were calculated for the N1-P2 responses in the two passive listening conditions. Consistent with previous studies, background noise produced significant amplitude reduction and latency increase in N1 and P2, which were accompanied by significant ITPC decreases in all the three frequency bands. Correlation analyses further revealed that variations in ITPC were able to predict the amplitude and latency variations in N1-P2. The results suggest that trial-by-trial analysis of cortical neural synchrony is a valuable tool in understanding the modulatory effects of background noise on AERP measures. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. Feedforward and Feedback Control in Apraxia of Speech: Effects of Noise Masking on Vowel Production

    Science.gov (United States)

    Maas, Edwin; Mailend, Marja-Liisa; Guenther, Frank H.

    2015-01-01

    Purpose: This study was designed to test two hypotheses about apraxia of speech (AOS) derived from the Directions Into Velocities of Articulators (DIVA) model (Guenther et al., 2006): the feedforward system deficit hypothesis and the feedback system deficit hypothesis. Method: The authors used noise masking to minimize auditory feedback during…

  4. Investigating the Role of Working Memory in Speech-in-noise Identification for Listeners with Normal Hearing.

    Science.gov (United States)

    Füllgrabe, Christian; Rosen, Stuart

    2016-01-01

    With the advent of cognitive hearing science, increased attention has been given to individual differences in cognitive functioning and their explanatory power in accounting for inter-listener variability in understanding speech in noise (SiN). The psychological construct that has received most interest is working memory (WM), representing the ability to simultaneously store and process information. Common lore and theoretical models assume that WM-based processes subtend speech processing in adverse perceptual conditions, such as those associated with hearing loss or background noise. Empirical evidence confirms the association between WM capacity (WMC) and SiN identification in older hearing-impaired listeners. To assess whether WMC also plays a role when listeners without hearing loss process speech in acoustically adverse conditions, we surveyed published and unpublished studies in which the Reading-Span test (a widely used measure of WMC) was administered in conjunction with a measure of SiN identification. The survey revealed little or no evidence for an association between WMC and SiN performance. We also analysed new data from 132 normal-hearing participants sampled from across the adult lifespan (18-91 years), for a relationship between Reading-Span scores and identification of matrix sentences in noise. Performance on both tasks declined with age, and correlated weakly even after controlling for the effects of age and audibility (r = 0.39, p ≤ 0.001, one-tailed). However, separate analyses for different age groups revealed that the correlation was only significant for middle-aged and older groups but not for the young (< 40 years) participants.

  5. Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model.

    Science.gov (United States)

    Jürgens, Tim; Brand, Thomas

    2009-11-01

    This study compares the phoneme recognition performance in speech-shaped noise of a microscopic model for speech recognition with the performance of normal-hearing listeners. "Microscopic" is defined in terms of this model twofold. First, the speech recognition rate is predicted on a phoneme-by-phoneme basis. Second, microscopic modeling means that the signal waveforms to be recognized are processed by mimicking elementary parts of human's auditory processing. The model is based on an approach by Holube and Kollmeier [J. Acoust. Soc. Am. 100, 1703-1716 (1996)] and consists of a psychoacoustically and physiologically motivated preprocessing and a simple dynamic-time-warp speech recognizer. The model is evaluated while presenting nonsense speech in a closed-set paradigm. Averaged phoneme recognition rates, specific phoneme recognition rates, and phoneme confusions are analyzed. The influence of different perceptual distance measures and of the model's a-priori knowledge is investigated. The results show that human performance can be predicted by this model using an optimal detector, i.e., identical speech waveforms for both training of the recognizer and testing. The best model performance is yielded by distance measures which focus mainly on small perceptual distances and neglect outliers.

  6. MEMS microphone innovations towards high signal to noise ratios (Conference Presentation) (Plenary Presentation)

    Science.gov (United States)

    Dehé, Alfons

    2017-06-01

    After decades of research and more than ten years of successful production in very high volumes Silicon MEMS microphones are mature and unbeatable in form factor and robustness. Audio applications such as video, noise cancellation and speech recognition are key differentiators in smart phones. Microphones with low self-noise enable those functions. Backplate-free microphones enter the signal to noise ratios above 70dB(A). This talk will describe state of the art MEMS technology of Infineon Technologies. An outlook on future technologies such as the comb sensor microphone will be given.

  7. Experimental investigation of the robustness against noise for different Bell-type inequalities in three-qubit Greenberger-Horne-Zeilinger states

    International Nuclear Information System (INIS)

    Lu Huaixin; Zhao Jiaqiang; Cao Lianzhen; Wang Xiaoqin

    2011-01-01

    There are different families of inequalities that can be used to characterize the entanglement of multiqubit entangled states by the violation of quantum mechanics prediction versus local realism prediction. In a noisy environment, the violation of different inequalities distinguishes a direct from a noise-free environment. That is, each inequality has a different robustness against noise. We investigate theoretically and experimentally this proposition with the Mermin inequality, Bell inequality, and Svetlichny inequality using three-qubit GHZ states for different levels of noise. Our purpose is to determine which one of the inequalities is more robust against noise and thus more suitable to characterize entanglement of states. Our results show that the Mermin inequality is the most robust against stronger noise and is, thus, more suitable for characterizing the entanglement of three-qubit GHZ states in a noisy environment.

  8. Experimental investigation of the robustness against noise for different Bell-type inequalities in three-qubit Greenberger-Horne-Zeilinger states

    Energy Technology Data Exchange (ETDEWEB)

    Lu Huaixin; Zhao Jiaqiang; Cao Lianzhen; Wang Xiaoqin [Department of Physics and Electronic Science, Weifang University, Weifang, Shandong 261061 (China)

    2011-10-15

    There are different families of inequalities that can be used to characterize the entanglement of multiqubit entangled states by the violation of quantum mechanics prediction versus local realism prediction. In a noisy environment, the violation of different inequalities distinguishes a direct from a noise-free environment. That is, each inequality has a different robustness against noise. We investigate theoretically and experimentally this proposition with the Mermin inequality, Bell inequality, and Svetlichny inequality using three-qubit GHZ states for different levels of noise. Our purpose is to determine which one of the inequalities is more robust against noise and thus more suitable to characterize entanglement of states. Our results show that the Mermin inequality is the most robust against stronger noise and is, thus, more suitable for characterizing the entanglement of three-qubit GHZ states in a noisy environment.

  9. Computer-based auditory phoneme discrimination training improves speech recognition in noise in experienced adult cochlear implant listeners.

    Science.gov (United States)

    Schumann, Annette; Serman, Maja; Gefeller, Olaf; Hoppe, Ulrich

    2015-03-01

    Specific computer-based auditory training may be a useful completion in the rehabilitation process for cochlear implant (CI) listeners to achieve sufficient speech intelligibility. This study evaluated the effectiveness of a computerized, phoneme-discrimination training programme. The study employed a pretest-post-test design; participants were randomly assigned to the training or control group. Over a period of three weeks, the training group was instructed to train in phoneme discrimination via computer, twice a week. Sentence recognition in different noise conditions (moderate to difficult) was tested pre- and post-training, and six months after the training was completed. The control group was tested and retested within one month. Twenty-seven adult CI listeners who had been using cochlear implants for more than two years participated in the programme; 15 adults in the training group, 12 adults in the control group. Besides significant improvements for the trained phoneme-identification task, a generalized training effect was noted via significantly improved sentence recognition in moderate noise. No significant changes were noted in the difficult noise conditions. Improved performance was maintained over an extended period. Phoneme-discrimination training improves experienced CI listeners' speech perception in noise. Additional research is needed to optimize auditory training for individual benefit.

  10. Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model

    Directory of Open Access Journals (Sweden)

    Lotter Thomas

    2005-01-01

    Full Text Available This contribution presents two spectral amplitude estimators for acoustical background noise suppression based on maximum a posteriori estimation and super-Gaussian statistical modelling of the speech DFT amplitudes. The probability density function of the speech spectral amplitude is modelled with a simple parametric function, which allows a high approximation accuracy for Laplace- or Gamma-distributed real and imaginary parts of the speech DFT coefficients. Also, the statistical model can be adapted to optimally fit the distribution of the speech spectral amplitudes for a specific noise reduction system. Based on the super-Gaussian statistical model, computationally efficient maximum a posteriori speech estimators are derived, which outperform the commonly applied Ephraim-Malah algorithm.

  11. Speech-in-Noise Tests and Supra-threshold Auditory Evoked Potentials as Metrics for Noise Damage and Clinical Trial Outcome Measures.

    Science.gov (United States)

    Le Prell, Colleen G; Brungart, Douglas S

    2016-09-01

    In humans, the accepted clinical standards for detecting hearing loss are the behavioral audiogram, based on the absolute detection threshold of pure-tones, and the threshold auditory brainstem response (ABR). The audiogram and the threshold ABR are reliable and sensitive measures of hearing thresholds in human listeners. However, recent results from noise-exposed animals demonstrate that noise exposure can cause substantial neurodegeneration in the peripheral auditory system without degrading pure-tone audiometric thresholds. It has been suggested that clinical measures of auditory performance conducted with stimuli presented above the detection threshold may be more sensitive than the behavioral audiogram in detecting early-stage noise-induced hearing loss in listeners with audiometric thresholds within normal limits. Supra-threshold speech-in-noise testing and supra-threshold ABR responses are reviewed here, given that they may be useful supplements to the behavioral audiogram for assessment of possible neurodegeneration in noise-exposed listeners. Supra-threshold tests may be useful for assessing the effects of noise on the human inner ear, and the effectiveness of interventions designed to prevent noise trauma. The current state of the science does not necessarily allow us to define a single set of best practice protocols. Nonetheless, we encourage investigators to incorporate these metrics into test batteries when feasible, with an effort to standardize procedures to the greatest extent possible as new reports emerge.

  12. Two Methods of Mechanical Noise Reduction of Recorded Speech During Phonation in an MRI devic

    Czech Academy of Sciences Publication Activity Database

    Přibil, J.; Horáček, Jaromír; Horák, Petr

    2011-01-01

    Roč. 11, č. 3 (2011), s. 92-98 ISSN 1335-8871 R&D Projects: GA ČR GA102/09/0989 Institutional research plan: CEZ:AV0Z20760514; CEZ:AV0Z20670512 Keywords : speech processing * noise reduction * NMR imaging Subject RIV: BI - Acoustics Impact factor: 0.418, year: 2011

  13. On the use of the distortion-sensitivity approach in examining the role of linguistic abilities in speech understanding in noise.

    Science.gov (United States)

    Goverts, S Theo; Huysmans, Elke; Kramer, Sophia E; de Groot, Annette M B; Houtgast, Tammo

    2011-12-01

    Researchers have used the distortion-sensitivity approach in the psychoacoustical domain to investigate the role of auditory processing abilities in speech perception in noise (van Schijndel, Houtgast, & Festen, 2001; Goverts & Houtgast, 2010). In this study, the authors examined the potential applicability of the distortion-sensitivity approach for investigating the role of linguistic abilities in speech understanding in noise. The authors applied the distortion-sensitivity approach by measuring the processing of visually presented masked text in a condition with manipulated syntactic, lexical, and semantic cues and while using the Text Reception Threshold (George et al., 2007; Kramer, Zekveld, & Houtgast, 2009; Zekveld, George, Kramer, Goverts, & Houtgast, 2007) method. Two groups that differed in linguistic abilities were studied: 13 native and 10 non-native speakers of Dutch, all typically hearing university students. As expected, the non-native subjects showed substantially reduced performance. The results of the distortion-sensitivity approach yielded differentiated results on the use of specific linguistic cues in the 2 groups. The results show the potential value of the distortion-sensitivity approach in studying the role of linguistic abilities in speech understanding in noise of individuals with hearing impairment.

  14. Explaining Discrepancies Between the Digit Triplet Speech-in-Noise Test Score and Self-Reported Hearing Problems in Older Adults.

    Science.gov (United States)

    Pronk, Marieke; Deeg, Dorly J H; Kramer, Sophia E

    2018-04-17

    The purpose of this study is to determine which demographic, health-related, mood, personality, or social factors predict discrepancies between older adults' functional speech-in-noise test result and their self-reported hearing problems. Data of 1,061 respondents from the Longitudinal Aging Study Amsterdam were used (ages ranged from 57 to 95 years). Functional hearing problems were measured using a digit triplet speech-in-noise test. Five questions were used to assess self-reported hearing problems. Scores of both hearing measures were dichotomized. Two discrepancy outcomes were created: (a) being unaware: those with functional but without self-reported problems (reference is aware: those with functional and self-reported problems); (b) reporting false complaints: those without functional but with self-reported problems (reference is well: those without functional and self-reported hearing problems). Two multivariable prediction models (logistic regression) were built with 19 candidate predictors. The speech reception threshold in noise was kept (forced) as a predictor in both models. Persons with higher self-efficacy (to initiate behavior) and higher self-esteem had a higher odds to being unaware than persons with lower self-efficacy scores (odds ratio [OR] = 1.13 and 1.11, respectively). Women had a higher odds than men (OR = 1.47). Persons with more chronic diseases and persons with worse (i.e., higher) speech-in-noise reception thresholds in noise had a lower odds to being unaware (OR = 0.85 and 0.91, respectively) than persons with less diseases and better thresholds, respectively. A higher odds to reporting false complaints was predicted by more depressive symptoms (OR = 1.06), more chronic diseases (OR = 1.21), and a larger social network (OR = 1.02). Persons with higher self-efficacy (to complete behavior) had a lower odds (OR = 0.86), whereas persons with higher self-esteem had a higher odds to report false complaints (OR = 1.21). The explained variance

  15. Is There a Relationship between Speech Identification in Noise and Categorical Perception in Children with Dyslexia?

    Science.gov (United States)

    Calcus, Axelle; Lorenzi, Christian; Collet, Gregory; Colin, Cécile; Kolinsky, Régine

    2016-01-01

    Purpose: Children with dyslexia have been suggested to experience deficits in both categorical perception (CP) and speech identification in noise (SIN) perception. However, results regarding both abilities are inconsistent, and the relationship between them is still unclear. Therefore, this study aimed to investigate the relationship between CP…

  16. Long term memory for noise: evidence of robust encoding of very short temporal acoustic patterns.

    Directory of Open Access Journals (Sweden)

    Jayalakshmi Viswanathan

    2016-11-01

    Full Text Available Recent research has demonstrated that humans are able to implicitly encode and retain repeating patterns in meaningless auditory noise. Our study aimed at testing the robustness of long-term implicit recognition memory for these learned patterns. Participants performed a cyclic/non-cyclic discrimination task, during which they were presented with either 1-s cyclic noises (CNs (the two halves of the noise were identical or 1-s plain random noises (Ns. Among CNs and Ns presented once, target CNs were implicitly presented multiple times within a block, and implicit recognition of these target CNs was tested 4 weeks later using a similar cyclic/non-cyclic discrimination task. Furthermore, robustness of implicit recognition memory was tested by presenting participants with looped (shifting the origin and scrambled (chopping sounds into 10- and 20-ms bits before shuffling versions of the target CNs. We found that participants had robust implicit recognition memory for learned noise patterns after 4 weeks, right from the first presentation. Additionally, this memory was remarkably resistant to acoustic transformations, such as looping and scrambling of the sounds. Finally, implicit recognition of sounds was dependent on participant’s discrimination performance during learning. Our findings suggest that meaningless temporal features as short as 10 ms can be implicitly stored in long-term auditory memory. Moreover, successful encoding and storage of such fine features may vary between participants, possibly depending on individual attention and auditory discrimination abilities.

  17. Speech understanding in noise with integrated in-ear and muff-style hearing protection systems

    Directory of Open Access Journals (Sweden)

    Sharon M Abel

    2011-01-01

    Full Text Available Integrated hearing protection systems are designed to enhance free field and radio communications during military operations while protecting against the damaging effects of high-level noise exposure. A study was conducted to compare the effect of increasing the radio volume on the intelligibility of speech over the radios of two candidate systems, in-ear and muff-style, in 85-dBA speech babble noise presented free field. Twenty normal-hearing, English-fluent subjects, half male and half female, were tested in same gender pairs. Alternating as talker and listener, their task was to discriminate consonant-vowel-consonant syllables that contrasted either the initial or final consonant. Percent correct consonant discrimination increased with increases in the radio volume. At the highest volume, subjects achieved 79% with the in-ear device but only 69% with the muff-style device, averaged across the gender of listener/talker pairs and consonant position. Although there was no main effect of gender, female listener/talkers showed a 10% advantage for the final consonant and male listener/talkers showed a 1% advantage for the initial consonant. These results indicate that normal hearing users can achieve reasonably high radio communication scores with integrated in-ear hearing protection in moderately high-level noise that provides both energetic and informational masking. The adequacy of the range of available radio volumes for users with hearing loss has yet to be determined.

  18. Speech enhancement in the Karhunen-Loeve expansion domain

    CERN Document Server

    Benesty, Jacob

    2011-01-01

    This book is devoted to the study of the problem of speech enhancement whose objective is the recovery of a signal of interest (i.e., speech) from noisy observations. Typically, the recovery process is accomplished by passing the noisy observations through a linear filter (or a linear transformation). Since both the desired speech and undesired noise are filtered at the same time, the most critical issue of speech enhancement resides in how to design a proper optimal filter that can fully take advantage of the difference between the speech and noise statistics to mitigate the noise effect as m

  19. Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

    Science.gov (United States)

    Larm, Petra; Hongisto, Valtteri

    2006-02-01

    During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.

  20. Semantic and phonetic enhancements for speech-in-noise recognition by native and non-native listeners.

    Science.gov (United States)

    Bradlow, Ann R; Alexander, Jennifer A

    2007-04-01

    Previous research has shown that speech recognition differences between native and proficient non-native listeners emerge under suboptimal conditions. Current evidence has suggested that the key deficit that underlies this disproportionate effect of unfavorable listening conditions for non-native listeners is their less effective use of compensatory information at higher levels of processing to recover from information loss at the phoneme identification level. The present study investigated whether this non-native disadvantage could be overcome if enhancements at various levels of processing were presented in combination. Native and non-native listeners were presented with English sentences in which the final word varied in predictability and which were produced in either plain or clear speech. Results showed that, relative to the low-predictability-plain-speech baseline condition, non-native listener final word recognition improved only when both semantic and acoustic enhancements were available (high-predictability-clear-speech). In contrast, the native listeners benefited from each source of enhancement separately and in combination. These results suggests that native and non-native listeners apply similar strategies for speech-in-noise perception: The crucial difference is in the signal clarity required for contextual information to be effective, rather than in an inability of non-native listeners to take advantage of this contextual information per se.

  1. Robust cubature Kalman filter for GNSS/INS with missing observations and colored measurement noise.

    Science.gov (United States)

    Cui, Bingbo; Chen, Xiyuan; Tang, Xihua; Huang, Haoqian; Liu, Xiao

    2018-01-01

    In order to improve the accuracy of GNSS/INS working in GNSS-denied environment, a robust cubature Kalman filter (RCKF) is developed by considering colored measurement noise and missing observations. First, an improved cubature Kalman filter (CKF) is derived by considering colored measurement noise, where the time-differencing approach is applied to yield new observations. Then, after analyzing the disadvantages of existing methods, the measurement augment in processing colored noise is translated into processing the uncertainties of CKF, and new sigma point update framework is utilized to account for the bounded model uncertainties. By reusing the diffused sigma points and approximation residual in the prediction stage of CKF, the RCKF is developed and its error performance is analyzed theoretically. Results of numerical experiment and field test reveal that RCKF is more robust than CKF and extended Kalman filter (EKF), and compared with EKF, the heading error of land vehicle is reduced by about 72.4%. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  2. A multi-frame particle tracking algorithm robust against input noise

    International Nuclear Information System (INIS)

    Li, Dongning; Zhang, Yuanhui; Sun, Yigang; Yan, Wei

    2008-01-01

    The performance of a particle tracking algorithm which detects particle trajectories from discretely recorded particle positions could be substantially hindered by the input noise. In this paper, a particle tracking algorithm is developed which is robust against input noise. This algorithm employs the regression method instead of the extrapolation method usually employed by existing algorithms to predict future particle positions. If a trajectory cannot be linked to a particle at a frame, the algorithm can still proceed by trying to find a candidate at the next frame. The connectivity of tracked trajectories is inspected to remove the false ones. The algorithm is validated with synthetic data. The result shows that the algorithm is superior to traditional algorithms in the aspect of tracking long trajectories

  3. The Galker test of speech reception in noise; associations with background variables, middle ear status, hearing, and language in Danish preschool children.

    Science.gov (United States)

    Lauritsen, Maj-Britt Glenn; Söderström, Margareta; Kreiner, Svend; Dørup, Jens; Lous, Jørgen

    2016-01-01

    We tested "the Galker test", a speech reception in noise test developed for primary care for Danish preschool children, to explore if the children's ability to hear and understand speech was associated with gender, age, middle ear status, and the level of background noise. The Galker test is a 35-item audio-visual, computerized word discrimination test in background noise. Included were 370 normally developed children attending day care center. The children were examined with the Galker test, tympanometry, audiometry, and the Reynell test of verbal comprehension. Parents and daycare teachers completed questionnaires on the children's ability to hear and understand speech. As most of the variables were not assessed using interval scales, non-parametric statistics (Goodman-Kruskal's gamma) were used for analyzing associations with the Galker test score. For comparisons, analysis of variance (ANOVA) was used. Interrelations were adjusted for using a non-parametric graphic model. In unadjusted analyses, the Galker test was associated with gender, age group, language development (Reynell revised scale), audiometry, and tympanometry. The Galker score was also associated with the parents' and day care teachers' reports on the children's vocabulary, sentence construction, and pronunciation. Type B tympanograms were associated with a mean hearing 5-6dB below that of than type A, C1, or C2. In the graphic analysis, Galker scores were closely and significantly related to Reynell test scores (Gamma (G)=0.35), the children's age group (G=0.33), and the day care teachers' assessment of the children's vocabulary (G=0.26). The Galker test of speech reception in noise appears promising as an easy and quick tool for evaluating preschool children's understanding of spoken words in noise, and it correlated well with the day care teachers' reports and less with the parents' reports. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  4. Evaluation of Speech Recognition of Cochlear Implant Recipients Using Adaptive, Digital Remote Microphone Technology and a Speech Enhancement Sound Processing Algorithm.

    Science.gov (United States)

    Wolfe, Jace; Morais, Mila; Schafer, Erin; Agrawal, Smita; Koch, Dawn

    2015-05-01

    Cochlear implant recipients often experience difficulty with understanding speech in the presence of noise. Cochlear implant manufacturers have developed sound processing algorithms designed to improve speech recognition in noise, and research has shown these technologies to be effective. Remote microphone technology utilizing adaptive, digital wireless radio transmission has also been shown to provide significant improvement in speech recognition in noise. There are no studies examining the potential improvement in speech recognition in noise when these two technologies are used simultaneously. The goal of this study was to evaluate the potential benefits and limitations associated with the simultaneous use of a sound processing algorithm designed to improve performance in noise (Advanced Bionics ClearVoice) and a remote microphone system that incorporates adaptive, digital wireless radio transmission (Phonak Roger). A two-by-two way repeated measures design was used to examine performance differences obtained without these technologies compared to the use of each technology separately as well as the simultaneous use of both technologies. Eleven Advanced Bionics (AB) cochlear implant recipients, ages 11 to 68 yr. AzBio sentence recognition was measured in quiet and in the presence of classroom noise ranging in level from 50 to 80 dBA in 5-dB steps. Performance was evaluated in four conditions: (1) No ClearVoice and no Roger, (2) ClearVoice enabled without the use of Roger, (3) ClearVoice disabled with Roger enabled, and (4) simultaneous use of ClearVoice and Roger. Speech recognition in quiet was better than speech recognition in noise for all conditions. Use of ClearVoice and Roger each provided significant improvement in speech recognition in noise. The best performance in noise was obtained with the simultaneous use of ClearVoice and Roger. ClearVoice and Roger technology each improves speech recognition in noise, particularly when used at the same time

  5. Analysis of a simplified normalized covariance measure based on binary weighting functions for predicting the intelligibility of noise-suppressed speech.

    Science.gov (United States)

    Chen, Fei; Loizou, Philipos C

    2010-12-01

    The normalized covariance measure (NCM) has been shown previously to predict reliably the intelligibility of noise-suppressed speech containing non-linear distortions. This study analyzes a simplified NCM measure that requires only a small number of bands (not necessarily contiguous) and uses simple binary (1 or 0) weighting functions. The rationale behind the use of a small number of bands is to account for the fact that the spectral information contained in contiguous or nearby bands is correlated and redundant. The modified NCM measure was evaluated with speech intelligibility scores obtained by normal-hearing listeners in 72 noisy conditions involving noise-suppressed speech corrupted by four different types of maskers (car, babble, train, and street interferences). High correlation (r = 0.8) was obtained with the modified NCM measure even when only one band was used. Further analysis revealed a masker-specific pattern of correlations when only one band was used, and bands with low correlation signified the corresponding envelopes that have been severely distorted by the noise-suppression algorithm and/or the masker. Correlation improved to r = 0.84 when only two disjoint bands (centered at 325 and 1874 Hz) were used. Even further improvements in correlation (r = 0.85) were obtained when three or four lower-frequency (<700 Hz) bands were selected.

  6. Hearing aid processing of loud speech and noise signals: Consequences for loudness perception and listening comfort

    DEFF Research Database (Denmark)

    Schmidt, Erik

    2007-01-01

    sounds, has found that both normal-hearing and hearing-impaired listeners prefer loud sounds to be closer to the most comfortable loudness-level, than suggested by common non-linear fitting rules. During this project, two listening experiments were carried out. In the first experiment, hearing aid users......Hearing aid processing of loud speech and noise signals: Consequences for loudness perception and listening comfort. Sound processing in hearing aids is determined by the fitting rule. The fitting rule describes how the hearing aid should amplify speech and sounds in the surroundings......, such that they become audible again for the hearing impaired person. The general goal is to place all sounds within the hearing aid users’ audible range, such that speech intelligibility and listening comfort become as good as possible. Amplification strategies in hearing aids are in many cases based on empirical...

  7. Hearing speech in music

    Directory of Open Access Journals (Sweden)

    Seth-Reino Ekström

    2011-01-01

    Full Text Available The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA noise and speech spectrum-filtered noise (SPN]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA. The results showed a significant effect of piano performance speed and octave (P<.01. Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01 and SPN (P<.05. Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01, but there were smaller differences between masking conditions (P<.01. It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

  8. Laboratory evaluation of an optimised internet-based speech-in-noise test for occupational high-frequency hearing loss screening: Occupational Earcheck.

    Science.gov (United States)

    Sheikh Rashid, Marya; Leensen, Monique C J; de Laat, Jan A P M; Dreschler, Wouter A

    2017-11-01

    The "Occupational Earcheck" (OEC) is a Dutch online self-screening speech-in-noise test developed for the detection of occupational high-frequency hearing loss (HFHL). This study evaluates an optimised version of the test and determines the most appropriate masking noise. The original OEC was improved by homogenisation of the speech material, and shortening the test. A laboratory-based cross-sectional study was performed in which the optimised OEC in five alternative masking noise conditions was evaluated. The study was conducted on 18 normal-hearing (NH) adults, and 15 middle-aged listeners with HFHL. The OEC in a low-pass (LP) filtered stationary background noise (test version LP 3: with a cut-off frequency of 1.6 kHz, and a noise floor of -12 dB) was the most accurate version tested. The test showed a reasonable sensitivity (93%), and specificity (94%) and test reliability (intra-class correlation coefficient: 0.84, mean within-subject standard deviation: 1.5 dB SNR, slope of psychometric function: 13.1%/dB SNR). The improved OEC, with homogenous word material in a LP filtered noise, appears to be suitable for the discrimination between younger NH listeners and older listeners with HFHL. The appropriateness of the OEC for screening purposes in an occupational setting will be studied further.

  9. Visual context enhanced. The joint contribution of iconic gestures and visible speech to degraded speech comprehension.

    NARCIS (Netherlands)

    Drijvers, L.; Özyürek, A.

    2017-01-01

    Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech

  10. Wiener variable step size and gradient spectral variance smoothing for double-talk-robust acoustic echo cancellation and acoustic feedback cancellation

    DEFF Research Database (Denmark)

    Gil-Cacho, Jose M.; Van Waterschoot, Toon; Moonen, Marc

    2014-01-01

    Double-talk (DT)-robust acoustic echo cancellation (AEC) and acoustic feedback cancellation (AFC) are needed in speech communication systems, e.g., in hands-free communication systems and hearing aids. In this paper, we derive a practical and computationally efficient algorithm based...... model and in colored non-stationary noise....

  11. Transform Domain Robust Variable Step Size Griffiths' Adaptive Algorithm for Noise Cancellation in ECG

    Science.gov (United States)

    Hegde, Veena; Deekshit, Ravishankar; Satyanarayana, P. S.

    2011-12-01

    The electrocardiogram (ECG) is widely used for diagnosis of heart diseases. Good quality of ECG is utilized by physicians for interpretation and identification of physiological and pathological phenomena. However, in real situations, ECG recordings are often corrupted by artifacts or noise. Noise severely limits the utility of the recorded ECG and thus needs to be removed, for better clinical evaluation. In the present paper a new noise cancellation technique is proposed for removal of random noise like muscle artifact from ECG signal. A transform domain robust variable step size Griffiths' LMS algorithm (TVGLMS) is proposed for noise cancellation. For the TVGLMS, the robust variable step size has been achieved by using the Griffiths' gradient which uses cross-correlation between the desired signal contaminated with observation or random noise and the input. The algorithm is discrete cosine transform (DCT) based and uses symmetric property of the signal to represent the signal in frequency domain with lesser number of frequency coefficients when compared to that of discrete Fourier transform (DFT). The algorithm is implemented for adaptive line enhancer (ALE) filter which extracts the ECG signal in a noisy environment using LMS filter adaptation. The proposed algorithm is found to have better convergence error/misadjustment when compared to that of ordinary transform domain LMS (TLMS) algorithm, both in the presence of white/colored observation noise. The reduction in convergence error achieved by the new algorithm with desired signal decomposition is found to be lower than that obtained without decomposition. The experimental results indicate that the proposed method is better than traditional adaptive filter using LMS algorithm in the aspects of retaining geometrical characteristics of ECG signal.

  12. Evidence of "hidden hearing loss" following noise exposures that produce robust TTS and ABR wave-I amplitude reductions.

    Science.gov (United States)

    Lobarinas, Edward; Spankovich, Christopher; Le Prell, Colleen G

    2017-06-01

    In animals, noise exposures that produce robust temporary threshold shifts (TTS) can produce immediate damage to afferent synapses and long-term degeneration of low spontaneous rate auditory nerve fibers. This synaptopathic damage has been shown to correlate with reduced auditory brainstem response (ABR) wave-I amplitudes at suprathreshold levels. The perceptual consequences of this "synaptopathy" remain unknown but have been suggested to include compromised hearing performance in competing background noise. Here, we used a modified startle inhibition paradigm to evaluate whether noise exposures that produce robust TTS and ABR wave-I reduction but not permanent threshold shift (PTS) reduced hearing-in-noise performance. Animals exposed to 109 dB SPL octave band noise showed TTS >30 dB 24-h post noise and modest but persistent ABR wave-I reduction 2 weeks post noise despite full recovery of ABR thresholds. Hearing-in-noise performance was negatively affected by the noise exposure. However, the effect was observed only at the poorest signal to noise ratio and was frequency specific. Although TTS >30 dB 24-h post noise was a predictor of functional deficits, there was no relationship between the degree of ABR wave-I reduction and degree of functional impairment. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Simulation for noise cancellation using LMS adaptive filter

    Science.gov (United States)

    Lee, Jia-Haw; Ooi, Lu-Ean; Ko, Ying-Hao; Teoh, Choe-Yung

    2017-06-01

    In this paper, the fundamental algorithm of noise cancellation, Least Mean Square (LMS) algorithm is studied and enhanced with adaptive filter. The simulation of the noise cancellation using LMS adaptive filter algorithm is developed. The noise corrupted speech signal and the engine noise signal are used as inputs for LMS adaptive filter algorithm. The filtered signal is compared to the original noise-free speech signal in order to highlight the level of attenuation of the noise signal. The result shows that the noise signal is successfully canceled by the developed adaptive filter. The difference of the noise-free speech signal and filtered signal are calculated and the outcome implies that the filtered signal is approaching the noise-free speech signal upon the adaptive filtering. The frequency range of the successfully canceled noise by the LMS adaptive filter algorithm is determined by performing Fast Fourier Transform (FFT) on the signals. The LMS adaptive filter algorithm shows significant noise cancellation at lower frequency range.

  14. Communication system with adaptive noise suppression

    Science.gov (United States)

    Kozel, David (Inventor); Devault, James A. (Inventor); Birr, Richard B. (Inventor)

    2007-01-01

    A signal-to-noise ratio dependent adaptive spectral subtraction process eliminates noise from noise-corrupted speech signals. The process first pre-emphasizes the frequency components of the input sound signal which contain the consonant information in human speech. Next, a signal-to-noise ratio is determined and a spectral subtraction proportion adjusted appropriately. After spectral subtraction, low amplitude signals can be squelched. A single microphone is used to obtain both the noise-corrupted speech and the average noise estimate. This is done by determining if the frame of data being sampled is a voiced or unvoiced frame. During unvoiced frames an estimate of the noise is obtained. A running average of the noise is used to approximate the expected value of the noise. Spectral subtraction may be performed on a composite noise-corrupted signal, or upon individual sub-bands of the noise-corrupted signal. Pre-averaging of the input signal's magnitude spectrum over multiple time frames may be performed to reduce musical noise.

  15. Speech Intelligibility Advantages using an Acoustic Beamformer Display

    Science.gov (United States)

    Begault, Durand R.; Sunder, Kaushik; Godfroy, Martine; Otto, Peter

    2015-01-01

    A speech intelligibility test conforming to the Modified Rhyme Test of ANSI S3.2 "Method for Measuring the Intelligibility of Speech Over Communication Systems" was conducted using a prototype 12-channel acoustic beamformer system. The target speech material (signal) was identified against speech babble (noise), with calculated signal-noise ratios of 0, 5 and 10 dB. The signal was delivered at a fixed beam orientation of 135 deg (re 90 deg as the frontal direction of the array) and the noise at 135 deg (co-located) and 0 deg (separated). A significant improvement in intelligibility from 57% to 73% was found for spatial separation for the same signal-noise ratio (0 dB). Significant effects for improved intelligibility due to spatial separation were also found for higher signal-noise ratios (5 and 10 dB).

  16. A comparison between the first-fit settings of two multichannel digital signal-processing strategies: music quality ratings and speech-in-noise scores.

    Science.gov (United States)

    Higgins, Paul; Searchfield, Grant; Coad, Gavin

    2012-06-01

    The aim of this study was to determine which level-dependent hearing aid digital signal-processing strategy (DSP) participants preferred when listening to music and/or performing a speech-in-noise task. Two receiver-in-the-ear hearing aids were compared: one using 32-channel adaptive dynamic range optimization (ADRO) and the other wide dynamic range compression (WDRC) incorporating dual fast (4 channel) and slow (15 channel) processing. The manufacturers' first-fit settings based on participants' audiograms were used in both cases. Results were obtained from 18 participants on a quick speech-in-noise (QuickSIN; Killion, Niquette, Gudmundsen, Revit, & Banerjee, 2004) task and for 3 music listening conditions (classical, jazz, and rock). Participants preferred the quality of music and performed better at the QuickSIN task using the hearing aids with ADRO processing. A potential reason for the better performance of the ADRO hearing aids was less fluctuation in output with change in sound dynamics. ADRO processing has advantages for both music quality and speech recognition in noise over the multichannel WDRC processing that was used in the study. Further evaluations of which DSP aspects contribute to listener preference are required.

  17. Robust synchronization analysis in nonlinear stochastic cellular networks with time-varying delays, intracellular perturbations and intercellular noise.

    Science.gov (United States)

    Chen, Po-Wei; Chen, Bor-Sen

    2011-08-01

    Naturally, a cellular network consisted of a large amount of interacting cells is complex. These cells have to be synchronized in order to emerge their phenomena for some biological purposes. However, the inherently stochastic intra and intercellular interactions are noisy and delayed from biochemical processes. In this study, a robust synchronization scheme is proposed for a nonlinear stochastic time-delay coupled cellular network (TdCCN) in spite of the time-varying process delay and intracellular parameter perturbations. Furthermore, a nonlinear stochastic noise filtering ability is also investigated for this synchronized TdCCN against stochastic intercellular and environmental disturbances. Since it is very difficult to solve a robust synchronization problem with the Hamilton-Jacobi inequality (HJI) matrix, a linear matrix inequality (LMI) is employed to solve this problem via the help of a global linearization method. Through this robust synchronization analysis, we can gain a more systemic insight into not only the robust synchronizability but also the noise filtering ability of TdCCN under time-varying process delays, intracellular perturbations and intercellular disturbances. The measures of robustness and noise filtering ability of a synchronized TdCCN have potential application to the designs of neuron transmitters, on-time mass production of biochemical molecules, and synthetic biology. Finally, a benchmark of robust synchronization design in Escherichia coli repressilators is given to confirm the effectiveness of the proposed methods. Copyright © 2011 Elsevier Inc. All rights reserved.

  18. Enhancement of speech signals - with a focus on voiced speech models

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie

    This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based...... on this model. The basic model used in this thesis is the harmonic model which is a commonly used model for describing the voiced part of the speech signal. We show that it can be beneficial to extend the model to take inharmonicities or the non-stationarity of speech into account. Extending the model...

  19. The effects of noise reduction technologies on the acceptance of background noise.

    Science.gov (United States)

    Lowery, Kristy Jones; Plyler, Patrick N

    2013-09-01

    Directional microphones (D-Mics) and digital noise reduction (DNR) algorithms are used in hearing aids to reduce the negative effects of background noise on performance. Directional microphones attenuate sounds arriving from anywhere other than the front of the listener while DNR attenuates sounds with physical characteristics of noise. Although both noise reduction technologies are currently available in hearing aids, it is unclear if the use of these technologies in isolation or together affects acceptance of noise and/or preference for the end user when used in various types of background noise. The purpose of the research was to determine the effects of D-Mic, DNR, or the combination of D-Mic and DNR on acceptance of noise and preference when listening in various types of background noise. An experimental study in which subjects were exposed to a repeated measures design was utilized. Thirty adult listeners with mild sloping to moderately severe sensorineural hearing loss participated (mean age 67 yr). Acceptable noise levels (ANLs) were obtained using no noise reduction technologies, D-Mic only, DNR only, and the combination of the two technologies (Combo) for three different background noises (single-talker speech, speech-shaped noise, and multitalker babble) for each listener. In addition, preference rankings of the noise reduction technologies were obtained within each background noise (1 = best, 3 = worst). ANL values were significantly better for each noise reduction technology than baseline; and benefit increased significantly from DNR to D-Mic to Combo. Listeners with higher (worse) baseline ANLs received more benefit from noise reduction technologies than listeners with lower (better) baseline ANLs. Neither ANL values nor ANL benefit values were significantly affected by background noise type; however, ANL benefit with D-Mic and Combo was similar when speech-like noise was present while ANL benefit was greatest for Combo when speech spectrum noise was

  20. The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility.

    Science.gov (United States)

    Bentsen, Thomas; May, Tobias; Kressner, Abigail A; Dau, Torsten

    2018-01-01

    Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements. A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech intelligibility in normal-hearing listeners. A substantial improvement of 25.4 percentage points in speech intelligibility scores was found going from a subband-based architecture, in which a Gaussian Mixture Model-based classifier predicts the distributions of speech and noise for each frequency channel, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where the units are assigned a continuous value between zero and one. Therefore, both components play significant roles and by combining them, speech intelligibility improvements were obtained in a six-talker condition at a low signal-to-noise ratio.

  1. A Noise Reduction Preprocessor for Mobile Voice Communication

    Directory of Open Access Journals (Sweden)

    Rainer Martin

    2004-07-01

    Full Text Available We describe a speech enhancement algorithm which leads to significant quality and intelligibility improvements when used as a preprocessor to a low bit rate speech coder. This algorithm was developed in conjunction with the mixed excitation linear prediction (MELP coder which, by itself, is highly susceptible to environmental noise. The paper presents novel as well as known speech and noise estimation techniques and combines them into a highly effective speech enhancement system. The algorithm is based on short-time spectral amplitude estimation, soft-decision gain modification, tracking of the a priori probability of speech absence, and minimum statistics noise power estimation. Special emphasis is placed on enhancing the performance of the preprocessor in nonstationary noise environments.

  2. Concurrent Codes: A Holographic-Type Encoding Robust against Noise and Loss.

    Directory of Open Access Journals (Sweden)

    David M Benton

    Full Text Available Concurrent coding is an encoding scheme with 'holographic' type properties that are shown here to be robust against a significant amount of noise and signal loss. This single encoding scheme is able to correct for random errors and burst errors simultaneously, but does not rely on cyclic codes. A simple and practical scheme has been tested that displays perfect decoding when the signal to noise ratio is of order -18dB. The same scheme also displays perfect reconstruction when a contiguous block of 40% of the transmission is missing. In addition this scheme is 50% more efficient in terms of transmitted power requirements than equivalent cyclic codes. A simple model is presented that describes the process of decoding and can determine the computational load that would be expected, as well as describing the critical levels of noise and missing data at which false messages begin to be generated.

  3. Individual differneces in degraded speech perception

    Science.gov (United States)

    Carbonell, Kathy M.

    One of the lasting concerns in audiology is the unexplained individual differences in speech perception performance even for individuals with similar audiograms. One proposal is that there are cognitive/perceptual individual differences underlying this vulnerability and that these differences are present in normal hearing (NH) individuals but do not reveal themselves in studies that use clear speech produced in quiet (because of a ceiling effect). However, previous studies have failed to uncover cognitive/perceptual variables that explain much of the variance in NH performance on more challenging degraded speech tasks. This lack of strong correlations may be due to either examining the wrong measures (e.g., working memory capacity) or to there being no reliable differences in degraded speech performance in NH listeners (i.e., variability in performance is due to measurement noise). The proposed project has 3 aims; the first, is to establish whether there are reliable individual differences in degraded speech performance for NH listeners that are sustained both across degradation types (speech in noise, compressed speech, noise-vocoded speech) and across multiple testing sessions. The second aim is to establish whether there are reliable differences in NH listeners' ability to adapt their phonetic categories based on short-term statistics both across tasks and across sessions; and finally, to determine whether performance on degraded speech perception tasks are correlated with performance on phonetic adaptability tasks, thus establishing a possible explanatory variable for individual differences in speech perception for NH and hearing impaired listeners.

  4. Long Term Memory for Noise: Evidence of Robust Encoding of Very Short Temporal Acoustic Patterns.

    Science.gov (United States)

    Viswanathan, Jayalakshmi; Rémy, Florence; Bacon-Macé, Nadège; Thorpe, Simon J

    2016-01-01

    Recent research has demonstrated that humans are able to implicitly encode and retain repeating patterns in meaningless auditory noise. Our study aimed at testing the robustness of long-term implicit recognition memory for these learned patterns. Participants performed a cyclic/non-cyclic discrimination task, during which they were presented with either 1-s cyclic noises (CNs) (the two halves of the noise were identical) or 1-s plain random noises (Ns). Among CNs and Ns presented once, target CNs were implicitly presented multiple times within a block, and implicit recognition of these target CNs was tested 4 weeks later using a similar cyclic/non-cyclic discrimination task. Furthermore, robustness of implicit recognition memory was tested by presenting participants with looped (shifting the origin) and scrambled (chopping sounds into 10- and 20-ms bits before shuffling) versions of the target CNs. We found that participants had robust implicit recognition memory for learned noise patterns after 4 weeks, right from the first presentation. Additionally, this memory was remarkably resistant to acoustic transformations, such as looping and scrambling of the sounds. Finally, implicit recognition of sounds was dependent on participant's discrimination performance during learning. Our findings suggest that meaningless temporal features as short as 10 ms can be implicitly stored in long-term auditory memory. Moreover, successful encoding and storage of such fine features may vary between participants, possibly depending on individual attention and auditory discrimination abilities. Significance Statement Meaningless auditory patterns could be implicitly encoded and stored in long-term memory.Acoustic transformations of learned meaningless patterns could be implicitly recognized after 4 weeks.Implicit long-term memories can be formed for meaningless auditory features as short as 10 ms.Successful encoding and long-term implicit recognition of meaningless patterns may

  5. Acoustical conditions for speech communication in active elementary school classrooms

    Science.gov (United States)

    Sato, Hiroshi; Bradley, John

    2005-04-01

    Detailed acoustical measurements were made in 34 active elementary school classrooms with typical rectangular room shape in schools near Ottawa, Canada. There was an average of 21 students in classrooms. The measurements were made to obtain accurate indications of the acoustical quality of conditions for speech communication during actual teaching activities. Mean speech and noise levels were determined from the distribution of recorded sound levels and the average speech-to-noise ratio was 11 dBA. Measured mid-frequency reverberation times (RT) during the same occupied conditions varied from 0.3 to 0.6 s, and were a little less than for the unoccupied rooms. RT values were not related to noise levels. Octave band speech and noise levels, useful-to-detrimental ratios, and Speech Transmission Index values were also determined. Key results included: (1) The average vocal effort of teachers corresponded to louder than Pearsons Raised voice level; (2) teachers increase their voice level to overcome ambient noise; (3) effective speech levels can be enhanced by up to 5 dB by early reflection energy; and (4) student activity is seen to be the dominant noise source, increasing average noise levels by up to 10 dBA during teaching activities. [Work supported by CLLRnet.

  6. Visual Context Enhanced: The Joint Contribution of Iconic Gestures and Visible Speech to Degraded Speech Comprehension

    Science.gov (United States)

    Drijvers, Linda; Ozyurek, Asli

    2017-01-01

    Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. Method:…

  7. A homology sound-based algorithm for speech signal interference

    Science.gov (United States)

    Jiang, Yi-jiao; Chen, Hou-jin; Li, Ju-peng; Zhang, Zhan-song

    2015-12-01

    Aiming at secure analog speech communication, a homology sound-based algorithm for speech signal interference is proposed in this paper. We first split speech signal into phonetic fragments by a short-term energy method and establish an interference noise cache library with the phonetic fragments. Then we implement the homology sound interference by mixing the randomly selected interferential fragments and the original speech in real time. The computer simulation results indicated that the interference produced by this algorithm has advantages of real time, randomness, and high correlation with the original signal, comparing with the traditional noise interference methods such as white noise interference. After further studies, the proposed algorithm may be readily used in secure speech communication.

  8. Robust Transmission of Speech LSFs Using Hidden Markov Model-Based Multiple Description Index Assignments

    Directory of Open Access Journals (Sweden)

    Rondeau Paul

    2008-01-01

    Full Text Available Speech coding techniques capable of generating encoded representations which are robust against channel losses play an important role in enabling reliable voice communication over packet networks and mobile wireless systems. In this paper, we investigate the use of multiple description index assignments (MDIAs for loss-tolerant transmission of line spectral frequency (LSF coefficients, typically generated by state-of-the-art speech coders. We propose a simulated annealing-based approach for optimizing MDIAs for Markov-model-based decoders which exploit inter- and intraframe correlations in LSF coefficients to reconstruct the quantized LSFs from coded bit streams corrupted by channel losses. Experimental results are presented which compare the performance of a number of novel LSF transmission schemes. These results clearly demonstrate that Markov-model-based decoders, when used in conjunction with optimized MDIA, can yield average spectral distortion much lower than that produced by methods such as interleaving/interpolation, commonly used to combat the packet losses.

  9. Robust Transmission of Speech LSFs Using Hidden Markov Model-Based Multiple Description Index Assignments

    Directory of Open Access Journals (Sweden)

    Pradeepa Yahampath

    2008-03-01

    Full Text Available Speech coding techniques capable of generating encoded representations which are robust against channel losses play an important role in enabling reliable voice communication over packet networks and mobile wireless systems. In this paper, we investigate the use of multiple description index assignments (MDIAs for loss-tolerant transmission of line spectral frequency (LSF coefficients, typically generated by state-of-the-art speech coders. We propose a simulated annealing-based approach for optimizing MDIAs for Markov-model-based decoders which exploit inter- and intraframe correlations in LSF coefficients to reconstruct the quantized LSFs from coded bit streams corrupted by channel losses. Experimental results are presented which compare the performance of a number of novel LSF transmission schemes. These results clearly demonstrate that Markov-model-based decoders, when used in conjunction with optimized MDIA, can yield average spectral distortion much lower than that produced by methods such as interleaving/interpolation, commonly used to combat the packet losses.

  10. Assessment of the Speech Intelligibility Performance of Post Lingual Cochlear Implant Users at Different Signal-to-Noise Ratios Using the Turkish Matrix Test

    Directory of Open Access Journals (Sweden)

    Zahra Polat

    2016-10-01

    Full Text Available Background: Spoken word recognition and speech perception tests in quiet are being used as a routine in assessment of the benefit which children and adult cochlear implant users receive from their devices. Cochlear implant users generally demonstrate high level performances in these test materials as they are able to achieve high level speech perception ability in quiet situations. Although these test materials provide valuable information regarding Cochlear Implant (CI users’ performances in optimal listening conditions, they do not give realistic information regarding performances in adverse listening conditions, which is the case in the everyday environment. Aims: The aim of this study was to assess the speech intelligibility performance of post lingual CI users in the presence of noise at different signal-to-noise ratio with the Matrix Test developed for Turkish language. Study Design: Cross-sectional study. Methods: The thirty post lingual implant user adult subjects, who had been using implants for a minimum of one year, were evaluated with Turkish Matrix test. Subjects’ speech intelligibility was measured using the adaptive and non-adaptive Matrix Test in quiet and noisy environments. Results: The results of the study show a correlation between Pure Tone Average (PTA values of the subjects and Matrix test Speech Reception Threshold (SRT values in the quiet. Hence, it is possible to asses PTA values of CI users using the Matrix Test also. However, no correlations were found between Matrix SRT values in the quiet and Matrix SRT values in noise. Similarly, the correlation between PTA values and intelligibility scores in noise was also not significant. Therefore, it may not be possible to assess the intelligibility performance of CI users using test batteries performed in quiet conditions. Conclusion: The Matrix Test can be used to assess the benefit of CI users from their systems in everyday life, since it is possible to perform

  11. A robust and coherent network statistic for detecting gravitational waves from inspiralling compact binaries in non-Gaussian noise

    CERN Document Server

    Bose, S

    2002-01-01

    The robust statistic proposed by Creighton (Creighton J D E 1999 Phys. Rev. D 60 021101) and Allen et al (Allen et al 2001 Preprint gr-gc/010500) for the detection of stationary non-Gaussian noise is briefly reviewed. We compute the robust statistic for generic weak gravitational-wave signals in the mixture-Gaussian noise model to an accuracy higher than in those analyses, and reinterpret its role. Specifically, we obtain the coherent statistic for detecting gravitational-wave signals from inspiralling compact binaries with an arbitrary network of earth-based interferometers. Finally, we show that excess computational costs incurred owing to non-Gaussianity is negligible compared to the cost of detection in Gaussian noise.

  12. Robust Sequential Covariance Intersection Fusion Kalman Filtering over Multi-agent Sensor Networks with Measurement Delays and Uncertain Noise Variances

    Institute of Scientific and Technical Information of China (English)

    QI Wen-Juan; ZHANG Peng; DENG Zi-Li

    2014-01-01

    This paper deals with the problem of designing robust sequential covariance intersection (SCI) fusion Kalman filter for the clustering multi-agent sensor network system with measurement delays and uncertain noise variances. The sensor network is partitioned into clusters by the nearest neighbor rule. Using the minimax robust estimation principle, based on the worst-case conservative sensor network system with conservative upper bounds of noise variances, and applying the unbiased linear minimum variance (ULMV) optimal estimation rule, we present the two-layer SCI fusion robust steady-state Kalman filter which can reduce communication and computation burdens and save energy sources, and guarantee that the actual filtering error variances have a less-conservative upper-bound. A Lyapunov equation method for robustness analysis is proposed, by which the robustness of the local and fused Kalman filters is proved. The concept of the robust accuracy is presented and the robust accuracy relations of the local and fused robust Kalman filters are proved. It is proved that the robust accuracy of the global SCI fuser is higher than those of the local SCI fusers and the robust accuracies of all SCI fusers are higher than that of each local robust Kalman filter. A simulation example for a tracking system verifies the robustness and robust accuracy relations.

  13. Dialogue enabling speech-to-text user assistive agent system for hearing-impaired person.

    Science.gov (United States)

    Lee, Seongjae; Kang, Sunmee; Han, David K; Ko, Hanseok

    2016-06-01

    A novel approach for assisting bidirectional communication between people of normal hearing and hearing-impaired is presented. While the existing hearing-impaired assistive devices such as hearing aids and cochlear implants are vulnerable in extreme noise conditions or post-surgery side effects, the proposed concept is an alternative approach wherein spoken dialogue is achieved by means of employing a robust speech recognition technique which takes into consideration of noisy environmental factors without any attachment into human body. The proposed system is a portable device with an acoustic beamformer for directional noise reduction and capable of performing speech-to-text transcription function, which adopts a keyword spotting method. It is also equipped with an optimized user interface for hearing-impaired people, rendering intuitive and natural device usage with diverse domain contexts. The relevant experimental results confirm that the proposed interface design is feasible for realizing an effective and efficient intelligent agent for hearing-impaired.

  14. Speech privacy and annoyance considerations in the acoustic environment of passenger cars of high-speed trains.

    Science.gov (United States)

    Jeon, Jin Yong; Hong, Joo Young; Jang, Hyung Suk; Kim, Jae Hyeon

    2015-12-01

    It is necessary to consider not only annoyance of interior noises but also speech privacy to achieve acoustic comfort in a passenger car of a high-speed train because speech from other passengers can be annoying. This study aimed to explore an optimal acoustic environment to satisfy speech privacy and reduce annoyance in a passenger car. Two experiments were conducted using speech sources and compartment noise of a high speed train with varying speech-to-noise ratios (SNRA) and background noise levels (BNL). Speech intelligibility was tested in experiment I, and in experiment II, perceived speech privacy, annoyance, and acoustic comfort of combined sounds with speech and background noise were assessed. The results show that speech privacy and annoyance were significantly influenced by the SNRA. In particular, the acoustic comfort was evaluated as acceptable when the SNRA was less than -6 dB for both speech privacy and noise annoyance. In addition, annoyance increased significantly as the BNL exceeded 63 dBA, whereas the effect of the background-noise level on the speech privacy was not significant. These findings suggest that an optimal level of interior noise in a passenger car might exist between 59 and 63 dBA, taking normal speech levels into account.

  15. Musicians do not benefit from differences in fundamental frequency when listening to speech in competing speech backgrounds

    DEFF Research Database (Denmark)

    Madsen, Sara Miay Kim; Whiteford, Kelly L.; Oxenham, Andrew J.

    2017-01-01

    Recent studies disagree on whether musicians have an advantage over non-musicians in understanding speech in noise. However, it has been suggested that musicians may be able to use diferences in fundamental frequency (F0) to better understand target speech in the presence of interfering talkers....... Here we studied a relatively large (N=60) cohort of young adults, equally divided between nonmusicians and highly trained musicians, to test whether the musicians were better able to understand speech either in noise or in a two-talker competing speech masker. The target speech and competing speech...... were presented with either their natural F0 contours or on a monotone F0, and the F0 diference between the target and masker was systematically varied. As expected, speech intelligibility improved with increasing F0 diference between the target and the two-talker masker for both natural and monotone...

  16. Noise-invariant Neurons in the Avian Auditory Cortex: Hearing the Song in Noise

    Science.gov (United States)

    Moore, R. Channing; Lee, Tyler; Theunissen, Frédéric E.

    2013-01-01

    Given the extraordinary ability of humans and animals to recognize communication signals over a background of noise, describing noise invariant neural responses is critical not only to pinpoint the brain regions that are mediating our robust perceptions but also to understand the neural computations that are performing these tasks and the underlying circuitry. Although invariant neural responses, such as rotation-invariant face cells, are well described in the visual system, high-level auditory neurons that can represent the same behaviorally relevant signal in a range of listening conditions have yet to be discovered. Here we found neurons in a secondary area of the avian auditory cortex that exhibit noise-invariant responses in the sense that they responded with similar spike patterns to song stimuli presented in silence and over a background of naturalistic noise. By characterizing the neurons' tuning in terms of their responses to modulations in the temporal and spectral envelope of the sound, we then show that noise invariance is partly achieved by selectively responding to long sounds with sharp spectral structure. Finally, to demonstrate that such computations could explain noise invariance, we designed a biologically inspired noise-filtering algorithm that can be used to separate song or speech from noise. This novel noise-filtering method performs as well as other state-of-the-art de-noising algorithms and could be used in clinical or consumer oriented applications. Our biologically inspired model also shows how high-level noise-invariant responses could be created from neural responses typically found in primary auditory cortex. PMID:23505354

  17. Noise-invariant neurons in the avian auditory cortex: hearing the song in noise.

    Science.gov (United States)

    Moore, R Channing; Lee, Tyler; Theunissen, Frédéric E

    2013-01-01

    Given the extraordinary ability of humans and animals to recognize communication signals over a background of noise, describing noise invariant neural responses is critical not only to pinpoint the brain regions that are mediating our robust perceptions but also to understand the neural computations that are performing these tasks and the underlying circuitry. Although invariant neural responses, such as rotation-invariant face cells, are well described in the visual system, high-level auditory neurons that can represent the same behaviorally relevant signal in a range of listening conditions have yet to be discovered. Here we found neurons in a secondary area of the avian auditory cortex that exhibit noise-invariant responses in the sense that they responded with similar spike patterns to song stimuli presented in silence and over a background of naturalistic noise. By characterizing the neurons' tuning in terms of their responses to modulations in the temporal and spectral envelope of the sound, we then show that noise invariance is partly achieved by selectively responding to long sounds with sharp spectral structure. Finally, to demonstrate that such computations could explain noise invariance, we designed a biologically inspired noise-filtering algorithm that can be used to separate song or speech from noise. This novel noise-filtering method performs as well as other state-of-the-art de-noising algorithms and could be used in clinical or consumer oriented applications. Our biologically inspired model also shows how high-level noise-invariant responses could be created from neural responses typically found in primary auditory cortex.

  18. Noise-invariant neurons in the avian auditory cortex: hearing the song in noise.

    Directory of Open Access Journals (Sweden)

    R Channing Moore

    Full Text Available Given the extraordinary ability of humans and animals to recognize communication signals over a background of noise, describing noise invariant neural responses is critical not only to pinpoint the brain regions that are mediating our robust perceptions but also to understand the neural computations that are performing these tasks and the underlying circuitry. Although invariant neural responses, such as rotation-invariant face cells, are well described in the visual system, high-level auditory neurons that can represent the same behaviorally relevant signal in a range of listening conditions have yet to be discovered. Here we found neurons in a secondary area of the avian auditory cortex that exhibit noise-invariant responses in the sense that they responded with similar spike patterns to song stimuli presented in silence and over a background of naturalistic noise. By characterizing the neurons' tuning in terms of their responses to modulations in the temporal and spectral envelope of the sound, we then show that noise invariance is partly achieved by selectively responding to long sounds with sharp spectral structure. Finally, to demonstrate that such computations could explain noise invariance, we designed a biologically inspired noise-filtering algorithm that can be used to separate song or speech from noise. This novel noise-filtering method performs as well as other state-of-the-art de-noising algorithms and could be used in clinical or consumer oriented applications. Our biologically inspired model also shows how high-level noise-invariant responses could be created from neural responses typically found in primary auditory cortex.

  19. A Danish open-set speech corpus for competing-speech studies

    DEFF Research Database (Denmark)

    Nielsen, Jens Bo; Dau, Torsten; Neher, Tobias

    2014-01-01

    Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs......) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded...... with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed...

  20. Noise and communication: A three-year update

    Directory of Open Access Journals (Sweden)

    Anthony J Brammer

    2012-01-01

    Full Text Available Noise is omnipresent and impacts us all in many aspects of daily living. Noise can interfere with communication not only in industrial workplaces, but also in other work settings (e.g. open-plan offices, construction, and mining and within buildings (e.g. residences, arenas, and schools. The interference of noise with communication can have significant social consequences, especially for persons with hearing loss, and may compromise safety (e.g. failure to perceive auditory warning signals, influence worker productivity and learning in children, affect health (e.g. vocal pathology, noise-induced hearing loss, compromise speech privacy, and impact social participation by the elderly. For workers, attempts have been made to: 1 Better define the auditory performance needed to function effectively and to directly measure these abilities when assessing Auditory Fitness for Duty, 2 design hearing protection devices that can improve speech understanding while offering adequate protection against loud noises, and 3 improve speech privacy in open-plan offices. As the elderly are particularly vulnerable to the effects of noise, an understanding of the interplay between auditory, cognitive, and social factors and its effect on speech communication and social participation is also critical. Classroom acoustics and speech intelligibility in children have also gained renewed interest because of the importance of effective speech comprehension in noise on learning. Finally, substantial work has been made in developing models aimed at better predicting speech intelligibility. Despite progress in various fields, the design of alarm signals continues to lag behind advancements in knowledge. This summary of the last three years′ research highlights some of the most recent issues for the workplace, for older adults, and for children, as well as the effectiveness of warning sounds and models for predicting speech intelligibility. Suggestions for future work are

  1. Noise and communication: a three-year update.

    Science.gov (United States)

    Brammer, Anthony J; Laroche, Chantal

    2012-01-01

    Noise is omnipresent and impacts us all in many aspects of daily living. Noise can interfere with communication not only in industrial workplaces, but also in other work settings (e.g. open-plan offices, construction, and mining) and within buildings (e.g. residences, arenas, and schools). The interference of noise with communication can have significant social consequences, especially for persons with hearing loss, and may compromise safety (e.g. failure to perceive auditory warning signals), influence worker productivity and learning in children, affect health (e.g. vocal pathology, noise-induced hearing loss), compromise speech privacy, and impact social participation by the elderly. For workers, attempts have been made to: 1) Better define the auditory performance needed to function effectively and to directly measure these abilities when assessing Auditory Fitness for Duty, 2) design hearing protection devices that can improve speech understanding while offering adequate protection against loud noises, and 3) improve speech privacy in open-plan offices. As the elderly are particularly vulnerable to the effects of noise, an understanding of the interplay between auditory, cognitive, and social factors and its effect on speech communication and social participation is also critical. Classroom acoustics and speech intelligibility in children have also gained renewed interest because of the importance of effective speech comprehension in noise on learning. Finally, substantial work has been made in developing models aimed at better predicting speech intelligibility. Despite progress in various fields, the design of alarm signals continues to lag behind advancements in knowledge. This summary of the last three years' research highlights some of the most recent issues for the workplace, for older adults, and for children, as well as the effectiveness of warning sounds and models for predicting speech intelligibility. Suggestions for future work are also discussed.

  2. Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions

    Directory of Open Access Journals (Sweden)

    Reinhold Orglmeister

    2010-01-01

    Full Text Available When a number of speakers are simultaneously active, for example in meetings or noisy public places, the sources of interest need to be separated from interfering speakers and from each other in order to be robustly recognized. Independent component analysis (ICA has proven a valuable tool for this purpose. However, ICA outputs can still contain strong residual components of the interfering speakers whenever noise or reverberation is high. In such cases, nonlinear postprocessing can be applied to the ICA outputs, for the purpose of reducing remaining interferences. In order to improve robustness to the artefacts and loss of information caused by this process, recognition can be greatly enhanced by considering the processed speech feature vector as a random variable with time-varying uncertainty, rather than as deterministic. The aim of this paper is to show the potential to improve recognition of multiple overlapping speech signals through nonlinear postprocessing together with uncertainty-based decoding techniques.

  3. The use of cochlear's SCAN and wireless microphones to improve speech understanding in noise with the Nucleus6® CP900 processor.

    Science.gov (United States)

    De Ceulaer, Geert; Pascoal, David; Vanpoucke, Filiep; Govaerts, Paul J

    2017-11-01

    The newest Nucleus CI processor, the CP900, has two new options to improve speech-in-noise perception: (1) use of an adaptive directional microphone (SCAN mode) and (2) wireless connection to MiniMic1 and MiniMic2 wireless remote microphones. An analysis was made of the absolute and relative benefits of these technologies in a real-world mimicking test situation. Speech perception was tested using an adaptive speech-in-noise test (sentences-in-babble noise). In session A, SRTs were measured in three conditions: (1) Clinical Map, (2) SCAN and (3) MiniMic1. Each was assessed for three distances between speakers and CI recipient: 1 m, 2 m and 3 m. In session B, the benefit of the use of MiniMic2 was compared to benefit of MiniMic1 at 3 m. A group of 13 adult CP900 recipients participated. SCAN and MiniMic1 improved performance compared to the standard microphone with a median improvement in SRT of 2.7-3.9 dB for SCAN at 1 m and 3 m, respectively, and 4.7-10.9 dB for the MiniMic1. MiniMic1 improvements were significant. MiniMic2 showed an improvement in SRT of 22.2 dB compared to 10.0 dB for MiniMic1 (3 m). Digital wireless transmission systems (i.e. MiniMic) offer a statistically and clinically significant improvement in speech perception in challenging, realistic listening conditions.

  4. Development of equally intelligible Telugu sentence-lists to test speech recognition in noise.

    Science.gov (United States)

    Tanniru, Kishore; Narne, Vijaya Kumar; Jain, Chandni; Konadath, Sreeraj; Singh, Niraj Kumar; Sreenivas, K J Ramadevi; K, Anusha

    2017-09-01

    To develop sentence lists in the Telugu language for the assessment of speech recognition threshold (SRT) in the presence of background noise through identification of the mean signal-to-noise ratio required to attain a 50% sentence recognition score (SRTn). This study was conducted in three phases. The first phase involved the selection and recording of Telugu sentences. In the second phase, 20 lists, each consisting of 10 sentences with equal intelligibility, were formulated using a numerical optimisation procedure. In the third phase, the SRTn of the developed lists was estimated using adaptive procedures on individuals with normal hearing. A total of 68 native Telugu speakers with normal hearing participated in the study. Of these, 18 (including the speakers) performed on various subjective measures in first phase, 20 performed on sentence/word recognition in noise for second phase and 30 participated in the list equivalency procedures in third phase. In all, 15 lists of comparable difficulty were formulated as test material. The mean SRTn across these lists corresponded to -2.74 (SD = 0.21). The developed sentence lists provided a valid and reliable tool to measure SRTn in Telugu native speakers.

  5. On the use of the distortion-sensitivity approach in examining the role of linguistic abilities in speech understanding in noise

    NARCIS (Netherlands)

    Goverts, S.T.; Huysmans, E.; Kramer, S.E.; Groot, A.M.; Houtgast, T.

    2011-01-01

    Purpose: Researchers have used the distortion-sensitivity approach in the psychoacoustical domain to investigate the role of auditory processing abilities in speech perception in noise (van Schijndel, Houtgast, & Festen, 2001; Goverts & Houtgast, 2010). In this study, the authors examined the

  6. A blood pressure monitor with robust noise reduction system under linear cuff inflation and deflation.

    Science.gov (United States)

    Usuda, Takashi; Kobayashi, Naoki; Takeda, Sunao; Kotake, Yoshifumi

    2010-01-01

    We have developed the non-invasive blood pressure monitor which can measure the blood pressure quickly and robustly. This monitor combines two measurement mode: the linear inflation and the linear deflation. On the inflation mode, we realized a faster measurement with rapid inflation rate. On the deflation mode, we realized a robust noise reduction. When there is neither noise nor arrhythmia, the inflation mode incorporated on this monitor provides precise, quick and comfortable measurement. Once the inflation mode fails to calculate appropriate blood pressure due to body movement or arrhythmia, then the monitor switches automatically to the deflation mode and measure blood pressure by using digital signal processing as wavelet analysis, filter bank, filter combined with FFT and Inverse FFT. The inflation mode succeeded 2440 measurements out of 3099 measurements (79%) in an operating room and a rehabilitation room. The new designed blood pressure monitor provides the fastest measurement for patient with normal circulation and robust measurement for patients with body movement or severe arrhythmia. Also this fast measurement method provides comfortableness for patients.

  7. Reduction of Non-stationary Noise using a Non-negative Latent Variable Decomposition

    DEFF Research Database (Denmark)

    Schmidt, Mikkel Nørgaard; Larsen, Jan

    2008-01-01

    We present a method for suppression of non-stationary noise in single channel recordings of speech. The method is based on a non-negative latent variable decomposition model for the speech and noise signals, learned directly from a noisy mixture. In non-speech regions an over complete basis...... is learned for the noise that is then used to jointly estimate the speech and the noise from the mixture. We compare the method to the classical spectral subtraction approach, where the noise spectrum is estimated as the average over non-speech frames. The proposed method significantly outperforms...

  8. Comparison of PAM and CAP modulations robustness against mode partition noise in optical links

    Science.gov (United States)

    Stepniak, Grzegorz

    2017-08-01

    Mode partition noise (MPN) of the laser employed at the transmitter can significantly degrade the transmission performance. In the paper, we introduce a simulation model of MPN in vertical cavity surface emitting laser (VCSEL) and simulate transmission of pulse amplitude modulation (PAM) and carrierless amplitude phase (CAP) signals in multimode fiber (MMF) link. By turning off other effects, like relative intensity noise (RIN), we focus solely on the influence of MPN on transmission performance degradation. Robustness of modulation and equalization type against MPN is studied.

  9. Boundary layer noise subtraction in hydrodynamic tunnel using robust principal component analysis.

    Science.gov (United States)

    Amailland, Sylvain; Thomas, Jean-Hugh; Pézerat, Charles; Boucheron, Romuald

    2018-04-01

    The acoustic study of propellers in a hydrodynamic tunnel is of paramount importance during the design process, but can involve significant difficulties due to the boundary layer noise (BLN). Indeed, advanced denoising methods are needed to recover the acoustic signal in case of poor signal-to-noise ratio. The technique proposed in this paper is based on the decomposition of the wall-pressure cross-spectral matrix (CSM) by taking advantage of both the low-rank property of the acoustic CSM and the sparse property of the BLN CSM. Thus, the algorithm belongs to the class of robust principal component analysis (RPCA), which derives from the widely used principal component analysis. If the BLN is spatially decorrelated, the proposed RPCA algorithm can blindly recover the acoustical signals even for negative signal-to-noise ratio. Unfortunately, in a realistic case, acoustic signals recorded in a hydrodynamic tunnel show that the noise may be partially correlated. A prewhitening strategy is then considered in order to take into account the spatially coherent background noise. Numerical simulations and experimental results show an improvement in terms of BLN reduction in the large hydrodynamic tunnel. The effectiveness of the denoising method is also investigated in the context of acoustic source localization.

  10. Auditory Modeling for Noisy Speech Recognition

    National Research Council Canada - National Science Library

    2000-01-01

    ... digital filtering for noise cancellation which interfaces to speech recognition software. It uses auditory features in speech recognition training, and provides applications to multilingual spoken language translation...

  11. Resolution and robustness to noise of the sensitivity-based method for microwave imaging with data acquired on cylindrical surfaces

    International Nuclear Information System (INIS)

    Zhang, Yifan; Tu, Sheng; Amineh, Reza K; Nikolova, Natalia K

    2012-01-01

    The spatial resolution limit of a Jacobian-based microwave imaging algorithm and its robustness to noise are evaluated. The focus here is on tomographic systems where the wideband data are acquired with a vertically scanned circular sensor array and at each scanning step a 2D image is reconstructed in the plane of the sensor array. The theoretical resolution is obtained as one-half of the maximum-frequency wavelength with far-zone data and about two-thirds of the array radius with near-zone data. Validation examples are given using analytical electromagnetic models. The algorithm is shown to be robust to noise when the response data are corrupted by Gaussian white noise. (paper)

  12. Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

    Science.gov (United States)

    Ye, Sherry

    2015-01-01

    NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.

  13. The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility

    DEFF Research Database (Denmark)

    Bentsen, Thomas; May, Tobias; Kressner, Abigail Anne

    2018-01-01

    Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements....... A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech......, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where...

  14. Noise Reduction with Microphone Arrays for Speaker Identification

    Energy Technology Data Exchange (ETDEWEB)

    Cohen, Z

    2011-12-22

    Reducing acoustic noise in audio recordings is an ongoing problem that plagues many applications. This noise is hard to reduce because of interfering sources and non-stationary behavior of the overall background noise. Many single channel noise reduction algorithms exist but are limited in that the more the noise is reduced; the more the signal of interest is distorted due to the fact that the signal and noise overlap in frequency. Specifically acoustic background noise causes problems in the area of speaker identification. Recording a speaker in the presence of acoustic noise ultimately limits the performance and confidence of speaker identification algorithms. In situations where it is impossible to control the environment where the speech sample is taken, noise reduction filtering algorithms need to be developed to clean the recorded speech of background noise. Because single channel noise reduction algorithms would distort the speech signal, the overall challenge of this project was to see if spatial information provided by microphone arrays could be exploited to aid in speaker identification. The goals are: (1) Test the feasibility of using microphone arrays to reduce background noise in speech recordings; (2) Characterize and compare different multichannel noise reduction algorithms; (3) Provide recommendations for using these multichannel algorithms; and (4) Ultimately answer the question - Can the use of microphone arrays aid in speaker identification?

  15. Application of adaptive digital signal processing to speech enhancement for the hearing impaired.

    Science.gov (United States)

    Chabries, D M; Christiansen, R W; Brey, R H; Robinette, M S; Harris, R W

    1987-01-01

    A major complaint of individuals with normal hearing and hearing impairments is a reduced ability to understand speech in a noisy environment. This paper describes the concept of adaptive noise cancelling for removing noise from corrupted speech signals. Application of adaptive digital signal processing has long been known and is described from a historical as well as technical perspective. The Widrow-Hoff LMS (least mean square) algorithm developed in 1959 forms the introduction to modern adaptive signal processing. This method uses a "primary" input which consists of the desired speech signal corrupted with noise and a second "reference" signal which is used to estimate the primary noise signal. By subtracting the adaptively filtered estimate of the noise, the desired speech signal is obtained. Recent developments in the field as they relate to noise cancellation are described. These developments include more computationally efficient algorithms as well as algorithms that exhibit improved learning performance. A second method for removing noise from speech, for use when no independent reference for the noise exists, is referred to as single channel noise suppression. Both adaptive and spectral subtraction techniques have been applied to this problem--often with the result of decreased speech intelligibility. Current techniques applied to this problem are described, including signal processing techniques that offer promise in the noise suppression application.

  16. Robust synchronization control scheme of a population of nonlinear stochastic synthetic genetic oscillators under intrinsic and extrinsic molecular noise via quorum sensing.

    Science.gov (United States)

    Chen, Bor-Sen; Hsu, Chih-Yuan

    2012-10-26

    Collective rhythms of gene regulatory networks have been a subject of considerable interest for biologists and theoreticians, in particular the synchronization of dynamic cells mediated by intercellular communication. Synchronization of a population of synthetic genetic oscillators is an important design in practical applications, because such a population distributed over different host cells needs to exploit molecular phenomena simultaneously in order to emerge a biological phenomenon. However, this synchronization may be corrupted by intrinsic kinetic parameter fluctuations and extrinsic environmental molecular noise. Therefore, robust synchronization is an important design topic in nonlinear stochastic coupled synthetic genetic oscillators with intrinsic kinetic parameter fluctuations and extrinsic molecular noise. Initially, the condition for robust synchronization of synthetic genetic oscillators was derived based on Hamilton Jacobi inequality (HJI). We found that if the synchronization robustness can confer enough intrinsic robustness to tolerate intrinsic parameter fluctuation and extrinsic robustness to filter the environmental noise, then robust synchronization of coupled synthetic genetic oscillators is guaranteed. If the synchronization robustness of a population of nonlinear stochastic coupled synthetic genetic oscillators distributed over different host cells could not be maintained, then robust synchronization could be enhanced by external control input through quorum sensing molecules. In order to simplify the analysis and design of robust synchronization of nonlinear stochastic synthetic genetic oscillators, the fuzzy interpolation method was employed to interpolate several local linear stochastic coupled systems to approximate the nonlinear stochastic coupled system so that the HJI-based synchronization design problem could be replaced by a simple linear matrix inequality (LMI)-based design problem, which could be solved with the help of LMI

  17. Effects of directional microphone and adaptive multichannel noise reduction algorithm on cochlear implant performance.

    Science.gov (United States)

    Chung, King; Zeng, Fan-Gang; Acker, Kyle N

    2006-10-01

    Although cochlear implant (CI) users have enjoyed good speech recognition in quiet, they still have difficulties understanding speech in noise. We conducted three experiments to determine whether a directional microphone and an adaptive multichannel noise reduction algorithm could enhance CI performance in noise and whether Speech Transmission Index (STI) can be used to predict CI performance in various acoustic and signal processing conditions. In Experiment I, CI users listened to speech in noise processed by 4 hearing aid settings: omni-directional microphone, omni-directional microphone plus noise reduction, directional microphone, and directional microphone plus noise reduction. The directional microphone significantly improved speech recognition in noise. Both directional microphone and noise reduction algorithm improved overall preference. In Experiment II, normal hearing individuals listened to the recorded speech produced by 4- or 8-channel CI simulations. The 8-channel simulation yielded similar speech recognition results as in Experiment I, whereas the 4-channel simulation produced no significant difference among the 4 settings. In Experiment III, we examined the relationship between STIs and speech recognition. The results suggested that STI could predict actual and simulated CI speech intelligibility with acoustic degradation and the directional microphone, but not the noise reduction algorithm. Implications for intelligibility enhancement are discussed.

  18. Tracking of Nonstationary Noise Based on Data-Driven Recursive Noise Power Estimation

    NARCIS (Netherlands)

    Erkelens, J.S.; Heusdens, R.

    2008-01-01

    This paper considers estimation of the noise spectral variance from speech signals contaminated by highly nonstationary noise sources. The method can accurately track fast changes in noise power level (up to about 10 dB/s). In each time frame, for each frequency bin, the noise variance estimate is

  19. Robust signal selection for lineair prediction analysis of voiced speech

    NARCIS (Netherlands)

    Ma, C.; Kamp, Y.; Willems, L.F.

    1993-01-01

    This paper investigates a weighted LPC analysis of voiced speech. In view of the speech production model, the weighting function is either chosen to be the short-time energy function of the preemphasized speech sample sequence with certain delays or is obtained by thresholding the short-time energy

  20. Background Noise Degrades Central Auditory Processing in Toddlers.

    Science.gov (United States)

    Niemitalo-Haapola, Elina; Haapala, Sini; Jansson-Verkasalo, Eira; Kujala, Teija

    2015-01-01

    Noise, as an unwanted sound, has become one of modern society's environmental conundrums, and many children are exposed to higher noise levels than previously assumed. However, the effects of background noise on central auditory processing of toddlers, who are still acquiring language skills, have so far not been determined. The authors evaluated the effects of background noise on toddlers' speech-sound processing by recording event-related brain potentials. The hypothesis was that background noise modulates neural speech-sound encoding and degrades speech-sound discrimination. Obligatory P1 and N2 responses for standard syllables and the mismatch negativity (MMN) response for five different syllable deviants presented in a linguistic multifeature paradigm were recorded in silent and background noise conditions. The participants were 18 typically developing 22- to 26-month-old monolingual children with healthy ears. The results showed that the P1 amplitude was smaller and the N2 amplitude larger in the noisy conditions compared with the silent conditions. In the noisy condition, the MMN was absent for the intensity and vowel changes and diminished for the consonant, frequency, and vowel duration changes embedded in speech syllables. Furthermore, the frontal MMN component was attenuated in the noisy condition. However, noise had no effect on P1, N2, or MMN latencies. The results from this study suggest multiple effects of background noise on the central auditory processing of toddlers. It modulates the early stages of sound encoding and dampens neural discrimination vital for accurate speech perception. These results imply that speech processing of toddlers, who may spend long periods of daytime in noisy conditions, is vulnerable to background noise. In noisy conditions, toddlers' neural representations of some speech sounds might be weakened. Thus, special attention should be paid to acoustic conditions and background noise levels in children's daily environments

  1. The effect of viewing speech on auditory speech processing is different in the left and right hemispheres.

    Science.gov (United States)

    Davis, Chris; Kislyuk, Daniel; Kim, Jeesun; Sams, Mikko

    2008-11-25

    We used whole-head magnetoencephalograpy (MEG) to record changes in neuromagnetic N100m responses generated in the left and right auditory cortex as a function of the match between visual and auditory speech signals. Stimuli were auditory-only (AO) and auditory-visual (AV) presentations of /pi/, /ti/ and /vi/. Three types of intensity matched auditory stimuli were used: intact speech (Normal), frequency band filtered speech (Band) and speech-shaped white noise (Noise). The behavioural task was to detect the /vi/ syllables which comprised 12% of stimuli. N100m responses were measured to averaged /pi/ and /ti/ stimuli. Behavioural data showed that identification of the stimuli was faster and more accurate for Normal than for Band stimuli, and for Band than for Noise stimuli. Reaction times were faster for AV than AO stimuli. MEG data showed that in the left hemisphere, N100m to both AO and AV stimuli was largest for the Normal, smaller for Band and smallest for Noise stimuli. In the right hemisphere, Normal and Band AO stimuli elicited N100m responses of quite similar amplitudes, but N100m amplitude to Noise was about half of that. There was a reduction in N100m for the AV compared to the AO conditions. The size of this reduction for each stimulus type was same in the left hemisphere but graded in the right (being largest to the Normal, smaller to the Band and smallest to the Noise stimuli). The N100m decrease for the Normal stimuli was significantly larger in the right than in the left hemisphere. We suggest that the effect of processing visual speech seen in the right hemisphere likely reflects suppression of the auditory response based on AV cues for place of articulation.

  2. The Efficacy of Short-term Gated Audiovisual Speech Training for Improving Auditory Sentence Identification in Noise in Elderly Hearing Aid Users

    Science.gov (United States)

    Moradi, Shahram; Wahlin, Anna; Hällgren, Mathias; Rönnberg, Jerker; Lidestam, Björn

    2017-01-01

    This study aimed to examine the efficacy and maintenance of short-term (one-session) gated audiovisual speech training for improving auditory sentence identification in noise in experienced elderly hearing-aid users. Twenty-five hearing aid users (16 men and 9 women), with an average age of 70.8 years, were randomly divided into an experimental (audiovisual training, n = 14) and a control (auditory training, n = 11) group. Participants underwent gated speech identification tasks comprising Swedish consonants and words presented at 65 dB sound pressure level with a 0 dB signal-to-noise ratio (steady-state broadband noise), in audiovisual or auditory-only training conditions. The Hearing-in-Noise Test was employed to measure participants’ auditory sentence identification in noise before the training (pre-test), promptly after training (post-test), and 1 month after training (one-month follow-up). The results showed that audiovisual training improved auditory sentence identification in noise promptly after the training (post-test vs. pre-test scores); furthermore, this improvement was maintained 1 month after the training (one-month follow-up vs. pre-test scores). Such improvement was not observed in the control group, neither promptly after the training nor at the one-month follow-up. However, no significant between-groups difference nor an interaction between groups and session was observed. Conclusion: Audiovisual training may be considered in aural rehabilitation of hearing aid users to improve listening capabilities in noisy conditions. However, the lack of a significant between-groups effect (audiovisual vs. auditory) or an interaction between group and session calls for further research. PMID:28348542

  3. Improved Kalman Filter-Based Speech Enhancement with Perceptual Post-Filtering

    Institute of Scientific and Technical Information of China (English)

    WEIJianqiang; DULimin; YANZhaoli; ZENGHui

    2004-01-01

    In this paper, a Kalman filter-based speech enhancement algorithm with some improvements of previous work is presented. A new technique based on spectral subtraction is used for separation speech and noise characteristics from noisy speech and for the computation of speech and noise Autoregressive (AR) parameters. In order to obtain a Kalman filter output with high audible quality, a perceptual post-filter is placed at the output of the Kalman filter to smooth the enhanced speech spectra.Extensive experiments indicate that this newly proposed method works well.

  4. Relations Between Self-Reported Daily-Life Fatigue, Hearing Status, and Pupil Dilation During a Speech Perception in Noise Task

    DEFF Research Database (Denmark)

    Wang, Yang; Naylor, Graham; Kramer, Sophia E

    2017-01-01

    during the speech processing, and we used peak pupil dilation (PPD) as the main outcome measure of the pupillometry. No correlation was found between subjectively measured fatigue and hearing acuity, nor was a group difference found between the normally hearing and the hearing-impaired participants...... on the fatigue scores. A significant negative correlation was found between self-reported fatigue and PPD. A similar correlation was also found between Speech Intelligibility Index required for 50% correct and PPD. Multiple regression analysis showed that factors representing "hearing acuity" and "self......-reported fatigue" had equal and independent associations with the PPD during the speech in noise test. Less fatigue and better hearing acuity were associated with a larger pupil dilation. To the best of our knowledge, this is the first study to investigate the relationship between a subjective measure of daily...

  5. Soundscape elaboration from anthrophonic adaptation of community noise

    Science.gov (United States)

    Teddy Badai Samodra, FX

    2018-03-01

    Under the situation of an urban environment, noise has been a critical issue in affecting the indoor environment. A reliable approach is required for evaluation of the community noise as one factor of anthrophonic in the urban environment. This research investigates the level of noise exposure from different community noise sources and elaborates the advantage of the noise disadvantages for soundscape innovation. Integrated building element design as a protector for noise control and speech intelligibility compliance using field experiment and MATLAB programming and modeling are also carried out. Meanwhile, for simulation analysis and building acoustic optimization, Sound Reduction-Speech Intelligibility and Reverberation Time are the main parameters for identifying tropical building model as case study object. The results show that the noise control should consider its integration with the other critical issue, thermal control, in an urban environment. The 1.1 second of reverberation time for speech activities and noise reduction more than 28.66 dBA for critical frequency (20 Hz), the speech intelligibility index could be reached more than fair assessment, 0.45. Furthermore, the environmental psychology adaptation result “Close The Opening” as the best method in high noise condition and personal adjustment as the easiest and the most adaptable way.

  6. Objective Prediction of Hearing Aid Benefit Across Listener Groups Using Machine Learning: Speech Recognition Performance With Binaural Noise-Reduction Algorithms.

    Science.gov (United States)

    Schädler, Marc R; Warzybok, Anna; Kollmeier, Birger

    2018-01-01

    The simulation framework for auditory discrimination experiments (FADE) was adopted and validated to predict the individual speech-in-noise recognition performance of listeners with normal and impaired hearing with and without a given hearing-aid algorithm. FADE uses a simple automatic speech recognizer (ASR) to estimate the lowest achievable speech reception thresholds (SRTs) from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Empirical data from the literature were used to evaluate the model in terms of predicted SRTs and benefits in SRT with the German matrix sentence recognition test when using eight single- and multichannel binaural noise-reduction algorithms. To allow individual predictions of SRTs in binaural conditions, the model was extended with a simple better ear approach and individualized by taking audiograms into account. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for a group of normal-hearing listeners and predicted the corresponding benefits with a root-mean-square prediction error of 0.6 dB. This highlights the potential of the approach for the objective assessment of benefits in SRT without prior knowledge about the empirical data. The predictions for the group of listeners with impaired hearing explained 75% of the empirical variance, while the individual predictions explained less than 25%. Possibly, additional individual factors should be considered for more accurate predictions with impaired hearing. A competing talker condition clearly showed one limitation of current ASR technology, as the empirical performance with SRTs lower than -20 dB could not be predicted.

  7. Optimizing acoustical conditions for speech intelligibility in classrooms

    Science.gov (United States)

    Yang, Wonyoung

    High speech intelligibility is imperative in classrooms where verbal communication is critical. However, the optimal acoustical conditions to achieve a high degree of speech intelligibility have previously been investigated with inconsistent results, and practical room-acoustical solutions to optimize the acoustical conditions for speech intelligibility have not been developed. This experimental study validated auralization for speech-intelligibility testing, investigated the optimal reverberation for speech intelligibility for both normal and hearing-impaired listeners using more realistic room-acoustical models, and proposed an optimal sound-control design for speech intelligibility based on the findings. The auralization technique was used to perform subjective speech-intelligibility tests. The validation study, comparing auralization results with those of real classroom speech-intelligibility tests, found that if the room to be auralized is not very absorptive or noisy, speech-intelligibility tests using auralization are valid. The speech-intelligibility tests were done in two different auralized sound fields---approximately diffuse and non-diffuse---using the Modified Rhyme Test and both normal and hearing-impaired listeners. A hybrid room-acoustical prediction program was used throughout the work, and it and a 1/8 scale-model classroom were used to evaluate the effects of ceiling barriers and reflectors. For both subject groups, in approximately diffuse sound fields, when the speech source was closer to the listener than the noise source, the optimal reverberation time was zero. When the noise source was closer to the listener than the speech source, the optimal reverberation time was 0.4 s (with another peak at 0.0 s) with relative output power levels of the speech and noise sources SNS = 5 dB, and 0.8 s with SNS = 0 dB. In non-diffuse sound fields, when the noise source was between the speaker and the listener, the optimal reverberation time was 0.6 s with

  8. Children's Perception of Conversational and Clear American-English Vowels in Noise

    Science.gov (United States)

    Leone, Dorothy; Levy, Erika S.

    2015-01-01

    Purpose: Much of a child's day is spent listening to speech in the presence of background noise. Although accurate vowel perception is important for listeners' accurate speech perception and comprehension, little is known about children's vowel perception in noise. "Clear speech" is a speech style frequently used by talkers in the…

  9. Speech recognition in individuals with sensorineural hearing loss

    Directory of Open Access Journals (Sweden)

    Adriana Neves de Andrade

    Full Text Available ABSTRACT INTRODUCTION: Hearing loss can negatively influence the communication performance of individuals, who should be evaluated with suitable material and in situations of listening close to those found in everyday life. OBJECTIVE: To analyze and compare the performance of patients with mild-to-moderate sensorineural hearing loss in speech recognition tests carried out in silence and with noise, according to the variables ear (right and left and type of stimulus presentation. METHODS: The study included 19 right-handed individuals with mild-to-moderate symmetrical bilateral sensorineural hearing loss, submitted to the speech recognition test with words in different modalities and speech test with white noise and pictures. RESULTS: There was no significant difference between right and left ears in any of the tests. The mean number of correct responses in the speech recognition test with pictures, live voice, and recorded monosyllables was 97.1%, 85.9%, and 76.1%, respectively, whereas after the introduction of noise, the performance decreased to 72.6% accuracy. CONCLUSIONS: The best performances in the Speech Recognition Percentage Index were obtained using monosyllabic stimuli, represented by pictures presented in silence, with no significant differences between the right and left ears. After the introduction of competitive noise, there was a decrease in individuals' performance.

  10. Neural Correlates of Early Sound Encoding and their Relationship to Speech-in-Noise Perception

    Directory of Open Access Journals (Sweden)

    Emily B. J. Coffey

    2017-08-01

    Full Text Available Speech-in-noise (SIN perception is a complex cognitive skill that affects social, vocational, and educational activities. Poor SIN ability particularly affects young and elderly populations, yet varies considerably even among healthy young adults with normal hearing. Although SIN skills are known to be influenced by top-down processes that can selectively enhance lower-level sound representations, the complementary role of feed-forward mechanisms and their relationship to musical training is poorly understood. Using a paradigm that minimizes the main top-down factors that have been implicated in SIN performance such as working memory, we aimed to better understand how robust encoding of periodicity in the auditory system (as measured by the frequency-following response contributes to SIN perception. Using magnetoencephalograpy, we found that the strength of encoding at the fundamental frequency in the brainstem, thalamus, and cortex is correlated with SIN accuracy. The amplitude of the slower cortical P2 wave was previously also shown to be related to SIN accuracy and FFR strength; we use MEG source localization to show that the P2 wave originates in a temporal region anterior to that of the cortical FFR. We also confirm that the observed enhancements were related to the extent and timing of musicianship. These results are consistent with the hypothesis that basic feed-forward sound encoding affects SIN perception by providing better information to later processing stages, and that modifying this process may be one mechanism through which musical training might enhance the auditory networks that subserve both musical and language functions.

  11. Auditory Masking Effects on Speech Fluency in Apraxia of Speech and Aphasia: Comparison to Altered Auditory Feedback

    Science.gov (United States)

    Jacks, Adam; Haley, Katarina L.

    2015-01-01

    Purpose: To study the effects of masked auditory feedback (MAF) on speech fluency in adults with aphasia and/or apraxia of speech (APH/AOS). We hypothesized that adults with AOS would increase speech fluency when speaking with noise. Altered auditory feedback (AAF; i.e., delayed/frequency-shifted feedback) was included as a control condition not…

  12. Segmentation cues in conversational speech: Robust semantics and fragile phonotactics

    Directory of Open Access Journals (Sweden)

    Laurence eWhite

    2012-10-01

    Full Text Available Multiple cues influence listeners’ segmentation of connected speech into words, but most previous studies have used stimuli elicited in careful readings rather than natural conversation. Discerning word boundaries in conversational speech may differ from the laboratory setting. In particular, a speaker’s articulatory effort – hyperarticulation vs hypoarticulation (H&H – may vary according to communicative demands, suggesting a compensatory relationship whereby acoustic-phonetic cues are attenuated when other information sources strongly guide segmentation. We examined how listeners’ interpretation of segmentation cues is affected by speech style (spontaneous conversation vs read, using cross-modal identity priming. To elicit spontaneous stimuli, we used a map task in which speakers discussed routes around stylised landmarks. These landmarks were two-word phrases in which the strength of potential segmentation cues – semantic likelihood and cross-boundary diphone phonotactics – was systematically varied. Landmark-carrying utterances were transcribed and later re-recorded as read speech.Independent of speech style, we found an interaction between cue valence (favourable/unfavourable and cue type (phonotactics/semantics. Thus, there was an effect of semantic plausibility, but no effect of cross-boundary phonotactics, indicating that the importance of phonotactic segmentation may have been overstated in studies where lexical information was artificially suppressed. These patterns were unaffected by whether the stimuli were elicited in a spontaneous or read context, even though the difference in speech styles was evident in a main effect. Durational analyses suggested speaker-driven cue trade-offs congruent with an H&H account, but these modulations did not impact on listener behaviour. We conclude that previous research exploiting read speech is reliable in indicating the primacy of lexically-based cues in the segmentation of natural

  13. Vocal Noise Cancellation From Respiratory Sounds

    National Research Council Canada - National Science Library

    Moussavi, Zahra

    2001-01-01

    Although background noise cancellation for speech or electrocardiographic recording is well established, however when the background noise contains vocal noises and the main signal is a breath sound...

  14. Speech intelligibility for normal hearing and hearing-impaired listeners in simulated room acoustic conditions

    DEFF Research Database (Denmark)

    Arweiler, Iris; Dau, Torsten; Poulsen, Torben

    Speech intelligibility depends on many factors such as room acoustics, the acoustical properties and location of the signal and the interferers, and the ability of the (normal and impaired) auditory system to process monaural and binaural sounds. In the present study, the effect of reverberation...... on spatial release from masking was investigated in normal hearing and hearing impaired listeners using three types of interferers: speech shaped noise, an interfering female talker and speech-modulated noise. Speech reception thresholds (SRT) were obtained in three simulated environments: a listening room......, a classroom and a church. The data from the study provide constraints for existing models of speech intelligibility prediction (based on the speech intelligibility index, SII, or the speech transmission index, STI) which have shortcomings when reverberation and/or fluctuating noise affect speech...

  15. Audiovisual Asynchrony Detection in Human Speech

    Science.gov (United States)

    Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

    2011-01-01

    Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…

  16. Efficient CEPSTRAL Normalization for Robust Speech Recognition

    National Research Council Canada - National Science Library

    Liu, Fu-Hua; Stern, Richard M; Huang, Xuedong; Acero, Alejandro

    1993-01-01

    In this paper we describe and compare the performance of a series of cepstrum-based procedures that enable the CMU SPHINX-II speech recognition system to maintain a high level of recognition accuracy...

  17. Objective measures of listening effort: effects of background noise and noise reduction.

    Science.gov (United States)

    Sarampalis, Anastasios; Kalluri, Sridhar; Edwards, Brent; Hafter, Ervin

    2009-10-01

    This work is aimed at addressing a seeming contradiction related to the use of noise-reduction (NR) algorithms in hearing aids. The problem is that although some listeners claim a subjective improvement from NR, it has not been shown to improve speech intelligibility, often even making it worse. To address this, the hypothesis tested here is that the positive effects of NR might be to reduce cognitive effort directed toward speech reception, making it available for other tasks. Normal-hearing individuals participated in 2 dual-task experiments, in which 1 task was to report sentences or words in noise set to various signal-to-noise ratios. Secondary tasks involved either holding words in short-term memory or responding in a complex visual reaction-time task. At low values of signal-to-noise ratio, although NR had no positive effect on speech reception thresholds, it led to better performance on the word-memory task and quicker responses in visual reaction times. Results from both dual tasks support the hypothesis that NR reduces listening effort and frees up cognitive resources for other tasks. Future hearing aid research should incorporate objective measurements of cognitive benefits.

  18. Objective Prediction of Hearing Aid Benefit Across Listener Groups Using Machine Learning: Speech Recognition Performance With Binaural Noise-Reduction Algorithms

    Science.gov (United States)

    Schädler, Marc R.; Warzybok, Anna; Kollmeier, Birger

    2018-01-01

    The simulation framework for auditory discrimination experiments (FADE) was adopted and validated to predict the individual speech-in-noise recognition performance of listeners with normal and impaired hearing with and without a given hearing-aid algorithm. FADE uses a simple automatic speech recognizer (ASR) to estimate the lowest achievable speech reception thresholds (SRTs) from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Empirical data from the literature were used to evaluate the model in terms of predicted SRTs and benefits in SRT with the German matrix sentence recognition test when using eight single- and multichannel binaural noise-reduction algorithms. To allow individual predictions of SRTs in binaural conditions, the model was extended with a simple better ear approach and individualized by taking audiograms into account. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for a group of normal-hearing listeners and predicted the corresponding benefits with a root-mean-square prediction error of 0.6 dB. This highlights the potential of the approach for the objective assessment of benefits in SRT without prior knowledge about the empirical data. The predictions for the group of listeners with impaired hearing explained 75% of the empirical variance, while the individual predictions explained less than 25%. Possibly, additional individual factors should be considered for more accurate predictions with impaired hearing. A competing talker condition clearly showed one limitation of current ASR technology, as the empirical performance with SRTs lower than −20 dB could not be predicted. PMID:29692200

  19. Low Delay Noise Reduction and Dereverberation for Hearing Aids

    Directory of Open Access Journals (Sweden)

    Heinrich W. Löllmann

    2009-01-01

    Full Text Available A new system for single-channel speech enhancement is proposed which achieves a joint suppression of late reverberant speech and background noise with a low signal delay and low computational complexity. It is based on a generalized spectral subtraction rule which depends on the variances of the late reverberant speech and background noise. The calculation of the spectral variances of the late reverberant speech requires an estimate of the reverberation time (RT which is accomplished by a maximum likelihood (ML approach. The enhancement with this blind RT estimation achieves almost the same speech quality as by using the actual RT. In comparison to commonly used post-filters in hearing aids which only perform a noise reduction, a significantly better objective and subjective speech quality is achieved. The proposed system performs time-domain filtering with coefficients adapted in the non-uniform (Bark-scaled frequency-domain. This allows to achieve a high speech quality with low signal delay which is important for speech enhancement in hearing aids or related applications such as hands-free communication systems.

  20. How young and old adults listen to and remember speech in noise.

    Science.gov (United States)

    Pichora-Fuller, M K; Schneider, B A; Daneman, M

    1995-01-01

    Two experiments using the materials of the Revised Speech Perception in Noise (SPIN-R) Test [Bilger et al., J. Speech Hear. Res. 27, 32-48 (1984)] were conducted to investigate age-related differences in the identification and the recall of sentence-final words heard in a babble background. In experiment 1, the level of the babble was varied to determine psychometric functions (percent correct word identification as a function of S/N ratio) for presbycusics, old adults with near-normal hearing, and young normal-hearing adults, when the sentence-final words were either predictable (high context) or unpredictable (low context). Differences between the psychometric functions for high- and low-context conditions were used to show that both groups of old listeners derived more benefit from supportive context than did young listeners. In experiment 2, a working memory task [Daneman and Carpenter, J. Verb. Learn. Verb. Behav. 19, 450-466 (1980)] was added to the SPIN task for young and old adults. Specifically, after listening to and identifying the sentence-final words for a block of n sentences, the subjects were asked to recall the last n words that they had identified. Old subjects recalled fewer of the items they had perceived than did young subjects in all S/N conditions, even though there was no difference in the recall ability of the two age groups when sentences were read. Furthermore, the number of items recalled by both age groups was reduced in adverse S/N conditions. The resutls were interpreted as supporting a processing model in which reallocable processing resources are used to support auditory processing when listening becomes difficult either because of noise, or because of age-related deterioration in the auditory system. Because of this reallocation, these resources are unavailable to more central cognitive processes such as the storage and retrieval functions of working memory, so that "upstream" processing of auditory information is adversely

  1. Variable Span Filters for Speech Enhancement

    DEFF Research Database (Denmark)

    Jensen, Jesper Rindom; Benesty, Jacob; Christensen, Mads Græsbøll

    2016-01-01

    In this work, we consider enhancement of multichannel speech recordings. Linear filtering and subspace approaches have been considered previously for solving the problem. The current linear filtering methods, although many variants exist, have limited control of noise reduction and speech...

  2. Speech Intelligibility in Noise Using Throat and Acoustic Microphones

    National Research Council Canada - National Science Library

    Acker-Mills, Barbara

    2004-01-01

    ... speech intelligibility. Speech intelligibility for signals generated by an acoustic microphone, a throat microphone, and the two microphones together was assessed using the Modified Rhyme Test (MRT...

  3. Design and realisation of an audiovisual speech activity detector

    NARCIS (Netherlands)

    Van Bree, K.C.

    2006-01-01

    For many speech telecommunication technologies a robust speech activity detector is important. An audio-only speech detector will givefalse positives when the interfering signal is speech or has speech characteristics. The modality video is suitable to solve this problem. In this report the approach

  4. Comparing spatial tuning curves, spectral ripple resolution, and speech perception in cochlear implant users.

    Science.gov (United States)

    Anderson, Elizabeth S; Nelson, David A; Kreft, Heather; Nelson, Peggy B; Oxenham, Andrew J

    2011-07-01

    Spectral ripple discrimination thresholds were measured in 15 cochlear-implant users with broadband (350-5600 Hz) and octave-band noise stimuli. The results were compared with spatial tuning curve (STC) bandwidths previously obtained from the same subjects. Spatial tuning curve bandwidths did not correlate significantly with broadband spectral ripple discrimination thresholds but did correlate significantly with ripple discrimination thresholds when the rippled noise was confined to an octave-wide passband, centered on the STC's probe electrode frequency allocation. Ripple discrimination thresholds were also measured for octave-band stimuli in four contiguous octaves, with center frequencies from 500 Hz to 4000 Hz. Substantial variations in thresholds with center frequency were found in individuals, but no general trends of increasing or decreasing resolution from apex to base were observed in the pooled data. Neither ripple nor STC measures correlated consistently with speech measures in noise and quiet in the sample of subjects in this study. Overall, the results suggest that spectral ripple discrimination measures provide a reasonable measure of spectral resolution that correlates well with more direct, but more time-consuming, measures of spectral resolution, but that such measures do not always provide a clear and robust predictor of performance in speech perception tasks. © 2011 Acoustical Society of America

  5. Wind Noise Reduction using Non-negative Sparse Coding

    DEFF Research Database (Denmark)

    Schmidt, Mikkel N.; Larsen, Jan; Hsiao, Fu-Tien

    2007-01-01

    We introduce a new speaker independent method for reducing wind noise in single-channel recordings of noisy speech. The method is based on non-negative sparse coding and relies on a wind noise dictionary which is estimated from an isolated noise recording. We estimate the parameters of the model ...... and discuss their sensitivity. We then compare the algorithm with the classical spectral subtraction method and the Qualcomm-ICSI-OGI noise reduction method. We optimize the sound quality in terms of signal-to-noise ratio and provide results on a noisy speech recognition task....

  6. Speech recognition in individuals with sensorineural hearing loss.

    Science.gov (United States)

    de Andrade, Adriana Neves; Iorio, Maria Cecilia Martinelli; Gil, Daniela

    2016-01-01

    Hearing loss can negatively influence the communication performance of individuals, who should be evaluated with suitable material and in situations of listening close to those found in everyday life. To analyze and compare the performance of patients with mild-to-moderate sensorineural hearing loss in speech recognition tests carried out in silence and with noise, according to the variables ear (right and left) and type of stimulus presentation. The study included 19 right-handed individuals with mild-to-moderate symmetrical bilateral sensorineural hearing loss, submitted to the speech recognition test with words in different modalities and speech test with white noise and pictures. There was no significant difference between right and left ears in any of the tests. The mean number of correct responses in the speech recognition test with pictures, live voice, and recorded monosyllables was 97.1%, 85.9%, and 76.1%, respectively, whereas after the introduction of noise, the performance decreased to 72.6% accuracy. The best performances in the Speech Recognition Percentage Index were obtained using monosyllabic stimuli, represented by pictures presented in silence, with no significant differences between the right and left ears. After the introduction of competitive noise, there was a decrease in individuals' performance. Copyright © 2015 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.

  7. A simulation study of harmonics regeneration in noise reduction for electric and acoustic stimulation.

    Science.gov (United States)

    Hu, Yi

    2010-05-01

    Recent research results show that combined electric and acoustic stimulation (EAS) significantly improves speech recognition in noise, and it is generally established that access to the improved F0 representation of target speech, along with the glimpse cues, provide the EAS benefits. Under noisy listening conditions, noise signals degrade these important cues by introducing undesired temporal-frequency components and corrupting harmonics structure. In this study, the potential of combining noise reduction and harmonics regeneration techniques was investigated to further improve speech intelligibility in noise by providing improved beneficial cues for EAS. Three hypotheses were tested: (1) noise reduction methods can improve speech intelligibility in noise for EAS; (2) harmonics regeneration after noise reduction can further improve speech intelligibility in noise for EAS; and (3) harmonics sideband constraints in frequency domain (or equivalently, amplitude modulation in temporal domain), even deterministic ones, can provide additional benefits. Test results demonstrate that combining noise reduction and harmonics regeneration can significantly improve speech recognition in noise for EAS, and it is also beneficial to preserve the harmonics sidebands under adverse listening conditions. This finding warrants further work into the development of algorithms that regenerate harmonics and the related sidebands for EAS processing under noisy conditions.

  8. Speech Perception With Combined Electric-Acoustic Stimulation: A Simulation and Model Comparison.

    Science.gov (United States)

    Rader, Tobias; Adel, Youssef; Fastl, Hugo; Baumann, Uwe

    2015-01-01

    The aim of this study is to simulate speech perception with combined electric-acoustic stimulation (EAS), verify the advantage of combined stimulation in normal-hearing (NH) subjects, and then compare it with cochlear implant (CI) and EAS user results from the authors' previous study. Furthermore, an automatic speech recognition (ASR) system was built to examine the impact of low-frequency information and is proposed as an applied model to study different hypotheses of the combined-stimulation advantage. Signal-detection-theory (SDT) models were applied to assess predictions of subject performance without the need to assume any synergistic effects. Speech perception was tested using a closed-set matrix test (Oldenburg sentence test), and its speech material was processed to simulate CI and EAS hearing. A total of 43 NH subjects and a customized ASR system were tested. CI hearing was simulated by an aurally adequate signal spectrum analysis and representation, the part-tone-time-pattern, which was vocoded at 12 center frequencies according to the MED-EL DUET speech processor. Residual acoustic hearing was simulated by low-pass (LP)-filtered speech with cutoff frequencies 200 and 500 Hz for NH subjects and in the range from 100 to 500 Hz for the ASR system. Speech reception thresholds were determined in amplitude-modulated noise and in pseudocontinuous noise. Previously proposed SDT models were lastly applied to predict NH subject performance with EAS simulations. NH subjects tested with EAS simulations demonstrated the combined-stimulation advantage. Increasing the LP cutoff frequency from 200 to 500 Hz significantly improved speech reception thresholds in both noise conditions. In continuous noise, CI and EAS users showed generally better performance than NH subjects tested with simulations. In modulated noise, performance was comparable except for the EAS at cutoff frequency 500 Hz where NH subject performance was superior. The ASR system showed similar behavior

  9. Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling.

    Science.gov (United States)

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2017-06-01

    Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.

  10. Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling

    Science.gov (United States)

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2017-06-01

    Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.

  11. Audiovisual Cues and Perceptual Learning of Spectrally Distorted Speech

    Science.gov (United States)

    Pilling, Michael; Thomas, Sharon

    2011-01-01

    Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…

  12. The relationship of speech intelligibility with hearing sensitivity, cognition, and perceived hearing difficulties varies for different speech perception tests

    Science.gov (United States)

    Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A.

    2015-01-01

    Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests. Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study. Forty-four listeners aged between 50 and 74 years with mild sensorineural hearing loss were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet), to medium (digit triplet perception in speech-shaped noise) to high (sentence perception in modulated noise); cognitive tests of attention, memory, and non-verbal intelligence quotient; and self-report questionnaires of general health-related and hearing-specific quality of life. Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that

  13. The relationship of speech intelligibility with hearing sensitivity, cognition, and perceived hearing difficulties varies for different speech perception tests

    Directory of Open Access Journals (Sweden)

    Antje eHeinrich

    2015-06-01

    Full Text Available Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests.Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study.Forty-four listeners aged between 50-74 years with mild SNHL were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet, to medium (digit triplet perception in speech-shaped noise to high (sentence perception in modulated noise; cognitive tests of attention, memory, and nonverbal IQ; and self-report questionnaires of general health-related and hearing-specific quality of life.Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that auditory environments pose on

  14. A Two-Sensor Noise Reduction System: Applications for Hands-Free Car Kit

    Directory of Open Access Journals (Sweden)

    Guérin Alexandre

    2003-01-01

    Full Text Available This paper presents a two-microphone speech enhancer designed to remove noise in hands-free car kits. The algorithm, based on the magnitude squared coherence, uses speech correlation and noise decorrelation to separate speech from noise. The remaining correlated noise is reduced using cross-spectral subtraction. Particular attention is focused on the estimation of the different spectral densities (noise and noisy signals power spectral densities which are critical for the quality of the algorithm. We also propose a continuous noise estimation, avoiding the need of vocal activity detector. Results on recorded signals are provided, showing the superiority of the two-sensor approach to single microphone techniques.

  15. Evaluation of Noise in Hearing Instruments Caused by GSM and DECT Mobile Telephones

    DEFF Research Database (Denmark)

    Hansen, Mie Østergaard; Poulsen, Torben

    1996-01-01

    The annoyance of noise in hearing instruments caused by electromagnetic interference from Global systems for Mobile Communication (GSM) and Digital European Cordless Telecommunication (DECT) mobile telephones has been subjectively evaluated by test subjects. The influence on speech recognition from...... the GSM and the DECT noises was also determined. The measurements involved seventeen hearing-imparied subjects. The annoyance was tested with GSM and DECT noise, each one mixed with continuous speech, a mall environment noise, or an office environment noise. Speech recognition was tested with the DANTALE...... word material mixed with GSM and DECT noise. The listening tests showed that if the noise level is acceptable so also is speech recognition. The results agree well with an investigation carried out on normal-hearing subjects. If a hearing instrument user is able to use a telephone without annoyance...

  16. Musical noise reduction using an adaptive filter

    Science.gov (United States)

    Hanada, Takeshi; Murakami, Takahiro; Ishida, Yoshihisa; Hoya, Tetsuya

    2003-10-01

    This paper presents a method for reducing a particular noise (musical noise). The musical noise is artificially produced by Spectral Subtraction (SS), which is one of the most conventional methods for speech enhancement. The musical noise is the tin-like sound and annoying in human auditory. We know that the duration of the musical noise is considerably short in comparison with that of speech, and that the frequency components of the musical noise are random and isolated. In the ordinary SS-based methods, the musical noise is removed by the post-processing. However, the output of the ordinary post-processing is delayed since the post-processing uses the succeeding frames. In order to improve this problem, we propose a novel method using an adaptive filter. In the proposed system, the observed noisy signal is used as the input signal to the adaptive filter and the output of SS is used as the reference signal. In this paper we exploit the normalized LMS (Least Mean Square) algorithm for the adaptive filter. Simulation results show that the proposed method has improved the intelligibility of the enhanced speech in comparison with the conventional method.

  17. Development and preliminary evaluation of a pediatric Spanish-English speech perception task.

    Science.gov (United States)

    Calandruccio, Lauren; Gomez, Bianca; Buss, Emily; Leibold, Lori J

    2014-06-01

    The purpose of this study was to develop a task to evaluate children's English and Spanish speech perception abilities in either noise or competing speech maskers. Eight bilingual Spanish-English and 8 age-matched monolingual English children (ages 4.9-16.4 years) were tested. A forced-choice, picture-pointing paradigm was selected for adaptively estimating masked speech reception thresholds. Speech stimuli were spoken by simultaneous bilingual Spanish-English talkers. The target stimuli were 30 disyllabic English and Spanish words, familiar to 5-year-olds and easily illustrated. Competing stimuli included either 2-talker English or 2-talker Spanish speech (corresponding to target language) and spectrally matched noise. For both groups of children, regardless of test language, performance was significantly worse for the 2-talker than for the noise masker condition. No difference in performance was found between bilingual and monolingual children. Bilingual children performed significantly better in English than in Spanish in competing speech. For all listening conditions, performance improved with increasing age. Results indicated that the stimuli and task were appropriate for speech recognition testing in both languages, providing a more conventional measure of speech-in-noise perception as well as a measure of complex listening. Further research is needed to determine performance for Spanish-dominant listeners and to evaluate the feasibility of implementation into routine clinical use.

  18. Fundamental Frequency and Direction-of-Arrival Estimation for Multichannel Speech Enhancement

    DEFF Research Database (Denmark)

    Karimian-Azari, Sam

    Audio systems receive the speech signals of interest usually in the presence of noise. The noise has profound impacts on the quality and intelligibility of the speech signals, and it is therefore clear that the noisy signals must be cleaned up before being played back, stored, or analyzed. We can...... estimate the speech signal of interest from the noisy signals using a priori knowledge about it. A human speech signal is broadband and consists of both voiced and unvoiced parts. The voiced part is quasi-periodic with a time-varying fundamental frequency (or pitch as it is commonly referred to). We...... their time differences which eventually may further reduce the effects of noise. This thesis introduces a number of principles and methods to estimate periodic signals in noisy environments with application to multichannel speech enhancement. We propose model-based signal enhancement concerning the model...

  19. Impact of Different Active-Speech-Ratios on PESQ’s Predictions in Case of Independent and Dependent Losses (in Presence of Receiver-Side Comfort-Noise

    Directory of Open Access Journals (Sweden)

    P. Pocta

    2010-04-01

    Full Text Available This paper deals with the investigation of PESQ’s behavior under independent and dependent loss conditions from an Active-Speech-Ratio perspective in presence of receiver-side comfort-noise. This reference signal characteristic is defined very broadly by ITU-T Recommendation P.862.3. That is the reason to investigate an impact of this characteristic on speech quality prediction more in-depth. We assess the variability of PESQ’s predictions with respect to Active-Speech-Ratios and loss conditions, as well as their accuracy, by comparing the predictions with subjective assessments. Our results show that an increase in amount of speech in the reference signal (expressed by the Active-Speech-Ratio characteristic may result in an increase of the reference signal sensitivity to packet loss change. Interestingly, we have found two additional effects in this investigated case. The use of higher Active-Speech-Ratios may lead to negative shifting effect in MOS domain and also PESQ’s predictions accuracy declining. Predictions accuracy could be improved by higher packet losses.

  20. Speech Intelligibility and Hearing Protector Selection

    Science.gov (United States)

    2016-08-29

    not only affect the listener of speech communication in a noisy environment, HPDs can also affect the speaker . Tufts and Frank (2003) found that...of hearing protection on speech intelligibility in noise. Sound and Vibration . 20(10): 12-14. Berger, E. H. 1980. EARLog #4 – The

  1. Robust spinal cord resting-state fMRI using independent component analysis-based nuisance regression noise reduction.

    Science.gov (United States)

    Hu, Yong; Jin, Richu; Li, Guangsheng; Luk, Keith Dk; Wu, Ed X

    2018-04-16

    Physiological noise reduction plays a critical role in spinal cord (SC) resting-state fMRI (rsfMRI). To reduce physiological noise and increase the robustness of SC rsfMRI by using an independent component analysis (ICA)-based nuisance regression (ICANR) method. Retrospective. Ten healthy subjects (female/male = 4/6, age = 27 ± 3 years, range 24-34 years). 3T/gradient-echo echo planar imaging (EPI). We used three alternative methods (no regression [Nil], conventional region of interest [ROI]-based noise reduction method without ICA [ROI-based], and correction of structured noise using spatial independent component analysis [CORSICA]) to compare with the performance of ICANR. Reduction of the influence of physiological noise on the SC and the reproducibility of rsfMRI analysis after noise reduction were examined. The correlation coefficient (CC) was calculated to assess the influence of physiological noise. Reproducibility was calculated by intraclass correlation (ICC). Results from different methods were compared by one-way analysis of variance (ANOVA) with post-hoc analysis. No significant difference in cerebrospinal fluid (CSF) pulsation influence or tissue motion influence were found (P = 0.223 in CSF, P = 0.2461 in tissue motion) in the ROI-based (CSF: 0.122 ± 0.020; tissue motion: 0.112 ± 0.015), and Nil (CSF: 0.134 ± 0.026; tissue motion: 0.124 ± 0.019). CORSICA showed a significantly stronger influence of CSF pulsation and tissue motion (CSF: 0.166 ± 0.045, P = 0.048; tissue motion: 0.160 ± 0.032, P = 0.048) than Nil. ICANR showed a significantly weaker influence of CSF pulsation and tissue motion (CSF: 0.076 ± 0.007, P = 0.0003; tissue motion: 0.081 ± 0.014, P = 0.0182) than Nil. The ICC values in the Nil, ROI-based, CORSICA, and ICANR were 0.669, 0.645, 0.561, and 0.766, respectively. ICANR more effectively reduced physiological noise from both tissue motion and CSF pulsation than three alternative methods. ICANR increases the robustness of SC rsf

  2. Benefits to Speech Perception in Noise From the Binaural Integration of Electric and Acoustic Signals in Simulated Unilateral Deafness.

    Science.gov (United States)

    Ma, Ning; Morris, Saffron; Kitterick, Pádraig Thomas

    2016-01-01

    This study used vocoder simulations with normal-hearing (NH) listeners to (1) measure their ability to integrate speech information from an NH ear and a simulated cochlear implant (CI), and (2) investigate whether binaural integration is disrupted by a mismatch in the delivery of spectral information between the ears arising from a misalignment in the mapping of frequency to place. Eight NH volunteers participated in the study and listened to sentences embedded in background noise via headphones. Stimuli presented to the left ear were unprocessed. Stimuli presented to the right ear (referred to as the CI-simulation ear) were processed using an eight-channel noise vocoder with one of the three processing strategies. An Ideal strategy simulated a frequency-to-place map across all channels that matched the delivery of spectral information between the ears. A Realistic strategy created a misalignment in the mapping of frequency to place in the CI-simulation ear where the size of the mismatch between the ears varied across channels. Finally, a Shifted strategy imposed a similar degree of misalignment in all channels, resulting in consistent mismatch between the ears across frequency. The ability to report key words in sentences was assessed under monaural and binaural listening conditions and at signal to noise ratios (SNRs) established by estimating speech-reception thresholds in each ear alone. The SNRs ensured that the monaural performance of the left ear never exceeded that of the CI-simulation ear. The advantages of binaural integration were calculated by comparing binaural performance with monaural performance using the CI-simulation ear alone. Thus, these advantages reflected the additional use of the experimentally constrained left ear and were not attributable to better-ear listening. Binaural performance was as accurate as, or more accurate than, monaural performance with the CI-simulation ear alone. When both ears supported a similar level of monaural

  3. A Noise Robust Statistical Texture Model

    DEFF Research Database (Denmark)

    Hilger, Klaus Baggesen; Stegmann, Mikkel Bille; Larsen, Rasmus

    2002-01-01

    Appearance Models segmentation framework. This is accomplished by augmenting the model with an estimate of the covariance of the noise present in the training data. This results in a more compact model maximising the signal-to-noise ratio, thus favouring subspaces rich on signal, but low on noise......This paper presents a novel approach to the problem of obtaining a low dimensional representation of texture (pixel intensity) variation present in a training set after alignment using a Generalised Procrustes analysis.We extend the conventional analysis of training textures in the Active...

  4. Cognition and speech-in-noise recognition: the role of proactive interference.

    Science.gov (United States)

    Ellis, Rachel J; Rönnberg, Jerker

    2014-01-01

    Complex working memory (WM) span tasks have been shown to predict speech-in-noise (SIN) recognition. Studies of complex WM span tasks suggest that, rather than indexing a single cognitive process, performance on such tasks may be governed by separate cognitive subprocesses embedded within WM. Previous research has suggested that one such subprocess indexed by WM tasks is proactive interference (PI), which refers to difficulties memorizing current information because of interference from previously stored long-term memory representations for similar information. The aim of the present study was to investigate phonological PI and to examine the relationship between PI (semantic and phonological) and SIN perception. A within-subjects experimental design was used. An opportunity sample of 24 young listeners with normal hearing was recruited. Measures of resistance to, and release from, semantic and phonological PI were calculated alongside the signal-to-noise ratio required to identify 50% of keywords correctly in a SIN recognition task. The data were analyzed using t-tests and correlations. Evidence of release from and resistance to semantic interference was observed. These measures correlated significantly with SIN recognition. Limited evidence of phonological PI was observed. The results show that capacity to resist semantic PI can be used to predict SIN recognition scores in young listeners with normal hearing. On the basis of these findings, future research will focus on investigating whether tests of PI can be used in the treatment and/or rehabilitation of hearing loss. American Academy of Audiology.

  5. A music perception disorder (congenital amusia) influences speech comprehension.

    Science.gov (United States)

    Liu, Fang; Jiang, Cunmei; Wang, Bei; Xu, Yi; Patel, Aniruddh D

    2015-01-01

    This study investigated the underlying link between speech and music by examining whether and to what extent congenital amusia, a musical disorder characterized by degraded pitch processing, would impact spoken sentence comprehension for speakers of Mandarin, a tone language. Sixteen Mandarin-speaking amusics and 16 matched controls were tested on the intelligibility of news-like Mandarin sentences with natural and flat fundamental frequency (F0) contours (created via speech resynthesis) under four signal-to-noise (SNR) conditions (no noise, +5, 0, and -5dB SNR). While speech intelligibility in quiet and extremely noisy conditions (SNR=-5dB) was not significantly compromised by flattened F0, both amusic and control groups achieved better performance with natural-F0 sentences than flat-F0 sentences under moderately noisy conditions (SNR=+5 and 0dB). Relative to normal listeners, amusics demonstrated reduced speech intelligibility in both quiet and noise, regardless of whether the F0 contours of the sentences were natural or flattened. This deficit in speech intelligibility was not associated with impaired pitch perception in amusia. These findings provide evidence for impaired speech comprehension in congenital amusia, suggesting that the deficit of amusics extends beyond pitch processing and includes segmental processing. Copyright © 2014 Elsevier Ltd. All rights reserved.

  6. How does susceptibility to proactive interference relate to speech recognition in aided and unaided conditions?

    Science.gov (United States)

    Ellis, Rachel J; Rönnberg, Jerker

    2015-01-01

    Proactive interference (PI) is the capacity to resist interference to the acquisition of new memories from information stored in the long-term memory. Previous research has shown that PI correlates significantly with the speech-in-noise recognition scores of younger adults with normal hearing. In this study, we report the results of an experiment designed to investigate the extent to which tests of visual PI relate to the speech-in-noise recognition scores of older adults with hearing loss, in aided and unaided conditions. The results suggest that measures of PI correlate significantly with speech-in-noise recognition only in the unaided condition. Furthermore the relation between PI and speech-in-noise recognition differs to that observed in younger listeners without hearing loss. The findings suggest that the relation between PI tests and the speech-in-noise recognition scores of older adults with hearing loss relates to capability of the test to index cognitive flexibility.

  7. How does susceptibility to proactive interference relate to speech recognition in aided and unaided conditions?

    Directory of Open Access Journals (Sweden)

    Rachel Jane Ellis

    2015-08-01

    Full Text Available Proactive interference (PI is the capacity to resist interference to the acquisition of new memories from information stored in the long-term memory. Previous research has shown that PI correlates significantly with the speech-in-noise recognition scores of younger adults with normal hearing. In this study, we report the results of an experiment designed to investigate the extent to which tests of visual PI relate to the speech-in-noise recognition scores of older adults with hearing loss, in aided and unaided conditions. The results suggest that measures of PI correlate significantly with speech-in-noise recognition only in the unaided condition. Furthermore the relation between PI and speech-in-noise recognition differs to that observed in younger listeners without hearing loss. The findings suggest that the relation between PI tests and the speech-in-noise recognition scores of older adults with hearing loss relates to capability of the test to index cognitive flexibility.

  8. The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system.

    Science.gov (United States)

    Zekveld, Adriana A; Kramer, Sophia E; Kessens, Judith M; Vlaming, Marcel S M G; Houtgast, Tammo

    2009-04-01

    The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that speech comprehension in noise by young listeners with normal hearing improves when presenting partly incorrect, automatically generated subtitles. The current study focused on the effects of age, hearing loss, visual working memory capacity, and linguistic skills on the benefit obtained from automatically generated subtitles during listening to speech in noise. In order to investigate the effects of age and hearing loss, three groups of participants were included: 22 young persons with normal hearing (YNH, mean age = 21 years), 22 middle-aged adults with normal hearing (MA-NH, mean age = 55 years) and 30 middle-aged adults with hearing impairment (MA-HI, mean age = 57 years). The benefit from automatic subtitling was measured by Speech Reception Threshold (SRT) tests (Plomp & Mimpen, 1979). Both unimodal auditory and bimodal audiovisual SRT tests were performed. In the audiovisual tests, the subtitles were presented simultaneously with the speech, whereas in the auditory test, only speech was presented. The difference between the auditory and audiovisual SRT was defined as the audiovisual benefit. Participants additionally rated the listening effort. We examined the influences of ASR accuracy level and text delay on the audiovisual benefit and the listening effort using a repeated measures General Linear Model analysis. In a correlation analysis, we evaluated the relationships between age, auditory SRT, visual working memory capacity and the audiovisual benefit and listening effort. The automatically generated subtitles improved speech comprehension in noise for all ASR accuracies and delays covered by the current study. Higher ASR accuracy levels resulted in more benefit obtained

  9. The effects of asymmetric directional microphone fittings on acceptance of background noise.

    Science.gov (United States)

    Kim, Jong S; Bryan, Melinda Freyaldenhoven

    2011-05-01

    The effects of asymmetric directional microphone fittings (i.e., an omnidirectional microphone on one ear and a directional microphone on the other) on speech understanding in noise and acceptance of background noise were investigated in 15 full-time hearing aid users. Subjects were fitted binaurally with four directional microphone conditions (i.e., binaural omnidirectional, right asymmetric directional, left asymmetric directional and binaural directional microphones) using Siemens Intuis Directional behind-the-ear hearing aids. Speech understanding in noise was assessed using the Hearing in Noise Test, and acceptance of background noise was assessed using the Acceptable Noise Level procedure. Speech was presented from 0° while noise was presented from 180° azimuth. The results revealed that speech understanding in noise improved when using asymmetric directional microphones compared to binaural omnidirectional microphone fittings and was not significantly hindered compared to binaural directional microphone fittings. The results also revealed that listeners accepted more background noise when fitted with asymmetric directional microphones as compared to binaural omnidirectional microphones. Lastly, the results revealed that the acceptance of noise was further increased for the binaural directional microphones when compared to the asymmetric directional microphones, maximizing listeners' willingness to accept background noise in the presence of noise. Clinical implications will be discussed.

  10. The area-of-interest problem in eyetracking research: A noise-robust solution for face and sparse stimuli.

    Science.gov (United States)

    Hessels, Roy S; Kemner, Chantal; van den Boomen, Carlijn; Hooge, Ignace T C

    2016-12-01

    A problem in eyetracking research is choosing areas of interest (AOIs): Researchers in the same field often use widely varying AOIs for similar stimuli, making cross-study comparisons difficult or even impossible. Subjective choices while choosing AOIs cause differences in AOI shape, size, and location. On the other hand, not many guidelines for constructing AOIs, or comparisons between AOI-production methods, are available. In the present study, we addressed this gap by comparing AOI-production methods in face stimuli, using data collected with infants and adults (with autism spectrum disorder [ASD] and matched controls). Specifically, we report that the attention-attracting and attention-maintaining capacities of AOIs differ between AOI-production methods, and that this matters for statistical comparisons in one of three groups investigated (the ASD group). In addition, we investigated the relation between AOI size and an AOI's attention-attracting and attention-maintaining capacities, as well as the consequences for statistical analyses, and report that adopting large AOIs solves the problem of statistical differences between the AOI methods. Finally, we tested AOI-production methods for their robustness to noise, and report that large AOIs-using the Voronoi tessellation method or the limited-radius Voronoi tessellation method with large radii-are most robust to noise. We conclude that large AOIs are a noise-robust solution in face stimuli and, when implemented using the Voronoi method, are the most objective of the researcher-defined AOIs. Adopting Voronoi AOIs in face-scanning research should allow better between-group and cross-study comparisons.

  11. Relations Between the Intelligibility of Speech in Noise and Psychophysical Measures of Hearing Measured in Four Languages Using the Auditory Profile Test Battery

    NARCIS (Netherlands)

    van Esch, T. E. M.; Dreschler, W. A.

    2015-01-01

    The aim of the present study was to determine the relations between the intelligibility of speech in noise and measures of auditory resolution, loudness recruitment, and cognitive function. The analyses were based on data published earlier as part of the presentation of the Auditory Profile, a test

  12. Speech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering

    Directory of Open Access Journals (Sweden)

    M. H. Savoji

    2014-09-01

    Full Text Available Gaussian Mixture Models (GMMs of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equations whose solutions lead to the first estimates of speech and noise power spectra. The noise source is also identified and the input SNR estimated in this first step. These first estimates are then refined using approximate but explicit MMSE and MAP estimation formulations. The refined estimates are then used in a Wiener filter to reduce noise and enhance the noisy speech. The proposed schemes show good results. Nevertheless, it is shown that the MAP explicit solution, introduced here for the first time, reduces the computation time to less than one third with a slight higher improvement in SNR and PESQ score and also less distortion in comparison to the MMSE solution.

  13. Role of working memory and lexical knowledge in perceptual restoration of interrupted speech.

    Science.gov (United States)

    Nagaraj, Naveen K; Magimairaj, Beula M

    2017-12-01

    The role of working memory (WM) capacity and lexical knowledge in perceptual restoration (PR) of missing speech was investigated using the interrupted speech perception paradigm. Speech identification ability, which indexed PR, was measured using low-context sentences periodically interrupted at 1.5 Hz. PR was measured for silent gated, low-frequency speech noise filled, and low-frequency fine-structure and envelope filled interrupted conditions. WM capacity was measured using verbal and visuospatial span tasks. Lexical knowledge was assessed using both receptive vocabulary and meaning from context tests. Results showed that PR was better for speech noise filled condition than other conditions tested. Both receptive vocabulary and verbal WM capacity explained unique variance in PR for the speech noise filled condition, but were unrelated to performance in the silent gated condition. It was only receptive vocabulary that uniquely predicted PR for fine-structure and envelope filled conditions. These findings suggest that the contribution of lexical knowledge and verbal WM during PR depends crucially on the information content that replaced the silent intervals. When perceptual continuity was partially restored by filler speech noise, both lexical knowledge and verbal WM capacity facilitated PR. Importantly, for fine-structure and envelope filled interrupted conditions, lexical knowledge was crucial for PR.

  14. Prosody perception in simulated cochlear implant listening in modulated and stationary noise

    DEFF Research Database (Denmark)

    Morris, David Jackson

    2012-01-01

    Cochlear Implant (CI) listeners can do well when attending to speech in quiet, yet challenging listening situations are more problematic. Previous studies have shown that fluctuations in the noise do not yield better speech recognition scores for CI listeners as they can do for normal hearing (NH...... derived from non-scripted Danish speech. The F0 temporal midpoint of the initial syllable was varied stepwise in semitones. Competing signals of modulated white noise and speech shaped noise at 0 dB and 12 dB SNR, were added to the tokens prior to 8-channel noiseexcited vocoder processing. Stimuli were...

  15. Noise Reduction in the Time Domain using Joint Diagonalization

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie; Benesty, Jacob; Jensen, Jesper Rindom

    2014-01-01

    , an estimate of the desired signal is found by subtraction of the noise estimate from the observed signal. The filter can be designed to obtain a desired trade-off between noise reduction and signal distortion, depending on the number of eigenvectors included in the filter design. This is explored through...... simulations using a speech signal corrupted by car noise, and the results confirm that the output signal-to-noise ratio and speech distortion index both increase when more eigenvectors are included in the filter design....

  16. Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

    Science.gov (United States)

    Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

    2015-05-01

    The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.

  17. Understanding speech when wearing communication headsets and hearing protectors with subband processing.

    Science.gov (United States)

    Brammer, Anthony J; Yu, Gongqiang; Bernstein, Eric R; Cherniack, Martin G; Peterson, Donald R; Tufts, Jennifer B

    2014-08-01

    An adaptive, delayless, subband feed-forward control structure is employed to improve the speech signal-to-noise ratio (SNR) in the communication channel of a circumaural headset/hearing protector (HPD) from 90 Hz to 11.3 kHz, and to provide active noise control (ANC) from 50 to 800 Hz to complement the passive attenuation of the HPD. The task involves optimizing the speech SNR for each communication channel subband, subject to limiting the maximum sound level at the ear, maintaining a speech SNR preferred by users, and reducing large inter-band gain differences to improve speech quality. The performance of a proof-of-concept device has been evaluated in a pseudo-diffuse sound field when worn by human subjects under conditions of environmental noise and speech that do not pose a risk to hearing, and by simulation for other conditions. For the environmental noises employed in this study, subband speech SNR control combined with subband ANC produced greater improvement in word scores than subband ANC alone, and improved the consistency of word scores across subjects. The simulation employed a subject-specific linear model, and predicted that word scores are maintained in excess of 90% for sound levels outside the HPD of up to ∼115 dBA.

  18. Robust random telegraph conductivity noise in single crystals of the ferromagnetic insulating manganite La0.86Ca0.14MnO3

    Science.gov (United States)

    Przybytek, J.; Fink-Finowicki, J.; Puźniak, R.; Shames, A.; Markovich, V.; Mogilyansky, D.; Jung, G.

    2017-03-01

    Robust random telegraph conductivity fluctuations have been observed in La0.86Ca0.14MnO3 manganite single crystals. At room temperatures, the spectra of conductivity fluctuations are featureless and follow a 1 /f shape in the entire experimental frequency and bias range. Upon lowering the temperature, clear Lorentzian bias-dependent excess noise appears on the 1 /f background and eventually dominates the spectral behavior. In the time domain, fully developed Lorentzian noise appears as pronounced two-level random telegraph noise with a thermally activated switching rate, which does not depend on bias current and applied magnetic field. The telegraph noise is very robust and persists in the exceptionally wide temperature range of more than 50 K. The amplitude of the telegraph noise decreases exponentially with increasing bias current in exactly the same manner as the sample resistance increases with the current, pointing out the dynamic current redistribution between percolation paths dominated by phase-separated clusters with different conductivity as a possible origin of two-level conductivity fluctuations.

  19. Metaheuristic applications to speech enhancement

    CERN Document Server

    Kunche, Prajna

    2016-01-01

    This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.

  20. Exploring the Relationship Between Working Memory, Compressor Speed, and Background Noise Characteristics

    DEFF Research Database (Denmark)

    Ohlenforst, Barbara; Souza, Pamela E.; MacDonald, Ewen

    2016-01-01

    grouped by high or low working memory according to their performance on a reading span test. Speech intelligibility was measured for low-context sentences presented in background noise, where the noise varied in the extent of amplitude modulation. Simulated fast- or slowacting compression amplification...... on the number of talkers in the background noise. The presented signal to noise ratios were not a significant factor on the measured intelligibility performance. Conclusion: In agreement with earlier research, high working memory allowed better speech intelligibility when fast compression was applied......Objectives: Previous work has shown that individuals with lower working memory demonstrate reduced intelligibility for speech processed with fast-acting compression amplification. This relationship has been noted in fluctuating noise, but the extent of noise modulation that must be present...

  1. Development of a Danish speech intelligibility test

    DEFF Research Database (Denmark)

    Nielsen, Jens Bo; Dau, Torsten

    2009-01-01

    Abstract A Danish speech intelligibility test for assessing the speech recognition threshold in noise (SRTN) has been developed. The test consists of 180 sentences distributed in 18 phonetically balanced lists. The sentences are based on an open word-set and represent everyday language. The sente....... The test was verified with 14 normal-hearing listeners; the overall SRTN lies at a signal-to-noise ratio of -3.15 dB with a standard deviation of 1.0 dB. The list-SRTNs deviate less than 0.5 dB from the overall mean....

  2. Intelligibility of synthetic speech in the presence of interfering speech

    NARCIS (Netherlands)

    Eggen, J.H.

    1989-01-01

    Standard articulation tests are not always sensitive enough to discriminate between speech samples which are of high intelligibility. One can increase the sensitivity of such tests by presenting the test materials in noise. In this way, small differences in intelligibility can be magnified into

  3. The effect of instantaneous input dynamic range setting on the speech perception of children with the nucleus 24 implant.

    Science.gov (United States)

    Davidson, Lisa S; Skinner, Margaret W; Holstad, Beth A; Fears, Beverly T; Richter, Marie K; Matusofsky, Margaret; Brenner, Christine; Holden, Timothy; Birath, Amy; Kettel, Jerrica L; Scollie, Susan

    2009-06-01

    The purpose of this study was to examine the effects of a wider instantaneous input dynamic range (IIDR) setting on speech perception and comfort in quiet and noise for children wearing the Nucleus 24 implant system and the Freedom speech processor. In addition, children's ability to understand soft and conversational level speech in relation to aided sound-field thresholds was examined. Thirty children (age, 7 to 17 years) with the Nucleus 24 cochlear implant system and the Freedom speech processor with two different IIDR settings (30 versus 40 dB) were tested on the Consonant Nucleus Consonant (CNC) word test at 50 and 60 dB SPL, the Bamford-Kowal-Bench Speech in Noise Test, and a loudness rating task for four-talker speech noise. Aided thresholds for frequency-modulated tones, narrowband noise, and recorded Ling sounds were obtained with the two IIDRs and examined in relation to CNC scores at 50 dB SPL. Speech Intelligibility Indices were calculated using the long-term average speech spectrum of the CNC words at 50 dB SPL measured at each test site and aided thresholds. Group mean CNC scores at 50 dB SPL with the 40 IIDR were significantly higher (p Speech in Noise Test were not significantly different for the two IIDRs. Significantly improved aided thresholds at 250 to 6000 Hz as well as higher Speech Intelligibility Indices afforded improved audibility for speech presented at soft levels (50 dB SPL). These results indicate that an increased IIDR provides improved word recognition for soft levels of speech without compromising comfort of higher levels of speech sounds or sentence recognition in noise.

  4. Nonlatching positive feedback enables robust bimodality by decoupling expression noise from the mean

    Energy Technology Data Exchange (ETDEWEB)

    Razooky, Brandon S. [Rockefeller Univ., New York, NY (United States). Lab. of Virology and Infectious Disease; Gladstone Institutes (Virology and Immunology), San Francisco, CA (United States); Univ. of California, San Francisco, CA (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Center for Nanophase Materials Science (CNMS); Univ. of Tennessee, Knoxville, TN (United States). Bredesen Center for Interdisciplinary; Cao, Youfang [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Hansen, Maike M. K. [Gladstone Institutes (Virology and Immunology), San Francisco, CA (United States); Perelson, Alan S. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Simpson, Michael L. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Center for Nanophase Materials Science (CNMS); Univ. of Tennessee, Knoxville, TN (United States). Bredesen Center for Interdisciplinary; Weinberger, Leor S. [Gladstone Institutes (Virology and Immunology), San Francisco, CA (United States); Univ. of California, San Francisco, CA (United States). Dept. of Biochemistry and Biophysics; Univ. of California, San Francisco, CA (United States). QB3: California Inst. of Quantitative Biosciences; Univ. of California, San Francisco, CA (United States). Dept. of Pharmaceutical Chemistry

    2017-10-18

    Fundamental to biological decision-making is the ability to generate bimodal expression patterns where two alternate expression states simultaneously exist. Here in this study, we use a combination of single-cell analysis and mathematical modeling to examine the sources of bimodality in the transcriptional program controlling HIV’s fate decision between active replication and viral latency. We find that the HIV Tat protein manipulates the intrinsic toggling of HIV’s promoter, the LTR, to generate bimodal ON-OFF expression, and that transcriptional positive feedback from Tat shifts and expands the regime of LTR bimodality. This result holds for both minimal synthetic viral circuits and full-length virus. Strikingly, computational analysis indicates that the Tat circuit’s non-cooperative ‘non-latching’ feedback architecture is optimized to slow the promoter’s toggling and generate bimodality by stochastic extinction of Tat. In contrast to the standard Poisson model, theory and experiment show that non-latching positive feedback substantially dampens the inverse noise-mean relationship to maintain stochastic bimodality despite increasing mean-expression levels. Given the rapid evolution of HIV, the presence of a circuit optimized to robustly generate bimodal expression appears consistent with the hypothesis that HIV’s decision between active replication and latency provides a viral fitness advantage. More broadly, the results suggest that positive-feedback circuits may have evolved not only for signal amplification but also for robustly generating bimodality by decoupling expression fluctuations (noise) from mean expression levels.

  5. Distributed Speech Enhancement in Wireless Acoustic Sensor Networks

    NARCIS (Netherlands)

    Zeng, Y.

    2015-01-01

    In digital speech communication applications like hands-free mobile telephony, hearing aids and human-to-computer communication systems, the recorded speech signals are typically corrupted by background noise. As a result, their quality and intelligibility can get severely degraded. Traditional

  6. Intelligibility for Binaural Speech with Discarded Low-SNR Speech Components.

    Science.gov (United States)

    Schoenmaker, Esther; van de Par, Steven

    2016-01-01

    Speech intelligibility in multitalker settings improves when the target speaker is spatially separated from the interfering speakers. A factor that may contribute to this improvement is the improved detectability of target-speech components due to binaural interaction in analogy to the Binaural Masking Level Difference (BMLD). This would allow listeners to hear target speech components within specific time-frequency intervals that have a negative SNR, similar to the improvement in the detectability of a tone in noise when these contain disparate interaural difference cues. To investigate whether these negative-SNR target-speech components indeed contribute to speech intelligibility, a stimulus manipulation was performed where all target components were removed when local SNRs were smaller than a certain criterion value. It can be expected that for sufficiently high criterion values target speech components will be removed that do contribute to speech intelligibility. For spatially separated speakers, assuming that a BMLD-like detection advantage contributes to intelligibility, degradation in intelligibility is expected already at criterion values below 0 dB SNR. However, for collocated speakers it is expected that higher criterion values can be applied without impairing speech intelligibility. Results show that degradation of intelligibility for separated speakers is only seen for criterion values of 0 dB and above, indicating a negligible contribution of a BMLD-like detection advantage in multitalker settings. These results show that the spatial benefit is related to a spatial separation of speech components at positive local SNRs rather than to a BMLD-like detection improvement for speech components at negative local SNRs.

  7. Non-native Listeners’ Recognition of High-Variability Speech Using PRESTO

    Science.gov (United States)

    Tamati, Terrin N.; Pisoni, David B.

    2015-01-01

    Background Natural variability in speech is a significant challenge to robust successful spoken word recognition. In everyday listening environments, listeners must quickly adapt and adjust to multiple sources of variability in both the signal and listening environments. High-variability speech may be particularly difficult to understand for non-native listeners, who have less experience with the second language (L2) phonological system and less detailed knowledge of sociolinguistic variation of the L2. Purpose The purpose of this study was to investigate the effects of high-variability sentences on non-native speech recognition and to explore the underlying sources of individual differences in speech recognition abilities of non-native listeners. Research Design Participants completed two sentence recognition tasks involving high-variability and low-variability sentences. They also completed a battery of behavioral tasks and self-report questionnaires designed to assess their indexical processing skills, vocabulary knowledge, and several core neurocognitive abilities. Study Sample Native speakers of Mandarin (n = 25) living in the United States recruited from the Indiana University community participated in the current study. A native comparison group consisted of scores obtained from native speakers of English (n = 21) in the Indiana University community taken from an earlier study. Data Collection and Analysis Speech recognition in high-variability listening conditions was assessed with a sentence recognition task using sentences from PRESTO (Perceptually Robust English Sentence Test Open-Set) mixed in 6-talker multitalker babble. Speech recognition in low-variability listening conditions was assessed using sentences from HINT (Hearing In Noise Test) mixed in 6-talker multitalker babble. Indexical processing skills were measured using a talker discrimination task, a gender discrimination task, and a forced-choice regional dialect categorization task. Vocabulary

  8. Advantages of binaural amplification to acceptable noise level of directional hearing aid users.

    Science.gov (United States)

    Kim, Ja-Hee; Lee, Jae Hee; Lee, Ho-Ki

    2014-06-01

    The goal of the present study was to examine whether Acceptable Noise Levels (ANLs) would be lower (greater acceptance of noise) in binaural listening than in monaural listening condition and also whether meaningfulness of background speech noise would affect ANLs for directional microphone hearing aid users. In addition, any relationships between the individual binaural benefits on ANLs and the individuals' demographic information were investigated. Fourteen hearing aid users (mean age, 64 years) participated for experimental testing. For the ANL calculation, listeners' most comfortable listening levels and background noise level were measured. Using Korean ANL material, ANLs of all participants were evaluated under monaural and binaural amplification with a counterbalanced order. The ANLs were also compared across five types of competing speech noises, consisting of 1- through 8-talker background speech maskers. Seven young normal-hearing listeners (mean age, 27 years) participated for the same measurements as a pilot testing. The results demonstrated that directional hearing aid users accepted more noise (lower ANLs) with binaural amplification than with monaural amplification, regardless of the type of competing speech. When the background speech noise became more meaningful, hearing-impaired listeners accepted less amount of noise (higher ANLs), revealing that ANL is dependent on the intelligibility of the competing speech. The individuals' binaural advantages in ANLs were significantly greater for the listeners with longer experience of hearing aids, yet not related to their age or hearing thresholds. Binaural directional microphone processing allowed hearing aid users to accept a greater amount of background noise, which may in turn improve listeners' hearing aid success. Informational masking substantially influenced background noise acceptance. Given a significant association between ANLs and duration of hearing aid usage, ANL measurement can be useful for

  9. Talker and background noise specificity in spoken word recognition memory

    Directory of Open Access Journals (Sweden)

    Angela Cooper

    2017-11-01

    Full Text Available Prior research has demonstrated that listeners are sensitive to changes in the indexical (talker-specific characteristics of speech input, suggesting that these signal-intrinsic features are integrally encoded in memory for spoken words. Given that listeners frequently must contend with concurrent environmental noise, to what extent do they also encode signal-extrinsic details? Native English listeners’ explicit memory for spoken English monosyllabic and disyllabic words was assessed as a function of consistency versus variation in the talker’s voice (talker condition and background noise (noise condition using a delayed recognition memory paradigm. The speech and noise signals were spectrally-separated, such that changes in a simultaneously presented non-speech signal (background noise from exposure to test would not be accompanied by concomitant changes in the target speech signal. The results revealed that listeners can encode both signal-intrinsic talker and signal-extrinsic noise information into integrated cognitive representations, critically even when the two auditory streams are spectrally non-overlapping. However, the extent to which extra-linguistic episodic information is encoded alongside linguistic information appears to be modulated by syllabic characteristics, with specificity effects found only for monosyllabic items. These findings suggest that encoding and retrieval of episodic information during spoken word processing may be modulated by lexical characteristics.

  10. Speech Intelligibility Evaluation for Mobile Phones

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Cubick, Jens; Dau, Torsten

    2015-01-01

    In the development process of modern telecommunication systems, such as mobile phones, it is common practice to use computer models to objectively evaluate the transmission quality of the system, instead of time-consuming perceptual listening tests. Such models have typically focused on the quality...... of the transmitted speech, while little or no attention has been provided to speech intelligibility. The present study investigated to what extent three state-of-the art speech intelligibility models could predict the intelligibility of noisy speech transmitted through mobile phones. Sentences from the Danish...... Dantale II speech material were mixed with three different kinds of background noise, transmitted through three different mobile phones, and recorded at the receiver via a local network simulator. The speech intelligibility of the transmitted sentences was assessed by six normal-hearing listeners...

  11. Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

    Directory of Open Access Journals (Sweden)

    M. Bashirpour

    2016-09-01

    Full Text Available Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC in a speech emotion recognition system. We investigate its performance in emotion recognition using clean and noisy speech materials and compare it with the performances of the well-known MFCC, LPCC, RASTA-PLP, and also TEMFCC features. Speech samples are extracted from the Berlin emotional speech database (Emo DB and Persian emotional speech database (Persian ESD which are corrupted with 4 different noise types under various SNR levels. The experiments are conducted in clean train/noisy test scenarios to simulate practical conditions with noise sources. Simulation results show that higher recognition rates are achieved for PNCC as compared with the conventional features under noisy conditions.

  12. Speech enhancement via Mel-scale Wiener filtering with a frequency-wise voice activity detector

    International Nuclear Information System (INIS)

    Kim, Han Jun; Kim, Hwa Soo; Cho, Young Man

    2007-01-01

    This paper presents a speech enhancement system that enables a comfortable communication inside an automobile. A couple of novel concepts are proposed in an effort to improve two major building blocks in the existing speech enhancement systems: a voice activity detector (VAD) and a noise filtering algorithm. The proposed VAD classifies a given data frame as speech or noise at each frequency, enabling the frequency-wise updates of noise statistics and thereby improving the effectiveness of the noise filtering algorithms by providing more up-to-date noise statistics. The celebrated Wiener filter is adopted in this paper as the accompanying noise filtering algorithm, which results in significant noise suppression. Yet, the musical noise present in most Wiener filter-based systems prompts the idea of applying the Wiener filter in the Mel-scale in which the human auditory system responds to the external stimulation. It turns out that the Mel-scale Wiener filter creates some masking effects and thereby reduces musical noise significantly, leading to smooth transition between data frames

  13. Speech discrimination difficulties in High-Functioning Autism Spectrum Disorder are likely independent of auditory hypersensitivity.

    Directory of Open Access Journals (Sweden)

    William Andrew Dunlop

    2016-08-01

    Full Text Available Autism Spectrum Disorder (ASD, characterised by impaired communication skills and repetitive behaviours, can also result in differences in sensory perception. Individuals with ASD often perform normally in simple auditory tasks but poorly compared to typically developed (TD individuals on complex auditory tasks like discriminating speech from complex background noise. A common trait of individuals with ASD is hypersensitivity to auditory stimulation. No studies to our knowledge consider whether hypersensitivity to sounds is related to differences in speech-in-noise discrimination. We provide novel evidence that individuals with high-functioning ASD show poor performance compared to TD individuals in a speech-in-noise discrimination task with an attentionally demanding background noise, but not in a purely energetic noise. Further, we demonstrate in our small sample that speech-hypersensitivity does not appear to predict performance in the speech-in-noise task. The findings support the argument that an attentional deficit, rather than a perceptual deficit, affects the ability of individuals with ASD to discriminate speech from background noise. Finally, we piloted a novel questionnaire that measures difficulty hearing in noisy environments, and sensitivity to non-verbal and verbal sounds. Psychometric analysis using 128 TD participants provided novel evidence for a difference in sensitivity to non-verbal and verbal sounds, and these findings were reinforced by participants with ASD who also completed the questionnaire. The study was limited by a small and high-functioning sample of participants with ASD. Future work could test larger sample sizes and include lower-functioning ASD participants.

  14. ''1/f noise'' in music: Music from 1/f noise

    Energy Technology Data Exchange (ETDEWEB)

    Voss, R.F.; Clarke, J.

    1978-01-01

    The spectral density of fluctuations in the audio power of many musical selections and of English speech varies approximately as 1/f (f is the frequency) down to a frequency of 5 x 10/sup -4/ Hz. This result implies that the audio-power fluctuations are correlated over all times in the same manner as ''1/f noise'' in electronic components. The frequency fluctuations of music also have a 1/f spectral density at frequencies down to the inverse of the length of the piece of music. The frequency fluctuations of English speech have a quite different behavior, with a single characteristic time of about 0.1 s, the average length of a syllable. The observations on music suggest that 1/f noise is a good choice for stochastic composition. Compositions in which the frequency and duration of each note were determined by 1/f noise sources sounded pleasing. Those generated by white-noise sources sounded too random, while those generated by 1/f/sup 2/ noise sounded too correlated.

  15. Error analysis to improve the speech recognition accuracy on ...

    Indian Academy of Sciences (India)

    dictionary plays a key role in the speech recognition accuracy. .... Sophisticated microphone is used for the recording speech corpus in a noise free environment. .... values, word error rate (WER) and error-rate will be calculated as follows:.

  16. Contribution of envelope periodicity to release from speech-on-speech masking

    DEFF Research Database (Denmark)

    Christiansen, Claus; MacDonald, Ewen; Dau, Torsten

    2013-01-01

    Masking release (MR) is the improvement in speech intelligibility for a fluctuating interferer compared to stationary noise. Reduction in MR due to vocoder processing is usually linked to distortions in the temporal fine structure of the stimuli and a corresponding reduction in the fundamental fr...

  17. Robustness of holonomic quantum gates

    International Nuclear Information System (INIS)

    Solinas, P.; Zanardi, P.; Zanghi, N.

    2005-01-01

    Full text: If the driving field fluctuates during the quantum evolution this produces errors in the applied operator. The holonomic (and geometrical) quantum gates are believed to be robust against some kind of noise. Because of the geometrical dependence of the holonomic operators can be robust against this kind of noise; in fact if the fluctuations are fast enough they cancel out leaving the final operator unchanged. I present the numerical studies of holonomic quantum gates subject to this parametric noise, the fidelity of the noise and ideal evolution is calculated for different noise correlation times. The holonomic quantum gates seem robust not only for fast fluctuating fields but also for slow fluctuating fields. These results can be explained as due to the geometrical feature of the holonomic operator: for fast fluctuating fields the fluctuations are canceled out, for slow fluctuating fields the fluctuations do not perturb the loop in the parameter space. (author)

  18. A variational EM method for pole-zero modeling of speech with mixed block sparse and Gaussian excitation

    DEFF Research Database (Denmark)

    Shi, Liming; Nielsen, Jesper Kjær; Jensen, Jesper Rindom

    2017-01-01

    The modeling of speech can be used for speech synthesis and speech recognition. We present a speech analysis method based on pole-zero modeling of speech with mixed block sparse and Gaussian excitation. By using a pole-zero model, instead of the all-pole model, a better spectral fitting can...... be expected. Moreover, motivated by the block sparse glottal flow excitation during voiced speech and the white noise excitation for unvoiced speech, we model the excitation sequence as a combination of block sparse signals and white noise. A variational EM (VEM) method is proposed for estimating...... in reconstructing of the block sparse excitation....

  19. Significance of Joint Features Derived from the Modified Group Delay Function in Speech Processing

    Directory of Open Access Journals (Sweden)

    Murthy Hema A

    2007-01-01

    Full Text Available This paper investigates the significance of combining cepstral features derived from the modified group delay function and from the short-time spectral magnitude like the MFCC. The conventional group delay function fails to capture the resonant structure and the dynamic range of the speech spectrum primarily due to pitch periodicity effects. The group delay function is modified to suppress these spikes and to restore the dynamic range of the speech spectrum. Cepstral features are derived from the modified group delay function, which are called the modified group delay feature (MODGDF. The complementarity and robustness of the MODGDF when compared to the MFCC are also analyzed using spectral reconstruction techniques. Combination of several spectral magnitude-based features and the MODGDF using feature fusion and likelihood combination is described. These features are then used for three speech processing tasks, namely, syllable, speaker, and language recognition. Results indicate that combining MODGDF with MFCC at the feature level gives significant improvements for speech recognition tasks in noise. Combining the MODGDF and the spectral magnitude-based features gives a significant increase in recognition performance of 11% at best, while combining any two features derived from the spectral magnitude does not give any significant improvement.

  20. On speech recognition during anaesthesia

    DEFF Research Database (Denmark)

    Alapetite, Alexandre

    2007-01-01

    This PhD thesis in human-computer interfaces (informatics) studies the case of the anaesthesia record used during medical operations and the possibility to supplement it with speech recognition facilities. Problems and limitations have been identified with the traditional paper-based anaesthesia...... and inaccuracies in the anaesthesia record. Supplementing the electronic anaesthesia record interface with speech input facilities is proposed as one possible solution to a part of the problem. The testing of the various hypotheses has involved the development of a prototype of an electronic anaesthesia record...... interface with speech input facilities in Danish. The evaluation of the new interface was carried out in a full-scale anaesthesia simulator. This has been complemented by laboratory experiments on several aspects of speech recognition for this type of use, e.g. the effects of noise on speech recognition...

  1. Left Superior Temporal Gyrus Is Coupled to Attended Speech in a Cocktail-Party Auditory Scene.

    Science.gov (United States)

    Vander Ghinst, Marc; Bourguignon, Mathieu; Op de Beeck, Marc; Wens, Vincent; Marty, Brice; Hassid, Sergio; Choufani, Georges; Jousmäki, Veikko; Hari, Riitta; Van Bogaert, Patrick; Goldman, Serge; De Tiège, Xavier

    2016-02-03

    Using a continuous listening task, we evaluated the coupling between the listener's cortical activity and the temporal envelopes of different sounds in a multitalker auditory scene using magnetoencephalography and corticovocal coherence analysis. Neuromagnetic signals were recorded from 20 right-handed healthy adult humans who listened to five different recorded stories (attended speech streams), one without any multitalker background (No noise) and four mixed with a "cocktail party" multitalker background noise at four signal-to-noise ratios (5, 0, -5, and -10 dB) to produce speech-in-noise mixtures, here referred to as Global scene. Coherence analysis revealed that the modulations of the attended speech stream, presented without multitalker background, were coupled at ∼0.5 Hz to the activity of both superior temporal gyri, whereas the modulations at 4-8 Hz were coupled to the activity of the right supratemporal auditory cortex. In cocktail party conditions, with the multitalker background noise, the coupling was at both frequencies stronger for the attended speech stream than for the unattended Multitalker background. The coupling strengths decreased as the Multitalker background increased. During the cocktail party conditions, the ∼0.5 Hz coupling became left-hemisphere dominant, compared with bilateral coupling without the multitalker background, whereas the 4-8 Hz coupling remained right-hemisphere lateralized in both conditions. The brain activity was not coupled to the multitalker background or to its individual talkers. The results highlight the key role of listener's left superior temporal gyri in extracting the slow ∼0.5 Hz modulations, likely reflecting the attended speech stream within a multitalker auditory scene. When people listen to one person in a "cocktail party," their auditory cortex mainly follows the attended speech stream rather than the entire auditory scene. However, how the brain extracts the attended speech stream from the whole

  2. Relating hearing loss and executive functions to hearing aid users’ preference for, and speech recognition with, different combinations of binaural noise reduction and microphone directionality

    Directory of Open Access Journals (Sweden)

    Tobias eNeher

    2014-12-01

    Full Text Available Knowledge of how executive functions relate to preferred hearing aid (HA processing is sparse and seemingly inconsistent with related knowledge for speech recognition outcomes. This study thus aimed to find out if (1 performance on a measure of reading span (RS is related to preferred binaural noise reduction (NR strength, (2 similar relations exist for two different, nonverbal measures of executive function, (3 pure-tone average hearing loss (PTA, signal-to-noise ratio (SNR, and microphone directionality (DIR also influence preferred NR strength, and (4 preference and speech recognition outcomes are similar. Sixty elderly HA users took part. Six HA conditions consisting of omnidirectional or cardioid microphones followed by inactive, moderate, or strong binaural NR as well as linear amplification were tested. Outcome was assessed at fixed SNRs using headphone simulations of a frontal target talker in a busy cafeteria. Analyses showed positive effects of active NR and DIR on preference, and negative and positive effects of, respectively, strong NR and DIR on speech recognition. Also, while moderate NR was the most preferred NR setting overall, preference for strong NR increased with SNR. No relation between RS and preference was found. However, larger PTA was related to weaker preference for inactive NR and stronger preference for strong NR for both microphone modes. Equivalent (but weaker relations between worse performance on one nonverbal measure of executive function and the HA conditions without DIR were found. For speech recognition, there were relations between HA condition, PTA, and RS, but their pattern differed from that for preference. Altogether, these results indicate that, while moderate NR works well in general, a notable proportion of HA users prefer stronger NR. Furthermore, PTA and executive functions can account for some of the variability in preference for, and speech recognition with, different binaural NR and DIR settings.

  3. Speech Transduction Based on Linguistic Content

    DEFF Research Database (Denmark)

    Juel Henrichsen, Peter; Christiansen, Thomas Ulrich

    Digital hearing aids use a variety of advanced digital signal processing methods in order to improve speech intelligibility. These methods are based on knowledge about the acoustics outside the ear as well as psychoacoustics. This paper investigates the recent observation that speech elements...... with a high degree of information can be robustly identified based on basic acoustic properties, i.e., function words have greater spectral tilt than content words for each of the 18 Danish talkers investigated. In this paper we examine these spectral tilt differences as a function of time based on a speech...... material six times the duration of previous investigations. Our results show that the correlation of spectral tilt with information content is relatively constant across time, even if averaged across talkers. This indicates that it is possible to devise a robust method for estimating information density...

  4. Auditory Verbal Working Memory as a Predictor of Speech Perception in Modulated Maskers in Listeners With Normal Hearing.

    Science.gov (United States)

    Millman, Rebecca E; Mattys, Sven L

    2017-05-24

    Background noise can interfere with our ability to understand speech. Working memory capacity (WMC) has been shown to contribute to the perception of speech in modulated noise maskers. WMC has been assessed with a variety of auditory and visual tests, often pertaining to different components of working memory. This study assessed the relationship between speech perception in modulated maskers and components of auditory verbal working memory (AVWM) over a range of signal-to-noise ratios. Speech perception in noise and AVWM were measured in 30 listeners (age range 31-67 years) with normal hearing. AVWM was estimated using forward digit recall, backward digit recall, and nonword repetition. After controlling for the effects of age and average pure-tone hearing threshold, speech perception in modulated maskers was related to individual differences in the phonological component of working memory (as assessed by nonword repetition) but only in the least favorable signal-to-noise ratio. The executive component of working memory (as assessed by backward digit) was not predictive of speech perception in any conditions. AVWM is predictive of the ability to benefit from temporal dips in modulated maskers: Listeners with greater phonological WMC are better able to correctly identify sentences in modulated noise backgrounds.

  5. [Evaluation of a transient noise reduction strategy on the loudness perception and sound quality].

    Science.gov (United States)

    Liu, Haihong; Zhang, Hua; Chen, Xueqing; Wu, Yanjun; Kong, Ying; Wang, Shuo; Li, Jing

    2010-10-01

    A current technology for detecting and controlling transient noise in hearing aids (AntiShock) was evaluated. The objective was to evaluate AntiShock on loudness control and whether results in negative changes in sound quality of speech, transient noise and environmental noise and provide implications for hearing aid fitting. Twenty-four subjects with sensorineural hearing loss participated in the study. In a single-blinded paradigm, the subjects were asked to rate loudness of transient noise and distortion of speech, transient noise and environmental noise with the AntiShock in both on and off conditions. (1) The percentage of the transient noise rated as soft, comfortable, loud, too loud was 3.0%, 72.7%, 22.9% and 1.4%, respectively. There were significant differences in mean scores of loudness perception among listening conditions and between genders by a Two-Way ANOVA, the P values were 0.009 and 0.001, respectively. (2) The percentage of the speech rated as mild distorted, understandable, clear and very clear was 2.5%, 30.6%, 32.9% and 34.0%, respectively. There were significant differences in mean scores of speech distortion under different listening conditions by an One-Way ANOVA (P 0.05). (4) The percentage of the environmental noise rated as mild distorted, clear but soft, clear and natural was 0.4%, 0.8% and 98.8%, respectively. No significant differences in mean scores of nature of environmental noise was found between different listening conditions by an Independent-Samples T Test (P > 0.05). AntiShock showed positive effects on the loudness control of the transient noise. Quality of speech, transient noise and environmental noise were not impacted by AntiShock.

  6. The Relationship between Binaural Benefit and Difference in Unilateral Speech Recognition Performance for Bilateral Cochlear Implant Users

    Science.gov (United States)

    Yoon, Yang-soo; Li, Yongxin; Kang, Hou-Yong; Fu, Qian-Jie

    2011-01-01

    Objective The full benefit of bilateral cochlear implants may depend on the unilateral performance with each device, the speech materials, processing ability of the user, and/or the listening environment. In this study, bilateral and unilateral speech performances were evaluated in terms of recognition of phonemes and sentences presented in quiet or in noise. Design Speech recognition was measured for unilateral left, unilateral right, and bilateral listening conditions; speech and noise were presented at 0° azimuth. The “binaural benefit” was defined as the difference between bilateral performance and unilateral performance with the better ear. Study Sample 9 adults with bilateral cochlear implants participated. Results On average, results showed a greater binaural benefit in noise than in quiet for all speech tests. More importantly, the binaural benefit was greater when unilateral performance was similar across ears. As the difference in unilateral performance between ears increased, the binaural advantage decreased; this functional relationship was observed across the different speech materials and noise levels even though there was substantial intra- and inter-subject variability. Conclusions The results indicate that subjects who show symmetry in speech recognition performance between implanted ears in general show a large binaural benefit. PMID:21696329

  7. Microphone directionality, pre-emphasis filter, and wind noise in cochlear implants.

    Science.gov (United States)

    Chung, King; McKibben, Nicholas

    2011-10-01

    Wind noise can be a nuisance or a debilitating masker for cochlear implant users in outdoor environments. Previous studies indicated that wind noise at the microphone/hearing aid output had high levels of low-frequency energy and the amount of noise generated is related to the microphone directionality. Currently, cochlear implants only offer either directional microphones or omnidirectional microphones for users at-large. As all cochlear implants utilize pre-emphasis filters to reduce low-frequency energy before the signal is encoded, effective wind noise reduction algorithms for hearing aids might not be applicable for cochlear implants. The purposes of this study were to investigate the effect of microphone directionality on speech recognition and perceived sound quality of cochlear implant users in wind noise and to derive effective wind noise reduction strategies for cochlear implants. A repeated-measure design was used to examine the effects of spectral and temporal masking created by wind noise recorded through directional and omnidirectional microphones and the effects of pre-emphasis filters on cochlear implant performance. A digital hearing aid was programmed to have linear amplification and relatively flat in-situ frequency responses for the directional and omnidirectional modes. The hearing aid output was then recorded from 0 to 360° at flow velocities of 4.5 and 13.5 m/sec in a quiet wind tunnel. Sixteen postlingually deafened adult cochlear implant listeners who reported to be able to communicate on the phone with friends and family without text messages participated in the study. Cochlear implant users listened to speech in wind noise recorded at locations that the directional and omnidirectional microphones yielded the lowest noise levels. Cochlear implant listeners repeated the sentences and rated the sound quality of the testing materials. Spectral and temporal characteristics of flow noise, as well as speech and/or noise characteristics before

  8. Fine-structure processing, frequency selectivity and speech perception in hearing-impaired listeners

    DEFF Research Database (Denmark)

    Strelcyk, Olaf; Dau, Torsten

    2008-01-01

    Hearing-impaired people often experience great difficulty with speech communication when background noise is present, even if reduced audibility has been compensated for. Other impairment factors must be involved. In order to minimize confounding effects, the subjects participating in this study...... consisted of groups with homogeneous, symmetric audiograms. The perceptual listening experiments assessed the intelligibility of full-spectrum as well as low-pass filtered speech in the presence of stationary and fluctuating interferers, the individual's frequency selectivity and the integrity of temporal...... modulation were obtained. In addition, these binaural and monaural thresholds were measured in a stationary background noise in order to assess the persistence of the fine-structure processing to interfering noise. Apart from elevated speech reception thresholds, the hearing impaired listeners showed poorer...

  9. Synthesis of multi-wavelength temporal phase-shifting algorithms optimized for high signal-to-noise ratio and high detuning robustness using the frequency transfer function.

    Science.gov (United States)

    Servin, Manuel; Padilla, Moises; Garnica, Guillermo

    2016-05-02

    Synthesis of single-wavelength temporal phase-shifting algorithms (PSA) for interferometry is well-known and firmly based on the frequency transfer function (FTF) paradigm. Here we extend the single-wavelength FTF-theory to dual and multi-wavelength PSA-synthesis when several simultaneous laser-colors are present. The FTF-based synthesis for dual-wavelength (DW) PSA is optimized for high signal-to-noise ratio and minimum number of temporal phase-shifted interferograms. The DW-PSA synthesis herein presented may be used for interferometric contouring of discontinuous industrial objects. Also DW-PSA may be useful for DW shop-testing of deep free-form aspheres. As shown here, using the FTF-based synthesis one may easily find explicit DW-PSA formulae optimized for high signal-to-noise and high detuning robustness. To this date, no general synthesis and analysis for temporal DW-PSAs has been given; only ad hoc DW-PSAs formulas have been reported. Consequently, no explicit formulae for their spectra, their signal-to-noise, their detuning and harmonic robustness has been given. Here for the first time a fully general procedure for designing DW-PSAs (or triple-wavelengths PSAs) with desire spectrum, signal-to-noise ratio and detuning robustness is given. We finally generalize DW-PSA to higher number of wavelength temporal PSAs.

  10. A positron emission tomography study of the neural basis of informational and energetic masking effects in speech perception

    Science.gov (United States)

    Scott, Sophie K.; Rosen, Stuart; Wickham, Lindsay; Wise, Richard J. S.

    2004-02-01

    Positron emission tomography (PET) was used to investigate the neural basis of the comprehension of speech in unmodulated noise (``energetic'' masking, dominated by effects at the auditory periphery), and when presented with another speaker (``informational'' masking, dominated by more central effects). Each type of signal was presented at four different signal-to-noise ratios (SNRs) (+3, 0, -3, -6 dB for the speech-in-speech, +6, +3, 0, -3 dB for the speech-in-noise), with listeners instructed to listen for meaning to the target speaker. Consistent with behavioral studies, there was SNR-dependent activation associated with the comprehension of speech in noise, with no SNR-dependent activity for the comprehension of speech-in-speech (at low or negative SNRs). There was, in addition, activation in bilateral superior temporal gyri which was associated with the informational masking condition. The extent to which this activation of classical ``speech'' areas of the temporal lobes might delineate the neural basis of the informational masking is considered, as is the relationship of these findings to the interfering effects of unattended speech and sound on more explicit working memory tasks. This study is a novel demonstration of candidate neural systems involved in the perception of speech in noisy environments, and of the processing of multiple speakers in the dorso-lateral temporal lobes.

  11. Speech activity detection for the automated speaker recognition system of critical use

    Directory of Open Access Journals (Sweden)

    M. M. Bykov

    2017-06-01

    Full Text Available In the article, the authors developed a method for detecting speech activity for an automated system for recognizing critical use of speeches with wavelet parameterization of speech signal and classification at intervals of “language”/“pause” using a curvilinear neural network. The method of wavelet-parametrization proposed by the authors allows choosing the optimal parameters of wavelet transformation in accordance with the user-specified error of presentation of speech signal. Also, the method allows estimating the loss of information depending on the selected parameters of continuous wavelet transformation (NPP, which allowed to reduce the number of scalable coefficients of the LVP of the speech signal in order of magnitude with the allowable degree of distortion of the local spectrum of the LVP. An algorithm for detecting speech activity with a curvilinear neural network classifier is also proposed, which shows the high quality of segmentation of speech signals at intervals "language" / "pause" and is resistant to the presence in the speech signal of narrowband noise and technogenic noise due to the inherent properties of the curvilinear neural network.

  12. Improving the speech intelligibility in classrooms

    Science.gov (United States)

    Lam, Choi Ling Coriolanus

    of the reverberation time, the indoor ambient noise (or background noise level), the signal-to-noise ratio, and the speech transmission index, it aims to establish a guideline for improving the speech intelligibility in classrooms for any countries and any environmental conditions. The study showed that the acoustical conditions of most of the measured classrooms in Hong Kong are unsatisfactory. The selection of materials inside a classroom is important for improving speech intelligibility at design stage, especially the acoustics ceiling, to shorten the reverberation time inside the classroom. The signal-to-noise should be higher than 11dB(A) for over 70% of speech perception, either tonal or non-tonal languages, without the usage of address system. The unexpected results bring out a call to revise the standard design and to devise acceptable standards for classrooms in Hong Kong. It is also demonstrated a method for assessment on the classroom in other cities with similar environmental conditions.

  13. Evaluation of Adaptive Noise Management Technologies for School-Age Children with Hearing Loss.

    Science.gov (United States)

    Wolfe, Jace; Duke, Mila; Schafer, Erin; Jones, Christine; Rakita, Lori

    2017-05-01

    Children with hearing loss experience significant difficulty understanding speech in noisy and reverberant situations. Adaptive noise management technologies, such as fully adaptive directional microphones and digital noise reduction, have the potential to improve communication in noise for children with hearing aids. However, there are no published studies evaluating the potential benefits children receive from the use of adaptive noise management technologies in simulated real-world environments as well as in daily situations. The objective of this study was to compare speech recognition, speech intelligibility ratings (SIRs), and sound preferences of children using hearing aids equipped with and without adaptive noise management technologies. A single-group, repeated measures design was used to evaluate performance differences obtained in four simulated environments. In each simulated environment, participants were tested in a basic listening program with minimal noise management features, a manual program designed for that scene, and the hearing instruments' adaptive operating system that steered hearing instrument parameterization based on the characteristics of the environment. Twelve children with mild to moderately severe sensorineural hearing loss. Speech recognition and SIRs were evaluated in three hearing aid programs with and without noise management technologies across two different test sessions and various listening environments. Also, the participants' perceptual hearing performance in daily real-world listening situations with two of the hearing aid programs was evaluated during a four- to six-week field trial that took place between the two laboratory sessions. On average, the use of adaptive noise management technology improved sentence recognition in noise for speech presented in front of the participant but resulted in a decrement in performance for signals arriving from behind when the participant was facing forward. However, the improvement

  14. White noise theory of robust nonlinear filtering with correlated state and observation noises

    NARCIS (Netherlands)

    Bagchi, Arunabha; Karandikar, Rajeeva

    1992-01-01

    In the direct white noise theory of nonlinear filtering, the state process is still modeled as a Markov process satisfying an Ito stochastic differential equation, while a finitely additive white noise is used to model the observation noise. In the present work, this asymmetry is removed by modeling

  15. White noise theory of robust nonlinear filtering with correlated state and observation noises

    NARCIS (Netherlands)

    Bagchi, Arunabha; Karandikar, Rajeeva

    1994-01-01

    In the existing `direct¿ white noise theory of nonlinear filtering, the state process is still modelled as a Markov process satisfying an Itô stochastic differential equation, while a `finitely additive¿ white noise is used to model the observation noise. We remove this asymmetry by modelling the

  16. Accuracy of cochlear implant recipients in speech reception in the presence of background music.

    Science.gov (United States)

    Gfeller, Kate; Turner, Christopher; Oleson, Jacob; Kliethermes, Stephanie; Driscoll, Virginia

    2012-12-01

    This study examined speech recognition abilities of cochlear implant (CI) recipients in the spectrally complex listening condition of 3 contrasting types of background music, and compared performance based upon listener groups: CI recipients using conventional long-electrode devices, Hybrid CI recipients (acoustic plus electric stimulation), and normal-hearing adults. We tested 154 long-electrode CI recipients using varied devices and strategies, 21 Hybrid CI recipients, and 49 normal-hearing adults on closed-set recognition of spondees presented in 3 contrasting forms of background music (piano solo, large symphony orchestra, vocal solo with small combo accompaniment) in an adaptive test. Signal-to-noise ratio thresholds for speech in music were examined in relation to measures of speech recognition in background noise and multitalker babble, pitch perception, and music experience. The signal-to-noise ratio thresholds for speech in music varied as a function of category of background music, group membership (long-electrode, Hybrid, normal-hearing), and age. The thresholds for speech in background music were significantly correlated with measures of pitch perception and thresholds for speech in background noise; auditory status was an important predictor. Evidence suggests that speech reception thresholds in background music change as a function of listener age (with more advanced age being detrimental), structural characteristics of different types of music, and hearing status (residual hearing). These findings have implications for everyday listening conditions such as communicating in social or commercial situations in which there is background music.

  17. On Optimal Linear Filtering of Speech for Near-End Listening Enhancement

    DEFF Research Database (Denmark)

    Taal, Cees H.; Jensen, Jesper; Leijon, Arne

    2013-01-01

    In this letter the focus is on linear filtering of speech before degradation due to additive background noise. The goal is to design the filter such that the speech intelligibility index (SII) is maximized when the speech is played back in a known noisy environment. Moreover, a power constraint i...

  18. Auditory Verbal Working Memory as a Predictor of Speech Perception in Modulated Maskers in Listeners With Normal Hearing

    OpenAIRE

    Millman, Rebecca E.; Mattys, Sven L.

    2017-01-01

    Purpose: Background noise can interfere with our ability to understand speech. Working memory capacity (WMC) has been shown to contribute to the perception of speech in modulated noise maskers. WMC has been assessed with a variety of auditory and visual tests, often pertaining to different components of working memory. This study assessed the relationship between speech perception in modulated maskers and components of auditory verbal working memory (AVWM) over a range of signal-to-noise rati...

  19. Gender and vocal production mode discrimination using the high frequencies for speech and singing

    Science.gov (United States)

    Monson, Brian B.; Lotto, Andrew J.; Story, Brad H.

    2014-01-01

    Humans routinely produce acoustical energy at frequencies above 6 kHz during vocalization, but this frequency range is often not represented in communication devices and speech perception research. Recent advancements toward high-definition (HD) voice and extended bandwidth hearing aids have increased the interest in the high frequencies. The potential perceptual information provided by high-frequency energy (HFE) is not well characterized. We found that humans can accomplish tasks of gender discrimination and vocal production mode discrimination (speech vs. singing) when presented with acoustic stimuli containing only HFE at both amplified and normal levels. Performance in these tasks was robust in the presence of low-frequency masking noise. No substantial learning effect was observed. Listeners also were able to identify the sung and spoken text (excerpts from “The Star-Spangled Banner”) with very few exposures. These results add to the increasing evidence that the high frequencies provide at least redundant information about the vocal signal, suggesting that its representation in communication devices (e.g., cell phones, hearing aids, and cochlear implants) and speech/voice synthesizers could improve these devices and benefit normal-hearing and hearing-impaired listeners. PMID:25400613

  20. Auditory-Perceptual and Acoustic Methods in Measuring Dysphonia Severity of Korean Speech.

    Science.gov (United States)

    Maryn, Youri; Kim, Hyung-Tae; Kim, Jaeock

    2016-09-01

    The purpose of this study was to explore the criterion-related concurrent validity of two standardized auditory-perceptual rating protocols and the Acoustic Voice Quality Index (AVQI) for measuring dysphonia severity in Korean speech. Sixty native Korean subjects with various voice disorders were asked to sustain the vowel [a:] and to read aloud the Korean text "Walk." A 3-second midvowel portion of the sustained vowel and two sentences (with 25 syllables) were edited, concatenated, and analyzed according to methods described elsewhere. From 56 participants, both continuous speech and sustained vowel recordings had sufficiently high signal-to-noise ratios (35.5 dB and 37 dB on average, respectively) and were therefore subjected to further dysphonia severity analysis with (1) "G" or Grade from the GRBAS protocol, (2) "OS" or Overall Severity from the Consensus Auditory-Perceptual Evaluation of Voice protocol, and (3) AVQI. First, high correlations were found between G and OS (rS = 0.955 for sustained vowels; rS = 0.965 for continuous speech). Second, the AVQI showed a strong correlation with G (rS = 0.911) as well as OS (rP = 0.924). These findings are in agreement with similar studies dealing with continuous speech in other languages. The present study highlights the criterion-related concurrent validity of these methods in Korean speech. Furthermore, it supports the cross-linguistic robustness of the AVQI as a valid and objective marker of overall dysphonia severity. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  1. The pupil response is sensitive to divided attention during speech processing.

    Science.gov (United States)

    Koelewijn, Thomas; Shinn-Cunningham, Barbara G; Zekveld, Adriana A; Kramer, Sophia E

    2014-06-01

    Dividing attention over two streams of speech strongly decreases performance compared to focusing on only one. How divided attention affects cognitive processing load as indexed with pupillometry during speech recognition has so far not been investigated. In 12 young adults the pupil response was recorded while they focused on either one or both of two sentences that were presented dichotically and masked by fluctuating noise across a range of signal-to-noise ratios. In line with previous studies, the performance decreases when processing two target sentences instead of one. Additionally, dividing attention to process two sentences caused larger pupil dilation and later peak pupil latency than processing only one. This suggests an effect of attention on cognitive processing load (pupil dilation) during speech processing in noise. Copyright © 2014 The Authors. Published by Elsevier B.V. All rights reserved.

  2. The Effects of Audiovisual Stimulation on the Acceptance of Background Noise.

    Science.gov (United States)

    Plyler, Patrick N; Lang, Rowan; Monroe, Amy L; Gaudiano, Paul

    2015-05-01

    Previous examinations of noise acceptance have been conducted using an auditory stimulus only; however, the effect of visual speech supplementation of the auditory stimulus on acceptance of noise remains limited. The purpose of the present study was to determine the effect of audiovisual stimulation on the acceptance of noise in listeners with normal and impaired hearing. A repeated measures design was utilized. A total of 92 adult participants were recruited for this experiment. Of these participants, 54 were listeners with normal hearing and 38 were listeners with sensorineural hearing impairment. Most comfortable levels and acceptable noise levels (ANL) were obtained using auditory and auditory-visual stimulation modes for the unaided listening condition for each participant and for the aided listening condition for 35 of the participants with impaired hearing that owned hearing aids. Speech reading ability was assessed using the Utley test for each participant. The addition of visual input did not impact the most comfortable level values for listeners in either group; however, visual input improved unaided ANL values for listeners with normal hearing and aided ANL values in listeners with impaired hearing. ANL benefit received from visual speech input was related to the auditory ANL in listeners in each group; however, it was not related to speech reading ability for either listener group in any experimental condition. Visual speech input can significantly impact measures of noise acceptance. The current ANL measure may not accurately reflect acceptance of noise values when in more realistic environments, where the signal of interest is both audible and visible to the listener. American Academy of Audiology.

  3. SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support

    Directory of Open Access Journals (Sweden)

    Giampiero Salvi

    2009-01-01

    Full Text Available This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling.

  4. Noise tolerant spatiotemporal chaos computing.

    Science.gov (United States)

    Kia, Behnam; Kia, Sarvenaz; Lindner, John F; Sinha, Sudeshna; Ditto, William L

    2014-12-01

    We introduce and design a noise tolerant chaos computing system based on a coupled map lattice (CML) and the noise reduction capabilities inherent in coupled dynamical systems. The resulting spatiotemporal chaos computing system is more robust to noise than a single map chaos computing system. In this CML based approach to computing, under the coupled dynamics, the local noise from different nodes of the lattice diffuses across the lattice, and it attenuates each other's effects, resulting in a system with less noise content and a more robust chaos computing architecture.

  5. Effects of white noise on Callsign Acquisition Test and Modified Rhyme Test scores.

    Science.gov (United States)

    Blue-Terry, Misty; Letowski, Tomasz

    2011-02-01

    The Callsign Acquisition Test (CAT) is a speech intelligibility test developed by the US Army Research Laboratory. The test has been used to evaluate speech transmission through various communication systems but has not been yet sufficiently standardised and validated. The aim of this study was to compare CAT and Modified Rhyme Test (MRT) performance in the presence of white noise across a range of signal-to-noise ratios (SNRs). A group of 16 normal-hearing listeners participated in the study. The speech items were presented at 65 dB(A) in the background of white noise at SNRs of -18, -15, -12, -9 and -6 dB. The results showed a strong positive association (75.14%) between the two tests, but significant differences between the CAT and MRT absolute scores in the range of investigated SNRs. Based on the data, a function to predict CAT scores based on existing MRT scores and vice versa was formulated. STATEMENT OF RELEVANCE: This work compares performance data of a common speech intelligibility test (MRT) with a new test (CAT) in the presence of white noise. The results here can be used as a part of the standardisation procedures and provide insights to the predictive capabilities of the CAT to quantify speech intelligibility communication in high-noise military environments.

  6. Virtual sensors for active noise control in acoustic-structural coupled enclosures using structural sensing: robust virtual sensor design.

    Science.gov (United States)

    Halim, Dunant; Cheng, Li; Su, Zhongqing

    2011-03-01

    The work was aimed to develop a robust virtual sensing design methodology for sensing and active control applications of vibro-acoustic systems. The proposed virtual sensor was designed to estimate a broadband acoustic interior sound pressure using structural sensors, with robustness against certain dynamic uncertainties occurring in an acoustic-structural coupled enclosure. A convex combination of Kalman sub-filters was used during the design, accommodating different sets of perturbed dynamic model of the vibro-acoustic enclosure. A minimax optimization problem was set up to determine an optimal convex combination of Kalman sub-filters, ensuring an optimal worst-case virtual sensing performance. The virtual sensing and active noise control performance was numerically investigated on a rectangular panel-cavity system. It was demonstrated that the proposed virtual sensor could accurately estimate the interior sound pressure, particularly the one dominated by cavity-controlled modes, by using a structural sensor. With such a virtual sensing technique, effective active noise control performance was also obtained even for the worst-case dynamics. © 2011 Acoustical Society of America

  7. Speech enhancement on smartphone voice recording

    International Nuclear Information System (INIS)

    Atmaja, Bagus Tris; Farid, Mifta Nur; Arifianto, Dhany

    2016-01-01

    Speech enhancement is challenging task in audio signal processing to enhance the quality of targeted speech signal while suppress other noises. In the beginning, the speech enhancement algorithm growth rapidly from spectral subtraction, Wiener filtering, spectral amplitude MMSE estimator to Non-negative Matrix Factorization (NMF). Smartphone as revolutionary device now is being used in all aspect of life including journalism; personally and professionally. Although many smartphones have two microphones (main and rear) the only main microphone is widely used for voice recording. This is why the NMF algorithm widely used for this purpose of speech enhancement. This paper evaluate speech enhancement on smartphone voice recording by using some algorithms mentioned previously. We also extend the NMF algorithm to Kulback-Leibler NMF with supervised separation. The last algorithm shows improved result compared to others by spectrogram and PESQ score evaluation. (paper)

  8. The impact of exploiting spectro-temporal context in computational speech segregation

    DEFF Research Database (Denmark)

    Bentsen, Thomas; Kressner, Abigail Anne; Dau, Torsten

    2018-01-01

    Computational speech segregation aims to automatically segregate speech from interfering noise, often by employing ideal binary mask estimation. Several studies have tried to exploit contextual information in speech to improve mask estimation accuracy by using two frequently-used strategies that (1...... for measured intelligibility. The findings may have implications for the design of speech segregation systems, and for the selection of a cost function that correlates with intelligibility....

  9. Robust Wavelet Estimation to Eliminate Simultaneously the Effects of Boundary Problems, Outliers, and Correlated Noise

    Directory of Open Access Journals (Sweden)

    Alsaidi M. Altaher

    2012-01-01

    Full Text Available Classical wavelet thresholding methods suffer from boundary problems caused by the application of the wavelet transformations to a finite signal. As a result, large bias at the edges and artificial wiggles occur when the classical boundary assumptions are not satisfied. Although polynomial wavelet regression and local polynomial wavelet regression effectively reduce the risk of this problem, the estimates from these two methods can be easily affected by the presence of correlated noise and outliers, giving inaccurate estimates. This paper introduces two robust methods in which the effects of boundary problems, outliers, and correlated noise are simultaneously taken into account. The proposed methods combine thresholding estimator with either a local polynomial model or a polynomial model using the generalized least squares method instead of the ordinary one. A primary step that involves removing the outlying observations through a statistical function is considered as well. The practical performance of the proposed methods has been evaluated through simulation experiments and real data examples. The results are strong evidence that the proposed method is extremely effective in terms of correcting the boundary bias and eliminating the effects of outliers and correlated noise.

  10. Speech Perception in Noise in Normally Hearing Children: Does Binaural Frequency Modulated Fitting Provide More Benefit than Monaural Frequency Modulated Fitting?

    Science.gov (United States)

    Mukari, Siti Zamratol-Mai Sarah; Umat, Cila; Razak, Ummu Athiyah Abdul

    2011-07-01

    The aim of the present study was to compare the benefit of monaural versus binaural ear-level frequency modulated (FM) fitting on speech perception in noise in children with normal hearing. Reception threshold for sentences (RTS) was measured in no-FM, monaural FM, and binaural FM conditions in 22 normally developing children with bilateral normal hearing, aged 8 to 9 years old. Data were gathered using the Pediatric Malay Hearing in Noise Test (P-MyHINT) with speech presented from front and multi-talker babble presented from 90°, 180°, 270° azimuths in a sound treated booth. The results revealed that the use of either monaural or binaural ear level FM receivers provided significantly better mean RTSs than the no-FM condition (Pbinaural FM did not produce a significantly greater benefit in mean RTS than monaural fitting. The benefit of binaural over monaural FM varies across individuals; while binaural fitting provided better RTSs in about 50% of study subjects, there were those in whom binaural fitting resulted in either deterioration or no additional improvement compared to monaural FM fitting. The present study suggests that the use of monaural ear-level FM receivers in children with normal hearing might provide similar benefit as binaural use. Individual subjects' variations of binaural FM benefit over monaural FM suggests that the decision to employ monaural or binaural fitting should be individualized. It should be noted however, that the current study recruits typically developing normal hearing children. Future studies involving normal hearing children with high risk of having difficulty listening in noise is indicated to see if similar findings are obtained.

  11. Influence of binary mask estimation errors on robust speaker identification

    DEFF Research Database (Denmark)

    May, Tobias

    2017-01-01

    Missing-data strategies have been developed to improve the noise-robustness of automatic speech recognition systems in adverse acoustic conditions. This is achieved by classifying time-frequency (T-F) units into reliable and unreliable components, as indicated by a so-called binary mask. Different...... approaches have been proposed to handle unreliable feature components, each with distinct advantages. The direct masking (DM) approach attenuates unreliable T-F units in the spectral domain, which allows the extraction of conventionally used mel-frequency cepstral coefficients (MFCCs). Instead of attenuating....... Since each of these approaches utilizes the knowledge about reliable and unreliable feature components in a different way, they will respond differently to estimation errors in the binary mask. The goal of this study was to identify the most effective strategy to exploit knowledge about reliable...

  12. The effect of hearing aid noise reduction on listening effort in hearing-impaired adults.

    Science.gov (United States)

    Desjardins, Jamie L; Doherty, Karen A

    2014-01-01

    The purpose of the present study was to evaluate the effect of a noise-reduction (NR) algorithm on the listening effort hearing-impaired participants expend on a speech in noise task. Twelve hearing-impaired listeners fitted with behind-the-ear hearing aids with a fast-acting modulation-based NR algorithm participated in this study. A dual-task paradigm was used to measure listening effort with and without the NR enabled in the hearing aid. The primary task was a sentence-in-noise task presented at fixed overall speech performance levels of 76% (moderate listening condition) and 50% (difficult listening condition) correct performance, and the secondary task was a visual-tracking test. Participants also completed measures of working memory (Reading Span test), and processing speed (Digit Symbol Substitution Test) ability. Participants' speech recognition in noise scores did not significantly change with the NR algorithm activated in the hearing aid in either listening condition. The NR algorithm significantly decreased listening effort, but only in the more difficult listening condition. Last, there was a tendency for participants with faster processing speeds to expend less listening effort with the NR algorithm when listening to speech in background noise in the difficult listening condition. The NR algorithm reduced the listening effort adults with hearing loss must expend to understand speech in noise.

  13. Model based Binaural Enhancement of Voiced and Unvoiced Speech

    DEFF Research Database (Denmark)

    Kavalekalam, Mathew Shaji; Christensen, Mads Græsbøll; Boldt, Jesper B.

    2017-01-01

    This paper deals with the enhancement of speech in presence of non-stationary babble noise. A binaural speech enhancement framework is proposed which takes into account both the voiced and unvoiced speech production model. The usage of this model in enhancement requires the Short term predictor...... (STP) parameters and the pitch information to be estimated. This paper uses a codebook based approach for estimating the STP parameters and a parametric binaural method is proposed for estimating the pitch parameters. Improvements in objective score are shown when using the voicedunvoiced speech model...

  14. Speech-specific tuning of neurons in human superior temporal gyrus.

    Science.gov (United States)

    Chan, Alexander M; Dykstra, Andrew R; Jayaram, Vinay; Leonard, Matthew K; Travis, Katherine E; Gygi, Brian; Baker, Janet M; Eskandar, Emad; Hochberg, Leigh R; Halgren, Eric; Cash, Sydney S

    2014-10-01

    How the brain extracts words from auditory signals is an unanswered question. We recorded approximately 150 single and multi-units from the left anterior superior temporal gyrus of a patient during multiple auditory experiments. Against low background activity, 45% of units robustly fired to particular spoken words with little or no response to pure tones, noise-vocoded speech, or environmental sounds. Many units were tuned to complex but specific sets of phonemes, which were influenced by local context but invariant to speaker, and suppressed during self-produced speech. The firing of several units to specific visual letters was correlated with their response to the corresponding auditory phonemes, providing the first direct neural evidence for phonological recoding during reading. Maximal decoding of individual phonemes and words identities was attained using firing rates from approximately 5 neurons within 200 ms after word onset. Thus, neurons in human superior temporal gyrus use sparse spatially organized population encoding of complex acoustic-phonetic features to help recognize auditory and visual words. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. Robust quantum secure direct communication and authentication protocol against decoherence noise based on six-qubit DF state

    International Nuclear Information System (INIS)

    Chang Yan; Zhang Shi-Bin; Yan Li-Li; Han Gui-Hua

    2015-01-01

    By using six-qubit decoherence-free (DF) states as quantum carriers and decoy states, a robust quantum secure direct communication and authentication (QSDCA) protocol against decoherence noise is proposed. Four six-qubit DF states are used in the process of secret transmission, however only the |0′〉 state is prepared. The other three six-qubit DF states can be obtained by permuting the outputs of the setup for |0′〉. By using the |0′〉 state as the decoy state, the detection rate and the qubit error rate reach 81.3%, and they will not change with the noise level. The stability and security are much higher than those of the ping–pong protocol both in an ideal scenario and a decoherence noise scenario. Even if the eavesdropper measures several qubits, exploiting the coherent relationship between these qubits, she can gain one bit of secret information with probability 0.042. (paper)

  16. Accuracy of Cochlear Implant Recipients on Speech Reception in Background Music

    Science.gov (United States)

    Gfeller, Kate; Turner, Christopher; Oleson, Jacob; Kliethermes, Stephanie; Driscoll, Virginia

    2012-01-01

    Objectives This study (a) examined speech recognition abilities of cochlear implant (CI) recipients in the spectrally complex listening condition of three contrasting types of background music, and (b) compared performance based upon listener groups: CI recipients using conventional long-electrode (LE) devices, Hybrid CI recipients (acoustic plus electric stimulation), and normal-hearing (NH) adults. Methods We tested 154 LE CI recipients using varied devices and strategies, 21 Hybrid CI recipients, and 49 NH adults on closed-set recognition of spondees presented in three contrasting forms of background music (piano solo, large symphony orchestra, vocal solo with small combo accompaniment) in an adaptive test. Outcomes Signal-to-noise thresholds for speech in music (SRTM) were examined in relation to measures of speech recognition in background noise and multi-talker babble, pitch perception, and music experience. Results SRTM thresholds varied as a function of category of background music, group membership (LE, Hybrid, NH), and age. Thresholds for speech in background music were significantly correlated with measures of pitch perception and speech in background noise thresholds; auditory status was an important predictor. Conclusions Evidence suggests that speech reception thresholds in background music change as a function of listener age (with more advanced age being detrimental), structural characteristics of different types of music, and hearing status (residual hearing). These findings have implications for everyday listening conditions such as communicating in social or commercial situations in which there is background music. PMID:23342550

  17. Kalman filter for speech enhancement in cocktail party scenarios using a codebook-based approach

    DEFF Research Database (Denmark)

    Kavalekalam, Mathew Shaji; Christensen, Mads Græsbøll; Gran, Fredrik

    2016-01-01

    Enhancement of speech in non-stationary background noise is a challenging task, and conventional single channel speech enhancement algorithms have not been able to improve the speech intelligibility in such scenarios. The work proposed in this paper investigates a single channel Kalman filter based...... trained codebook over a generic speech codebook in relation to the performance of the speech enhancement system....

  18. Impact of noise on self-rated job satisfaction and health in open-plan offices: a structural equation modelling approach.

    Science.gov (United States)

    Lee, Pyoung Jik; Lee, Byung Kwon; Jeon, Jin Yong; Zhang, Mei; Kang, Jian

    2016-01-01

    This study uses a structural equation model to examine the effects of noise on self-rated job satisfaction and health in open-plan offices. A total of 334 employees from six open-plan offices in China and Korea completed a questionnaire survey. The questionnaire included questions assessing noise disturbances and speech privacy, as well as job satisfaction and health. The results indicated that noise disturbance affected self-rated health. Contrary to popular expectation, the relationship between noise disturbance and job satisfaction was not significant. Rather, job satisfaction and satisfaction with the environment were negatively correlated with lack of speech privacy. Speech privacy was found to be affected by noise sensitivity, and longer noise exposure led to decreased job satisfaction. There was also evidence that speech privacy was a stronger predictor of satisfaction with environment and job satisfaction for participants with high noise sensitivity. In addition, fit models for employees from China and Korea showed slight differences. This study is motivated by strong evidence that noise is the key source of complaints in open-plan offices. Survey results indicate that self-rated job satisfaction of workers in open-plan offices was negatively affected by lack of speech privacy and duration of disturbing noise.

  19. Comparisons of Stuttering Frequency during and after Speech Initiation in Unaltered Feedback, Altered Auditory Feedback and Choral Speech Conditions

    Science.gov (United States)

    Saltuklaroglu, Tim; Kalinowski, Joseph; Robbins, Mary; Crawcour, Stephen; Bowers, Andrew

    2009-01-01

    Background: Stuttering is prone to strike during speech initiation more so than at any other point in an utterance. The use of auditory feedback (AAF) has been found to produce robust decreases in the stuttering frequency by creating an electronic rendition of choral speech (i.e., speaking in unison). However, AAF requires users to self-initiate…

  20. Active3 noise reduction

    International Nuclear Information System (INIS)

    Holzfuss, J.

    1996-01-01

    Noise reduction is a problem being encountered in a variety of applications, such as environmental noise cancellation, signal recovery and separation. Passive noise reduction is done with the help of absorbers. Active noise reduction includes the transmission of phase inverted signals for the cancellation. This paper is about a threefold active approach to noise reduction. It includes the separation of a combined source, which consists of both a noise and a signal part. With the help of interaction with the source by scanning it and recording its response, modeling as a nonlinear dynamical system is achieved. The analysis includes phase space analysis and global radial basis functions as tools for the prediction used in a subsequent cancellation procedure. Examples are given which include noise reduction of speech. copyright 1996 American Institute of Physics

  1. Very loud speech over simulated environmental noise tends to have a spectral peak in the F1 region

    Science.gov (United States)

    Ternstrom, Sten; Bohman, Mikael; Sodersten, Maria

    2003-04-01

    In some professions, workplace noise appears to be a hazard to the voice, if not to hearing. Several studies have shown that teachers and sports instructors, for example, are more prone to voice problems than average, prompting research on loud voice. Since on-location recordings are in many ways impractical, the running speech of 23 untrained speaker subjects (12 female, 11 male) was instead recorded in several types of loud noise that was presented over high-quality loudspeakers. Using adaptive cancellation techniques, the noise was then removed from the recordings, thus exposing the strained voices for analysis. The experiment produced a large body of data, only one aspect of which is reported here. In most subjects, the vowel spectrum as a function of voice SPL showed the expected behavior for low to moderate efforts, but developed a very pronounced peak in the F1 region at the highest efforts. This peak can be ascribed to the concerted action of several acoustic mechanisms, including source waveform asymmetry, F1 approximating one of the lower partials, and increased formant Q values due to a longer closed phase. [Work supported by the Swedish Council for Working Life and Social Research, Contract No. 2001-0341.

  2. Parameter masks for close talk speech segregation using deep neural networks

    Directory of Open Access Journals (Sweden)

    Jiang Yi

    2015-01-01

    Full Text Available A deep neural networks (DNN based close talk speech segregation algorithm is introduced. One nearby microphone is used to collect the target speech as close talk indicated, and another microphone is used to get the noise in environments. The time and energy difference between the two microphones signal is used as the segregation cue. A DNN estimator on each frequency channel is used to calculate the parameter masks. The parameter masks represent the target speech energy in each time frequency (T-F units. Experiment results show the good performance of the proposed system. The signal to noise ratio (SNR improvement is 8.1 dB on 0 dB noisy environment.

  3. Prewhitening for Rank-Deficient Noise in Subspace Methods for Noise Reduction

    DEFF Research Database (Denmark)

    Hansen, Per Christian; Jensen, Søren Holdt

    2005-01-01

    A fundamental issue in connection with subspace methods for noise reduction is that the covariance matrix for the noise is required to have full rank, in order for the prewhitening step to be defined. However, there are important cases where this requirement is not fulfilled, e.g., when the noise...... has narrow-band characteristics, or in the case of tonal noise. We extend the concept of prewhitening to include the case when the noise covariance matrix is rank deficient, using a weighted pseudoinverse and the quotient SVD, and we show how to formulate a general rank-reduction algorithm that works...... also for rank deficient noise. We also demonstrate how to formulate this algorithm by means of a quotient ULV decomposition, which allows for faster computation and updating. Finally we apply our algorithm to a problem involving a speech signal contaminated by narrow-band noise....

  4. Modeling speech intelligibility in adverse conditions

    DEFF Research Database (Denmark)

    Dau, Torsten

    2012-01-01

    ) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting...... understanding speech when more than one person is talking, even when reduced audibility has been fully compensated for by a hearing aid. The reasons for these difficulties are not well understood. This presentation highlights recent concepts of the monaural and binaural signal processing strategies employed...... by the normal as well as impaired auditory system. Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII...

  5. Exploring the Relationship Between Working Memory, Compressor Speed, and Background Noise Characteristics.

    Science.gov (United States)

    Ohlenforst, Barbara; Souza, Pamela E; MacDonald, Ewen N

    2016-01-01

    Previous work has shown that individuals with lower working memory demonstrate reduced intelligibility for speech processed with fast-acting compression amplification. This relationship has been noted in fluctuating noise, but the extent of noise modulation that must be present to elicit such an effect is unknown. This study expanded on previous study by exploring the effect of background noise modulations in relation to compression speed and working memory ability, using a range of signal to noise ratios. Twenty-six older participants between ages 61 and 90 years were grouped by high or low working memory according to their performance on a reading span test. Speech intelligibility was measured for low-context sentences presented in background noise, where the noise varied in the extent of amplitude modulation. Simulated fast- or slow-acting compression amplification combined with individual frequency-gain shaping was applied to compensate for the individual's hearing loss. Better speech intelligibility scores were observed for participants with high working memory when fast compression was applied than when slow compression was applied. The low working memory group behaved in the opposite way and performed better under slow compression compared with fast compression. There was also a significant effect of the extent of amplitude modulation in the background noise, such that the magnitude of the score difference (fast versus slow compression) depended on the number of talkers in the background noise. The presented signal to noise ratios were not a significant factor on the measured intelligibility performance. In agreement with earlier research, high working memory allowed better speech intelligibility when fast compression was applied in modulated background noise. In the present experiment, that effect was present regardless of the extent of background noise modulation.

  6. Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition

    NARCIS (Netherlands)

    Huijbregts, M.A.H.; Ordelman, Roeland J.F.; de Jong, Franciska M.G.

    2007-01-01

    This paper reports on the setup and evaluation of robust speech recognition system parts, geared towards transcript generation for heterogeneous, real-life media collections. The system is deployed for generating speech transcripts for the NIST/TRECVID-2007 test collection, part of a Dutch real-life

  7. Quantum noise, quantum measurement, and squeezing

    International Nuclear Information System (INIS)

    Haus, Herman A

    2004-01-01

    This is the edited text of the Keynote Speech that Professor Haus had been invited to give at the Conference on Fluctuations and Noise in Photonics and Quantum Optics, held at Santa Fe, NM, on 1-4 June 2003. He introduces it as partly an overview, partly a retrospective, finishing with some remarks about the future, addressing the topics as he knew them best, from his own perspective. Sadly, Professor Haus died shortly before he was due to present this speech to conference delegates. (keynote speech)

  8. Development of a Voice Activity Controlled Noise Canceller

    Science.gov (United States)

    Abid Noor, Ali O.; Samad, Salina Abdul; Hussain, Aini

    2012-01-01

    In this paper, a variable threshold voice activity detector (VAD) is developed to control the operation of a two-sensor adaptive noise canceller (ANC). The VAD prohibits the reference input of the ANC from containing some strength of actual speech signal during adaptation periods. The novelty of this approach resides in using the residual output from the noise canceller to control the decisions made by the VAD. Thresholds of full-band energy and zero-crossing features are adjusted according to the residual output of the adaptive filter. Performance evaluation of the proposed approach is quoted in terms of signal to noise ratio improvements as well mean square error (MSE) convergence of the ANC. The new approach showed an improved noise cancellation performance when tested under several types of environmental noise. Furthermore, the computational power of the adaptive process is reduced since the output of the adaptive filter is efficiently calculated only during non-speech periods. PMID:22778667

  9. Development of a Voice Activity Controlled Noise Canceller

    Directory of Open Access Journals (Sweden)

    Aini Hussain

    2012-05-01

    Full Text Available In this paper, a variable threshold voice activity detector (VAD is developed to control the operation of a two-sensor adaptive noise canceller (ANC. The VAD prohibits the reference input of the ANC from containing some strength of actual speech signal during adaptation periods. The novelty of this approach resides in using the residual output from the noise canceller to control the decisions made by the VAD. Thresholds of full-band energy and zero-crossing features are adjusted according to the residual output of the adaptive filter. Performance evaluation of the proposed approach is quoted in terms of signal to noise ratio improvements as well mean square error (MSE convergence of the ANC. The new approach showed an improved noise cancellation performance when tested under several types of environmental noise. Furthermore, the computational power of the adaptive process is reduced since the output of the adaptive filter is efficiently calculated only during non-speech periods.

  10. Relationship between perceptual learning in speech and statistical learning in younger and older adults

    Directory of Open Access Journals (Sweden)

    Thordis Marisa Neger

    2014-09-01

    Full Text Available Within a few sentences, listeners learn to understand severely degraded speech such as noise-vocoded speech. However, individuals vary in the amount of such perceptual learning and it is unclear what underlies these differences. The present study investigates whether perceptual learning in speech relates to statistical learning, as sensitivity to probabilistic information may aid identification of relevant cues in novel speech input. If statistical learning and perceptual learning (partly draw on the same general mechanisms, then statistical learning in a non-auditory modality using non-linguistic sequences should predict adaptation to degraded speech.In the present study, 73 older adults (aged over 60 years and 60 younger adults (aged between 18 and 30 years performed a visual artificial grammar learning task and were presented with sixty meaningful noise-vocoded sentences in an auditory recall task. Within age groups, sentence recognition performance over exposure was analyzed as a function of statistical learning performance, and other variables that may predict learning (i.e., hearing, vocabulary, attention switching control, working memory and processing speed. Younger and older adults showed similar amounts of perceptual learning, but only younger adults showed significant statistical learning. In older adults, improvement in understanding noise-vocoded speech was constrained by age. In younger adults, amount of adaptation was associated with lexical knowledge and with statistical learning ability. Thus, individual differences in general cognitive abilities explain listeners' variability in adapting to noise-vocoded speech. Results suggest that perceptual and statistical learning share mechanisms of implicit regularity detection, but that the ability to detect statistical regularities is impaired in older adults if visual sequences are presented quickly.

  11. Lateralized speech perception with small interaural time differences in normal-hearing and hearing-impaired listeners

    DEFF Research Database (Denmark)

    Locsei, Gusztav; Santurette, Sébastien; Dau, Torsten

    2017-01-01

    and two-talker babble in terms of SRTs, HI listeners could utilize ITDs to a similar degree as NH listeners to facilitate the binaural unmasking of speech. A slight difference was observed between the group means when target and maskers were separated from each other by large ITDs, but not when separated...... SRMs are elicited by small ITDs. Speech reception thresholds (SRTs) and SRM due to ITDs were measured over headphones for 10 young NH and 10 older HI listeners, who had normal or close-to-normal hearing below 1.5 kHz. Diotic target sentences were presented in diotic or dichotic speech-shaped noise...... or two-talker babble maskers. In the dichotic conditions, maskers were lateralized by delaying the masker waveforms in the left headphone channel. Multiple magnitudes of masker ITDs were tested in both noise conditions. Although deficits were observed in speech perception abilities in speechshaped noise...

  12. Prediction of speech intelligibility based on an auditory preprocessing model

    DEFF Research Database (Denmark)

    Christiansen, Claus Forup Corlin; Pedersen, Michael Syskind; Dau, Torsten

    2010-01-01

    in noise experiment was used for training and an ideal binary mask experiment was used for evaluation. All three models were able to capture the trends in the speech in noise training data well, but the proposed model provides a better prediction of the binary mask test data, particularly when the binary...... masks degenerate to a noise vocoder....

  13. Statistical Analysis of Spectral Properties and Prosodic Parameters of Emotional Speech

    Science.gov (United States)

    Přibil, J.; Přibilová, A.

    2009-01-01

    The paper addresses reflection of microintonation and spectral properties in male and female acted emotional speech. Microintonation component of speech melody is analyzed regarding its spectral and statistical parameters. According to psychological research of emotional speech, different emotions are accompanied by different spectral noise. We control its amount by spectral flatness according to which the high frequency noise is mixed in voiced frames during cepstral speech synthesis. Our experiments are aimed at statistical analysis of cepstral coefficient values and ranges of spectral flatness in three emotions (joy, sadness, anger), and a neutral state for comparison. Calculated histograms of spectral flatness distribution are visually compared and modelled by Gamma probability distribution. Histograms of cepstral coefficient distribution are evaluated and compared using skewness and kurtosis. Achieved statistical results show good correlation comparing male and female voices for all emotional states portrayed by several Czech and Slovak professional actors.

  14. On the relationship between auditory cognition and speech intelligibility in cochlear implant users: An ERP study.

    Science.gov (United States)

    Finke, Mareike; Büchner, Andreas; Ruigendijk, Esther; Meyer, Martin; Sandmann, Pascale

    2016-07-01

    There is a high degree of variability in speech intelligibility outcomes across cochlear-implant (CI) users. To better understand how auditory cognition affects speech intelligibility with the CI, we performed an electroencephalography study in which we examined the relationship between central auditory processing, cognitive abilities, and speech intelligibility. Postlingually deafened CI users (N=13) and matched normal-hearing (NH) listeners (N=13) performed an oddball task with words presented in different background conditions (quiet, stationary noise, modulated noise). Participants had to categorize words as living (targets) or non-living entities (standards). We also assessed participants' working memory (WM) capacity and verbal abilities. For the oddball task, we found lower hit rates and prolonged response times in CI users when compared with NH listeners. Noise-related prolongation of the N1 amplitude was found for all participants. Further, we observed group-specific modulation effects of event-related potentials (ERPs) as a function of background noise. While NH listeners showed stronger noise-related modulation of the N1 latency, CI users revealed enhanced modulation effects of the N2/N4 latency. In general, higher-order processing (N2/N4, P3) was prolonged in CI users in all background conditions when compared with NH listeners. Longer N2/N4 latency in CI users suggests that these individuals have difficulties to map acoustic-phonetic features to lexical representations. These difficulties seem to be increased for speech-in-noise conditions when compared with speech in quiet background. Correlation analyses showed that shorter ERP latencies were related to enhanced speech intelligibility (N1, N2/N4), better lexical fluency (N1), and lower ratings of listening effort (N2/N4) in CI users. In sum, our findings suggest that CI users and NH listeners differ with regards to both the sensory and the higher-order processing of speech in quiet as well as in

  15. Digitally controlled active noise reduction with integrated speech communication

    NARCIS (Netherlands)

    Steeneken, H.J.M.; Verhave, J.A.

    2004-01-01

    Active noise reduction is a successful addition to passive ear-defenders for improvement of the sound attenuation at low frequencies. Design and assessment methods are discussed, focused on subjective and objective attenuation measurements, stability, and high noise level applications. Active noise

  16. A brief overview of speech enhancement with linear filtering

    DEFF Research Database (Denmark)

    Benesty, Jacob; Christensen, Mads Græsbøll; Jensen, Jesper Rindom

    2014-01-01

    In this paper, we provide an overview of some recently introduced principles and ideas for speech enhancement with linear filtering and explore how these are related and how they can be used in various applications. This is done in a general framework where the speech enhancement problem is stated......-to-noise ratio (SNR), and Wiener filters are derived from the conventional speech enhancement approach and the recently introduced orthogonal decomposition approach. For each of the filters, we derive their properties in terms of output SNR and speech distortion. We then demonstrate how the ideas can be applied...

  17. Restoring the missing features of the corrupted speech using linear interpolation methods

    Science.gov (United States)

    Rassem, Taha H.; Makbol, Nasrin M.; Hasan, Ali Muttaleb; Zaki, Siti Syazni Mohd; Girija, P. N.

    2017-10-01

    One of the main challenges in the Automatic Speech Recognition (ASR) is the noise. The performance of the ASR system reduces significantly if the speech is corrupted by noise. In spectrogram representation of a speech signal, after deleting low Signal to Noise Ratio (SNR) elements, the incomplete spectrogram is obtained. In this case, the speech recognizer should make modifications to the spectrogram in order to restore the missing elements, which is one direction. In another direction, speech recognizer should be able to restore the missing elements due to deleting low SNR elements before performing the recognition. This is can be done using different spectrogram reconstruction methods. In this paper, the geometrical spectrogram reconstruction methods suggested by some researchers are implemented as a toolbox. In these geometrical reconstruction methods, the linear interpolation along time or frequency methods are used to predict the missing elements between adjacent observed elements in the spectrogram. Moreover, a new linear interpolation method using time and frequency together is presented. The CMU Sphinx III software is used in the experiments to test the performance of the linear interpolation reconstruction method. The experiments are done under different conditions such as different lengths of the window and different lengths of utterances. Speech corpus consists of 20 males and 20 females; each one has two different utterances are used in the experiments. As a result, 80% recognition accuracy is achieved with 25% SNR ratio.

  18. MIMO scheme performance and detection in epsilon noise

    OpenAIRE

    Stepanov, Sander

    2006-01-01

    New approach for analysis and decoding MIMO signaling is developed for usual model of nongaussion noise consists of background and impulsive noise named epsilon - noise. It is shown that non-gaussion noise performance significantly worse than gaussion ones. Stimulation results strengthen out theory. Robust in statistical sense detection rule is suggested for such kind of noise features much best robust detector performance than detector designed for Gaussian noise in impulsive environment and...

  19. Computational speech segregation based on an auditory-inspired modulation analysis

    DEFF Research Database (Denmark)

    May, Tobias; Dau, Torsten

    2014-01-01

    A monaural speech segregation system is presented that estimates the ideal binary mask from noisy speech based on the supervised learning of amplitude modulation spectrogram (AMS) features. Instead of using linearly scaled modulation filters with constant absolute bandwidth, an auditory- inspired...... about speech activity present in neighboring time-frequency units. In order to evaluate the general