Full Text Available Back in October, Aaron Schmidt posted “HOWTO give a good presentation” to his blog walking paper. His second bullet point of “thoughts” on good presentations is: Please don’t fill your slides with words. Find some relevant and pretty pictures to support what you’re saying. You can use the pictures to remind yourself what you’re going [...
not only affect the listener of speech communication in a noisy environment, HPDs can also affect the speaker . Tufts and Frank (2003) found that...of hearing protection on speech intelligibility in noise. Sound and Vibration . 20(10): 12-14. Berger, E. H. 1980. EARLog #4 – The
Zekveld, A.A.; Kramer, S.E.; Kessens, J.M.; Vlaming, M.S.M.G.; Houtgast, T.
OBJECTIVES: The aim of this study was to evaluate the benefit that listeners obtain from visually presented output from an automatic speech recognition (ASR) system during listening to speech in noise. DESIGN: Auditory-alone and audiovisual speech reception thresholds (SRTs) were measured. The SRT
Full Text Available In psycholinguistic research the exact level of language selection in bilingual lexical access is still controversial and current models of bilingual speech production offer conflicting statements about the mechanisms and location of language selection. This paper aims to provide a corpus analysis of self-repair mechanisms in code-switching contexts of highly fluent bilingual speakers in order to gain further insights into bilingual speech production. The present paper follows the assumptions of the Selection by Proficiency model, which claims that language proficiency and lexical robustness determine the mechanism and level of language selection. In accordance with this hypothesis, highly fluent bilinguals select languages at a prelexical level, which should influence the occurrence of self-repairs in bilingual speech. A corpus of natural speech data of highly fluent and balanced bilingual French-English speakers of the Canadian French variety Franco-Manitoban serves as the basis for a detailed analysis of different self-repair mechanisms in code-switching environments. Although the speech data contain a large amount of code-switching, results reveal that only a few speech errors and self-repairs occur in direct code-switching environments. A detailed analysis of the respective starting point of code-switching and the different repair mechanisms supports the hypothesis that highly proficient bilinguals do not select languages at the lexical level.Le niveau exact de la sélection des langues lors de l’accès lexical chez le bilingue reste une question controversée dans la recherche psycholinguistique. Les modèles actuels de la production verbale bilingue proposent des arguments contradictoires concernant le mécanisme et le lieu de la sélection des langues. La présente recherche vise à fournir une analyse de corpus mettant l’accent sur les mécanismes d’autoréparation dans le contexte d’alternance codique dans la production verbale
Boileau, Don M.
Presents annotations of 21 documents in the ERIC system on the following subjects: (1) theory of freedom of speech; (2) theorists; (3) research on freedom of speech; (4) broadcasting and freedom of speech; and (5) international questions of freedom of speech. (PD)
Site selection has been going on since the earliest times. The process has evolved through the Industrial Revolution to the present period of exploding population and environmental awareness. Now the work must be done both with increasing sophistication and greater transparency. Modern techniques for site selection have been developed during the last two decades or so, utilizing a teachable body of knowledge and a growing literature. Many firms and individuals have contributed to this growing field. The driving force has been the need for such a process in siting and licensing of critical facilities such as nuclear power plants. A list of crucial, documented steps for identifying social impacts and acceptability are provided. A recent innovation is the self-selection method developed by government. The Superconducting Supercollider serves as an example of this approach. Geological or geologically dependent factors often dominate the process. The role as engineering and environmental geoscientists is to provide responsible leadership, consultation, and communication to the effort
Full Text Available The study of speech timing, i.e. the duration and speed or tempo of speech events, has increased in importance over the past twenty years, in particular in connection with increased demands for accuracy, intelligibility and naturalness in speech technology, with applications in language teaching and testing, and with the study of speech timing patterns in language typology. H owever, the methods used in such studies are very diverse, and so far there is no accessible overview of these methods. Since the field is too broad for us to provide an exhaustive account, we have made two choices: first, to provide a framework of paradigmatic (classificatory, syntagmatic (compositional and functional (discourse-oriented dimensions for duration analysis; and second, to provide worked examples of a selection of methods associated primarily with these three dimensions. Some of the methods which are covered are established state-of-the-art approaches (e.g. the paradigmatic Classification and Regression Trees, CART , analysis, others are discussed in a critical light (e.g. so-called ‘rhythm metrics’. A set of syntagmatic approaches applies to the tokenisation and tree parsing of duration hierarchies, based on speech annotations, and a functional approach describes duration distributions with sociolinguistic variables. Several of the methods are supported by a new web-based software tool for analysing annotated speech data, the Time Group Analyser.
Full Text Available The intensive research of speech emotion recognition introduced a huge collection of speech emotion features. Large feature sets complicate the speech emotion recognition task. Among various feature selection and transformation techniques for one-stage classification, multiple classifier systems were proposed. The main idea of multiple classifiers is to arrange the emotion classification process in stages. Besides parallel and serial cases, the hierarchical arrangement of multi-stage classification is most widely used for speech emotion recognition. In this paper, we present a sequential-forward-feature-selection-based multi-stage classification scheme. The Sequential Forward Selection (SFS and Sequential Floating Forward Selection (SFFS techniques were employed for every stage of the multi-stage classification scheme. Experimental testing of the proposed scheme was performed using the German and Lithuanian emotional speech datasets. Sequential-feature-selection-based multi-stage classification outperformed the single-stage scheme by 12–42 % for different emotion sets. The multi-stage scheme has shown higher robustness to the growth of emotion set. The decrease in recognition rate with the increase in emotion set for multi-stage scheme was lower by 10–20 % in comparison with the single-stage case. Differences in SFS and SFFS employment for feature selection were negligible.
Dias, James W.; Cook, Theresa C.; Rosenblum, Lawrence D.
Research suggests that selective adaptation in speech is a low-level process dependent on sensory-specific information shared between the adaptor and test-stimuli. However, previous research has only examined how adaptors shift perception of unimodal test stimuli, either auditory or visual. In the current series of experiments, we investigated whether adaptation to cross-sensory phonetic information can influence perception of integrated audio-visual phonetic information. We examined how selective adaptation to audio and visual adaptors shift perception of speech along an audiovisual test continuum. This test-continuum consisted of nine audio-/ba/-visual-/va/ stimuli, ranging in visual clarity of the mouth. When the mouth was clearly visible, perceivers “heard” the audio-visual stimulus as an integrated “va” percept 93.7% of the time (e.g., McGurk & MacDonald, 1976). As visibility of the mouth became less clear across the nine-item continuum, the audio-visual “va” percept weakened, resulting in a continuum ranging in audio-visual percepts from /va/ to /ba/. Perception of the test-stimuli was tested before and after adaptation. Changes in audiovisual speech perception were observed following adaptation to visual-/va/ and audiovisual-/va/, but not following adaptation to auditory-/va/, auditory-/ba/, or visual-/ba/. Adaptation modulates perception of integrated audio-visual speech by modulating the processing of sensory-specific information. The results suggest that auditory and visual speech information are not completely integrated at the level of selective adaptation. PMID:27041781
Astheimer, Lori B; Sanders, Lisa D
Recent event-related potential (ERP) evidence demonstrates that adults employ temporally selective attention to preferentially process the initial portions of words in continuous speech. Doing so is an effective listening strategy since word-initial segments are highly informative. Although the development of this process remains unexplored, directing attention to word onsets may be important for speech processing in young children who would otherwise be overwhelmed by the rapidly changing acoustic signals that constitute speech. We examined the use of temporally selective attention in 3- to 5-year-old children listening to stories by comparing ERPs elicited by attention probes presented at four acoustically matched times relative to word onsets: concurrently with a word onset, 100 ms before, 100 ms after, and at random control times. By 80 ms, probes presented at and after word onsets elicited a larger negativity than probes presented before word onsets or at control times. The latency and distribution of this effect is similar to temporally and spatially selective attention effects measured in adults and, despite differences in polarity, spatially selective attention effects measured in children. These results indicate that, like adults, preschool aged children modulate temporally selective attention to preferentially process the initial portions of words in continuous speech. Copyright © 2011 Elsevier Ltd. All rights reserved.
Sohal, Aman P S; Dasarathi, Madhuri; Lodh, Rajib; Cheetham, Tim; Devlin, Anita M
Hyperthyroidism is rare in pre-school children. Untreated, it can have a profound effect on normal growth and development, particularly in the first 2 years of life. Although neurological manifestations of dysthyroid states are well known, specific expressive speech and language disorder as a presentation of hyperthyroidism is rarely documented. Case reports of two children with hyperthyroidism presenting with speech and language delay. We report two pre-school children with hyperthyroidism, who presented with expressive speech and language delay, and demonstrated a significant improvement in their language skills following treatment with anti-thyroid medication. Hyperthyroidism must be considered in all children presenting with speech and language difficulties, particularly expressive speech delay. Prompt recognition and early treatment are likely to improve outcome.
Wang Xiaojia; Mao Qirong; Zhan Yongzhao
There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction
Strelcyk, Olaf; Dau, Torsten
Hearing-impaired people often experience great difficulty with speech communication when background noise is present, even if reduced audibility has been compensated for. Other impairment factors must be involved. In order to minimize confounding effects, the subjects participating in this study...... consisted of groups with homogeneous, symmetric audiograms. The perceptual listening experiments assessed the intelligibility of full-spectrum as well as low-pass filtered speech in the presence of stationary and fluctuating interferers, the individual's frequency selectivity and the integrity of temporal...... modulation were obtained. In addition, these binaural and monaural thresholds were measured in a stationary background noise in order to assess the persistence of the fine-structure processing to interfering noise. Apart from elevated speech reception thresholds, the hearing impaired listeners showed poorer...
Gao, Yayue; Wang, Qian; Ding, Yu; Wang, Changming; Li, Haifeng; Wu, Xihong; Qu, Tianshu; Li, Liang
Human listeners are able to selectively attend to target speech in a noisy environment with multiple-people talking. Using recordings of scalp electroencephalogram (EEG), this study investigated how selective attention facilitates the cortical representation of target speech under a simulated "cocktail-party" listening condition with speech-on-speech masking. The result shows that the cortical representation of target-speech signals under the multiple-people talking condition was specifically improved by selective attention relative to the non-selective-attention listening condition, and the beta-band activity was most strongly modulated by selective attention. Moreover, measured with the Granger Causality value, selective attention to the single target speech in the mixed-speech complex enhanced the following four causal connectivities for the beta-band oscillation: the ones (1) from site FT7 to the right motor area, (2) from the left frontal area to the right motor area, (3) from the central frontal area to the right motor area, and (4) from the central frontal area to the right frontal area. However, the selective-attention-induced change in beta-band causal connectivity from the central frontal area to the right motor area, but not other beta-band causal connectivities, was significantly correlated with the selective-attention-induced change in the cortical beta-band representation of target speech. These findings suggest that under the "cocktail-party" listening condition, the beta-band oscillation in EEGs to target speech is specifically facilitated by selective attention to the target speech that is embedded in the mixed-speech complex. The selective attention-induced unmasking of target speech may be associated with the improved beta-band functional connectivity from the central frontal area to the right motor area, suggesting a top-down attentional modulation of the speech-motor process.
Oberfeld, Daniel; Klöckner-Nowotny, Felicitas
Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise.
Oberfeld, Daniel; Klöckner-Nowotny, Felicitas
Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise. DOI: http://dx.doi.org/10.7554/eLife.16747.001 PMID:27580272
Full Text Available The paper presents an automatic speaker’s recognition system, implemented in the Matlab environment, and demonstrates how to achieve and optimize various elements of the system. The main emphasis was put on features selection of a speech signal using a genetic algorithm which takes into account synergy of features. The results of optimization of selected elements of a classifier have been also shown, including the number of Gaussian distributions used to model each of the voices. In addition, for creating voice models, a universal voice model has been used.[b]Keywords[/b]: biometrics, automatic speaker recognition, genetic algorithms, feature selection
Ma, C.; Kamp, Y.; Willems, L.F.
This paper investigates a weighted LPC analysis of voiced speech. In view of the speech production model, the weighting function is either chosen to be the short-time energy function of the preemphasized speech sample sequence with certain delays or is obtained by thresholding the short-time energy
Alan James Power
Full Text Available Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal ‘samples’ of information from the speech stream at different rates, phase-resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (‘phase locking’. Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase-locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically-developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate based on repetition of the syllable ba, presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a talking head. To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the ba stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a ba in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling
Congdon, Eliza L; Novack, Miriam A; Brooks, Neon; Hemani-Lopez, Naureen; O'Keefe, Lucy; Goldin-Meadow, Susan
When teachers gesture during instruction, children retain and generalize what they are taught (Goldin-Meadow, 2014). But why does gesture have such a powerful effect on learning? Previous research shows that children learn most from a math lesson when teachers present one problem-solving strategy in speech while simultaneously presenting a different, but complementary, strategy in gesture (Singer & Goldin-Meadow, 2005). One possibility is that gesture is powerful in this context because it presents information simultaneously with speech. Alternatively, gesture may be effective simply because it involves the body, in which case the timing of information presented in speech and gesture may be less important for learning. Here we find evidence for the importance of simultaneity: 3 rd grade children retain and generalize what they learn from a math lesson better when given instruction containing simultaneous speech and gesture than when given instruction containing sequential speech and gesture. Interpreting these results in the context of theories of multimodal learning, we find that gesture capitalizes on its synchrony with speech to promote learning that lasts and can be generalized.
U.S. Environmental Protection Agency — This dataset contains selected cases involving EPA's Regional Judicial Officers (RJOs) from 2005 to present. EPA's Regional Judicial Officers (RJOs) perform...
Invention deals with the content of a speech, arrangement involves placing the content in an order that is most strategic, style focuses on selecting linguistic devices, such as metaphor, to make the message more appealing, memory assists the speaker in delivering the message correctly, and delivery ideally enables great reception of the message.…
Wijngaarden, S.J. van; Rots, G.
Background: Aircrews are often exposed to high ambient sound levels, especially in military aviation. Since long-term exposure to such noise may cause hearing damage, selection of adequate hearing protective devices is crucial. Such devices also affect speech intelligibility. When speech
Full Text Available The development of esophageal speech was examined in a laryngectomee subject to observe the emergence of selected acoustic characteristics, and their relation to listener intelligibility ratings. Over a two-and-a-half month period, the data from five recording sessions was used for spectrographic and perceptual (listener analysis. There was evidence to suggest a fairly reliable correlation between emerging acoustic characteristics and increasing perceptual ratings. Acoustic factors coincident with increased intelligibility ratings appeared related to two dimensions: firstly, the increasing pseudoglottic control over esophageal air release; secondly the presence of a mechanism of pharyngeal compression. Increased pseudoglottic control manifested in a reduction of tracheo-esophageal turbulence, and a more efficient burping mode of vibration with clearer formant structure. Spectrographic evidence of a fundamental frequency did not emerge. These dimensions appeared to have potential diagnostic and therapeutic value, rendering an analysis of the patient's developing vocal performance more explicit for both clinician and patient.
Power, Alan J; Foxe, John J; Forde, Emma-Jane; Reilly, Richard B; Lalor, Edmund C
Distinguishing between speakers and focusing attention on one speaker in multi-speaker environments is extremely important in everyday life. Exactly how the brain accomplishes this feat and, in particular, the precise temporal dynamics of this attentional deployment are as yet unknown. A long history of behavioral research using dichotic listening paradigms has debated whether selective attention to speech operates at an early stage of processing based on the physical characteristics of the stimulus or at a later stage during semantic processing. With its poor temporal resolution fMRI has contributed little to the debate, while EEG-ERP paradigms have been hampered by the need to average the EEG in response to discrete stimuli which are superimposed onto ongoing speech. This presents a number of problems, foremost among which is that early attention effects in the form of endogenously generated potentials can be so temporally broad as to mask later attention effects based on the higher level processing of the speech stream. Here we overcome this issue by utilizing the AESPA (auditory evoked spread spectrum analysis) method which allows us to extract temporally detailed responses to two concurrently presented speech streams in natural cocktail-party-like attentional conditions without the need for superimposed probes. We show attentional effects on exogenous stimulus processing in the 200-220 ms range in the left hemisphere. We discuss these effects within the context of research on auditory scene analysis and in terms of a flexible locus of attention that can be deployed at a particular processing stage depending on the task. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.
Full Text Available studies of the control of complex sequential movements have dissociated two aspects of movement planning: control over the sequential selection of movement plans, and control over the precise timing of movement execution. This distinction is particularly relevant in the production of speech: utterances contain sequentially ordered words and syllables, but articulatory movements are often executed in a non-sequential, overlapping manner with precisely coordinated relative timing. This study presents a hybrid dynamical model in which competitive activation controls selection of movement plans and coupled oscillatory systems govern coordination. The model departs from previous approaches by ascribing an important role to competitive selection of articulatory plans within a syllable. Numerical simulations show that the model reproduces a variety of speech production phenomena, such as effects of preparation and utterance composition on reaction time, and asymmetries in patterns of articulatory timing associated with onsets and codas. The model furthermore provides a unified understanding of a diverse group of phonetic and phonological phenomena which have not previously been related.
Partila, Pavol; Voznak, Miroslav; Tovarek, Jaromir
The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.
Full Text Available The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.
Mcleod, Sharynne; Baker, Elise
A survey of 231 Australian speech-language pathologists (SLPs) was undertaken to describe practices regarding assessment, analysis, target selection, intervention, and service delivery for children with speech sound disorders (SSD). The participants typically worked in private practice, education, or community health settings and 67.6% had a waiting list for services. For each child, most of the SLPs spent 10-40 min in pre-assessment activities, 30-60 min undertaking face-to-face assessments, and 30-60 min completing paperwork after assessments. During an assessment SLPs typically conducted a parent interview, single-word speech sampling, collected a connected speech sample, and used informal tests. They also determined children's stimulability and estimated intelligibility. With multilingual children, informal assessment procedures and English-only tests were commonly used and SLPs relied on family members or interpreters to assist. Common analysis techniques included determination of phonological processes, substitutions-omissions-distortions-additions (SODA), and phonetic inventory. Participants placed high priority on selecting target sounds that were stimulable, early developing, and in error across all word positions and 60.3% felt very confident or confident selecting an appropriate intervention approach. Eight intervention approaches were frequently used: auditory discrimination, minimal pairs, cued articulation, phonological awareness, traditional articulation therapy, auditory bombardment, Nuffield Centre Dyspraxia Programme, and core vocabulary. Children typically received individual therapy with an SLP in a clinic setting. Parents often observed and participated in sessions and SLPs typically included siblings and grandparents in intervention sessions. Parent training and home programs were more frequently used than the group therapy. Two-thirds kept up-to-date by reading journal articles monthly or every 6 months. There were many similarities with
Schriefers, H.J.; Jescheniak, J.D.; Hantsch, A.
N.O. Schiller and A. Caramazza (2003) and A. Costa, D. Kovacic, E. Fedorenko, and A. Caramazza (2003) have argued that the processing of freestanding gender-marked morphemes (e.g., determiners) and bound gender-marked morphemes (e.g., adjective suffixes) during syntactic encoding in speech
Zion Golumbic, Elana M.; Poeppel, David; Schroeder, Charles E.
The human capacity for processing speech is remarkable, especially given that information in speech unfolds over multiple time scales concurrently. Similarly notable is our ability to filter out of extraneous sounds and focus our attention on one conversation, epitomized by the ‘Cocktail Party’ effect. Yet, the neural mechanisms underlying on-line speech decoding and attentional stream selection are not well understood. We review findings from behavioral and neurophysiological investigations that underscore the importance of the temporal structure of speech for achieving these perceptual feats. We discuss the hypothesis that entrainment of ambient neuronal oscillations to speech’s temporal structure, across multiple time-scales, serves to facilitate its decoding and underlies the selection of an attended speech stream over other competing input. In this regard, speech decoding and attentional stream selection are examples of ‘active sensing’, emphasizing an interaction between proactive and predictive top-down modulation of neuronal dynamics and bottom-up sensory input. PMID:22285024
Gao, Yayue; Wang, Qian; Ding, Yu; Wang, Changming; Li, Haifeng; Wu, Xihong; Qu, Tianshu; Li, Liang
Human listeners are able to selectively attend to target speech in a noisy environment with multiple-people talking. Using recordings of scalp electroencephalogram (EEG), this study investigated how selective attention facilitates the cortical representation of target speech under a simulated “cocktail-party” listening condition with speech-on-speech masking. The result shows that the cortical representation of target-speech signals under the multiple-people talking condition was specifically improved by selective attention relative to the non-selective-attention listening condition, and the beta-band activity was most strongly modulated by selective attention. Moreover, measured with the Granger Causality value, selective attention to the single target speech in the mixed-speech complex enhanced the following four causal connectivities for the beta-band oscillation: the ones (1) from site FT7 to the right motor area, (2) from the left frontal area to the right motor area, (3) from the central frontal area to the right motor area, and (4) from the central frontal area to the right frontal area. However, the selective-attention-induced change in beta-band causal connectivity from the central frontal area to the right motor area, but not other beta-band causal connectivities, was significantly correlated with the selective-attention-induced change in the cortical beta-band representation of target speech. These findings suggest that under the “cocktail-party” listening condition, the beta-band oscillation in EEGs to target speech is specifically facilitated by selective attention to the target speech that is embedded in the mixed-speech complex. The selective attention-induced unmasking of target speech may be associated with the improved beta-band functional connectivity from the central frontal area to the right motor area, suggesting a top-down attentional modulation of the speech-motor process. PMID:28239344
Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A
Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.
Jørgensen, Søren; Dau, Torsten
A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The ...... process provides a key measure of speech intelligibility. © 2011 Acoustical Society of America.......A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data....... The model estimates the speech-to-noise envelope power ratio, SNR env, at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech...
Nelson, W. T; Bolia, Robert S; Ericson, Mark A; McKinley, Richard L
.... Factorial combinations of three variables, including the number of localized speech signals, the location of the speech signals along the horizontal plane, and the sex of the talker were employed...
Yao, Bo; Belin, Pascal; Scheepers, Christoph
In human communication, direct speech (e.g., Mary said: "I'm hungry") is perceived to be more vivid than indirect speech (e.g., Mary said [that] she was hungry). However, for silent reading, the representational consequences of this distinction are still unclear. Although many of us share the intuition of an "inner voice," particularly during silent reading of direct speech statements in text, there has been little direct empirical confirmation of this experience so far. Combining fMRI with eye tracking in human volunteers, we show that silent reading of direct versus indirect speech engenders differential brain activation in voice-selective areas of the auditory cortex. This suggests that readers are indeed more likely to engage in perceptual simulations (or spontaneous imagery) of the reported speaker's voice when reading direct speech as opposed to meaning-equivalent indirect speech statements as part of a more vivid representation of the former. Our results may be interpreted in line with embodied cognition and form a starting point for more sophisticated interdisciplinary research on the nature of auditory mental simulation during reading.
Full Text Available (1 To evaluate the recognition of words, phonemes and lexical tones in audiovisual (AV and auditory-only (AO modes in Mandarin-speaking adults with cochlear implants (CIs; (2 to understand the effect of presentation levels on AV speech perception; (3 to learn the effect of hearing experience on AV speech perception.Thirteen deaf adults (age = 29.1±13.5 years; 8 male, 5 female who had used CIs for >6 months and 10 normal-hearing (NH adults participated in this study. Seven of them were prelingually deaf, and 6 postlingually deaf. The Mandarin Monosyllablic Word Recognition Test was used to assess recognition of words, phonemes and lexical tones in AV and AO conditions at 3 presentation levels: speech detection threshold (SDT, speech recognition threshold (SRT and 10 dB SL (re:SRT.The prelingual group had better phoneme recognition in the AV mode than in the AO mode at SDT and SRT (both p = 0.016, and so did the NH group at SDT (p = 0.004. Mode difference was not noted in the postlingual group. None of the groups had significantly different tone recognition in the 2 modes. The prelingual and postlingual groups had significantly better phoneme and tone recognition than the NH one at SDT in the AO mode (p = 0.016 and p = 0.002 for phonemes; p = 0.001 and p<0.001 for tones but were outperformed by the NH group at 10 dB SL (re:SRT in both modes (both p<0.001 for phonemes; p<0.001 and p = 0.002 for tones. The recognition scores had a significant correlation with group with age and sex controlled (p<0.001.Visual input may help prelingually deaf implantees to recognize phonemes but may not augment Mandarin tone recognition. The effect of presentation level seems minimal on CI users' AV perception. This indicates special considerations in developing audiological assessment protocols and rehabilitation strategies for implantees who speak tonal languages.
Golumbic, Elana Zion; Cogan, Gregory B.; Schroeder, Charles E.; Poeppel, David
Our ability to selectively attend to one auditory signal amidst competing input streams, epitomized by the ‘Cocktail Party’ problem, continues to stimulate research from various approaches. How this demanding perceptual feat is achieved from a neural systems perspective remains unclear and controversial. It is well established that neural responses to attended stimuli are enhanced compared to responses to ignored ones, but responses to ignored stimuli are nonetheless highly significant, leading to interference in performance. We investigated whether congruent visual input of an attended speaker enhances cortical selectivity in auditory cortex, leading to diminished representation of ignored stimuli. We recorded magnetoencephalographic (MEG) signals from human participants as they attended to segments of natural continuous speech. Using two complementary methods of quantifying the neural response to speech, we found that viewing a speaker’s face enhances the capacity of auditory cortex to track the temporal speech envelope of that speaker. This mechanism was most effective in a ‘Cocktail Party’ setting, promoting preferential tracking of the attended speaker, whereas without visual input no significant attentional modulation was observed. These neurophysiological results underscore the importance of visual input in resolving perceptual ambiguity in a noisy environment. Since visual cues in speech precede the associated auditory signals, they likely serve a predictive role in facilitating auditory processing of speech, perhaps by directing attentional resources to appropriate points in time when to-be-attended acoustic input is expected to arrive. PMID:23345218
Hux, Karen; Knollman-Porter, Kelly; Brown, Jessica; Wallace, Sarah E
Using text-to-speech technology to provide simultaneous written and auditory content presentation may help compensate for chronic reading challenges if people with aphasia can understand synthetic speech output; however, inherent auditory comprehension challenges experienced by people with aphasia may make understanding synthetic speech difficult. This study's purpose was to compare the preferences and auditory comprehension accuracy of people with aphasia when listening to sentences generated with digitized natural speech, Alex synthetic speech (i.e., Macintosh platform), or David synthetic speech (i.e., Windows platform). The methodology required each of 20 participants with aphasia to select one of four images corresponding in meaning to each of 60 sentences comprising three stimulus sets. Results revealed significantly better accuracy given digitized natural speech than either synthetic speech option; however, individual participant performance analyses revealed three patterns: (a) comparable accuracy regardless of speech condition for 30% of participants, (b) comparable accuracy between digitized natural speech and one, but not both, synthetic speech option for 45% of participants, and (c) greater accuracy with digitized natural speech than with either synthetic speech option for remaining participants. Ranking and Likert-scale rating data revealed a preference for digitized natural speech and David synthetic speech over Alex synthetic speech. Results suggest many individuals with aphasia can comprehend synthetic speech options available on popular operating systems. Further examination of synthetic speech use to support reading comprehension through text-to-speech technology is thus warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Carlile, Simon; Corkhill, Caitlin
To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20 dB improvement in speech reception threshold; 40% of which was attributed to a release from informational masking. When across frequency temporal modulations in the masker talkers are decorrelated the speech is unintelligible, although the within frequency modulation characteristics remains identical. Used as a masker as above, the information masking accounted for 37% of the spatial unmasking seen with this masker. This unintelligible and highly differentiable masker is unlikely to involve top-down processes. These data provides strong evidence of bottom-up masking involving speech-like, within-frequency modulations and that this, presumably low level process, can be modulated by selective spatial attention.
Li, Xiaoqing; Lu, Yong; Zhao, Haiyan
The present study used EEG to investigate how and when top-down prediction interacts with bottom-up acoustic signals in temporally selective attention during speech comprehension. Mandarin Chinese spoken sentences were used as stimuli. We systematically manipulated the predictability and de/accentuation of the critical words in the sentence context. Meanwhile, a linguistic attention probe 'ba' was presented concurrently with the critical words or not. The results showed that, first, words with a linguistic attention probe elicited a larger N1 than those without a probe. The latency of this N1 effect was shortened for accented or lowly predictable words, indicating more attentional resources allocated to these words. Importantly, prediction and accentuation showed a complementary interplay on the latency of this N1 effect, demonstrating that when the words had already attracted attention due to low predictability or due to the presence of pitch accent, the other factor did not modulate attention allocation anymore. Second, relative to the lowly predictable words, the highly predictable words elicited a reduced N400 and enhanced gamma-band power increases, especially under the accented conditions; moreover, under the accented conditions, shorter N1 peak-latency was found to correlate with larger gamma-band power enhancement, which indicates that a close relationship might exist between early selective attention and later semantic integration. Finally, the interaction between top-down selective attention (driven by prediction) and bottom-up selective attention (driven by accentuation) occurred before lexical-semantic processing, namely before the N400 effect evoked by predictability, which was discussed with regard to the language comprehension models. Copyright © 2014 Elsevier Ltd. All rights reserved.
Nguyen, AiVi; Dragga, Anthony
Free speech is fast becoming a hot-button issue at colleges across the country, with campus protests often mirroring those of the public-at-large on issues such as racism or tackling institution-specific matters such as college governance. On the surface, the issue of campus free speech may seem like a purely legal concern, yet in reality,…
Katongo, Emily Mwamba; Ndhlovu, Daniel
This study sought to establish the role of music in speech intelligibility of learners with Post Lingual Hearing Impairment (PLHI) and strategies teachers used to enhance speech intelligibility in learners with PLHI in selected special units for the deaf in Lusaka district. The study used a descriptive research design. Qualitative and quantitative…
Forte, Antonio Elia; Etard, Octave; Reichenbach, Tobias
Humans excel at selectively listening to a target speaker in background noise such as competing voices. While the encoding of speech in the auditory cortex is modulated by selective attention, it remains debated whether such modulation occurs already in subcortical auditory structures. Investigating the contribution of the human brainstem to attention has, in particular, been hindered by the tiny amplitude of the brainstem response. Its measurement normally requires a large number of repetitions of the same short sound stimuli, which may lead to a loss of attention and to neural adaptation. Here we develop a mathematical method to measure the auditory brainstem response to running speech, an acoustic stimulus that does not repeat and that has a high ecological validity. We employ this method to assess the brainstem's activity when a subject listens to one of two competing speakers, and show that the brainstem response is consistently modulated by attention.
Krieger-Redwood, Katya; Gaskell, M Gareth; Lindsay, Shane; Jefferies, Elizabeth
Several accounts of speech perception propose that the areas involved in producing language are also involved in perceiving it. In line with this view, neuroimaging studies show activation of premotor cortex (PMC) during phoneme judgment tasks; however, there is debate about whether speech perception necessarily involves motor processes, across all task contexts, or whether the contribution of PMC is restricted to tasks requiring explicit phoneme awareness. Some aspects of speech processing, such as mapping sounds onto meaning, may proceed without the involvement of motor speech areas if PMC specifically contributes to the manipulation and categorical perception of phonemes. We applied TMS to three sites-PMC, posterior superior temporal gyrus, and occipital pole-and for the first time within the TMS literature, directly contrasted two speech perception tasks that required explicit phoneme decisions and mapping of speech sounds onto semantic categories, respectively. TMS to PMC disrupted explicit phonological judgments but not access to meaning for the same speech stimuli. TMS to two further sites confirmed that this pattern was site specific and did not reflect a generic difference in the susceptibility of our experimental tasks to TMS: stimulation of pSTG, a site involved in auditory processing, disrupted performance in both language tasks, whereas stimulation of occipital pole had no effect on performance in either task. These findings demonstrate that, although PMC is important for explicit phonological judgments, crucially, PMC is not necessary for mapping speech onto meanings.
Ирина Михайловна Некипелова
Full Text Available The article is devoted to research of action of uniform search algorithm when selecting by human of language units for speech produce. The process is connected with a speech optimization phenomenon. This makes it possible to shorten the time of cogitation something that human want to say, and to achieve the maximum precision in thoughts expression. The algorithm of uniform search works at consciousness and subconsciousness levels. It favours the forming of automatism produce and perception of speech. Realization of human's cognitive potential in the process of communication starts up complicated mechanism of self-organization and self-regulation of language. In turn, it results in optimization of language system, servicing needs not only human's self-actualization but realization of communication in society. The method of problem-oriented search is used for researching of optimization mechanisms, which are distinctive to speech producing and stabilization of language.DOI: http://dx.doi.org/10.12731/2218-7405-2013-4-50
Presents Mina Shaughnessy's thoughts on why English professors dislike the teaching of writing, what is needed in writing research, the disadvantages of being a writing teacher at an open admissions school, what open admissions policies have revealed about education in general and basic writing instruction in particular, and writing evaluation…
Beatty, Michael J.
Examines the choice-making processes of students engaged in the selection of speech introduction strategies. Finds that the frequency of students making decision-making errors was a positive function of public speaking apprehension. (MS)
Maria Luisa A. Valdez
Full Text Available Gender inequality and the resulting discrimination of women are deeply rooted in history, culture and tradition. It is said to be detrimental to the mental health of women and persists as a debilitating stigma which lowers their dignity and sense of self-worth. Thus, this qualitative research was conducted to underscore the issue of gender equality and women empowerment as core topics in selected speeches of Senator Miriam Defensor Santiago. Findings of the analysis showed that the issue of gender gap in the Philippines was manifested and discussed forthrightly by the senator in her speeches in terms of educational attainment, health and survival, economic participation and opportunity, and political empowerment, all being effectively touched by the senator with the signature wit, eloquence, astuteness and passion she was widely known for; that gender equality and women empowerment were likewise gleaned in the selected speeches, all of which were delivered by Miriam Defensor Santiago with the motive of persuading her audience to espouse the same advocacy, and this she achieved through her unique and distinct style of utilizing the persuasive ability of literature; and, that the implications of the author's advocacy on gender equality and gender empowerment delegated the monumental task upon the shoulders of the Filipino youth, in ways that their thinking will be directly influenced by her advocacy and thus promote within them a sense of urgency to embrace and espouse the same advocacies in order for them to be able to contribute to nation building.
Boldt, Jesper B.; Bertelsen, Andreas Thelander; Gran, Fredrik
Recently, the ideal binary mask has been introduced in the modulation domain by extending the ideal channel selection method to modulation channel selection . This new method shows substantial improvement in speech intelligibility but less than its predecessor despite the higher complexity. Here......, we extend the previous finding from  and provide a more direct comparison of binary masking in the modulation domain with binary masking in the time-frequency domain. Subjective and objective evaluations are performed and provide additional insight into modulation domain processing....
Valerya A. Trofimova
Full Text Available This article analyzes the phenomenon of verbal strategies in the political discourse. The authors consider the ways of language self-realization strategy on the material of pre-election speeches by Hillary Clinton. Within the study it has been revealed the most typical means of linguistic expression, as well as the method of operation in question. The performance peculiarity of the presidential candidate speech, as it has been noted, is her ability to combine the expressive potential of syntactic and lexical and grammatical means.
Full Text Available The automatic analysis of speech to detect affective states may improve the way users interact with electronic devices. However, the analysis only at the acoustic level could be not enough to determine the emotion of a user in a realistic scenario. In this paper we analyzed the spontaneous speech recordings of the FAU Aibo Corpus at the acoustic and linguistic levels to extract two sets of features. The acoustic set was reduced by a greedy procedure selecting the most relevant features to optimize the learning stage. We compared two versions of this greedy selection algorithm by performing the search of the relevant features forwards and backwards. We experimented with three classification approaches: Naïve-Bayes, a support vector machine and a logistic model tree, and two fusion schemes: decision-level fusion, merging the hard-decisions of the acoustic and linguistic classifiers by means of a decision tree; and feature-level fusion, concatenating both sets of features before the learning stage. Despite the low performance achieved by the linguistic data, a dramatic improvement was achieved after its combination with the acoustic information, improving the results achieved by this second modality on its own. The results achieved by the classifiers using the parameters merged at feature level outperformed the classification results of the decision-level fusion scheme, despite the simplicity of the scheme. Moreover, the extremely reduced set of acoustic features obtained by the greedy forward search selection algorithm improved the results provided by the full set.
Strijkers, Kristof; Costa, Albert
Speech requires time. How much time often depends on the amount of labor the brain has to perform in order to retrieve the linguistic information related to the ideas we want to express. Although most psycholinguistic research in the field of language production has focused on the net result of time required to utter words in various experimental conditions, over the last years more and more researchers pursued the objective to flesh out the time course of particular stages implicated in language production. Here we critically review these studies, with particular interest for the time course of lexical selection. First, we evaluate the data underlying the estimates of an influential temporal meta-analysis on language production (Indefrey and Levelt, 2004). We conclude that those data alone are not sufficient to provide a reliable time frame of lexical selection. Next, we discuss recent neurophysiological evidence which we argue to offer more explicit insights into the time course of lexical selection. Based on this evidence we suggest that, despite the absence of a clear time frame of how long lexical selection takes, there is sufficient direct evidence to conclude that the brain initiates lexical access within 200 ms after stimulus presentation, hereby confirming Indefrey and Levelt's estimate. In a final section, we briefly review the proposed mechanisms which could lead to this rapid onset of lexical access, namely automatic spreading activation versus specific concept selection, and discuss novel data which support the notion of spreading activation, but indicate that the speed with which this principle takes effect is driven by a top-down signal in function of the intention to engage in a speech act.
Strijkers, Kristof; Costa, Albert
Speech requires time. How much time often depends on the amount of labor the brain has to perform in order to retrieve the linguistic information related to the ideas we want to express. Although most psycholinguistic research in the field of language production has focused on the net result of time required to utter words in various experimental conditions, over the last years more and more researchers pursued the objective to flesh out the time course of particular stages implicated in language production. Here we critically review these studies, with particular interest for the time course of lexical selection. First, we evaluate the data underlying the estimates of an influential temporal meta-analysis on language production (Indefrey and Levelt, 2004). We conclude that those data alone are not sufficient to provide a reliable time frame of lexical selection. Next, we discuss recent neurophysiological evidence which we argue to offer more explicit insights into the time course of lexical selection. Based on this evidence we suggest that, despite the absence of a clear time frame of how long lexical selection takes, there is sufficient direct evidence to conclude that the brain initiates lexical access within 200 ms after stimulus presentation, hereby confirming Indefrey and Levelt’s estimate. In a final section, we briefly review the proposed mechanisms which could lead to this rapid onset of lexical access, namely automatic spreading activation versus specific concept selection, and discuss novel data which support the notion of spreading activation, but indicate that the speed with which this principle takes effect is driven by a top-down signal in function of the intention to engage in a speech act. PMID:22144973
As allied health professions change over time to keep up with and reflect a rapidly changing society, it is quite possible that the people attracted to the profession may also change. If this is the case, then knowing this could be critical for future workforce marketing, training and planning. The aim was to investigate whether the personality of students entering a speech-language pathology (SLP) program had changed over time and whether there were generational differences in personality. The study used the Big Five personality inventory to consider whether there were differences in the personality in speech-language pathology (SLP) students enrolled in the same regional university in Australia in 2005 and 2016. The results showed there were significant differences between the two groups on the Agreeableness and Extroversion scales. The students who were more Conscientious were also more Confident in their ability to perform as an SLP. Generational differences across the two cohorts were also considered. SLP is a dynamic profession that is reflected through an evolving scope of practice, increasing utilization of technology and specialization. As careers evolve it is logical that the people attracted to those careers may also shift; as demonstrated here via changes in the personality of SLP students. Understanding the personality of current SLP students and future Generation Z students may assist universities to identify specific skills and experiences students need to be successful in the workforce. © 2017 Royal College of Speech and Language Therapists.
Lodico, Dana M.; Torres, Rendell R.; Shimizu, Yasushi; Hunter, Claudia
This study investigates the effects of interference speech and the built acoustical environment on human performance, and the possibility of designing spaces to architecturally meet the acoustical goals of office and classroom environments. The effects of room size, geometry, and acoustical parameters on human performance are studied through human subject testing. Three experiments are used to investigate the effects of distracting background speech on short-term memory for verbally presented prose under constrained laboratory conditions. Short-term memory performance is rated within four different acoustical spaces and five background noise levels, as well as a quiet condition. The presentation will cover research methods, results, and possibilities for furthering this research. [Work supported by the Program in Architectural Acoustics, School of Architecture, Rensselaer Polytechnic Institute.
Ungvari, G S; White, E; Pang, A H
Over the past decade there has been an upsurge of interest in the prevalence, nosological position, treatment response and pathophysiology of catatonia. However, the psychopathology of catatonia has received only scant attention. Once the hallmark of catatonia, speech disorders--particularly logorrhoea, verbigeration and echolalia--seem to have been neglected in modern literature. The aims of the present paper are to outline the conceptual history of catatonic speech disorders and to follow their development in contemporary clinical research. The English-language psychiatric literature for the last 60 years on logorrhoea, verbigeration and echolalia was searched through Medline and cross-referencing. Kahlbaum, Wernicke, Jaspers, Kraepelin, Bleuler, Kleist and Leonhard's oft cited classical texts supplemented the search. In contrast to classical psychopathological sources, very few recent papers were found on catatonic speech disorders. Current clinical research failed to incorporate the observations of traditional descriptive psychopathology. Modern catatonia research operates with simplified versions of psychopathological terms devised and refined by generations of classical writers.
Puschmann, Sebastian; Steinkamp, Simon; Gillich, Imke; Mirkovic, Bojana; Debener, Stefan; Thiel, Christiane M
Listening selectively to one out of several competing speakers in a "cocktail party" situation is a highly demanding task. It relies on a widespread cortical network, including auditory sensory, but also frontal and parietal brain regions involved in controlling auditory attention. Previous work has shown that, during selective listening, ongoing neural activity in auditory sensory areas is dominated by the attended speech stream, whereas competing input is suppressed. The relationship between these attentional modulations in the sensory tracking of the attended speech stream and frontoparietal activity during selective listening is, however, not understood. We studied this question in young, healthy human participants (both sexes) using concurrent EEG-fMRI and a sustained selective listening task, in which one out of two competing speech streams had to be attended selectively. An EEG-based speech envelope reconstruction method was applied to assess the strength of the cortical tracking of the to-be-attended and the to-be-ignored stream during selective listening. Our results show that individual speech envelope reconstruction accuracies obtained for the to-be-attended speech stream were positively correlated with the amplitude of sustained BOLD responses in the right temporoparietal junction, a core region of the ventral attention network. This brain region further showed task-related functional connectivity to secondary auditory cortex and regions of the frontoparietal attention network, including the intraparietal sulcus and the inferior frontal gyrus. This suggests that the right temporoparietal junction is involved in controlling attention during selective listening, allowing for a better cortical tracking of the attended speech stream. SIGNIFICANCE STATEMENT Listening selectively to one out of several simultaneously talking speakers in a "cocktail party" situation is a highly demanding task. It activates a widespread network of auditory sensory and
This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering
Oberem, Josefa; Koch, Iring; Fels, Janina
Using a binaural-listening paradigm, age-related differences in the ability to intentionally switch auditory selective attention between two speakers, defined by their spatial location, were examined. Therefore 40 normal-hearing participants (20 young, Ø 24.8years; 20 older Ø 67.8years) were tested. The spatial reproduction of stimuli was provided by headphones using head-related-transfer-functions of an artificial head. Spoken number words of two speakers were presented simultaneously to participants from two out of eight locations on the horizontal plane. Guided by a visual cue indicating the spatial location of the target speaker, the participants were asked to categorize the target's number word into smaller vs. greater than five while ignoring the distractor's speech. Results showed significantly higher reaction times and error rates for older participants. The relative influence of the spatial switch of the target-speaker (switch or repetition of speaker's direction in space) was identical across age groups. Congruency effects (stimuli spoken by target and distractor may evoke the same answer or different answers) were increased for older participants and depend on the target's position. Results suggest that the ability to intentionally switch auditory attention to a new cued location was unimpaired whereas it was generally harder for older participants to suppress processing the distractor's speech. Copyright © 2017 Elsevier B.V. All rights reserved.
Scheperle, Rachel A; Abbas, Paul J
The ability to perceive speech is related to the listener's ability to differentiate among frequencies (i.e., spectral resolution). Cochlear implant (CI) users exhibit variable speech-perception and spectral-resolution abilities, which can be attributed in part to the extent of electrode interactions at the periphery (i.e., spatial selectivity). However, electrophysiological measures of peripheral spatial selectivity have not been found to correlate with speech perception. The purpose of this study was to evaluate auditory processing at the periphery and cortex using both simple and spectrally complex stimuli to better understand the stages of neural processing underlying speech perception. The hypotheses were that (1) by more completely characterizing peripheral excitation patterns than in previous studies, significant correlations with measures of spectral selectivity and speech perception would be observed, (2) adding information about processing at a level central to the auditory nerve would account for additional variability in speech perception, and (3) responses elicited with spectrally complex stimuli would be more strongly correlated with speech perception than responses elicited with spectrally simple stimuli. Eleven adult CI users participated. Three experimental processor programs (MAPs) were created to vary the likelihood of electrode interactions within each participant. For each MAP, a subset of 7 of 22 intracochlear electrodes was activated: adjacent (MAP 1), every other (MAP 2), or every third (MAP 3). Peripheral spatial selectivity was assessed using the electrically evoked compound action potential (ECAP) to obtain channel-interaction functions for all activated electrodes (13 functions total). Central processing was assessed by eliciting the auditory change complex with both spatial (electrode pairs) and spectral (rippled noise) stimulus changes. Speech-perception measures included vowel discrimination and the Bamford-Kowal-Bench Speech
Aliphas, Avner; Colburn, H. Steven; Ghitza, Oded
JNDS of interaural time delay (ITD) of selected frequency bands in the presence of other frequency bands have been reported for noiseband stimuli [Zurek (1985); Trahiotis and Bernstein (1990)]. Similar measurements will be reported for speech and music signals. When stimuli are synthesized with bandpass/band-stop operations, performance with complex stimuli are similar to noisebands (JNDS in tens or hundreds of microseconds); however, the resulting waveforms, when viewed through a model of the auditory periphery, show distortions (irregularities in phase and level) at the boundaries of the target band of frequencies. An alternate synthesis method based upon group-delay filtering operations does not show these distortions and is being used for the current measurements. Preliminary measurements indicate that when music stimuli are created using the new techniques, JNDS of ITDs are increased significantly compared to previous studies, with values on the order of milliseconds.
Schulz, Geralyn M; Hosey, Lara A; Bradberry, Trent J; Stager, Sheila V; Lee, Li-Ching; Pawha, Rajesh; Lyons, Kelly E; Metman, Leo Verhagen; Braun, Allen R
Deep brain stimulation (DBS) of the subthalamic nucleus improves the motor symptoms of Parkinson's disease, but may produce a worsening of speech and language performance at rates and amplitudes typically selected in clinical practice. The possibility that these dissociated effects might be modulated by selective stimulation of left and right STN has never been systematically investigated. To address this issue, we analyzed motor, speech and language functions of 12 patients implanted with bilateral stimulators configured for optimal motor responses. Behavioral responses were quantified under four stimulator conditions: bilateral DBS, right-only DBS, left-only DBS and no DBS. Under bilateral and left-only DBS conditions, our results exhibited a significant improvement in motor symptoms but worsening of speech and language. These findings contribute to the growing body of literature demonstrating that bilateral STN DBS compromises speech and language function and suggests that these negative effects may be principally due to left-sided stimulation. These findings may have practical clinical consequences, suggesting that clinicians might optimize motor, speech and language functions by carefully adjusting left- and right-sided stimulation parameters.
Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali
In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature.
Munoz-Reja, C.; Fuentes, L.; Garcia de la Infanta, J. M.; Munoz Sicilia, A.
One of the main aspects of the nuclear fuel is the selection of materials for the components. The operating conditions of the fuel elements impose a major challenge to materials: high temperature, corrosive aqueous environment, high mechanical properties, long periods of time under these extreme conditions and what is the differentiating factor; the effect of irradiation. The materials are selected to fulfill these severe requirements and also to be able to control and to predict its behavior in the working conditions. Their development, in terms of composition and processing, is based on the continuous follow-up of the operation behavior. Many of these materials are specific of the nuclear industry, such as the uranium dioxide and the zirconium alloys. This article presents the selection and development of the nuclear fuel materials as a function of the services requirements. It also includes a view of the new nuclear fuels materials that are being raised after Fukushima accident. (Author)
Cho, Soojin; Yu, Jyaehyoung; Chun, Hyungi; Seo, Hyekyung; Han, Woojae
Deficits of the aging auditory system negatively affect older listeners in terms of speech communication, resulting in limitations to their social lives. To improve their perceptual skills, the goal of this study was to investigate the effects of time alteration, selective word stress, and varying sentence lengths on the speech perception of older listeners. Seventeen older people with normal hearing were tested for seven conditions of different time-altered sentences (i.e., ±60%, ±40%, ±20%, 0%), two conditions of selective word stress (i.e., no-stress and stress), and three different lengths of sentences (i.e., short, medium, and long) at the most comfortable level for individuals in quiet circumstances. As time compression increased, sentence perception scores decreased statistically. Compared to a natural (or no stress) condition, the selectively stressed words significantly improved the perceptual scores of these older listeners. Long sentences yielded the worst scores under all time-altered conditions. Interestingly, there was a noticeable positive effect for the selective word stress at the 20% time compression. This pattern of results suggests that a combination of time compression and selective word stress is more effective for understanding speech in older listeners than using the time-expanded condition only.
Josiah, Ubong E.; Oghenerho, Gift
This paper investigates the speech of Martin Luther King (Jr.) titled: "I Have a Dream", presented in 1963 at the Lincoln Memorial. This speech is selected for use because it involves a speaker and an audience who belong to a particular speech community. The speech is about the failed promises by the Americans whose dream advocate…
deJong, BM; Willemsen, ATM; Paans, AMJ
A story told by his mother was presented on tape to a trauma patient in persistent vegetative state (PVS). During auditory presentation, measurements of regional cerebral blood flow (rCBF) were performed by means of positron emission tomography (PET). Changes in rCBF related to this stimulus
Pinelli, Thomas E. (Editor)
MODSIM World 2010 was held in Hampton, Virginia, October 13-15, 2010. The theme of the 2010 conference & expo was "21st Century Decision-Making: The Art of Modeling& Simulation". The conference program consisted of seven technical tracks - Defense, Engineering and Science, Health & Medicine, Homeland Security & First Responders, The Human Dimension, K-20 STEM Education, and Serious Games & Virtual Worlds. Selected papers and presentations from MODSIM World 2010 Conference & Expo are contained in this NASA Conference Publication (CP). Section 8.0 of this CP contains papers from MODSIM World 2009 Conference & Expo that were unavailable at the time of publication of NASA/CP-2010-216205 Selected Papers Presented at MODSIM World 2009 Conference and Expo, March 2010.
Pinelli, Thomas E. (Editor)
MODSIM World 2010 was held in Hampton, Virginia, October 13-15, 2010. The theme of the 2010 conference & expo was "21st Century Decision-Making: The Art of Modeling& Simulation". The conference program consisted of seven technical tracks - Defense, Engineering and Science, Health & Medicine, Homeland Security & First Responders, The Human Dimension, K-20 STEM Education, and Serious Games & Virtual Worlds. Selected papers and presentations from MODSIM World 2010 Conference & Expo are contained in this NASA Conference Publication (CP). Section 8.0 of this CP contains papers from MODSIM World 2009 Conference & Expo that were unavailable at the time of publication of NASA/CP-2010-216205 Selected Papers Presented at MODSIM World 2009 Conference and Expo, March 2010.
Lagerberg, Tove B.; Johnels, Jakob Åsberg; Hartelius, Lena; Persson, Christina
Background: The assessment of intelligibility is an essential part of establishing the severity of a speech disorder. The intelligibility of a speaker is affected by a number of different variables relating, "inter alia," to the speech material, the listener and the listener task. Aims: To explore the impact of the number of…
Carlile, Simon; Corkhill, Caitlin
To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20?dB improvement in speec...
Stephanie K Ries
Our results suggest that the posterior inferior LTC is involved in word selection as semantic concepts become available. Posterior medial and left PFC regions may be involved in trial-by-trial top-down control over LTC to help overcome interference caused by semantically-related alternatives in word selection. The single-case result supports this hypothesis and suggests that the posterior medial PFC plays a causal role in resolving this interference in word selection. Lastly, the sensitivity to semantic interference of the post-vocal onset posterior LTC activity suggests the semantic interference effect does not only reflect word selection difficulty but is also present at post-selection stages such as verbal response monitoring. In sum, this study reveals a dynamic network of interacting brain regions that support word selection in language production.
McCormick, Michael; Seta, John J
We tested whether framing a message as a gain or loss would alter its effectiveness by using a dichotic listening procedure to selectively present a health related message to the left or right hemisphere. A significant goal framing effect (losses > gains) was found when right, but not left, hemisphere processing was initially enhanced. The results support the position that the contextual processing style of the right hemisphere is especially sensitive to the associative implications of the frame. We discussed the implications of these findings for goal framing research, and the valence hypothesis. We also discussed how these findings converge with prior valence framing research and how they can be of potential use to health care providers.
Dana L Strait
Full Text Available Even in the quietest of rooms, our senses are perpetually inundated by a barrage of sounds, requiring the auditory system to adapt to a variety of listening conditions in order to extract signals of interest (e.g., one speaker’s voice amidst others. Brain networks that promote selective attention are thought to sharpen the neural encoding of a target signal, suppressing competing sounds and enhancing perceptual performance. Here, we ask: does musical training benefit cortical mechanisms that underlie selective attention to speech? To answer this question, we assessed the impact of selective auditory attention on cortical auditory-evoked response variability in musicians and nonmusicians. Outcomes indicate strengthened brain networks for selective auditory attention in musicians in that musicians but not nonmusicians demonstrate decreased prefrontal response variability with auditory attention. Results are interpreted in the context of previous work from our laboratory documenting perceptual and subcortical advantages in musicians for the hearing and neural encoding of speech in background noise. Musicians’ neural proficiency for selectively engaging and sustaining auditory attention to language indicates a potential benefit of music for auditory training. Given the importance of auditory attention for the development of language-related skills, musical training may aid in the prevention, habilitation and remediation of children with a wide range of attention-based language and learning impairments.
Smith, Nicholas A.; Gibilisco, Colleen R.; Meisinger, Rachel E.; Hankey, Maren
Two experiments used eye tracking to examine how infant and adult observers distribute their eye gaze on videos of a mother producing infant- and adult-directed speech. Both groups showed greater attention to the eyes than to the nose and mouth, as well as an asymmetrical focus on the talker’s right eye for infant-directed speech stimuli. Observers continued to look more at the talker’s apparent right eye when the video stimuli were mirror flipped, suggesting that the asymmetry reflects a per...
Nicholas A Smith
Full Text Available Two experiments used eye tracking to examine how infant and adult observers distribute their eye gaze on videos of a mother producing infant- and adult-directed speech. Both groups showed greater attention to the eyes than to the nose and mouth, as well as an asymmetrical focus on the talker’s right eye for infant-directed speech stimuli. Observers continued to look more at the talker’s apparent right eye when the video stimuli were mirror flipped, suggesting that the asymmetry reflects a perceptual processing bias rather than a stimulus artifact, which may be related to cerebral lateralization of emotion processing.
Kim, Jaebok; Park, Jeong-Sik
This paper proposes an efficient speech emotion recognition (SER) approach that utilizes personal voice data accumulated on personal devices. A representative weakness of conventional SER systems is the user-dependent performance induced by the speaker independent (SI) acoustic model framework. But,
Wright, David L.; Robin, Don A.; Rhee, Jooyhun; Vaculin, Amber; Jacks, Adam; Guenther, Frank H.; Fox, Peter T.
Purpose: The authors examined the involvement of 2 speech motor programming processes identified by S. T. Klapp (1995, 2003) during the articulation of utterances differing in syllable and sequence complexity. According to S. T. Klapp, 1 process, INT, resolves the demands of the programmed unit, whereas a second process, SEQ, oversees the serial…
Nozari, Nazbanou; Dell, Gary S.
This article describes an initial study of the effect of focused attention on phonological speech errors. In 3 experiments, participants recited 4-word tongue twisters and focused attention on 1 (or none) of the words. The attended word was singled out differently in each experiment; participants were under instructions to avoid errors on the…
Full Text Available This article deals with the advantages of case-study and its potential in forming the motivation for studying the English language for students of non-linguistic specialities, psychology students in particular. Training future psychologists foreign language communication should involve cases, published in foreign periodicals, and numerous exercises and communicative tasks according to the requirements of the case-technology which is used during their learning process. The studies enable to single out the main criteria of cases selection for the successful formation of foreign speech with the students of psychological faculty.
Sole purpose of supplier selection is not limited to get supply at low cost and at right time. Supplier selection is a strategic decision to fulfil company’s goal for long period of time at low risk. To accomplish this objective companies are moving from reactive buying to proactive buying to give more priority to co-creation of wealth with supplier/s. Considering this issue an attempt has been made in this paper to give systematic review of supplier selection and evaluation process from 2005...
Hodges, Charles B.; Clark, Kenneth
Web-based presentation tools are sometimes referred to as "next generation presentation tools" (EDUCAUSE, 2010). At the most basic level, these tools are simply online versions of traditional presentation software, such as Microsoft's PowerPoint or Apple's Keynote, but some services offer features like web-based collaboration, online presentation…
Pinelli, Thomas E. (Compiler); Bullock, Leanna S. (Compiler)
Selected papers from MODSIM World 2011 Conference & Expo are contained in this NASA Conference Publication (CP). MODSIM World 2011 was held in Virginia Beach, Virginia, October 11-14, 2011. The theme of the 2011 conference & expo was "Overcoming Critical Global Challenges with Modeling & Simulation". The conference program consisted of five technical tracks - Defense, Homeland Security & First Responders; Education; Health & Medicine; The Human Dimension; and Serious Games & Virtual Worlds.
Farshid Tayari Ashtiani
Full Text Available The present study was an attempt to investigate the impact of English verbal songs on connected speech aspects of adult English learners’ speech production. 40 participants were selected based on the results of their performance in a piloted and validated version of NELSON test given to 60 intermediate English learners in a language institute in Tehran. Then they were equally distributed in two control and experimental groups and received a validated pretest of reading aloud and speaking in English. Afterward, the treatment was performed in 18 sessions by singing preselected songs culled based on some criteria such as popularity, familiarity, amount, and speed of speech delivery, etc. In the end, the posttests of reading aloud and speaking in English were administered. The results revealed that the treatment had statistically positive effects on the connected speech aspects of English learners’ speech production at statistical .05 level of significance. Meanwhile, the results represented that there was not any significant difference between the experimental group’s mean scores on the posttests of reading aloud and speaking. It was thus concluded that providing the EFL learners with English verbal songs could positively affect connected speech aspects of both modes of speech production, reading aloud and speaking. The Findings of this study have pedagogical implications for language teachers to be more aware and knowledgeable of the benefits of verbal songs to promote speech production of language learners in terms of naturalness and fluency. Keywords: English Verbal Songs, Connected Speech, Speech Production, Reading Aloud, Speaking
Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.
Speech topics include: Leadership in Space; Space Exploration: Real and Acceptable Reasons; Why Explore Space?; Space Exploration: Filling up the Canvas; Continuing the Voyage: The Spirit of Endeavour; Incorporating Space into Our Economic Sphere of Influence; The Role of Space Exploration in the Global Economy; Partnership in Space Activities; International Space Cooperation; National Strategy and the Civil Space Program; What the Hubble Space Telescope Teaches Us about Ourselves; The Rocket Team; NASA's Direction; Science and NASA; Science Priorities and Program Management; NASA and the Commercial Space Industry; NASA and the Business of Space; American Competitiveness: NASA's Role & Everyone's Responsibility; Space Exploration: A Frontier for American Collaboration; The Next Generation of Engineers; System Engineering and the "Two Cultures" of Engineering; Generalship of Engineering; NASA and Engineering Integrity; The Constellation Architecture; Then and Now: Fifty Years in Space; The Reality of Tomorrow; and Human Space Exploration: The Next 50 Years.
Anne Birgitta Nilsen
Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the
Peelle, Jonathan E; Sommers, Mitchell S
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration
Peelle, Jonathan E.; Sommers, Mitchell S.
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported
Full Text Available Small agglomerative microphone array systems have been proposed for use with speech communication and recognition systems. Blind source separation methods based on frequency domain independent component analysis have shown significant separation performance, and the microphone arrays are small enough to make them portable. However, the level of computational complexity involved is very high because the conventional signal collection and processing method uses 60 microphones. In this paper, we propose a band selection method based on magnitude squared coherence. Frequency bands are selected based on the spatial and geometric characteristics of the microphone array device which is strongly related to the dodecahedral shape, and the selected bands are nonuniformly spaced. The estimated reduction in the computational complexity is 90% with a 68% reduction in the number of frequency bands. Separation performance achieved during our experimental evaluation was 7.45 (dB (signal-to-noise ratio and 2.30 (dB (cepstral distortion. These results show improvement in performance compared to the use of uniformly spaced frequency band.
Carey, Daniel; Mercure, Evelyne; Pizzioli, Fabrizio; Aydelott, Jennifer
The effects of ear of presentation and competing speech on N400s to spoken words in context were examined in a dichotic sentence priming paradigm. Auditory sentence contexts with a strong or weak semantic bias were presented in isolation to the right or left ear, or with a competing signal presented in the other ear at a SNR of -12 dB. Target words were congruent or incongruent with the sentence meaning. Competing speech attenuated N400s to both congruent and incongruent targets, suggesting that the demand imposed by a competing signal disrupts the engagement of semantic comprehension processes. Bias strength affected N400 amplitudes differentially depending upon ear of presentation: weak contexts presented to the le/RH produced a more negative N400 response to targets than strong contexts, whereas no significant effect of bias strength was observed for sentences presented to the re/LH. The results are consistent with a model of semantic processing in which the RH relies on integrative processing strategies in the interpretation of sentence-level meaning. Copyright © 2014 Elsevier Ltd. All rights reserved.
Finkelstein, Y; Nachmani, A; Ophir, D
To present illustrative cases showing various tonsillar influences on speech and to present a clinical method for patient evaluation establishing concepts of management and a rational therapeutic approach. The cases were selected from a group of approximately 1000 patients referred to the clinic because of suspected palatal diseases. Complete velopharyngeal assessment was made, including otolaryngologic, speech, and hearing examinations, polysomnography, nasendoscopy, multiview videofluoroscopy, and cephalometry. New observations further elucidate the intimate relation between the tonsils and the velopharyngeal valve. The potential influence of the tonsils on the velopharyngeal valve mechanism, in hindering or assisting speech, is described. In selected cases, the decision to perform tonsillectomy depends on its potential effect on speech. The combination of nasendoscopic and multiview videofluoroscopic studies of the mechanical properties of the tonsils during speech is required for patients who present with velopharyngeal insufficiency in whom tonsillar hypertrophy is found. These studies are also required in patients with palatal anomalies who are candidates for tonsillectomy.
... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...
Barberena, Luciana da Silva; Brasil, Brunah de Castro; Melo, Roberta Michelon; Mezzomo, Carolina Lisbôa; Mota, Helena Bolli; Keske-Soares, Márcia
To present recent studies that used the ultrasound in the fields of Speech Language Pathology and Audiology, which evidence possibilities of the applicability of this technique in different subareas. A bibliographic research was carried out in the PubMed database, using the keywords "ultrasonic," "speech," "phonetics," "Speech, Language and Hearing Sciences," "voice," "deglutition," and "myofunctional therapy," comprising some areas of Speech Language Pathology and Audiology Sciences. The keywords "ultrasound," "ultrasonography," "swallow," "orofacial myofunctional therapy," and "orofacial myology" were also used in the search. Studies in humans from the past 5 years were selected. In the preselection, duplicated studies, articles not fully available, and those that did not present direct relation between ultrasound and Speech Language Pathology and Audiology Sciences were discarded. The data were analyzed descriptively and classified subareas of Speech Language Pathology and Audiology Sciences. The following items were considered: purposes, participants, procedures, and results. We selected 12 articles for ultrasound versus speech/phonetics subarea, 5 for ultrasound versus voice, 1 for ultrasound versus muscles of mastication, and 10 for ultrasound versus swallow. Studies relating "ultrasound" and "Speech Language Pathology and Audiology Sciences" in the past 5 years were not found. Different studies on the use of ultrasound in Speech Language Pathology and Audiology Sciences were found. Each of them, according to its purpose, confirms new possibilities of the use of this instrument in the several subareas, aiming at a more accurate diagnosis and new evaluative and therapeutic possibilities.
Objective: To present the methodology for speech assessment in the Scandcleft project and discuss issues from a pilot study. Design: Description of methodology and blinded test for speech assessment. Speech samples and instructions for data collection and analysis for comparisons of speech outcomes...... across five included languages were developed and tested. Participants and Materials: Randomly selected video recordings of 10 5-year-old children from each language (n = 50) were included in the project. Speech material consisted of test consonants in single words, connected speech, and syllable chains......-sum and the overall rating of VPC was 78%. Conclusions: Pooling data of speakers of different languages in the same trial and comparing speech outcome across trials seems possible if the assessment of speech concerns consonants and is confined to speech units that are phonetically similar across languages. Agreed...
Full Text Available Speech recognition is about what is being said, irrespective of who is saying. Speech recognition is a growing field. Major progress is taking place on the technology of automatic speech recognition (ASR. Still, there are lots of barriers in this field in terms of recognition rate, background noise, speaker variability, speaking rate, accent etc. Speech recognition rate mainly depends on the selection of features and feature extraction methods. This paper outlines the feature extraction techniques for speaker dependent speech recognition for isolated words. A brief survey of different feature extraction techniques like Mel-Frequency Cepstral Coefficients (MFCC, Linear Predictive Coding Coefficients (LPCC, Perceptual Linear Prediction (PLP, Relative Spectra Perceptual linear Predictive (RASTA-PLP analysis are presented and evaluation is done. Speech recognition has various applications from daily use to commercial use. We have made a speaker dependent system and this system can be useful in many areas like controlling a patient vehicle using simple commands.
This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.
Jerry D. Gibson
Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.
Huijbregts, M.A.H.; de Jong, Franciska M.G.
In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because
section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...
Maria Luisa A. Valdez
Full Text Available A nation's progress has been measured in terms of its Gross Domestic Product (GDP throughout modern history. Suffice it to say that the higher a country's GDP, the more progressive a country is considered to be. An internationally used measure of a country's economic activity, GDP has undergone much thought as to its statistical and conceptual bases, but it mainly measures a country's market production. Clearly, there is a need for a coherent complement to a nation's GDP. Every nation can benef it from a fresh and transformational approach to defining and measuring their progress and this can be done by considering the country’s Gross National Happiness (GNH. It is a holistic and sustainable developmental approach targeted at achieving a healthy balance between material and non - material values while giving utmost priority to human happiness and well - being. This study is an analysis of Bhutan's Prime Minister His Excellency Tshering Tobgay’s Gross National Happiness philosophy, highlighting key in sights from the selections. Analysis revealed that His Excellency exemplified the core philosophy of Gross National Happiness in true adherence and embodiment of the pillars which constitute the said philosophical concept, and these are in terms of good go vernance, socio - economic development, cultural preservation and environment sustainability. Likewise, he achieved the efficiency of connecting with his audience and effectively sending his message across by utilizing rhetorical devices such as humor, ethos , logos and pathos. This paper likewise uncovered and discussed important insights which foster values essential to a nation's well - being and to appreciation of literature as manifested in his discourses, which in themselves can be considered as ample proo fs that a nation's well - being and the appreciation of literature can be secured by advocating the holistic approaches within the philosophy of Gross National Happiness.
Full Text Available Language therapy has trafficked from a medical focus until a preventive focus. However, difficulties are evidenced in the development of this last task, because he is devoted bigger space to the correction of the disorders of the language. Because the speech disorders is the dysfunction with more frequently appearance, acquires special importance the preventive work that is developed to avoid its appearance. Speech education since early age of the childhood makes work easier for prevent the appearance of speech disorders in the children. The present work has as objective to offer different activities for the prevention of the speech disorders.
Hasse Jørgensen, Stina
About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....
Archila-Meléndez, Mario E.; Valente, Giancarlo; Correia, Joao M.; Rouhl, Rob P. W.; van Kranen-Mastenbroek, Vivianne H.; Jansma, Bernadette M.
Sensorimotor integration, the translation between acoustic signals and motoric programs, may constitute a crucial mechanism for speech. During speech perception, the acoustic-motoric translations include the recruitment of cortical areas for the representation of speech articulatory features, such
Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...
Full Text Available This paper presents a method of speech recognition by pattern recognition techniques. Learning consists in determining the unique characteristics of a word (cepstral coefficients by eliminating those characteristics that are different from one word to another. For learning and recognition, the system will build a dictionary of words by determining the characteristics of each word to be used in the recognition. Determining the characteristics of an audio signal consists in the following steps: noise removal, sampling it, applying Hamming window, switching to frequency domain through Fourier transform, calculating the magnitude spectrum, filtering data, determining cepstral coefficients.
Herbelin, Bruno; Jensen, Karl Kristoffer; Graugaard, Lars
Speech is both beautiful and informative. In this work, a conceptual study of the speech, through investigation of the tower of Babel, the archetypal phonemes, and a study of the reasons of uses of language is undertaken in order to create an artistic work investigating the nature of speech. The ....... The artwork is presented at the Re:New festival in May 2008....
... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...
The presented materials consist of presentations of international workshop which held in Warsaw from 4 to 5 October 2007. Main subject of the meeting was progress in manufacturing as well as research program development for neutron detector which is planned to be placed at GANIL laboratory and will be used in nuclear spectroscopy research
Lee, Jimin; Hustad, Katherine C.; Weismer, Gary
Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…
Falk, Simone; Lanzilotti, Cosima; Schön, Daniele
Musical rhythm positively impacts on subsequent speech processing. However, the neural mechanisms underlying this phenomenon are so far unclear. We investigated whether carryover effects from a preceding musical cue to a speech stimulus result from a continuation of neural phase entrainment to periodicities that are present in both music and speech. Participants listened and memorized French metrical sentences that contained (quasi-)periodic recurrences of accents and syllables. Speech stimuli were preceded by a rhythmically regular or irregular musical cue. Our results show that the presence of a regular cue modulates neural response as estimated by EEG power spectral density, intertrial coherence, and source analyses at critical frequencies during speech processing compared with the irregular condition. Importantly, intertrial coherences for regular cues were indicative of the participants' success in memorizing the subsequent speech stimuli. These findings underscore the highly adaptive nature of neural phase entrainment across fundamentally different auditory stimuli. They also support current models of neural phase entrainment as a tool of predictive timing and attentional selection across cognitive domains.
DIMED 86: Discurso dos Media e Ensino a Distancia = Discours des Media et Enseignement a Distance = Media Speech and Distance Teaching. Papers Presented at a Seminar (21st, Algarve, Portugal, March 10-15, 1986).
Coelho, Maria Eduarda Leal
Presentations at this seminar on distance education focused on the different types of speech in multimedia presentations which contribute to the elaboration (simulated or substituted) of a situation involving different relationships. In addition to opening and closing remarks by Marcel de Greve and a final report by the Scientific Committee of…
U.S. Department of Health & Human Services — 2011 to present. BRFSS SMART MMSA Prevalence combined land line and cell phone data. The Selected Metropolitan Area Risk Trends (SMART) project uses the Behavioral...
Objective The Internet provides the general public with information about speech pathology services, including client groups and service delivery models, as well as the professionals providing the services. Although this information assists the general public and other professionals to both access and understand speech pathology services, it also potentially provides information about speech pathology as a prospective career, including the types of people who are speech pathologists (i.e. demographics). The aim of the present study was to collect baseline data on how the speech pathology profession was presented via images on the Internet. Methods A pilot prospective observational study using content analysis methodology was conducted to analyse publicly available Internet images related to the speech pathology profession. The terms 'Speech Pathology' and 'speech pathologist' to represent both the profession and the professional were used, resulting in the identification of 200 images. These images were considered across a range of areas, including who was in the image (e.g. professional, client, significant other), the technology used and the types of intervention. Results The majority of images showed both a client and a professional (i.e. speech pathologist). While the professional was predominantly presented as female, the gender of the client was more evenly distributed. The clients were more likely to be preschool or school aged, however male speech pathologists were presented as providing therapy to selected age groups (i.e. school aged and younger adults). Images were predominantly of individual therapy and the few group images that were presented were all paediatric. Conclusion Current images of speech pathology continue to portray narrow professional demographics and client groups (e.g. paediatrics). Promoting images of wider scope to fully represent the depth and breadth of speech pathology professional practice may assist in attracting a more diverse
The PARIS meeting held in Cracow, Poland from 14 to 15 May 2007. The main subjects discussed during this meeting were the status of international project dedicated to gamma spectroscopy research. The scientific research program includes investigations of giant dipole resonance, probe of hot nuclei induced in heavy reactions, Jacobi shape transitions, isospin mixing and nuclear multifragmentation. The mentioned programme needs Rand D development such as new scintillations materials as lanthanum chlorides and bromides as well as new photo detection sensors as avalanche photodiodes - such subjects are also subjects of discussion. Additionally results of computerized simulations of scintillation detectors properties by means of GEANT- 4 code are presented
Full Text Available In the present edition of Significação – Scientific Journal for Audiovisual Culture and in the others to follow something new is brought: the presence of thematic dossiers which are to be organized by invited scholars. The appointed subject for the very first one of them was Radio and the invited scholar, Eduardo Vicente, professor at the Graduate Course in Audiovisual and at the Postgraduate Program in Audiovisual Media and Processes of the School of Communication and Arts of the University of São Paulo (ECA-USP. Entitled Radio Beyond Borders the dossier gathers six articles and the intention of reuniting works on the perspectives of usage of such media as much as on the new possibilities of aesthetical experimenting being build up for it, especially considering the new digital technologies and technological convergences. It also intends to present works with original theoretical approach and original reflections able to reset the way we look at what is today already a centennial media. Having broadened the meaning of “beyond borders”, four foreign authors were invited to join the dossier. This is the first time they are being published in this country and so, in all cases, the articles where either written or translated into Portuguese.The dossier begins with “Radio is dead…Long live to the sound”, which is the transcription of a thought provoking lecture given by Armand Balsebre (Autonomous University of Barcelona – one of the most influential authors in the world on the Radio study field. It addresses the challenges such media is to face so that it can become “a new sound media, in the context of a new soundscape or sound-sphere, for the new listeners”. Andrew Dubber (Birmingham City University regarding the challenges posed by a Digital Era argues for a theoretical approach in radio studies which can consider a Media Ecology. The author understands the form and discourse of radio as a negotiation of affordances and
This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.
Valdez, Carlos A.; Vu, Alexander K.
Provided herein are methods for selectively detecting an alkyne-presenting molecule in a sample and related detection reagents, compositions, methods and systems. The methods include contacting a detection reagent with the sample for a time and under a condition to allow binding of the detection reagent to the one or more alkyne-presenting molecules possibly present in the matrix to the detection reagent. The detection reagent includes an organic label moiety presenting an azide group. The binding of the azide group to the alkyne-presenting molecules results in emission of a signal from the organic label moiety.
Ravishankar, C., Hughes Network Systems, Germantown, MD
Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the
Won, Jong Ho; Shim, Hyun Joon; Lorenzi, Christian; Rubinstein, Jay T
Won et al. (J Acoust Soc Am 132:1113-1119, 2012) reported that cochlear implant (CI) speech processors generate amplitude-modulation (AM) cues recovered from broadband speech frequency modulation (FM) and that CI users can use these cues for speech identification in quiet. The present study was designed to extend this finding for a wide range of listening conditions, where the original speech cues were severely degraded by manipulating either the acoustic signals or the speech processor. The manipulation of the acoustic signals included the presentation of background noise, simulation of reverberation, and amplitude compression. The manipulation of the speech processor included changing the input dynamic range and the number of channels. For each of these conditions, multiple levels of speech degradation were tested. Speech identification was measured for CI users and compared for stimuli having both AM and FM information (intact condition) or FM information only (FM condition). Each manipulation degraded speech identification performance for both intact and FM conditions. Performance for the intact and FM conditions became similar for stimuli having the most severe degradations. Identification performance generally overlapped for the intact and FM conditions. Moreover, identification performance for the FM condition was better than chance performance even at the maximum level of distortion. Finally, significant correlations were found between speech identification scores for the intact and FM conditions. Altogether, these results suggest that despite poor frequency selectivity, CI users can make efficient use of AM cues recovered from speech FM in difficult listening situations.
Yao, Bo; Belin, Pascal; Scheepers, Christoph
In human communication, direct speech (e.g., Mary said, "I'm hungry") is perceived as more vivid than indirect speech (e.g., Mary said that she was hungry). This vividness distinction has previously been found to underlie silent reading of quotations: Using functional magnetic resonance imaging (fMRI), we found that direct speech elicited higher brain activity in the temporal voice areas (TVA) of the auditory cortex than indirect speech, consistent with an "inner voice" experience in reading direct speech. Here we show that listening to monotonously spoken direct versus indirect speech quotations also engenders differential TVA activity. This suggests that individuals engage in top-down simulations or imagery of enriched supra-segmental acoustic representations while listening to monotonous direct speech. The findings shed new light on the acoustic nature of the "inner voice" in understanding direct speech. Copyright Â© 2012 Elsevier Inc. All rights reserved.
Noda, K.; Ohno, H.; Sugimoto, M.; Kato, Y.; Matsuo, H.; Watanabe, K.; Kikuchi, T.; Sawai, T.; Usui, T.; Oyama, Y.; Kondo, T.
The present status of technical studies of a high energy neutron irradiation facility, ESNIT (energy selective neutron irradiation test facility), is summarized. Technological survey and feasibility studies of ESNIT have continued since 1988. The results of technical studies of the accelerator, the target and the experimental systems in ESNIT program were reviewed by an International Advisory Committee in February 1993. Recommendations for future R and D on ESNIT program are also summarized in this paper. ((orig.))
Booz, Jaime A.
Although many studies have examined acoustic and sociolinguistic differences between male and female speech, the relationship between talker speaking style and perceived gender has not yet been explored. The present study attempts to determine whether clear speech, a style adopted by talkers who perceive some barrier to effective communication, shifts perceptions of femininity for male and female talkers. Much of our understanding of gender perception in voice and speech is based on sustained vowels or single words, eliminating temporal, prosodic, and articulatory cues available in more naturalistic, connected speech. Thus, clear and conversational sentence stimuli, selected from the 41 talkers of the Ferguson Clear Speech Database (Ferguson, 2004) were presented to 17 normal-hearing listeners, aged 18 to 30. They rated the talkers' gender using a visual analog scale with "masculine" and "feminine" endpoints. This response method was chosen to account for within-category shifts of gender perception by allowing nonbinary responses. Mixed-effects regression analysis of listener responses revealed a small but significant effect of speaking style, and this effect was larger for male talkers than female talkers. Because of the high degree of talker variability observed for talker gender, acoustic analyses of these sentences were undertaken to determine the relationship between acoustic changes in clear and conversational speech and perceived femininity. Results of these analyses showed that mean fundamental frequency (fo) and f o standard deviation were significantly correlated to perceived gender for both male and female talkers, and vowel space was significantly correlated only for male talkers. Speaking rate and breathiness measures (CPPS) were not significantly related for either group. Outcomes of this study indicate that adopting a clear speaking style is correlated with increases in perceived femininity. Although the increase was small, some changes associated
Aggarwal, Ashish; Sharma, Dinesh Dutt; Kumar, Ramesh; Sharma, Ravi C
Mutism, defined as an inability or unwillingness to speak, resulting in an absence or marked paucity of verbal output, is a common clinical symptom seen in psychiatric as well as neurology outpatient department. It rarely presents as an isolated disability and often occurs in association with other disturbances in behavior, thought processes, affect, or level of consciousness. It is often a focus of clinical attention, both for the physician and the relatives. Mutism occurs in a number of conditions, both functional and organic, and a proper diagnosis is important for the management. We hereby present three cases, who presented with mutism as the presenting symptom and the differential diagnosis and management issues related to these cases are discussed. The authors also selectively reviewed the literature on mutism, including psychiatric, neurologic, toxic-metabolic, and drug-induced causes.
Full Text Available G6PD deficiency is a common hemolytic genetic disorder, particularly in the areas endemic to malaria. Individuals are generally asymptomatic and hemolytic anemia occurs when some anti-malarial drugs or other oxidizing chemicals are administered. It has been proposed that G6PD deficiency provides protection against malaria. Maintaining of G6PD deficient alleles at polymorphic proportions is complicated because of the X-linked nature of G6PD deficiency. A comprehensive review of the literature on the hypothesis of malarial protection and the nature of the selection is being presented. Most of the epidemiological, in vitro and in vivo studies report selection for G6PD deficiency. Analysis of the G6PD gene also reveals that G6PD-deficient alleles show some signatures of selection. However, the question of how this polymorphism is being maintained remains unresolved because the selection/fitness coefficients for the different genotypes in the two sexes have not been established. Prevalence of G6PD deficiency in Indian caste and tribal populations and the different variants reported has also been reviewed.
McClain, Matthew; Romanowski, Brian
Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne
Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and
Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta
Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…
Tan, Zheng-Hua; Lindberg, Børge
in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within......The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR...
Tavares, Paulo Ott
Full Text Available This paper aims at discussing and analyzing the approach to speech acts in an EFL textbook series used in Brazilian public schools. In order to do that, the concepts of pragmatics and pragmatic competence, as well as their implications to foreign language (FL teaching, are discussed. Then, a brief review of the Speech Act Theory is presented. After describing the approach to FL teaching proposed by the PCNs and the selection of textbooks through the PNLD, we analyze one series, selected for the 2014-2016 triennium. The conclusion is that speech acts are not deeply approached, but that is in accordance with the goals of the series
Partila, Pavol; Tovarek, Jaromir; Voznak, Miroslav
This paper presents a method for detecting speech under stress using Self-Organizing Maps. Most people who are exposed to stressful situations can not adequately respond to stimuli. Army, police, and fire department occupy the largest part of the environment that are typical of an increased number of stressful situations. The role of men in action is controlled by the control center. Control commands should be adapted to the psychological state of a man in action. It is known that the psychological changes of the human body are also reflected physiologically, which consequently means the stress effected speech. Therefore, it is clear that the speech stress recognizing system is required in the security forces. One of the possible classifiers, which are popular for its flexibility, is a self-organizing map. It is one type of the artificial neural networks. Flexibility means independence classifier on the character of the input data. This feature is suitable for speech processing. Human Stress can be seen as a kind of emotional state. Mel-frequency cepstral coefficients, LPC coefficients, and prosody features were selected for input data. These coefficients were selected for their sensitivity to emotional changes. The calculation of the parameters was performed on speech recordings, which can be divided into two classes, namely the stress state recordings and normal state recordings. The benefit of the experiment is a method using SOM classifier for stress speech detection. Results showed the advantage of this method, which is input data flexibility.
Sandor, A.; Moses, H. R.
Currently on the International Space Station (ISS) and other space vehicles Caution & Warning (C&W) alerts are represented with various auditory tones that correspond to the type of event. This system relies on the crew's ability to remember what each tone represents in a high stress, high workload environment when responding to the alert. Furthermore, crew receive a year or more in advance of the mission that makes remembering the semantic meaning of the alerts more difficult. The current system works for missions conducted close to Earth where ground operators can assist as needed. On long duration missions, however, they will need to work off-nominal events autonomously. There is evidence that speech alarms may be easier and faster to recognize, especially during an off-nominal event. The Information Presentation Directed Research Project (FY07-FY09) funded by the Human Research Program included several studies investigating C&W alerts. The studies evaluated tone alerts currently in use with NASA flight deck displays along with candidate speech alerts. A follow-on study used four types of speech alerts to investigate how quickly various types of auditory alerts with and without a speech component - either at the beginning or at the end of the tone - can be identified. Even though crew were familiar with the tone alert from training or direct mission experience, alerts starting with a speech component were identified faster than alerts starting with a tone. The current study replicated the results from the previous study in a more rigorous experimental design to determine if the candidate speech alarms are ready for transition to operations or if more research is needed. Four types of alarms (caution, warning, fire, and depressurization) were presented to participants in both tone and speech formats in laboratory settings and later in the Human Exploration Research Analog (HERA). In the laboratory study, the alerts were presented by software and participants were
Full Text Available Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2-7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver.
Wentworth, Mami Tonoe
Uncertainty quantification plays an important role when making predictive estimates of model responses. In this context, uncertainty quantification is defined as quantifying and reducing uncertainties, and the objective is to quantify uncertainties in parameter, model and measurements, and propagate the uncertainties through the model, so that one can make a predictive estimate with quantified uncertainties. Two of the aspects of uncertainty quantification that must be performed prior to propagating uncertainties are model calibration and parameter selection. There are several efficient techniques for these processes; however, the accuracy of these methods are often not verified. This is the motivation for our work, and in this dissertation, we present and illustrate verification frameworks for model calibration and parameter selection in the context of biological and physical models. First, HIV models, developed and improved by [2, 3, 8], describe the viral infection dynamics of an HIV disease. These are also used to make predictive estimates of viral loads and T-cell counts and to construct an optimal control for drug therapy. Estimating input parameters is an essential step prior to uncertainty quantification. However, not all the parameters are identifiable, implying that they cannot be uniquely determined by the observations. These unidentifiable parameters can be partially removed by performing parameter selection, a process in which parameters that have minimal impacts on the model response are determined. We provide verification techniques for Bayesian model calibration and parameter selection for an HIV model. As an example of a physical model, we employ a heat model with experimental measurements presented in . A steady-state heat model represents a prototypical behavior for heat conduction and diffusion process involved in a thermal-hydraulic model, which is a part of nuclear reactor models. We employ this simple heat model to illustrate verification
Benesty, Jacob; Chen, Jingdong
We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red
Paiè, Petra; Bassi, Andrea; Bragheri, Francesca; Osellame, Roberto
Selective plane illumination microscopy (SPIM) is an optical sectioning technique that allows imaging of biological samples at high spatio-temporal resolution. Standard SPIM devices require dedicated set-ups, complex sample preparation and accurate system alignment, thus limiting the automation of the technique, its accessibility and throughput. We present a millimeter-scaled optofluidic device that incorporates selective plane illumination and fully automatic sample delivery and scanning. To this end an integrated cylindrical lens and a three-dimensional fluidic network were fabricated by femtosecond laser micromachining into a single glass chip. This device can upgrade any standard fluorescence microscope to a SPIM system. We used SPIM on a CHIP to automatically scan biological samples under a conventional microscope, without the need of any motorized stage: tissue spheroids expressing fluorescent proteins were flowed in the microchannel at constant speed and their sections were acquired while passing through the light sheet. We demonstrate high-throughput imaging of the entire sample volume (with a rate of 30 samples/min), segmentation and quantification in thick (100-300 μm diameter) cellular spheroids. This optofluidic device gives access to SPIM analyses to non-expert end-users, opening the way to automatic and fast screening of a high number of samples at subcellular resolution.
Shin, Young Hoon; Seo, Jiwon
People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.
Fox, Jesse; Vendemia, Megan A
Through social media and camera phones, users enact selective self-presentation as they choose, edit, and post photographs of themselves (such as selfies) to social networking sites for an imagined audience. Photos typically focus on users' physical appearance, which may compound existing sociocultural pressures about body image. We identified users of social networking sites among a nationally representative U.S. sample (N = 1,686) and examined women's and men's photo-related behavior, including posting photos, editing photos, and feelings after engaging in upward and downward social comparison with others' photos on social networking sites. We identified some sex differences: women edited photos more frequently and felt worse after upward social comparison than men. Body image and body comparison tendency mediated these effects.
Full Text Available The Selected French libraries (Bibliotheque nationale de France, Bibliotheque publique d’information, multimedia library of the Cité des sciences et de l’industrie as well as Paris public libraries are presented in the article. France does not have a union catalogue at the national level, therefore libraries use different platforms for shared cataloguing and compile more union catalogues. According to their needs, French libraries join into consortia for the acquisition of electronic resources, which can be either geographically or thematically delimited or formed by the institutions of the same status. The author believes that the Slovenian library network works well considering much smaller budget for culture and higher education in comparison with France. To improve its performance, more funds would have to be allocated and higher reputation of the library profession achieved, comparable to the situation in France. Digitization of resources is the area where the Slovenian librarianship lags most behind the French one.
Cherry, Rochelle Silberzweig
Fifty-three children (ages 5-9) were individually tested on their ability to select pictures of monosyllabic words presented diotically via headphones. Tasks were presented in quiet and under three noise (distractor) conditions: white noise, speech backwards, and speech forward. Age and type of distractor significantly influenced test scores.…
Tan, Zheng-Hua; Kraljevski, Ivan
This paper presents a method that combines variable frame length and rate analysis for speech recognition in noisy environments, together with an investigation of the effect of different frame lengths on speech recognition performance. The method adopts frame selection using an a posteriori signal......-to-noise (SNR) ratio weighted energy distance and increases the length of the selected frames, according to the number of non-selected preceding frames. It assigns a higher frame rate and a normal frame length to a rapidly changing and high SNR region of a speech signal, and a lower frame rate and an increased...... frame length to a steady or low SNR region. The speech recognition results show that the proposed variable frame rate and length method outperforms fixed frame rate and length analysis, as well as standalone variable frame rate analysis in terms of noise-robustness....
Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.
Yeung, H Henny; Werker, Janet F
Speech is robustly audiovisual from early in infancy. Here we show that audiovisual speech perception in 4.5-month-old infants is influenced by sensorimotor information related to the lip movements they make while chewing or sucking. Experiment 1 consisted of a classic audiovisual matching procedure, in which two simultaneously displayed talking faces (visual [i] and [u]) were presented with a synchronous vowel sound (audio /i/ or /u/). Infants' looking patterns were selectively biased away from the audiovisual matching face when the infants were producing lip movements similar to those needed to produce the heard vowel. Infants' looking patterns returned to those of a baseline condition (no lip movements, looking longer at the audiovisual matching face) when they were producing lip movements that did not match the heard vowel. Experiment 2 confirmed that these sensorimotor effects interacted with the heard vowel, as looking patterns differed when infants produced these same lip movements while seeing and hearing a talking face producing an unrelated vowel (audio /a/). These findings suggest that the development of speech perception and speech production may be mutually informative.
Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam
Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.
Hanratty, Jane; Deegan, Catherine; Walsh, Mary; Kirkpatrick, Barry
Diagnosis and monitoring of Parkinson's disease has a number of challenges as there is no definitive biomarker despite the broad range of symptoms. Research is ongoing to produce objective measures that can either diagnose Parkinson's or act as an objective decision support tool. Recent research on speech based measures have demonstrated promising results. This study aims to investigate the characteristics of the glottal source signal in Parkinsonian speech. An experiment is conducted in which a selection of glottal parameters are tested for their ability to discriminate between healthy and Parkinsonian speech. Results for each glottal parameter are presented for a database of 50 healthy speakers and a database of 16 speakers with Parkinsonian speech symptoms. Receiver operating characteristic (ROC) curves were employed to analyse the results and the area under the ROC curve (AUC) values were used to quantify the performance of each glottal parameter. The results indicate that glottal parameters can be used to discriminate between healthy and Parkinsonian speech, although results varied for each parameter tested. For the task of separating healthy and Parkinsonian speech, 2 out of the 7 glottal parameters tested produced AUC values of over 0.9.
Greene, Beth G; Logan, John S; Pisoni, David B
We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.
GREENE, BETH G.; LOGAN, JOHN S.; PISONI, DAVID B.
We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered. PMID:23225916
...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...
Locke, J L; Kutz, K J
Thirty kindergarteners, 15 who substituted /w/ for /r/ and 15 with correct articulation, received two perception tests and a memory test that included /w/ and /r/ in minimally contrastive syllables. Although both groups had nearly perfect perception of the experimenter's productions of /w/ and /r/, misarticulating subjects perceived their own tape-recorded w/r productions as /w/. In the memory task these same misarticulating subjects committed significantly more /w/-/r/ confusions in unspoken recall. The discussion considers why people subvocally rehearse; a developmental period in which children do not rehearse; ways subvocalization may aid recall, including motor and acoustic encoding; an echoic store that provides additional recall support if subjects rehearse vocally, and perception of self- and other- produced phonemes by misarticulating children-including its relevance to a motor theory of perception. Evidence is presented that speech for memory can be sufficiently impaired to cause memory disorder. Conceptions that restrict speech disorder to an impairment of communication are challenged.
Zheng, Yingjun; Wu, Chao; Li, Juanhua; Li, Ruikeng; Peng, Hongjun; She, Shenglin; Ning, Yuping; Li, Liang
Speech recognition under noisy "cocktail-party" environments involves multiple perceptual/cognitive processes, including target detection, selective attention, irrelevant signal inhibition, sensory/working memory, and speech production. Compared to health listeners, people with schizophrenia are more vulnerable to masking stimuli and perform worse in speech recognition under speech-on-speech masking conditions. Although the schizophrenia-related speech-recognition impairment under "cocktail-party" conditions is associated with deficits of various perceptual/cognitive processes, it is crucial to know whether the brain substrates critically underlying speech detection against informational speech masking are impaired in people with schizophrenia. Using functional magnetic resonance imaging (fMRI), this study investigated differences between people with schizophrenia (n = 19, mean age = 33 ± 10 years) and their matched healthy controls (n = 15, mean age = 30 ± 9 years) in intra-network functional connectivity (FC) specifically associated with target-speech detection under speech-on-speech-masking conditions. The target-speech detection performance under the speech-on-speech-masking condition in participants with schizophrenia was significantly worse than that in matched healthy participants (healthy controls). Moreover, in healthy controls, but not participants with schizophrenia, the strength of intra-network FC within the bilateral caudate was positively correlated with the speech-detection performance under the speech-masking conditions. Compared to controls, patients showed altered spatial activity pattern and decreased intra-network FC in the caudate. In people with schizophrenia, the declined speech-detection performance under speech-on-speech masking conditions is associated with reduced intra-caudate functional connectivity, which normally contributes to detecting target speech against speech masking via its functions of suppressing masking-speech signals.
McNeel Gordon Jantzen
Full Text Available Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Kraus & Chandrasekaran, 2010; Parbery-Clark, Skoe, & Kraus, 2009; Zendel & Alain, 2008; Musacchia, Sams, Skoe, & Kraus, 2007. Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus (MTG and superior temporal gyrus (STG in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain.
Rothney S. Tshaka
Full Text Available This article attempts to bring one of the greatest speeches of Malcolm X back to life in the current South Africa – the year 2015. It is a year of growing frustration and extreme dissatisfaction with basic living conditions amongst the greater part of black people in the country. Recounting the influences that Malcolm X had on Black Liberation Theology in South Africa, the article proposes that Black Liberation Theology in South Africa moves away from being an inward-looking critical theology to one that identifies with the basic concerns of the most vulnerable in society. It criticises both the political and the economic hegemonies that are currently perceived to perpetuate much of apartheid’s grave social ills in democratic South Africa. It calls attention to party politics that floods society with propaganda but in reality seems to have little real interest in the social well-being of the masses. In the article, the question as to what Malcolm X would have said about the current South African socio-economic context is asked. It is clear that both structural apartheid residues as well as the pure selfish interests of the current political rulers gang up against the chances of black people ever experiencing social justice in the near future.
Klein, Evelyn R.; Armstrong, Sharon Lee; Shipon-Blum, Elisa
Children with selective mutism (SM) display a failure to speak in select situations despite speaking when comfortable. The purpose of this study was to obtain valid assessments of receptive and expressive language in 33 children (ages 5 to 12) with SM. Because some children with SM will speak to parents but not a professional, another purpose was…
Current USDA linear selection indexes such as Lifetime Net Merit (NM$) estimate lifetime profit given a combination of 13 traits. In these indexes, every animal gets credit for 2.78 lactations of the traits expressed per lactation, independent of its productive life (PL). Selection among animals wit...
M. M. Bykov
Full Text Available In the article, the authors developed a method for detecting speech activity for an automated system for recognizing critical use of speeches with wavelet parameterization of speech signal and classification at intervals of “language”/“pause” using a curvilinear neural network. The method of wavelet-parametrization proposed by the authors allows choosing the optimal parameters of wavelet transformation in accordance with the user-specified error of presentation of speech signal. Also, the method allows estimating the loss of information depending on the selected parameters of continuous wavelet transformation (NPP, which allowed to reduce the number of scalable coefficients of the LVP of the speech signal in order of magnitude with the allowable degree of distortion of the local spectrum of the LVP. An algorithm for detecting speech activity with a curvilinear neural network classifier is also proposed, which shows the high quality of segmentation of speech signals at intervals "language" / "pause" and is resistant to the presence in the speech signal of narrowband noise and technogenic noise due to the inherent properties of the curvilinear neural network.
Singh, Sukhbir S; Belland, Liane; Leyland, Nicholas; von Riedemann, Sarah; Murji, Ally
Uterine fibroids are common in women of reproductive age and can have a significant impact on quality of life and fertility. Although a number of international obstetrics/gynecology societies have issued evidence-based clinical practice guidelines for the management of symptomatic uterine fibroids, many of these guidelines do not yet reflect the most recent clinical evidence and approved indication for one of the key medical management options: the selective progesterone receptor modulator class. This article aims to share the clinical experience gained with selective progesterone receptor modulators in Europe and Canada by reviewing the historical development of selective progesterone receptor modulators, current best practices for selective progesterone receptor modulator use based on available data, and potential future uses for selective progesterone receptor modulators in uterine fibroids and other gynecologic conditions. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Morgan, Lewis B.
This article focuses on speech mannerisms often employed by clients in a helping relationship. Eight mannerisms are presented and discussed, as well as possible interpretations. Suggestions are given to help counselors respond to them. (Author)
Wu, Chung-Hsien; Su, Hung-Yu; Liu, Chao-Hong
This study presents an efficient approach to personalized mispronunciation detection of Taiwanese-accented English. The main goal of this study was to detect frequently occurring mispronunciation patterns of Taiwanese-accented English instead of scoring English pronunciations directly. The proposed approach quickly identifies personalized…
Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris
Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to speech entrainment. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during speech entrainment versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of speech entrainment to improve speech production and may help select patients for speech entrainment treatment. PMID:25989443
Davidow, Jason H; Grossman, Heather L; Edge, Robin L
Voluntary stuttering techniques involve persons who stutter purposefully interjecting disfluencies into their speech. Little research has been conducted on the impact of these techniques on the speech pattern of persons who stutter. The present study examined whether changes in the frequency of voluntary stuttering accompanied changes in stuttering frequency, articulation rate, speech naturalness, and speech effort. In total, 12 persons who stutter aged 16-34 years participated. Participants read four 300-syllable passages during a control condition, and three voluntary stuttering conditions that involved attempting to produce purposeful, tension-free repetitions of initial sounds or syllables of a word for two or more repetitions (i.e., bouncing). The three voluntary stuttering conditions included bouncing on 5%, 10%, and 15% of syllables read. Friedman tests and follow-up Wilcoxon signed ranks tests were conducted for the statistical analyses. Stuttering frequency, articulation rate, and speech naturalness were significantly different between the voluntary stuttering conditions. Speech effort did not differ between the voluntary stuttering conditions. Stuttering frequency was significantly lower during the three voluntary stuttering conditions compared to the control condition, and speech effort was significantly lower during two of the three voluntary stuttering conditions compared to the control condition. Due to changes in articulation rate across the voluntary stuttering conditions, it is difficult to conclude, as has been suggested previously, that voluntary stuttering is the reason for stuttering reductions found when using voluntary stuttering techniques. Additionally, future investigations should examine different types of voluntary stuttering over an extended period of time to determine their impact on stuttering frequency, speech rate, speech naturalness, and speech effort.
Larm, Petra; Hongisto, Valtteri
During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.
Ekström, Seth-Reino; Borg, Erik
The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (Ptempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (Pmusic offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.
Full Text Available The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA noise and speech spectrum-filtered noise (SPN]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA. The results showed a significant effect of piano performance speed and octave (P<.01. Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01 and SPN (P<.05. Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01, but there were smaller differences between masking conditions (P<.01. It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.
... disorder; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder; Speech disorder - stuttering ... evaluation tools that can help identify and diagnose speech disorders: Denver Articulation Screening Examination Goldman-Fristoe Test of ...
Saija, Jefta D; Akyürek, Elkan G; Andringa, Tjeerd C; Başkent, Deniz
Cognitive skills, such as processing speed, memory functioning, and the ability to divide attention, are known to diminish with aging. The present study shows that, despite these changes, older adults can successfully compensate for degradations in speech perception. Critically, the older participants of this study were not pre-selected for high performance on cognitive tasks, but only screened for normal hearing. We measured the compensation for speech degradation using phonemic restoration, where intelligibility of degraded speech is enhanced using top-down repair mechanisms. Linguistic knowledge, Gestalt principles of perception, and expectations based on situational and linguistic context are used to effectively fill in the inaudible masked speech portions. A positive compensation effect was previously observed only with young normal hearing people, but not with older hearing-impaired populations, leaving the question whether the lack of compensation was due to aging or due to age-related hearing problems. Older participants in the present study showed poorer intelligibility of degraded speech than the younger group, as expected from previous reports of aging effects. However, in conditions that induce top-down restoration, a robust compensation was observed. Speech perception by the older group was enhanced, and the enhancement effect was similar to that observed with the younger group. This effect was even stronger with slowed-down speech, which gives more time for cognitive processing. Based on previous research, the likely explanations for these observations are that older adults can overcome age-related cognitive deterioration by relying on linguistic skills and vocabulary that they have accumulated over their lifetime. Alternatively, or simultaneously, they may use different cerebral activation patterns or exert more mental effort. This positive finding on top-down restoration skills by the older individuals suggests that new cognitive training methods
This dissertation attempts to find the common traits of great speeches. It does so by closely examining the language of some of the most well-known speeches in world. These speeches are presented in the book Speeches that Changed the World (2006) by Simon Sebag Montefiore. The dissertation specifically looks at four variables: The beginnings and endings of the speeches, the use of passive voice, the use of personal pronouns and the difficulty of the language. These four variables are based on...
Błeszyński, Jacek Jarosław
Speech of people with autism is recognised as one of the basic diagnostic, therapeutic and theoretical problems. One of the most common symptoms of autism in children is echolalia, described here as being of different types and severity. This paper presents the results of studies into different levels of echolalia, both in normally developing children and in children diagnosed with autism, discusses the differences between simple echolalia and echolalic speech - which can be considered to b...
The VDE system developed had the capability of recognizing up to 248 separate words in syntactic structures. 4 The two systems described are isolated...AND SPEAKER RECOGNITION by M.J.Hunt 5 ASSESSMENT OF SPEECH SYSTEMS ’ ..- * . by R.K.Moore 6 A SURVEY OF CURRENT EQUIPMENT AND RESEARCH’ by J.S.Bridle...TECHNOLOGY IN NAVY TRAINING SYSTEMS by R.Breaux, M.Blind and R.Lynchard 10 9 I-I GENERAL REVIEW OF MILITARY APPLICATIONS OF VOICE PROCESSING DR. BRUNO
Bruderer, Alison G; Danielson, D Kyle; Kandhadai, Padmapriya; Werker, Janet F
The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception-production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants' speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants' tongues. With a looking-time procedure, we found that temporarily restraining infants' articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral-motor movements influence speech sound discrimination. Moreover, an experimentally induced "impairment" in articulator movement can compromise speech perception performance, raising the question of whether long-term oral-motor impairments may impact perceptual development.
Polanowska, Katarzyna Ewa; Pietrzyk-Krawczyk, Iwona
Apraxia of speech (AOS) is a motor speech disorder, most typically caused by stroke, which in its "pure" form (without other speech-language deficits) is very rare in clinical practice. Because some observable characteristics of AOS overlap with more common verbal communication neurologic syndromes (i.e. aphasia, dysarthria) distinguishing them may be difficult. The present study describes AOS in a 49-year-old right-handed male after left-hemispheric stroke. Analysis of his articulatory and prosodic abnormalities in the context of intact communicative abilities as well as description of symptoms dynamics over time provides valuable information for clinical diagnosis of this specific disorder and prognosis for its recovery. This in turn is the basis for the selection of appropriate rehabilitative interventions. Copyright © 2016 Polish Neurological Society. Published by Elsevier Urban & Partner Sp. z o.o. All rights reserved.
Nielsen, Jens Bo; Dau, Torsten; Neher, Tobias
Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs......) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded...... with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed...
U.S. Department of Health & Human Services — 2011 to present. BRFSS SMART MMSA age-adjusted prevalence combined land line and cell phone data. The Selected Metropolitan Area Risk Trends (SMART) project uses the...
Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.
Rose, Richard C.; Barnwell, Thomas P., III
The performance of a broad class of analysis-by-synthesis linear predictive speech coders is quantified experimentally. The class of coders includes a number of well-known techniques as well as a very large number of speech coders which have not been named or studied. A general formulation for deriving the parametric representation used in all of the coders in the class is presented. A new coder, named the self-excited vocoder, is discussed because of its good performance with low complexity, and because of the insight this coder gives to analysis-by-synthesis coders in general. The results of a study comparing the performances of different members of this class are presented. The study takes the form of a series of formal subjective and objective speech quality tests performed on selected coders. The results of this study lead to some interesting and important observations concerning the controlling parameters for analysis-by-synthesis speech coders.
Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.
This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...
Kwok, Sheldon J. J.; Kuznetsov, Ivan A.; Kim, Moonseok; Choi, Myunghwan; Scarcelli, Giuliano; Yun, Seok-Hyun
Two-photon polymerization and crosslinking are commonly used methods for microfabrication of three-dimensional structures with applications spanning from photonic microdevices, drug delivery systems, to cellular scaffolds. However, the use of two-photon processes for precise, internal modification of biological tissues has not yet been reported. One of the major challenges has been a lack of appropriate tools to monitor and characterize crosslinked regions nondestructively. Here, we demonstrate spatially selective two-photon collagen crosslinking (2P-CXL) in intact tissue for the first time. Using riboflavin photosensitizer and femtosecond laser irradiation, we crosslinked a small volume of tissue within animal corneas. Collagen fiber orientations and photobleaching were characterized by second harmonic generation and two-photon fluorescence imaging, respectively. Using confocal Brillouin microscopy, we measured local changes in longitudinal mechanical moduli and visualized the cross-linked pattern without perturbing surrounding non-irradiated regions. 2P-CXL-induced tissue stiffening was comparable to that achieved with conventional one-photon CXL. Our results demonstrate the ability to selectively stiffen biological tissue in situ at high spatial resolution, with broad implications in ophthalmology, laser surgery, and tissue engineering.
... OTC Relief for Diarrhea Home Diseases and Conditions Speech and Language Delay Condition Speech and Language Delay Share Print Table of Contents1. ... Treatment6. Everyday Life7. Questions8. Resources What is a speech and language delay? A speech and language delay ...
Howard, Lee M.; Burton, Robert P.
Several presentation techniques have been created for visualization of data with more than three variables. Packages have been written, each of which implements a subset of these techniques. However, these packages generally fail to provide all the features needed by the user during the visualization process. Further, packages generally limit support for presentation techniques to a few techniques. A new package called Petrichor accommodates all necessary and useful features together in one system. Any presentation technique may be added easily through an extensible plugin system. Features are supported by a user interface that allows easy interaction with data. Annotations allow users to mark up visualizations and share information with others. By providing a hyperdimensional graphics package that easily accommodates presentation techniques and includes a complete set of features, including those that are rarely or never supported elsewhere, the user is provided with a tool that facilitates improved interaction with multivariate data to extract and disseminate information.
Nieuwenstein, Mark R; Potter, Mary C
People often fail to recall the second of two visual targets presented within 500 ms in rapid serial visual presentation (RSVP). This effect is called the attentional blink. One explanation of the attentional blink is that processes involved in encoding the first target into memory are slow and capacity limited. Here, however, we show that the attentional blink should be ascribed to attentional selection, not consolidation of the first target. Rapid sequences of six letters were presented, and observers had to report either all the letters (whole-report condition) or a subset of the letters (partial-report condition). Selection in partial report was based on color (e.g., report the two red letters) or identity (i.e., report all letters from a particular letter onward). In both cases, recall of letters presented shortly after the first selected letter was impaired, whereas recall of the corresponding letters was relatively accurate with whole report.
Jung, Youngsin; Duffy, Joseph R; Josephs, Keith A
Primary progressive aphasia is a neurodegenerative syndrome characterized by progressive language dysfunction. The majority of primary progressive aphasia cases can be classified into three subtypes: nonfluent/agrammatic, semantic, and logopenic variants. Each variant presents with unique clinical features, and is associated with distinctive underlying pathology and neuroimaging findings. Unlike primary progressive aphasia, apraxia of speech is a disorder that involves inaccurate production of sounds secondary to impaired planning or programming of speech movements. Primary progressive apraxia of speech is a neurodegenerative form of apraxia of speech, and it should be distinguished from primary progressive aphasia given its discrete clinicopathological presentation. Recently, there have been substantial advances in our understanding of these speech and language disorders. The clinical, neuroimaging, and histopathological features of primary progressive aphasia and apraxia of speech are reviewed in this article. The distinctions among these disorders for accurate diagnosis are increasingly important from a prognostic and therapeutic standpoint. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
Deveugele, Myriam; Silverman, Jonathan
Although peer-review for journal submission, grant-applications and conference submissions has been called 'a counter- stone of science', and even 'the gold standard for evaluating scientific merit', publications on this topic remain scares. Research that has investigated peer-review reveals several issues and criticisms concerning bias, poor quality review, unreliability and inefficiency. The most important weakness of the peer review process is the inconsistency between reviewers leading to inadequate inter-rater reliability. To report the reliability of ratings for a large international conference and to suggest possible solutions to overcome the problem. In 2016 during the International Conference on Communication in Healthcare, organized by EACH: International Association for Communication in Healthcare, a calibration exercise was proposed and feedback was reported back to the participants of the exercise. Most abstracts, as well as most peer-reviewers, receive and give scores around the median. Contrary to the general assumption that there are high and low scorers, in this group only 3 peer-reviewers could be identified with a high mean, while 7 has a low mean score. Only 2 reviewers gave only high ratings (4 and 5). Of the eight abstracts included in this exercise, only one abstract received a high mean score and one a low mean score. Nevertheless, both these abstracts received both low and high scores; all other abstracts received all possible scores. Peer-review of submissions for conferences are, in accordance with the literature, unreliable. New and creative methods will be needed to give the participants of a conference what they really deserve: a more reliable selection of the best abstracts. More raters per abstract improves the inter-rater reliability; training of reviewers could be helpful; providing feedback to reviewers can lead to less inter-rater disagreement; fostering negative peer-review (rejecting the inappropriate submissions) rather than a
Juel Henrichsen, Peter
on the supply side. The present article reports on a new public action strategy which has taken shape in the course of 2013-14. While Denmark is a small language area, our public sector is well organised and has considerable purchasing power. Across this past year, Danish local authorities have organised around......Most of the Danish municipalities are ready to begin to adopt automatic speech recognition, but at the same time remain nervous following a long series of bad business cases in the recent past. Complaints are voiced over costly licences and low service levels, typical effects of a de facto monopoly...... the speech technology challenge, they have formulated a number of joint questions and new requirements to be met by suppliers and have deliberately worked towards formulating tendering material which will allow fair competition. Public researchers have contributed to this work, including the author...
Full Text Available Theoretically framed within Vygotskyan sociocultural theory (SCT of mind, the present study investigated resurfacing of private speech markers by Iranian elementary female EFL learners in teacher-learner interactions. To this end, an elementary EFL class including 12 female learners and a same-sex teacher were selected as the participants of the study. As for the data, six 30-minute reading comprehension tasks with the interval of every two weeks were videotaped, while each participant was provided with a sensitive MP3 player to keep track of very low private speech markers. Instances of externalized private speech markers were coded and reports were generated for the patterns of private speech markers regarding their form and content. While a high number of literal translation, metalanguage, and switching to L1 mid-utterance were reported, the generated number of such private markers as self-directed questions, reading aloud, reviewing, and self-explanations in L2 was comparatively less which could be due to low L2 proficiency of the learners. The findings of the study, besides highlighting the importance of paying more attention to private speech as a mediating tool in cognitive regulation of learners in doing tasks in L2, suggest that teachers’ type of classroom practice is effective in production of private speech. Pedagogically speaking, the results suggest that instead of seeing L1 private speech markers as detrimental to L2 learning, they should be seen as signs of cognitive regulation when facing challenging tasks.
Carbonell, Kathy M.
One of the lasting concerns in audiology is the unexplained individual differences in speech perception performance even for individuals with similar audiograms. One proposal is that there are cognitive/perceptual individual differences underlying this vulnerability and that these differences are present in normal hearing (NH) individuals but do not reveal themselves in studies that use clear speech produced in quiet (because of a ceiling effect). However, previous studies have failed to uncover cognitive/perceptual variables that explain much of the variance in NH performance on more challenging degraded speech tasks. This lack of strong correlations may be due to either examining the wrong measures (e.g., working memory capacity) or to there being no reliable differences in degraded speech performance in NH listeners (i.e., variability in performance is due to measurement noise). The proposed project has 3 aims; the first, is to establish whether there are reliable individual differences in degraded speech performance for NH listeners that are sustained both across degradation types (speech in noise, compressed speech, noise-vocoded speech) and across multiple testing sessions. The second aim is to establish whether there are reliable differences in NH listeners' ability to adapt their phonetic categories based on short-term statistics both across tasks and across sessions; and finally, to determine whether performance on degraded speech perception tasks are correlated with performance on phonetic adaptability tasks, thus establishing a possible explanatory variable for individual differences in speech perception for NH and hearing impaired listeners.
Full Text Available This paper presents new Czech language two-channel (stereo speech database recorded in car environment. The created database was designed for experiments with speech enhancement for communication purposes and for the study and the design of a robust speech recognition systems. Tools for automated phoneme labelling based on Baum-Welch re-estimation were realised. The noise analysis of the car background environment was done.
Pollak, P.; Vopicka, J.; Hanzl, V.; Sovka, Pavel
This paper presents new Czech language two-channel (stereo) speech database recorded in car environment. The created database was designed for experiments with speech enhancement for communication purposes and for the study and the design of a robust speech recognition systems. Tools for automated phoneme labelling based on Baum-Welch re-estimation were realised. The noise analysis of the car background environment was done.
Paparrizos, Spyridon; Matzarakis, Andreas
The determination of heat requirements in the first developing phases of plants has been expressed as Growing Degree Days (GDD). The current study focuses on three selected study areas in Greece that are characterised by different climatic conditions due to their location and aims to assess the future variation and spatial distribution of Growing Degree Days (GDD) and how these can affect the main cultivations in the study areas. Future temperature data were obtained and analysed by the ENSEMBLES project. The analysis was performed for the future periods 2021-2050 and 2071-2100 with the A1B and B1 scenarios. Spatial distribution was performed using a combination of dynamical and statistical downscaling technique through ArcGIS 10.2.1. The results indicated that for all the future periods and scenarios, the GDD are expected to increase. Furthermore, the increase in the Sperchios River basin will be the highest, followed by the Ardas and the Geropotamos River basins. Moreover, the cultivation period will be shifted from April-October to April-September which will have social, economical and environmental benefits. Additionally, the spatial distribution indicated that in the upcoming years the existing cultivations can find favourable conditions and can be expanded in mountainous areas as well. On the other hand, due to the rough topography that exists in the study areas, the wide expansion of the existing cultivations into higher altitudes is unaffordable. Nevertheless, new more profitable cultivations can be introduced which can find propitious conditions in terms of GDD.
Flottmeyer, L; Fries, A
Since the late 60s, reality-oriented books for children and young people have increasingly turned to subject-matters and issues involving social/societal criticism, among them the theme of "being disabled". In the discussion on the degree to which media, and books in particular, do affect children's attitudes and socialization, it has been underlined that media take effect in the development of specific attitudinal patterns and behavioural dispositions in those cases where the recipient has not already formed a "completed" opinion of the topic at hand. This in particular is true in children of primary school age, and above all relates to their view of the disabled person. Six selected children's books were reviewed critically, based on a catalogue of criteria permitting coverage of as wide as spectrum as possible of "physical disability" and allied subjects. Summarizing, it is noted that the books reviewed do give children the opportunity, and partly in an excellent manner, of gaining insights into the situation of disabled persons. The potential for didactical treatment in primary classrooms is pointed out.
Benisty, Henri; Lupu, Anatole
The evolving field of optics for information and communication is currently seeking directions to expand the data rates in all concerned devices, fiber-based or on chips. We describe here two possibilities where the new concept of PT-symmetry in optics [1,2] can be exploited to help high data rate operation, considering either transverse or longitudinal aspects of modal selection, and assuming that data are carried using precise modes. The first aspect is transverse multimode transport. In this case, a fiber or a waveguide carries a few modes, say 4 to 16, and at nodes, they have to undergo a demux/mux operation to add or drop a subset of them, as much as possible without affecting the others. We shall consider to this end the operation as described in ref.  : if a PT-symmetric "potential", which essentially consists of a transverse gain-loss profile with antisymmetry, is applied to a waveguide, it has a very different impact on the different modes and mode families in the waveguide. One can in particular find situations where only two modes of the passive waveguide to be analyzed may enter into a gain regime, and not the other ones. From this scheme and others , we will discuss what is the road left towards an actual device, either in dielectrics or in case plasmonics is envisioned , i.e. with rather constant losses, but the possible advantage of miniaturization. The second aspect is longitudinal mode selection. The special transport properties of PT-symmetric Bragg gratings are now well established. In order to be used within a data management system, attention has to be paid to the rejection rate of Bragg gratings, and to the flatness of their response in the targeted window. To this end, a slow modulation of both real and imaginary parts of the periodic pattern of the basically PT-symmetric waveguide can help, in the general spirit of "apodization", but now with more parameters. We will detail some aspects of the designs introduced in  , notably
Full Text Available The REVEALS model is a tool for recalculating pollen data into vegetation abundances on a regional scale. We explored the general effect of selected parameters by performing simulations and ascertained the best model setting for the Czech Republic using the shallowest samples from 120 fossil sites and data on actual regional vegetation (60 km radius. Vegetation proportions of 17 taxa were obtained by combining the CORINE Land Cover map with forest inventories, agricultural statistics and habitat mapping data. Our simulation shows that changing the site radius for all taxa substantially affects REVEALS estimates of taxa with heavy or light pollen grains. Decreasing the site radius has a similar effect as increasing the wind speed parameter. However, adjusting the site radius to 1 m for local taxa only (even taxa with light pollen yields lower, more correct estimates despite their high pollen signal. Increasing the background radius does not affect the estimates significantly. Our comparison of estimates with actual vegetation in seven regions shows that the most accurate relative pollen productivity estimates (PPEs come from Central Europe and Southern Sweden. The initial simulation and pollen data yielded unrealistic estimates for Abies under the default setting of the wind speed parameter (3 m/s. We therefore propose the setting of 4 m/s, which corresponds to the spring average in most regions of the Czech Republic studied. Ad hoc adjustment of PPEs with this setting improves the match 3-4-fold. We consider these values (apart from four exceptions to be appropriate, because they are within the ranges of standard errors, so they are related to original PPEs. Setting a 1 m radius for local taxa (Alnus, Salix, Poaceae significantly improves the match between estimates and actual vegetation. However, further adjustments to PPEs exceed the ranges of original values, so their relevance is uncertain.
Mehta, G; Cutler, A
Although spontaneous speech occurs more frequently in most listeners' experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ considerably, however, which suggests that laboratory results may not generalise to the recognition of spontaneous speech. In the present study listeners were presented with both spontaneous and read speech materials, and their response time to detect word-initial target phonemes was measured. Responses were, overall, equally fast in each speech mode. However, analysis of effects previously reported in phoneme detection studies revealed significant differences between speech modes. In read speech but not in spontaneous speech, later targets were detected more rapidly than targets preceded by short words. In contrast, in spontaneous speech but not in read speech, targets were detected more rapidly in accented than in unaccented words and in strong than in weak syllables. An explanation for this pattern is offered in terms of characteristic prosodic differences between spontaneous and read speech. The results support claims from previous work that listeners pay great attention to prosodic information in the process of recognising speech.
Jørgensen, Søren; Cubick, Jens; Dau, Torsten
In the development process of modern telecommunication systems, such as mobile phones, it is common practice to use computer models to objectively evaluate the transmission quality of the system, instead of time-consuming perceptual listening tests. Such models have typically focused on the quality...... of the transmitted speech, while little or no attention has been provided to speech intelligibility. The present study investigated to what extent three state-of-the art speech intelligibility models could predict the intelligibility of noisy speech transmitted through mobile phones. Sentences from the Danish...... Dantale II speech material were mixed with three different kinds of background noise, transmitted through three different mobile phones, and recorded at the receiver via a local network simulator. The speech intelligibility of the transmitted sentences was assessed by six normal-hearing listeners...
Faundez-Zanuy, Marcos; Esposito, Antonietta; Cordasco, Gennaro; Drugman, Thomas; Solé-Casals, Jordi; Morabito, Francesco
This book presents recent advances in nonlinear speech processing beyond nonlinear techniques. It shows that it exploits heuristic and psychological models of human interaction in order to succeed in the implementations of socially believable VUIs and applications for human health and psychological support. The book takes into account the multifunctional role of speech and what is “outside of the box” (see Björn Schuller’s foreword). To this aim, the book is organized in 6 sections, each collecting a small number of short chapters reporting advances “inside” and “outside” themes related to nonlinear speech research. The themes emphasize theoretical and practical issues for modelling socially believable speech interfaces, ranging from efforts to capture the nature of sound changes in linguistic contexts and the timing nature of speech; labors to identify and detect speech features that help in the diagnosis of psychological and neuronal disease, attempts to improve the effectiveness and performa...
Macdonald, Ewen N; Raufer, Stefan
The Lombard effect refers to the phenomenon where talkers automatically increase their level of speech in a noisy environment. While many studies have characterized how the Lombard effect influences different measures of speech production (e.g., F0, spectral tilt, etc.), few have investigated...... the consequences of temporally fluctuating noise. In the present study, 20 talkers produced speech in a variety of noise conditions, including both steady-state and amplitude-modulated white noise. While listening to noise over headphones, talkers produced randomly generated five word sentences. Similar...... of noisy environments and will alter their speech accordingly....
Full Text Available Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.
Monkey vocal tracts are capable of producing monkey speech, not the full range of articulate human speech. The evolution of human speech entailed both anatomy and brains. Fitch, de Boer, Mathur, and Ghazanfar in Science Advances claim that "monkey vocal tracts are speech-ready," and conclude that "…the evolution of human speech capabilities required neural change rather than modifications of vocal anatomy." Neither premise is consistent either with the data presented and the conclusions reached by de Boer and Fitch themselves in their own published papers on the role of anatomy in the evolution of human speech or with the body of independent studies published since the 1950s.
The heterogeneous pathology of many autoimmune diseases warrants the continual discovery and development of new drugs. Drawing on selected oral presentations and selected poster displays, this article highlights some new developments in the pharmacological validation of molecular targets implicated in inflammatory autoimmune disease and may be of direct importance to scientists working in this field. This report describes the current state of the pharmacology of selected drugs and targets which may have utility in modulating immune function and autoimmune inflammatory disease. Many new molecules are progressing through clinical development for the treatment of rheumatological diseases. The value of the basic nonclinical and clinical research presented is to further pharmacological knowledge of the molecule, better understand the benefit-risk associated with clinical development and to assist in supporting the potential position of a new drug in the current treatment paradigm.
Full Text Available This paper presents the current state of knowledge concerning the examination of the impact of increased temperatures on changes of geomechanical properties of rocks. Based on historical data, the shape of stress–strain characteristics that illustrate the process of the destruction of rock samples as a result of load impact under uniaxial compression in a testing machine, were discussed. The results from the studies on changes in the basic strength and elasticity parameters of rocks, such as the compressive strength and Young’s modulus were compared. On their basis, it was found that temperature has a significant effect on the change of geomechanical properties of rocks. The nature of these changes also depends on other factors (apart from temperature. They are, among others: the mineral composition of rock, the porosity and density. The research analysis showed that changes in the rock by heating it at various temperatures and then uniaxially loading it in a testing machine, are different for different rock types. Most of the important processes that cause changes in the values of the strength parameters of the examined rocks occured in the temperature range of 400 to 600 °C.
Johnson, Benjamin K.; Ranzini, Giulia
Sharing mass media content through social network sites has become a prevalent practice that provides individuals with social utility and cultural capital. This behavior is examined here by testing how different self-presentational motivations may produce selective patterns of sharing media content
Johnson, Benjamin K.; Ranzini, Giulia
Sharing mass media content through social network sites has become a prevalent practice that provides individuals with social utility and cultural capital. This behavior is examined here by testing how different self-presentational motivations may produce selective patterns of sharing media content
Full Text Available During the last years, the researchers and therapists in speech therapy have been more and more concerned with the elaboration and use of computer programs in speech disorders therapy. The main objective of this study was to evaluate the therapeutic effectiveness of computer-based programs for the Romanian language in speech therapy. Along the study, we will present the experimental research through assessing the effectiveness of computer programs in the speech therapy for speech disorders: dyslalia, dyslexia and dysgraphia. Methodologically, the use of the computer in the therapeutic phases was carried out with the help of some computer-based programs (Logomon, Dislex-Test etc. that we elaborated and we experimented with during several years of therapeutic activity. The sample used in our experiments was composed of 120 subjects; two groups of 60 children with speech disorders were selected for both speech disorders: 30 for the experimental ('computer-based' group and 30 for the control ('classical method' group. The study hypotheses verified whether the results, obtained by the subjects within the experimental group, improved significantly after using the computer-based program, compared to the subjects within the control group, who did not use this program but got a classical therapy. The hypotheses were confirmed for the speech disorders included in this research; the conclusions of the study confirm the advantages of using computer-based programs within speech therapy by correcting these disorders, as well as due to the positive influence these programs have on the development of children’s personality.
Maas, Edwin; Mailend, Marja-Liisa
Purpose: The purpose of this article is to present an argument for the use of online reaction time (RT) methods to the study of apraxia of speech (AOS) and to review the existing small literature in this area and the contributions it has made to our fundamental understanding of speech planning (deficits) in AOS. Method: Following a brief…
Basharirad, Babak; Moradhaseli, Mohammadreza
Recently, attention of the emotional speech signals research has been boosted in human machine interfaces due to availability of high computation capability. There are many systems proposed in the literature to identify the emotional state through speech. Selection of suitable feature sets, design of a proper classifications methods and prepare an appropriate dataset are the main key issues of speech emotion recognition systems. This paper critically analyzed the current available approaches of speech emotion recognition methods based on the three evaluating parameters (feature set, classification of features, accurately usage). In addition, this paper also evaluates the performance and limitations of available methods. Furthermore, it highlights the current promising direction for improvement of speech emotion recognition systems.
Balasubramanian, Venu; Max, Ludo
The present study reports on the first case of crossed apraxia of speech (CAS) in a 69-year-old right-handed female (SE). The possibility of occurrence of apraxia of speech (AOS) following right hemisphere lesion is discussed in the context of known occurrences of ideomotor apraxias and acquired neurogenic stuttering in several cases with right…
Shriberg, Lawrence D.; Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.
Purpose: Previous articles in this supplement described rationale for and development of the pause marker (PM), a diagnostic marker of childhood apraxia of speech (CAS), and studies supporting its validity and reliability. The present article assesses the theoretical coherence of the PM with speech processing deficits in CAS. Method: PM and other…
This paper presents a study of pulmonic ingressive speech, a severely understudied phenomenon within varieties of English. While ingressive speech has been reported for several parts of the British Isles, New England, and eastern Canada, thus far Newfoundland appears to be the only locality where researchers have managed to provide substantial…
In this book, we introduce the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition. The specific models treated in depth include the widely used exponential-family distributions and the hidden Markov model. A detailed study is presented on unifying the common objective functions for discriminative learning in speech recognition, namely maximum mutual information (MMI), minimum classification error, and minimum phone/word error. The unification is presented, with rigorous mathematical analysis, in a common rational-functio
Doukas, Nikolaos; Bardis, Nikolaos G.
Speech recognition systems allow human - machine communication to acquire an intuitive nature that approaches the simplicity of inter - human communication. Small vocabulary speech recognition is a subset of the overall speech recognition problem, where only a small number of words need to be recognized. Speaker independent small vocabulary recognition can find significant applications in field equipment used by military personnel. Such equipment may typically be controlled by a small number of commands that need to be given quickly and accurately, under conditions where delicate manual operations are difficult to achieve. This type of application could hence significantly benefit by the use of robust voice operated control components, as they would facilitate the interaction with their users and render it much more reliable in times of crisis. This paper presents current challenges involved in attaining efficient and robust small vocabulary speech recognition. These challenges concern feature selection, classification techniques, speaker diversity and noise effects. A state machine approach is presented that facilitates the voice guidance of different equipment in a variety of situations.
... to being completely unable to speak or understand speech. Causes include Hearing disorders and deafness Voice problems, ... or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism ...
speech processing area are faced . He presents speech communication as an interactive process, in which the listener actively reconstructs the message...speech produced by these systems. Finally, perhaps the greatest recent impetus in advancing digital Finally, in the area of speech and speaker recognitio
Full Text Available The neural basis of speech perception has been debated for over a century. While it is generally agreed that the superior temporal lobes are critical for the perceptual analysis of speech, a major current topic is whether the motor system contributes to speech perception, with several conflicting findings attested. In a dorsal-ventral speech stream framework (Hickok & Poeppel 2007, this debate is essentially about the roles of the dorsal versus ventral speech processing streams. A major roadblock in characterizing the neuroanatomy of speech perception is task-specific effects. For example, much of the evidence for dorsal stream involvement comes from syllable discrimination type tasks, which have been found to behaviorally doubly dissociate from auditory comprehension tasks (Baker et al. 1981. Discrimination task deficits could be a result of difficulty perceiving the sounds themselves, which is the typical assumption, or it could be a result of failures in temporary maintenance of the sensory traces, or the comparison and/or the decision process. Similar complications arise in perceiving sentences: the extent of inferior frontal (i.e. dorsal stream activation during listening to sentences increases as a function of increased task demands (Love et al. 2006. Another complication is the stimulus: much evidence for dorsal stream involvement uses speech samples lacking semantic context (CVs, non-words. The present study addresses these issues in a large-scale lesion-symptom mapping study. 158 patients with focal cerebral lesions from the Mutli-site Aphasia Research Consortium underwent a structural MRI or CT scan, as well as an extensive psycholinguistic battery. Voxel-based lesion symptom mapping was used to compare the neuroanatomy involved in the following speech perception tasks with varying phonological, semantic, and task loads: (i two discrimination tasks of syllables (non-words and words, respectively, (ii two auditory comprehension tasks
Kim, Yoonkyung; Baek, Young Min
This study investigates the relationship between selective self-presentation and online life satisfaction, and how this relationship is influenced by respondents' perceptions of "self" (operationalized by "self-esteem") and "others" (operationalized by "social trust"). Relying on survey data from 712 Korean online users, two important findings were detected in our study. First, the positive relationship between selective self-presentation and online life satisfaction becomes more prominent among people with low self-esteem compared to those with high self-esteem, and second, this positive relationship is enhanced among people with high levels of social trust compared to those with low trust levels. Theoretical and practical implications of our findings as well as potential limitations are discussed.
Barberena,Luciana da Silva; Brasil,Brunah de Castro; Melo,Roberta Michelon; Mezzomo,Carolina Lisbôa; Mota,Helena Bolli; Keske-Soares,Márcia
PURPOSE: To present recent studies that used the ultrasound in the fields of Speech Language Pathology and Audiology, which evidence possibilities of the applicability of this technique in different subareas. RESEARCH STRATEGY: A bibliographic research was carried out in the PubMed database, using the keywords "ultrasonic," "speech," "phonetics," "Speech, Language and Hearing Sciences," "voice," "deglutition," and "myofunctional therapy," comprising some areas of Speech Language Pathology and...
Barberena, Luciana da Silva; Brasil, Brunah de Castro; Melo, Roberta Michelon; Mezzomo, Carolina Lisbôa; Mota, Helena Bolli; Keske-Soares, Márcia
PURPOSE: To present recent studies that used the ultrasound in the fields of Speech Language Pathology and Audiology, which evidence possibilities of the applicability of this technique in different subareas. RESEARCH STRATEGY: A bibliographic research was carried out in the PubMed database, using the keywords "ultrasonic," "speech," "phonetics," "Speech, Language and Hearing Sciences," "voice," "deglutition," and "myofunctional therapy," comprising some areas of Speech Language Patholog...
Full Text Available One of the major problems concerning the evolution of human language is to understand how sounds became associated to meaningful gestures. It has been proposed that the circuit controlling gestures and speech evolved from a circuit involved in the control of arm and mouth movements related to ingestion. This circuit contributed to the evolution of spoken language, moving from a system of communication based on arm gestures. The discovery of the mirror neurons has provided strong support for the gestural theory of speech origin because they offer a natural substrate for the embodiment of language and create a direct link between sender and receiver of a message. Behavioural studies indicate that manual gestures are linked to mouth movements used for syllable emission. Grasping with the hand selectively affected movement of inner or outer parts of the mouth according to syllable pronunciation and hand postures, in addition to hand actions, influenced the control of mouth grasp and vocalization. Gestures and words are also related to each other. It was found that when producing communicative gestures (emblems the intention to interact directly with a conspecific was transferred from gestures to words, inducing modification in voice parameters. Transfer effects of the meaning of representational gestures were found on both vocalizations and meaningful words. It has been concluded that the results of our studies suggest the existence of a system relating gesture to vocalization which was precursor of a more general system reciprocally relating gesture to word.
Peach, Richard K
The features of apraxia of speech (AOS) are presented with regard to both traditional and contemporary descriptions of the disorder. Models of speech processing, including the neurological bases for apraxia of speech, are discussed. Recent findings concerning subcortical contributions to apraxia of speech and the role of the insula are presented. The key features to differentially diagnose AOS from related speech syndromes are identified. Treatment implications derived from motor accounts of AOS are presented along with a summary of current approaches designed to treat the various subcomponents of the disorder. Finally, guidelines are provided for treating the AOS patient with coexisting aphasia.
Johnson, Benjamin K.; Ranzini, Giulia
Sharing mass media content through social network sites has become a prevalent practice that provides individuals with social utility and cultural capital. This behavior is examined here by testing how different self-presentational motivations may produce selective patterns of sharing media content in social networks. An other-ideal motive was expected to drive sharing of popular media, an own-ideal motive was expected to drive sharing of prestigious media, and an actual-self motive was expec...
Phifer, Gregg, Ed.
The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…
Arnett, Ronald C.
To examine the theoretical status of ethics scholarship and to explore the historical and present directions of ethics in human communication research, this paper reviews more than 100 articles drawn from the speech communication literature. Following a brief introduction that sets forth the criteria for article selection, the paper discusses…
Giddan, Jane J.; And Others
Presents the symptoms of selective mutism and historical background for treatment. It provides a case study which illustrates successful multidisciplinary treatment outcomes for a child who was selectively mute. Issues relevant to speech-language pathologists working with elementary school children are discussed and treatment guidelines provided.…
Bartos, Anthony L; Cipr, Tomas; Nelson, Douglas J; Schwarz, Petr; Banowetz, John; Jerabek, Ladislav
A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (-10 dB ≤ SNR < +10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR ≥ +10 dB).
Zhao, Jianhua; Zeng, Haishan; Kalia, Sunil; Lui, Harvey
Background: Raman spectroscopy is a non-invasive optical technique which can measure molecular vibrational modes within tissue. A large-scale clinical study (n = 518) has demonstrated that real-time Raman spectroscopy could distinguish malignant from benign skin lesions with good diagnostic accuracy; this was validated by a follow-up independent study (n = 127). Objective: Most of the previous diagnostic algorithms have typically been based on analyzing the full band of the Raman spectra, either in the fingerprint or high wavenumber regions. Our objective in this presentation is to explore wavenumber selection based analysis in Raman spectroscopy for skin cancer diagnosis. Methods: A wavenumber selection algorithm was implemented using variably-sized wavenumber windows, which were determined by the correlation coefficient between wavenumbers. Wavenumber windows were chosen based on accumulated frequency from leave-one-out cross-validated stepwise regression or least and shrinkage selection operator (LASSO). The diagnostic algorithms were then generated from the selected wavenumber windows using multivariate statistical analyses, including principal component and general discriminant analysis (PC-GDA) and partial least squares (PLS). A total cohort of 645 confirmed lesions from 573 patients encompassing skin cancers, precancers and benign skin lesions were included. Lesion measurements were divided into training cohort (n = 518) and testing cohort (n = 127) according to the measurement time. Result: The area under the receiver operating characteristic curve (ROC) improved from 0.861-0.891 to 0.891-0.911 and the diagnostic specificity for sensitivity levels of 0.99-0.90 increased respectively from 0.17-0.65 to 0.20-0.75 by selecting specific wavenumber windows for analysis. Conclusion: Wavenumber selection based analysis in Raman spectroscopy improves skin cancer diagnostic specificity at high sensitivity levels.
Meurer William J
Full Text Available Abstract Background Clinical documentation systems, such as templates, have been associated with process utilization. The T-System emergency department (ED templates are widely used but lacking are analyses of the templates association with processes. This system is also unique because of the many different template options available, and thus the selection of the template may also be important. We aimed to describe the selection of templates in ED dizziness presentations and to investigate the association between items on templates and process utilization. Methods Dizziness visits were captured from a population-based study of EDs that use documentation templates. Two relevant process outcomes were assessed: head computerized tomography (CT scan and nystagmus examination. Multivariable logistic regression was used to estimate the probability of each outcome for patients who did or did not receive a relevant-item template. Propensity scores were also used to adjust for selection effects. Results The final cohort was 1,485 visits. Thirty-one different templates were used. Use of a template with a head CT item was associated with an increase in the adjusted probability of head CT utilization from 12.2% (95% CI, 8.9%-16.6% to 29.3% (95% CI, 26.0%-32.9%. The adjusted probability of documentation of a nystagmus assessment increased from 12.0% (95%CI, 8.8%-16.2% when a nystagmus-item template was not used to 95.0% (95% CI, 92.8%-96.6% when a nystagmus-item template was used. The associations remained significant after propensity score adjustments. Conclusions Providers use many different templates in dizziness presentations. Important differences exist in the various templates and the template that is used likely impacts process utilization, even though selection may be arbitrary. The optimal design and selection of templates may offer a feasible and effective opportunity to improve care delivery.
Hitch, Graham J.; And Others
Reports on experiments to determine effects of overt speech on children's use of inner speech in short-term memory. Word length and phonemic similarity had greater effects on older children and when pictures were labeled at presentation. Suggests that speaking or listening to speech activates an internal articulatory loop. (Author/GH)
Gauvin, Hanna; De Baene, W.; Brass, Marcel; Hartsuiker, Robert
To minimize the number of errors in speech, and thereby facilitate communication, speech is monitored before articulation. It is, however, unclear at which level during speech production monitoring takes place, and what mechanisms are used to detect and correct errors. The present study investigated
Weismer, Gary; Yunusova, Yana; Bunton, Kate
The purpose of this paper is to review and evaluate measures of speech production that could be used to document effects of Deep Brain Stimulation (DBS) on speech performance, especially in persons with Parkinson disease (PD). A small set of evaluative criteria for these measures is presented first, followed by consideration of several speech physiology and speech acoustic measures that have been studied frequently and reported on in the literature on normal speech production, and speech production affected by neuromotor disorders (dysarthria). Each measure is reviewed and evaluated against the evaluative criteria. Embedded within this review and evaluation is a presentation of new data relating speech motions to speech intelligibility measures in speakers with PD, amyotrophic lateral sclerosis (ALS), and control speakers (CS). These data are used to support the conclusion that at the present time the slope of second formant transitions (F2 slope), an acoustic measure, is well suited to make inferences to speech motion and to predict speech intelligibility. The use of other measures should not be ruled out, however, and we encourage further development of evaluative criteria for speech measures designed to probe the effects of DBS or any treatment with potential effects on speech production and communication skills. PMID:24932066
Fontan, Lionel; Tardieu, Julien; Gaillard, Pascal; Woisard, Virginie; Ruiz, Robert
Purpose: The authors investigated the relationship between the intelligibility and comprehension of speech presented in babble noise. Method: Forty participants listened to French imperative sentences (commands for moving objects) in a multitalker babble background for which intensity was experimentally controlled. Participants were instructed to…
Standard articulation tests are not always sensitive enough to discriminate between speech samples which are of high intelligibility. One can increase the sensitivity of such tests by presenting the test materials in noise. In this way, small differences in intelligibility can be magnified into
Bryant, Gregory A; Barrett, H Clark
In all languages studied to date, distinct prosodic contours characterize different intention categories of infant-directed (ID) speech. This vocal behavior likely exists universally as a species-typical trait, but little research has examined whether listeners can accurately recognize intentions in ID speech using only vocal cues, without access to semantic information. We recorded native-English-speaking mothers producing four intention categories of utterances (prohibition, approval, comfort, and attention) as both ID and adult-directed (AD) speech, and we then presented the utterances to Shuar adults (South American hunter-horticulturalists). Shuar subjects were able to reliably distinguish ID from AD speech and were able to reliably recognize the intention categories in both types of speech, although performance was significantly better with ID speech. This is the first demonstration that adult listeners in an indigenous, nonindustrialized, and nonliterate culture can accurately infer intentions from both ID speech and AD speech in a language they do not speak.
Spiel, G; Brunner, E; Allmayer, B; Pletz, A
Speech disabilities (articulation deficits) and language disorders--expressive (vocabulary) receptive (language comprehension) are not uncommon in children. An overview of these along with a global description of the impairment of communication as well as clinical characteristics of language developmental disorders are presented in this article. The diagnostic tables, which are applied in the European and Anglo-American speech areas, ICD-10 and DSM-IV, have been explained and compared. Because of their strengths and weaknesses an alternative classification of language and speech developmental disorders is proposed, which allows a differentiation between expressive and receptive language capabilities with regard to the semantic and the morphological/syntax domains. Prevalence and comorbidity rates, psychosocial influences, biological factors and the biological social interaction have been discussed. The necessity of the use of standardized examinations is emphasised. General logopaedic treatment paradigms, specific therapy concepts and an overview of prognosis have been described.
Chevillet, Mark A; Jiang, Xiong; Rauschecker, Josef P; Riesenhuber, Maximilian
Debates about motor theories of speech perception have recently been reignited by a burst of reports implicating premotor cortex (PMC) in speech perception. Often, however, these debates conflate perceptual and decision processes. Evidence that PMC activity correlates with task difficulty and subject performance suggests that PMC might be recruited, in certain cases, to facilitate category judgments about speech sounds (rather than speech perception, which involves decoding of sounds). However, it remains unclear whether PMC does, indeed, exhibit neural selectivity that is relevant for speech decisions. Further, it is unknown whether PMC activity in such cases reflects input via the dorsal or ventral auditory pathway, and whether PMC processing of speech is automatic or task-dependent. In a novel modified categorization paradigm, we presented human subjects with paired speech sounds from a phonetic continuum but diverted their attention from phoneme category using a challenging dichotic listening task. Using fMRI rapid adaptation to probe neural selectivity, we observed acoustic-phonetic selectivity in left anterior and left posterior auditory cortical regions. Conversely, we observed phoneme-category selectivity in left PMC that correlated with explicit phoneme-categorization performance measured after scanning, suggesting that PMC recruitment can account for performance on phoneme-categorization tasks. Structural equation modeling revealed connectivity from posterior, but not anterior, auditory cortex to PMC, suggesting a dorsal route for auditory input to PMC. Our results provide evidence for an account of speech processing in which the dorsal stream mediates automatic sensorimotor integration of speech and may be recruited to support speech decision tasks.
Deroche, Mickael L D; Culling, John F; Chatterjee, Monita
Harmonic complexes that generate highly modulated temporal envelopes on the basilar membrane (BM) mask a tone less effectively than complexes that generate relatively flat temporal envelopes, because the non-linear active gain of the BM selectively amplifies a low-level tone in the dips of a modulated masker envelope. The present study examines a similar effect in speech recognition. Speech reception thresholds (SRTs) were measured for a voice masked by harmonic complexes with partials in sine phase (SP) or in random phase (RP). The masker's fundamental frequency (F0) was 50, 100 or 200 Hz. SRTs were considerably lower for SP than for RP maskers at 50-Hz F0, but the two converged at 100-Hz F0, while at 200-Hz F0, SRTs were a little higher for SP than RP maskers. The results were similar whether the target voice was male or female and whether the masker's spectral profile was flat or speech-shaped. Although listening in the masker dips has been shown to play a large role for artificial stimuli such as Schroeder-phase complexes at high levels, it contributes weakly to speech recognition in the presence of harmonic maskers with different crest factors at more moderate sound levels (65 dB SPL). Copyright © 2013 Elsevier B.V. All rights reserved.
Tan, Zheng-Hua; Lindberg, Børge
present a low-complexity and effective frame selection approach based on a posteriori signal-to-noise ratio (SNR) weighted energy distance: The use of an energy distance, instead of e.g. a standard cepstral distance, makes the approach computationally efficient and enables fine granularity search......Frame based speech processing inherently assumes a stationary behavior of speech signals in a short period of time. Over a long time, the characteristics of the signals can change significantly and frames are not equally important, underscoring the need for frame selection. In this paper, we......, and the use of a posteriori SNR weighting emphasizes the reliable regions in noisy speech signals. It is experimentally found that the approach is able to assign a higher frame rate to fast changing events such as consonants, a lower frame rate to steady regions like vowels and no frames to silence, even...
Full Text Available Textbooks play an important role in English Language Teaching (ELT, particularly in the English as a Foreign Language (EFL context where it provides the primary linguistic input. The present research was an attempt to comparatively evaluate the Touchstone series in terms of compliment and complaint speech acts. Four Touchstone textbooks (Book 1, Book 2, Book 3, and Book 4 were selected and content analysis was done using Olshtain and Weinbach’s (1993 complaint strategies and Wolfson and Manes’ (1980 classification of compliment. The frequencies and percentages of compliments and complaint speech acts were obtained. Data analysis showed that, first, the total frequency of the complaint speech act was higher in Touchstone, Book 4 than the other three textbooks; second, the frequency of complaint and compliment speech acts in the Writing section was quite low, but the Conversation section had a high frequency of compliment speech act in the Touchstone series; third, the expression of annoyance or disapproval complaint strategy was frequently used in the Touchstone series; fourth, the compliment strategy of ‘noun phrase + looks/is (intensifier adjective’ was very frequent in the Touchstone series; finally, there was a significant difference between the frequencies of the two speech acts, in general, in the four Touchstone textbooks. Considering the weaknesses and strengthens of Touchstone series, implications for teachers, material developers, and textbook writers are provided.
Eskelund, Kasper; Andersen, Tobias
Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate...... of audiovisual integration specific to speech perception. However, the results of Tuomainen et al. might have been influenced by another effect. When observers were naïve, they had little motivation to look at the face. When informed, they knew that the face was relevant for the task and this could increase...... visual detection task. In our first experiment, observers presented with congruent and incongruent audiovisual sine-wave speech stimuli did only show a McGurk effect when informed of the speech nature of the stimulus. Performance on the secondary visual task was very good, thus supporting the finding...
Schalling, Ellika; Hartelius, Lena
Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.
Full Text Available Huntington’s disease (HD has been described as a genetic condition caused by a mutation in the CAG (cytosine-adenine-guanine nucleotide sequence. Depending on the stage of the disease, people may have difficulties in speech, language and swallowing. The purpose of this paper is to describe these difficulties in detail, as well as to provide an account on speech and language therapy approach to this condition. Regarding speech, it is worth noticing that characteristics typical of hyperkinetic dysarthria can be found due to underlying choreic movements. The speech of people with HD tends to show shorter sentences, with much simpler syntactic structures, and difficulties in tasks that require complex cognitive processing. Moreover, swallowing may present dysphagia that progresses as the disease develops. A timely, comprehensive and effective speech-language intervention is essential to improve the quality of life of people and contribute to their communicative welfare.
) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting...... understanding speech when more than one person is talking, even when reduced audibility has been fully compensated for by a hearing aid. The reasons for these difficulties are not well understood. This presentation highlights recent concepts of the monaural and binaural signal processing strategies employed...... by the normal as well as impaired auditory system. Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII...
Potencial evocado auditivo de longa latência para estímulo de fala apresentado com diferentes transdutores em crianças ouvintes Late auditory evoked potentials to speech stimuli presented with different transducers in hearing children
Raquel Sampaio Agostinho-Pesse
Full Text Available OBJETIVO: analisar, de forma comparativa, a influência do transdutor no registro dos componentes P1, N1 e P2 eliciados por estímulo de fala, quanto à latência e à amplitude, em crianças ouvintes. MÉTODO: 30 crianças ouvintes de quatro a 12 anos de idade, de ambos os sexos. Os potenciais evocados auditivos de longa latência foram pesquisados por meio dos transdutores, fone de inserção e caixa acústica, eliciados por estímulo de fala /da/, sendo o intervalo interestímulos de 526ms, a intensidade de 70dBNA e a taxa de apresentação de 1,9 estímulos por segundo. Foram analisados os componentes P1, N1 e P2 quando presentes, quanto à latência e à amplitude. RESULTADOS: constatou-se um nível de concordância forte entre a pesquisadora e o juiz. Não houve diferença estatisticamente significante ao comparar os valores de latência e amplitude dos componentes P1, N1 e P2, ao considerar sexo e orelha, assim como para a latência dos componentes quando analisado os tipos de transdutores. Entretanto, houve diferença estatisticamente significante para a amplitude dos componentes P1 e N1, com maior amplitude para o transdutor caixa acústica. CONCLUSÃO: os valores de latência dos componentes P1, N1 e P2 e amplitude de P2 obtidos com fone de inserção podem ser utilizados como referência de normalidade independente do transdutor utilizado para a pesquisa dos potenciais evocados auditivos de longa latência.PURPOSE: to analyze, in a comparative manner, the influence of the transducer on the recordings of P1, N1 and P2components elicited through speech stimulus, as to the latency and amplitude in hearing children. METHOD: the sample was comprised of 30 hearing children aged 4-12 yrs, both genders. The long latency auditory evoked potentials were researched by means of transducers, insertion phone and speakers, elicited through speech stimulus /da/ presented with interstimuli interval of 526ms, the intensity of 70dBNA and presentation
Gopi, E S
Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.
Full Text Available Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants’ attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4-13 months of age were exposed to happy-sounding infant-directed speech versus hummed lullabies by the same woman. They listened significantly longer to the speech, which had considerably greater acoustic variability and expressiveness, than to the lullabies. In Experiment 2, infants of comparable age who heard the lyrics of a Turkish children’s song spoken versus sung in a joyful/happy manner did not exhibit differential listening. Infants in Experiment 3 heard the happily sung lyrics of the Turkish children’s song versus a version that was spoken in an adult-directed or affectively neutral manner. They listened significantly longer to the sung version. Overall, happy voice quality rather than vocal mode (speech or singing was the principal contributor to infant attention, regardless of age.
Kayasith, Prakasith; Theeramunkong, Thanaruk
It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.
Cooper, Angela; Brouwer, Susanne; Bradlow, Ann R
Speech processing can often take place in adverse listening conditions that involve the mixing of speech and background noise. In this study, we investigated processing dependencies between background noise and indexical speech features, using a speeded classification paradigm (Garner, 1974; Exp. 1), and whether background noise is encoded and represented in memory for spoken words in a continuous recognition memory paradigm (Exp. 2). Whether or not the noise spectrally overlapped with the speech signal was also manipulated. The results of Experiment 1 indicated that background noise and indexical features of speech (gender, talker identity) cannot be completely segregated during processing, even when the two auditory streams are spectrally nonoverlapping. Perceptual interference was asymmetric, whereby irrelevant indexical feature variation in the speech signal slowed noise classification to a greater extent than irrelevant noise variation slowed speech classification. This asymmetry may stem from the fact that speech features have greater functional relevance to listeners, and are thus more difficult to selectively ignore than background noise. Experiment 2 revealed that a recognition cost for words embedded in different types of background noise on the first and second occurrences only emerged when the noise and the speech signal were spectrally overlapping. Together, these data suggest integral processing of speech and background noise, modulated by the level of processing and the spectral separation of the speech and noise.
This work combines interdisciplinary knowledge and experience from research fields of psychology, linguistics, audio-processing, machine learning, and computer science. The work systematically explores a novel research topic devoted to automated modeling of personality expression from speech. For this aim, it introduces a novel personality assessment questionnaire and presents the results of extensive labeling sessions to annotate the speech data with personality assessments. It provides estimates of the Big 5 personality traits, i.e. openness, conscientiousness, extroversion, agreeableness, and neuroticism. Based on a database built on the questionnaire, the book presents models to tell apart different personality types or classes from speech automatically.
Togneri, Roberto; Narasimha, Madihally
This book describes the basic principles underlying the generation, coding, transmission and enhancement of speech and audio signals, including advanced statistical and machine learning techniques for speech and speaker recognition with an overview of the key innovations in these areas. Key research undertaken in speech coding, speech enhancement, speech recognition, emotion recognition and speaker diarization are also presented, along with recent advances and new paradigms in these areas. · Offers readers a single-source reference on the significant applications of speech and audio processing to speech coding, speech enhancement and speech/speaker recognition. Enables readers involved in algorithm development and implementation issues for speech coding to understand the historical development and future challenges in speech coding research; · Discusses speech coding methods yielding bit-streams that are multi-rate and scalable for Voice-over-IP (VoIP) Networks; · �...
Full Text Available Background: Since making communication with others is the most important function of speech, undoubtedly, any type of disorder in speech will affect the human communicability with others. The objective of the study was to investigate reasons behind the [high] prevalence rate of stammer, producing disorders and aglossia.Materials and Methods: This descriptive-analytical study was conducted on 118 male and female students, who were studying in a primary school in Zahedan; they had referred to the Speech Therapy Centers of Zahedan University of Medical Sciences in a period of seven months. The speech therapist examinations, diagnosis tools common in speech therapy, Spielberg Children Trait and also patients' cases were used to find the reasons behind the [high] prevalence rate of speech disorders. Results: Psychological causes had the highest rate of correlation with the speech disorders among the other factors affecting the speech disorders. After psychological causes, family history and age of the subjects are the other factors which may bring about the speech disorders (P<0.05. Bilingualism and birth order has a negative relationship with the speech disorders. Likewise, another result of this study shows that only psychological causes, social causes, hereditary causes and age of subjects can predict the speech disorders (P<0.05.Conclusion: The present study shows that the speech disorders have a strong and close relationship with the psychological causes at the first step and also history of family and age of individuals at the next steps.
Troejbom, Mats; Grolander, Sara
This report is a background report for the biosphere analysis of the SR-Site Safety Assessment. This work aims to describe the future development of the chemical conditions at Forsmark, based on the present chemical conditions at landscape level taking landscape development and climate cases into consideration. The results presented contribute to the overall understanding of the present and future chemistry in the Forsmark area, and specifically, to the understanding of the behaviour of some selected radionuclides in the surface system. The future development of the chemistry at the site is qualitatively discussed with focus on the interglacial within the next 10,000 years. The effects on the chemical environment of future climate cases as Global Warming and cold permafrost climates are also briefly discussed. The work is presented in two independent parts describing background radionuclide activities in the Forsmark area and the distribution and behaviour of a large number of stable elements in the landscape. In a concluding section, implications of the future chemical environment of a selection of radionuclides important in the Safety Assessment are discussed based on the knowledge of stable elements. The broad range of elements studied show that there are general and expected patterns for the distribution and behaviour in the landscape of different groups of elements. Mass balances reveal major sources and sinks, pool estimations show where elements are accumulated in the landscape and estimations of time-scales give indications of the potential future development. This general knowledge is transferred to radionuclides not measured in order to estimate their behaviour and distribution in the landscape. It could be concluded that the future development of the chemical environment in the Forsmark area might affect element specific parameters used in de radionuclide model in different directions depending on element. The alternative climate cases, Global Warming
Troejbom, Mats (Mats Troejbom Konsult AB (Sweden)); Grolander, Sara (Facilia AB (Sweden))
This report is a background report for the biosphere analysis of the SR-Site Safety Assessment. This work aims to describe the future development of the chemical conditions at Forsmark, based on the present chemical conditions at landscape level taking landscape development and climate cases into consideration. The results presented contribute to the overall understanding of the present and future chemistry in the Forsmark area, and specifically, to the understanding of the behaviour of some selected radionuclides in the surface system. The future development of the chemistry at the site is qualitatively discussed with focus on the interglacial within the next 10,000 years. The effects on the chemical environment of future climate cases as Global Warming and cold permafrost climates are also briefly discussed. The work is presented in two independent parts describing background radionuclide activities in the Forsmark area and the distribution and behaviour of a large number of stable elements in the landscape. In a concluding section, implications of the future chemical environment of a selection of radionuclides important in the Safety Assessment are discussed based on the knowledge of stable elements. The broad range of elements studied show that there are general and expected patterns for the distribution and behaviour in the landscape of different groups of elements. Mass balances reveal major sources and sinks, pool estimations show where elements are accumulated in the landscape and estimations of time-scales give indications of the potential future development. This general knowledge is transferred to radionuclides not measured in order to estimate their behaviour and distribution in the landscape. It could be concluded that the future development of the chemical environment in the Forsmark area might affect element specific parameters used in de radionuclide model in different directions depending on element. The alternative climate cases, Global Warming
Sayyahi, Fateme; Soleymani, Zahra; Akbari, Mohammad; Bijankhan, Mahmood; Dolatshahi, Behrooz
The present study examined the relationship between gap detection threshold and speech error consistency in children with speech sound disorder. The participants were children five to six years of age who were categorized into three groups of typical speech, consistent speech disorder (CSD) and inconsistent speech disorder (ISD).The phonetic gap detection threshold test was used for this study, which is a valid test comprised six syllables with inter-stimulus intervals between 20-300ms. The participants were asked to listen to the recorded stimuli three times and indicate whether they heard one or two sounds. There was no significant difference between the typical and CSD groups (p=0.55), but there were significant differences in performance between the ISD and CSD groups and the ISD and typical groups (p=0.00). The ISD group discriminated between speech sounds at a higher threshold. Children with inconsistent speech errors could not distinguish speech sounds during time-limited phonetic discrimination. It is suggested that inconsistency in speech is a representation of inconsistency in auditory perception, which causes by high gap detection threshold. Copyright © 2016 Elsevier Ltd. All rights reserved.
Theys, Catherine; van Wieringen, Astrid; De Nil, Luc F.
This study presents survey data on 58 Dutch-speaking patients with neurogenic stuttering following various neurological injuries. Stroke was the most prevalent cause of stuttering in our patients, followed by traumatic brain injury, neurodegenerative diseases, and other causes. Speech and non-speech characteristics were analyzed separately for…
Full Text Available In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR. Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A, reference at 1 meter at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB. Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda. Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for
Meyer, Julien; Dentel, Laure; Meunier, Fanny
In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR). Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A), reference at 1 meter) at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB). Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda). Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for future studies.
Mahananda, Baiju; Raju, C. M. S.; Patil, Ramalinga Reddy; Jha, Narayana; Varakhedi, Shrinivasa; Kishore, Prahallad
This paper describes about the work done in building a prototype text to speech system for Sanskrit. A basic prototype text-to-speech is built using a simplified Sanskrit phone set, and employing a unit selection technique, where prerecorded sub-word units are concatenated to synthesize a sentence. We also discuss the issues involved in building a full-fledged text-to-speech for Sanskrit.
"Ultra Low Bit-Rate Speech Coding" focuses on the specialized topic of speech coding at very low bit-rates of 1 Kbits/sec and less, particularly at the lower ends of this range, down to 100 bps. The authors set forth the fundamental results and trends that form the basis for such ultra low bit-rates to be viable and provide a comprehensive overview of various techniques and systems in literature to date, with particular attention to their work in the paradigm of unit-selection based segment quantization. The book is for research students, academic faculty and researchers, and industry practitioners in the areas of speech processing and speech coding.
Sandor, Aniko; Moses, Haifa
Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.
Nijholt, Antinus; Cappellini, V.; Hemsley, J.
We discuss having virtual presenters in virtual environments that present information to visitors of these environments. Some current research is surveyed and we will look in particular to our research in the context of a virtual meeting room where a virtual presenter uses speech, gestures, pointing
An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...
It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the
The present study has surveyed post-editor trainees’ views and attitudes before and after the introduction of speech technology as a front end to a computer-aided translation workbench. The aim of the survey was (i) to identify attitudes and perceptions among post-editor trainees before performing...... a post-editing task using automatic speech recognition (ASR); and (ii) to assess the degree to which post-editors’ attitudes and expectations to the use of speech technology changed after actually using it. The survey was based on two questionnaires: the first one administered before the participants...
Wen, Guihua; Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang
Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.
Inner speech—also known as covert speech or verbal thinking—has been implicated in theories of cognitive development, speech monitoring, executive function, and psychopathology. Despite a growing body of knowledge on its phenomenology, development, and function, approaches to the scientific study of inner speech have remained diffuse and largely unintegrated. This review examines prominent theoretical approaches to inner speech and methodological challenges in its study, before reviewing current evidence on inner speech in children and adults from both typical and atypical populations. We conclude by considering prospects for an integrated cognitive science of inner speech, and present a multicomponent model of the phenomenon informed by developmental, cognitive, and psycholinguistic considerations. Despite its variability among individuals and across the life span, inner speech appears to perform significant functions in human cognition, which in some cases reflect its developmental origins and its sharing of resources with other cognitive processes. PMID:26011789
The main aim of this paper is to investigate speech pauses and gestures as means to engage the audience and present the humorous message in an effective way. The data consist of two speeches by the USA president Barack Obama at the 2011 and 2016 Annual White House Correspondents’ Association Dinner...... produced significantly more hand gestures in 2016 than in 2011. An analysis of the hand gestures produced by Barack Obama in two political speeches held at the United Nations in 2011 and 2016 confirms that the president produced significantly less communicative co-speech hand gestures during his speeches...... and they emphasise the speech segment which they follow or precede. We also found a highly significant correlation between Obama’s speech pauses and audience response. Obama produces numerous head movements, facial expressions and hand gestures and their functions are related to both discourse content and structure...
Carla Ciceri Cesa
Full Text Available ABSTRACT Purpose: to investigate the qualification of the speech language and hearing therapists and their clinical performance with Augmentative and Alternative Communication. Methods: a descriptive, transversal, individual and contemporary study. Data were collected through a questionnaire, filled by twenty-four speech therapists, selected by a convenience sample. Content analysis was chosen for data study. Results: regarding access to the information media, all speech therapists in the sample presented the initiative to supply the absence of language training with Augmentative and Alternative Communication by different means. Regarding the dual focus on intervention, all speech therapists were favorable to this practice. However, according to experience, they reported resistance from the family, school and other therapists. The results showed two different types of introduction implementation and use of Augmentative and Alternative Communication, predominantly formed by strategies contemplating the pragmatic use of language through the contextualization of significant activities for the user. The other way used the Picture Exchange Communication System. Conclusion: the speech-language and hearing therapists in the present study inserted different interlocutors in the intervention, guided by implicit or explicit linguistic principles, by theoretical frameworks specific to the area of Augmentative and Alternative Communication knowledge, by global neuromotor elements and, finally, by principles of functionality and general wellness.
Patri, Jean-François; Diard, Julien; Perrier, Pascal
The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.
Kühn, Simone; Brass, Marcel; Gallinat, Jürgen
The so-called embodiment of communication has attracted considerable interest. Recently a growing number of studies have proposed a link between Broca's area's involvement in action processing and its involvement in speech. The present quantitative meta-analysis set out to test whether neuroimaging studies on imitation and overt speech show overlap within inferior frontal gyrus. By means of activation likelihood estimation (ALE), we investigated concurrence of brain regions activated by object-free hand imitation studies as well as overt speech studies including simple syllable and more complex word production. We found direct overlap between imitation and speech in bilateral pars opercularis (BA 44) within Broca's area. Subtraction analyses revealed no unique localization neither for speech nor for imitation. To verify the potential of ALE subtraction analysis to detect unique involvement within Broca's area, we contrasted the results of a meta-analysis on motor inhibition and imitation and found separable regions involved for imitation. This is the first meta-analysis to compare the neural correlates of imitation and overt speech. The results are in line with the proposed evolutionary roots of speech in imitation.
Cataldo, Dana Michelle; Migliano, Andrea Bamberg; Vinicius, Lucio
The 'technological hypothesis' proposes that gestural language evolved in early hominins to enable the cultural transmission of stone tool-making skills, with speech appearing later in response to the complex lithic industries of more recent hominins. However, no flintknapping study has assessed the efficiency of speech alone (unassisted by gesture) as a tool-making transmission aid. Here we show that subjects instructed by speech alone underperform in stone tool-making experiments in comparison to subjects instructed through either gesture alone or 'full language' (gesture plus speech), and also report lower satisfaction with their received instruction. The results provide evidence that gesture was likely to be selected over speech as a teaching aid in the earliest hominin tool-makers; that speech could not have replaced gesturing as a tool-making teaching aid in later hominins, possibly explaining the functional retention of gesturing in the full language of modern humans; and that speech may have evolved for reasons unrelated to tool-making. We conclude that speech is unlikely to have evolved as tool-making teaching aid superior to gesture, as claimed by the technological hypothesis, and therefore alternative views should be considered. For example, gestural language may have evolved to enable tool-making in earlier hominins, while speech may have later emerged as a response to increased trade and more complex inter- and intra-group interactions in Middle Pleistocene ancestors of Neanderthals and Homo sapiens; or gesture and speech may have evolved in parallel rather than in sequence.
Madsen, Sara Miay Kim; Whiteford, Kelly L.; Oxenham, Andrew J.
Recent studies disagree on whether musicians have an advantage over non-musicians in understanding speech in noise. However, it has been suggested that musicians may be able to use diferences in fundamental frequency (F0) to better understand target speech in the presence of interfering talkers....... Here we studied a relatively large (N=60) cohort of young adults, equally divided between nonmusicians and highly trained musicians, to test whether the musicians were better able to understand speech either in noise or in a two-talker competing speech masker. The target speech and competing speech...... were presented with either their natural F0 contours or on a monotone F0, and the F0 diference between the target and masker was systematically varied. As expected, speech intelligibility improved with increasing F0 diference between the target and the two-talker masker for both natural and monotone...
Jacks, Adam; Mathes, Katey A.; Marquardt, Thomas P.
Purpose: To investigate the hypothesis that vowel production is more variable in adults with acquired apraxia of speech (AOS) relative to healthy individuals with unimpaired speech. Vowel formant frequency measures were selected as the specific target of focus. Method: Seven adults with AOS and aphasia produced 15 repetitions of 6 American English…
O'Connell, Daniel C.; Kowal, Sabine; Sabin, Edward J.; Lamia, John F.; Dannevik, Margaret
Our purpose in the following was to investigate the start-up rhetoric employed by U.S. President Barack Obama in his speeches. The initial 5 min from eight of his speeches from May to September of 2009 were selected for their variety of setting, audience, theme, and purpose. It was generally hypothesized that Barack Obama, widely recognized for…
Day, Kimberly L.; Smith, Cynthia L.; Neal, Amy; Dunsmore, Julie C.
Research Findings: In addition to being a regulatory strategy, children's private speech may enhance or interfere with their effortful control used to regulate emotion. The goal of the current study was to investigate whether children's private speech during a selective attention task moderated the relations of their effortful control to their…
Meijers, A.W.M.; Tsohatzidis, S.L.
From its early development in the 1960s, speech act theory always had an individualistic orientation. It focused exclusively on speech acts performed by individual agents. Paradigmatic examples are ‘I promise that p’, ‘I order that p’, and ‘I declare that p’. There is a single speaker and a single
Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…
Kane, Peter E., Ed.
The 11 articles in this collection deal with theoretical and practical freedom of speech issues. The topics covered are (1) the United States Supreme Court and communication theory; (2) truth, knowledge, and a democratic respect for diversity; (3) denial of freedom of speech in Jock Yablonski's campaign for the presidency of the United Mine…
Shearer, William M.
Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…
Kane, Peter E., Ed.
This issue of "Free Speech" contains the following articles: "Daniel Schoor Relieved of Reporting Duties" by Laurence Stern, "The Sellout at CBS" by Michael Harrington, "Defending Dan Schorr" by Tome Wicker, "Speech to the Washington Press Club, February 25, 1976" by Daniel Schorr, "Funds…
Full Text Available This contribution presents two spectral amplitude estimators for acoustical background noise suppression based on maximum a posteriori estimation and super-Gaussian statistical modelling of the speech DFT amplitudes. The probability density function of the speech spectral amplitude is modelled with a simple parametric function, which allows a high approximation accuracy for Laplace- or Gamma-distributed real and imaginary parts of the speech DFT coefficients. Also, the statistical model can be adapted to optimally fit the distribution of the speech spectral amplitudes for a specific noise reduction system. Based on the super-Gaussian statistical model, computationally efficient maximum a posteriori speech estimators are derived, which outperform the commonly applied Ephraim-Malah algorithm.
Bailly, G.; Theune, Mariet; Meijs, Koen; Campbell, N.; Hamza, W.; Heylen, Dirk K.J.; Ordelman, Roeland J.F.; Hoge, H.; Jianhua, T.
Work on expressive speech synthesis has long focused on the expression of basic emotions. In recent years, however, interest in other expressive styles has been increasing. The research presented in this paper aims at the generation of a storytelling speaking style, which is suitable for storytelling applications and more in general, for applications aimed at children. Based on an analysis of human storytellers' speech, we designed and implemented a set of prosodic rules for converting "neutr...
Asakawa, Kaori; Tanaka, Akihiro; Imai, Hisato
We investigated whether audiovisual synchrony perception for speech could change after observation of the audiovisual temporal mismatch. Previous studies have revealed that audiovisual synchrony perception is re-calibrated after exposure to a constant timing difference between auditory and visual signals in non-speech. In the present study, we examined whether this audiovisual temporal recalibration occurs at the perceptual level even for speech (monosyllables). In Experiment 1, participants performed an audiovisual simultaneity judgment task (i.e., a direct measurement of the audiovisual synchrony perception) in terms of the speech signal after observation of the speech stimuli which had a constant audiovisual lag. The results showed that the “simultaneous” responses (i.e., proportion of responses for which participants judged the auditory and visual stimuli to be synchronous) at least partly depended on exposure lag. In Experiment 2, we adopted the McGurk identification task (i.e., an indirect measurement of the audiovisual synchrony perception) to exclude the possibility that this modulation of synchrony perception was solely attributable to the response strategy using stimuli identical to those of Experiment 1. The characteristics of the McGurk effect reported by participants depended on exposure lag. Thus, it was shown that audiovisual synchrony perception for speech could be modulated following exposure to constant lag both in direct and indirect measurement. Our results suggest that temporal recalibration occurs not only in non-speech signals but also in monosyllabic speech at the perceptual level.
Vogel, Adam P; Folker, Joanne; Poole, Matthew L
Hereditary ataxia syndromes can result in significant speech impairment, a symptom thought to be responsive to treatment. The type of speech impairment most commonly reported in hereditary ataxias is dysarthria. Dysarthria is a collective term referring to a group of movement disorders affecting the muscular control of speech. Dysarthria affects the ability of individuals to communicate and to participate in society. This in turn reduces quality of life. Given the harmful impact of speech disorder on a person's functioning, treatment of speech impairment in these conditions is important and evidence-based interventions are needed. To assess the effects of interventions for speech disorder in adults and children with Friedreich ataxia and other hereditary ataxias. On 14 October 2013, we searched the Cochrane Neuromuscular Disease Group Specialized Register, CENTRAL, MEDLINE, EMBASE, CINAHL Plus, PsycINFO, Education Resources Information Center (ERIC), Linguistics and Language Behavior Abstracts (LLBA), Dissertation Abstracts and trials registries. We checked all references in the identified trials to identify any additional published data. We considered for inclusion randomised controlled trials (RCTs) or quasi-RCTs that compared treatments for hereditary ataxias with no treatment, placebo or another treatment or combination of treatments, where investigators measured speech production. Two review authors independently selected trials for inclusion, extracted data and assessed the risk of bias of included studies using the standard methodological procedures expected by The Cochrane Collaboration. The review authors collected information on adverse effects from included studies. We did not conduct a meta-analysis as no two studies utilised the same assessment procedures within the same treatment. Fourteen clinical trials, involving 721 participants, met the criteria for inclusion in the review. Thirteen studies compared a pharmaceutical treatment with placebo (or a
Howe, Heather; Barnett, David
This consultation description reports parent and teacher problem solving for a preschool child with no typical speech directed to teachers or peers, and, by parent report, normal speech at home. This child's initial pattern of speech was similar to selective mutism, a low-incidence disorder often first detected during the preschool years, but…
Full Text Available Bolaji Fapohunda,1 Nosakhare Orobaton1,21International Division, John Snow Inc, Rosslyn, VA, USA; 2Targeted States High Impact Project (TSHIP, Bauchi, NigeriaAbstract: This paper examines the effects of demographic, socioeconomic, and women's autonomy factors on the utilization of delivery assistance in Sokoto State, Nigeria. Data were obtained from the Nigeria 2008 Demographic and Health Survey (DHS. Bivariate analysis and logistic regression procedures were conducted. The study revealed that delivery with no one present and with unskilled attendance accounted for roughly 95% of all births in Sokoto State. Mothers with existing high risk factors, including higher parity, were more likely to select unsafe/unskilled delivery practices than younger, lower-parity mothers. Evidenced by the high prevalence of delivery with traditional birth attendants, this study demonstrates that expectant mothers are willing to obtain care from a provider, and their odds of using accessible, affordable, skilled delivery is high, should such an option be presented. This conclusion is supported by the high correlation between a mother's socioeconomic status and the likelihood of using skilled attendance. To improve the access to, and increase the affordability of, skilled health attendants, we recommended two solutions: 1 the use of cash subsidies to augment women's incomes in order to reduce finance-related barriers in the use of formal health services, thus increasing demand; and 2 a structural improvement that will increase women's economic security by improving their access to higher education, income, and urban ideation.Keywords: Sokoto State, delivery attendance, maternal mortality rate, maternal health, reproductive health, demographic and health surveys, poverty
Başkent, Deniz; Gaudrain, Etienne
Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level
Novelli-Olmstead, Tina; Ling, Daniel
Seven hearing impaired children (five to seven years old) assigned to the Speakers group made highly significant gains in speech production and auditory discrimination of speech, while Listeners made only slight speech production gains and no gains in auditory discrimination. Combined speech and auditory training was more effective than auditory…
Menegueti, Katia Ignacio; Mangilli, Laura Davison; Alonso, Nivaldo; Andrade, Claudia Regina Furquim de
To characterize the profile and speech characteristics of patients undergoing primary palatoplasty in a Brazilian university hospital, considering the time of intervention (early, before two years of age; late, after two years of age). Participants were 97 patients of both genders with cleft palate and/or cleft and lip palate, assigned to the Speech-language Pathology Department, who had been submitted to primary palatoplasty and presented no prior history of speech-language therapy. Patients were divided into two groups: early intervention group (EIG) - 43 patients undergoing primary palatoplasty before 2 years of age and late intervention group (LIG) - 54 patients undergoing primary palatoplasty after 2 years of age. All patients underwent speech-language pathology assessment. The following parameters were assessed: resonance classification, presence of nasal turbulence, presence of weak intraoral air pressure, presence of audible nasal air emission, speech understandability, and compensatory articulation disorder (CAD). At statistical significance level of 5% (p≤0.05), no significant difference was observed between the groups in the following parameters: resonance classification (p=0.067); level of hypernasality (p=0.113), presence of nasal turbulence (p=0.179); presence of weak intraoral air pressure (p=0.152); presence of nasal air emission (p=0.369), and speech understandability (p=0.113). The groups differed with respect to presence of compensatory articulation disorders (p=0.020), with the LIG presenting higher occurrence of altered phonemes. It was possible to assess the general profile and speech characteristics of the study participants. Patients submitted to early primary palatoplasty present better speech profile.
Schafer, Phillip B; Jin, Dezhe Z
Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.
Heimbauer, Lisa A; Beran, Michael J; Owren, Michael J
A long-standing debate concerns whether humans are specialized for speech perception, which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words, asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuographic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users. Experiment 2 tested "impossibly unspeechlike" sine-wave (SW) synthesis, which reduces speech to just three moving tones. Although receiving only intermittent and noncontingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate but improved in experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human. Copyright © 2011 Elsevier Ltd. All rights reserved.
Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.
Magalhães, Ana Tereza de Matos; Goffi-Gomez, Maria Valéria Schmidt; Hoshino, Ana Cristina; Tsuji, Robinson Koji; Bento, Ricardo Ferreira; Brito, Rubens
New technology in the Freedom® speech processor for cochlear implants was developed to improve how incoming acoustic sound is processed; this applies not only for new users, but also for previous generations of cochlear implants. To identify the contribution of this technology-- the Nucleus 22®--on speech perception tests in silence and in noise, and on audiometric thresholds. A cross-sectional cohort study was undertaken. Seventeen patients were selected. The last map based on the Spectra® was revised and optimized before starting the tests. Troubleshooting was used to identify malfunction. To identify the contribution of the Freedom® technology for the Nucleus22®, auditory thresholds and speech perception tests were performed in free field in sound-proof booths. Recorded monosyllables and sentences in silence and in noise (SNR = 0dB) were presented at 60 dBSPL. The nonparametric Wilcoxon test for paired data was used to compare groups. Freedom® applied for the Nucleus22® showed a statistically significant difference in all speech perception tests and audiometric thresholds. The Freedom® technology improved the performance of speech perception and audiometric thresholds of patients with Nucleus 22®.
Kovacevic, Branko; Veinović, Mladen; Marković, Milan
This book focuses on speech signal phenomena, presenting a robustification of the usual speech generation models with regard to the presumed types of excitation signals, which is equivalent to the introduction of a class of nonlinear models and the corresponding criterion functions for parameter estimation. Compared to the general class of nonlinear models, such as various neural networks, these models possess good properties of controlled complexity, the option of working in “online” mode, as well as a low information volume for efficient speech encoding and transmission. Providing comprehensive insights, the book is based on the authors’ research, which has already been published, supplemented by additional texts discussing general considerations of speech modeling, linear predictive analysis and robust parameter estimation.
Full Text Available This study describes the methodology used for designing a database of speech under real stress. Based on limits of existing stress databases, we used a communication task via a computer game to collect speech data. To validate the presence of stress, known psychophysiological indicators such as heart rate and electrodermal activity, as well as subjective self-assessment were used. This paper presents the data from first 5 speakers (3 men, 2 women who participated in initial tests of the proposed design. In 4 out of 5 speakers increases in fundamental frequency and intensity of speech were registered. Similarly, in 4 out of 5 speakers heart rate was significantly increased during the task, when compared with reference measurement from before the task. These first results show that proposed design might be appropriate for building a speech under stress database. However, there are still considerations that need to be addressed.
Beare, Paul; Torgerson, Colleen; Creviston, Cindy
"Selective mutism" is the term used to describe a disorder in which a person speaks only in restricted stimulus situations. Examination of single-subject research concerning selective mutism reveals the most popular and successful interventions to instate speech involve a combination of behavior modification procedures. The present research…
Toshio Hirai; Seiichi Tenpaku; Kiyohiro Shikano
The definition of "phoneme boundary timing" in a speech corpus affects the quality of concatenative speech synthesis systems. For example, if the selected speech unit is not appropriately match to the speech unit of the required phoneme environment, the quality may be degraded. In this paper, a dynamic segment boundary defi- nition is proposed. In the definition, the concatenation point is chosen from the start or end timings of spectral transition depending on the phoneme environment at the ...
Bentsen, Thomas; Kressner, Abigail Anne; Dau, Torsten
Computational speech segregation aims to automatically segregate speech from interfering noise, often by employing ideal binary mask estimation. Several studies have tried to exploit contextual information in speech to improve mask estimation accuracy by using two frequently-used strategies that (1...... for measured intelligibility. The findings may have implications for the design of speech segregation systems, and for the selection of a cost function that correlates with intelligibility....
Harley, Trevor A.
Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…
Gubiani, Marileda Barichello; Pagliarin, Karina Carlesso; Keske-Soares, Marcia
This study systematically reviews the literature on the main tools used to evaluate childhood apraxia of speech (CAS). The search strategy includes Scopus, PubMed, and Embase databases. Empirical studies that used tools for assessing CAS were selected. Articles were selected by two independent researchers. The search retrieved 695 articles, out of which 12 were included in the study. Five tools were identified: Verbal Motor Production Assessment for Children, Dynamic Evaluation of Motor Speech Skill, The Orofacial Praxis Test, Kaufman Speech Praxis Test for Children, and Madison Speech Assessment Protocol. There are few instruments available for CAS assessment and most of them are intended to assess praxis and/or orofacial movements, sequences of orofacial movements, articulation of syllables and phonemes, spontaneous speech, and prosody. There are some tests for assessment and diagnosis of CAS. However, few studies on this topic have been conducted at the national level, as well as protocols to assess and assist in an accurate diagnosis.
Paparrizos, Spyridon; Maris, Fotios; Matzarakis, Andreas
The assessment of future precipitation variations prevailing in an area is essential for the research regarding climate and climate change. The current paper focuses on 3 selected areas in Greece that present different climatic characteristics due to their location and aims to assess and compare the future variation of annual and seasonal precipitation. Future precipitation data from the ENSEMBLES anthropogenic climate-change (ACC) global simulations and the Climate version of the Local Model (CLM) were obtained and analyzed. The climate simulations were performed for the future periods 2021-2050 and 2071-2100 under the A1B and B1 scenarios. Mann-Kendall test was applied to investigate possible trends. Spatial distribution of precipitation was performed using a combination of dynamic and statistical downscaling techniques and Kriging method within ArcGIS 10.2.1. The results indicated that for both scenarios, reference periods and study areas, precipitation is expected to be critically decreased. Additionally, Mann-Kendall test application showed a strong downward trend for every study area. Furthermore, the decrease in precipitation for the Ardas River basin characterized by the continental climate will be tempered, while in the Sperchios River basin it will be smoother due to the influence of some minor climatic variations in the basins' springs in the highlands where milder conditions occur. Precipitation decrease in the Geropotamos River basin which is characterized by Mediterranean climate will be more vigorous. B1 scenario appeared more optimistic for the Ardas and Sperchios River basins, while in the Geropotamos River basin, both applied scenarios brought similar results, in terms of future precipitation response.
Full Text Available The learning-based speech recovery approach using statistical spectral conversion has been used for some kind of distorted speech as alaryngeal speech and body-conducted speech (or bone-conducted speech. This approach attempts to recover clean speech (undistorted speech from noisy speech (distorted speech by converting the statistical models of noisy speech into that of clean speech without the prior knowledge on characteristics and distributions of noise source. Presently, this approach has still not attracted many researchers to apply in general noisy speech enhancement because of some major problems: those are the difficulties of noise adaptation and the lack of noise robust synthesizable features in different noisy environments. In this paper, we adopted the methods of state-of-the-art voice conversions and speaker adaptation in speech recognition to the proposed speech recovery approach applied in different kinds of noisy environment, especially in adverse environments with joint compensation of additive and convolutive noises. We proposed to use the decorrelated wavelet packet coefficients as a low-dimensional robust synthesizable feature under noisy environments. We also proposed a noise adaptation for speech recovery with the eigennoise similar to the eigenvoice in voice conversion. The experimental results showed that the proposed approach highly outperformed traditional nonlearning-based approaches.
Actual spoken language of man developed only approximately 200,000 to 100,000 years ago. As a result of natural selection, man has developed hearing, which is most sensitive in the frequency regions of 200 to 4000 Hz, corresponding to those of spoken sounds. Functional hearing has been one of the prerequisites for the development of speech, although according to current opinion the language itself may have evolved by mimicking gestures with the so-called mirror neurons. Due to hearing, gesticulation was no longer necessary, and the hands became available for other purposes.
Mario T Carreon
Full Text Available This paper discusses the Speech and Phoneme Recognition as an Educational Aid for the Deaf and Hearing Impaired (SPREAD application and the ongoing research on its deployment as a tool for motivating deaf and hearing impaired students to learn and appreciate speech. This application uses the Sphinx-4 voice recognition system to analyze the vocalization of the student and provide prompt feedback on their pronunciation. The packaging of the application as an interactive game aims to provide additional motivation for the deaf and hearing impaired student through visual motivation for them to learn and appreciate speech.
Binderup, Lars Grassme
, as opposed to a legal norm, that curbs exercises of the right to free speech that offend the feelings or beliefs of members from other cultural groups. The paper rejects the suggestion that acceptance of such a norm is in line with liberal egalitarian thinking. Following a review of the classical liberal...... egalitarian reasons for free speech - reasons from overall welfare, from autonomy and from respect for the equality of citizens - it is argued that these reasons outweigh the proposed reasons for curbing culturally offensive speech. Currently controversial cases such as that of the Danish Cartoon Controversy...
Hemmer, Joseph J., Jr.
A study identified and evaluated the approach of small colleges in dealing with hate speech and/or verbal harassment incidents. A questionnaire was sent to the Dean of Students at 200 randomly-selected small (500-2000 students), private, liberal arts colleges and universities. Responses were received from 132 institutions, for a response rate of…
FRISINA, D. ROBERT
THIS REPROT DESCRIBES THE DESIGN OF A NEW SPEECH AND HEARING CENTER AND ITS INTEGRATION INTO THE OVERALL ARCHITECTURAL SCHEME OF THE CAMPUS. THE CIRCULAR SHAPE WAS SELECTED TO COMPLEMENT THE SURROUNDING STRUCTURES AND COMPENSATE FOR DIFFERENCES IN SITE, WHILE PROVIDING THE ACOUSTICAL ADVANTAGES OF NON-PARALLEL WALLS, AND FACILITATING TRAFFIC FLOW.…
Logan, J S; Greene, B G; Pisoni, D B
This paper reports the results of an investigation that employed the modified rhyme test (MRT) to measure the segmental intelligibility of synthetic speech generated automatically by rule. Synthetic speech produced by ten text-to-speech systems was studied and compared to natural speech. A variation of the standard MRT was also used to study the effects of response set size on perceptual confusions. Results indicated that the segmental intelligibility scores formed a continuum. Several systems displayed very high levels of performance that were close to or equal to scores obtained with natural speech; other systems displayed substantially worse performance compared to natural speech. The overall performance of the best system, DECtalk--Paul, was equivalent to the data obtained with natural speech for consonants in syllable-initial position. The findings from this study are discussed in terms of the use of a set of standardized procedures for measuring intelligibility of synthetic speech under controlled laboratory conditions. Recent work investigating the perception of synthetic speech under more severe conditions in which greater demands are made on the listener's processing resources is also considered. The wide range of intelligibility scores obtained in the present study demonstrates important differences in perception and suggests that not all synthetic speech is perceptually equivalent to the listener.
Logan, John S.; Greene, Beth G.; Pisoni, David B.
This paper reports the results of an investigation that employed the modified rhyme test (MRT) to measure the segmental intelligibility of synthetic speech generated automatically by rule. Synthetic speech produced by ten text-to-speech systems was studied and compared to natural speech. A variation of the standard MRT was also used to study the effects of response set size on perceptual confusions. Results indicated that the segmental intelligibility scores formed a continuum. Several systems displayed very high levels of performance that were close to or equal to scores obtained with natural speech; other systems displayed substantially worse performance compared to natural speech. The overall performance of the best system, DECtalk—Paul, was equivalent to the data obtained with natural speech for consonants in syllable-initial position. The findings from this study are discussed in terms of the use of a set of standardized procedures for measuring intelligibility of synthetic speech under controlled laboratory conditions. Recent work investigating the perception of synthetic speech under more severe conditions in which greater demands are made on the listener’s processing resources is also considered. The wide range of intelligibility scores obtained in the present study demonstrates important differences in perception and suggests that not all synthetic speech is perceptually equivalent to the listener. PMID:2527884
Conture, Edward G.; Walden, Tedra A.; Lambert, Warren E.
Purpose This study investigated the relation among speech-language dissociations, attentional distractibility, and childhood stuttering. Method Participants were 82 preschool-age children who stutter (CWS) and 120 who do not stutter (CWNS). Correlation-based statistics (Bates, Appelbaum, Salcedo, Saygin, & Pizzamiglio, 2003) identified dissociations across 5 norm-based speech-language subtests. The Behavioral Style Questionnaire Distractibility subscale measured attentional distractibility. Analyses addressed (a) between-groups differences in the number of children exhibiting speech-language dissociations; (b) between-groups distractibility differences; (c) the relation between distractibility and speech-language dissociations; and (d) whether interactions between distractibility and dissociations predicted the frequency of total, stuttered, and nonstuttered disfluencies. Results More preschool-age CWS exhibited speech-language dissociations compared with CWNS, and more boys exhibited dissociations compared with girls. In addition, male CWS were less distractible than female CWS and female CWNS. For CWS, but not CWNS, less distractibility (i.e., greater attention) was associated with more speech-language dissociations. Last, interactions between distractibility and dissociations did not predict speech disfluencies in CWS or CWNS. Conclusions The present findings suggest that for preschool-age CWS, attentional processes are associated with speech-language dissociations. Future investigations are warranted to better understand the directionality of effect of this association (e.g., inefficient attentional processes → speech-language dissociations vs. inefficient attentional processes ← speech-language dissociations). PMID:26126203
Teixeira, João Paulo; Fernandes, Anildo
Text-to-speech synthesis is the main subject treated in this work. It will be presented the constitution of a generic text-to-speech system conversion, explained the functions of the various modules and described the development techniques using the formants model. The development of a didactic formant synthesiser under Matlab environment will also be described. This didactic synthesiser is intended for a didactic understanding of the formant model of speech production.
Full Text Available This present study investigated the link between speech-in-speech perception capacities and four executive function components: response suppression, inhibitory control, switching and working memory. We constructed a cross-modal semantic priming paradigm using a written target word and a spoken prime word, implemented in one of two concurrent auditory sentences (cocktail party situation. The prime and target were semantically related or unrelated. Participants had to perform a lexical decision task on visual target words and simultaneously listen to only one of two pronounced sentences. The attention of the participant was manipulated: The prime was in the pronounced sentence listened to by the participant or in the ignored one. In addition, we evaluate the executive function abilities of participants (switching cost, inhibitory-control cost and response-suppression cost and their working memory span. Correlation analyses were performed between the executive and priming measurements. Our results showed a significant interaction effect between attention and semantic priming. We observed a significant priming effect in the attended but not in the ignored condition. Only priming effects obtained in the ignored condition were significantly correlated with some of the executive measurements. However, no correlation between priming effects and working memory capacity was found. Overall, these results confirm, first, the role of attention for semantic priming effect and, second, the implication of executive functions in speech-in-noise understanding capacities.
Cera, Maysa Luchesi; Ortiz, Karin Zazo; Bertolucci, Paulo Henrique Ferreira; Minett, Thaís Soares Cianciarullo
Alzheimer's disease (AD) affects not only memory but also other cognitive functions, such as orientation, language, praxis, attention, visual perception, or executive function. Most studies on oral communication in AD focus on aphasia; however, speech and orofacial apraxias are also present in these patients. The aim of this study was to investigate the presence of speech and orofacial apraxias in patients with AD with the hypothesis that apraxia severity is strongly correlated with disease severity. Ninety participants in different stages of AD (mild, moderate, and severe) underwent the following assessments: Clinical Dementia Rating, Mini-Mental State Examination, Lawton Instrumental Activities of Daily Living, a specific speech and orofacial praxis assessment, and the oral agility subtest of the Boston diagnostic aphasia examination. The mean age was 80.2 ± 7.2 years and 73% were women. Patients with AD had significantly lower scores than normal controls for speech praxis (mean difference=-2.9, 95% confidence interval (CI)=-3.3 to -2.4) and orofacial praxis (mean difference=-4.9, 95% CI=-5.4 to -4.3). Dementia severity was significantly associated with orofacial apraxia severity (moderate AD: β =-19.63, p= 0.011; and severe AD: β =-51.68, p speech apraxia severity (moderate AD: β = 7.07, p = 0.001; and severe AD: β =8.16, p Speech and orofacial apraxias were evident in patients with AD and became more pronounced with disease progression.
Cognitive performance may be significantly affected by fatigue. In the case of critical personnel, such as pilots, monitoring human fatigue is essential to ensure safety and success of a given operation. One of the modalities that may be used for this purpose is speech, which is sensitive to respiratory changes and increased muscle tension of vocal cords, induced by fatigue. Age, gender, vocal tract length, physical and emotional state may significantly alter speech intensity, duration, rhythm, and spectral characteristics. In addition to changes in speech rhythm, fatigue may also affect the quality of speech, such as articulation. In a noisy environment, detecting fatigue-related changes in speech signals, particularly subtle changes at the onset of fatigue, may be difficult. Therefore, in a performance-monitoring system, speech parameters which are significantly affected by fatigue need to be identified and extracted from input signals. For this purpose, a series of experiments was performed under slowly varying cognitive load conditions and at different times of the day. The results of the data analysis are presented here.
Keser, Zafer; Francisco, Gerard E
Almost 7 million adult Americans have had a stroke. There is a growing need for more effective treatment options as add-ons to conventional therapies. This article summarizes the published literature for pharmacologic agents used for the enhancement of motor and speech recovery after stroke. Amphetamine, levodopa, selective serotonin reuptake inhibitors, and piracetam were the most commonly used drugs. Pharmacologic augmentation of stroke motor and speech recovery seems promising but systematic, adequately powered, randomized, and double-blind clinical trials are needed. At this point, the use of these pharmacologic agents is not supported by class I evidence. Copyright © 2015 Elsevier Inc. All rights reserved.
Niebuhr, Oliver; Brem, Alexander; Novák-Tót, Eszter
to business speeches. Consistent with the public opinion, our findings are indicative of Steve Jobs being a more charismatic speaker than Mark Zuckerberg. Beyond previous studies, our data suggest that rhythm and emphatic accentuation are also involved in conveying charisma. Furthermore, the differences...... between Steve Jobs and Mark Zuckerberg and the investor- and customer-related sections of their speeches support the modern understanding of charisma as a gradual, multiparametric, and context-sensitive concept....
Vích, Robert; Vondra, Martin
Vol. 4775, - (2007), s. 129-137 ISSN 0302-9743. [COST Action 2102 International Workshop. Vietri sul Mare, 29.03.2007-31.03.2007] R&D Projects: GA AV ČR(CZ) 1ET301710509 Institutional research plan: CEZ:AV0Z20670512 Keywords : speech * speech processing * cepstral analysis Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering Impact factor: 0.302, year: 2005
Ben-David, Boaz M.; Multani, Namita; Shakuf, Vered; Rudzicz, Frank; van Lieshout, Pascal H. H. M.
Purpose: Our aim is to explore the complex interplay of prosody (tone of speech) and semantics (verbal content) in the perception of discrete emotions in speech. Method: We implement a novel tool, the Test for Rating of Emotions in Speech. Eighty native English speakers were presented with spoken sentences made of different combinations of 5…
Full Text Available The paper presented some conscientious and results from the finished research which showing the meaning of the prevention with speech therapy in the development of the speech. The research was done at Negotino and with that are shown the most frequent speech deficiency of the children at preschool age.
.... In this study, a speech recognition system is presented, specifically an isolated word recognizer which uses speech collected from the external auditory canals of the subjects via an in-ear microphone...
Badalamenti, A F
This paper presents evidence that six of the seven parts of speech occur in written text as Poisson processes, simple or recurring. The six major parts are nouns, verbs, adjectives, adverbs, prepositions, and conjunctions, with the interjection occurring too infrequently to support a model. The data consist of more than the first 5000 words of works by four major authors coded to label the parts of speech, as well as periods (sentence terminators). Sentence length is measured via the period and found to be normally distributed with no stochastic model identified for its occurrence. The models for all six speech parts but the noun significantly distinguish some pairs of authors and likewise for the joint use of all words types. Any one author is significantly distinguished from any other by at least one word type and sentence length very significantly distinguishes each from all others. The variety of word type use, measured by Shannon entropy, builds to about 90% of its maximum possible value. The rate constants for nouns are close to the fractions of maximum entropy achieved. This finding together with the stochastic models and the relations among them suggest that the noun may be a primitive organizer of written text.
Martin Ofelia POPESCU
Full Text Available The article presents a concise speech correction intervention program in of dyslalia in conjunction with capacity development of intra, interpersonal and social integration of children with speech disorders. The program main objectives represent: the potential increasing of individual social integration by correcting speech disorders in conjunction with intra- and interpersonal capacity, the potential growth of children and community groups for social integration by optimizing the socio-relational context of children with speech disorder. In the program were included 60 children / students with dyslalia speech disorders (monomorphic and polymorphic dyslalia, from 11 educational institutions - 6 kindergartens and 5 schools / secondary schools, joined with inter-school logopedic centre (CLI from Targu Jiu city and areas of Gorj district. The program was implemented under the assumption that therapeutic-formative intervention to correct speech disorders and facilitate the social integration will lead, in combination with correct pronunciation disorders, to social integration optimization of children with speech disorders. The results conirm the hypothesis and gives facts about the intervention program eficiency.
Hwee Ling eLee
Full Text Available This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogues of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms. Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past three years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.
Van Ackeren, Markus Johannes; Barbero, Francesca M; Mattioni, Stefania; Bottini, Roberto
The occipital cortex of early blind individuals (EB) activates during speech processing, challenging the notion of a hard-wired neurobiology of language. But, at what stage of speech processing do occipital regions participate in EB? Here we demonstrate that parieto-occipital regions in EB enhance their synchronization to acoustic fluctuations in human speech in the theta-range (corresponding to syllabic rate), irrespective of speech intelligibility. Crucially, enhanced synchronization to the intelligibility of speech was selectively observed in primary visual cortex in EB, suggesting that this region is at the interface between speech perception and comprehension. Moreover, EB showed overall enhanced functional connectivity between temporal and occipital cortices that are sensitive to speech intelligibility and altered directionality when compared to the sighted group. These findings suggest that the occipital cortex of the blind adopts an architecture that allows the tracking of speech material, and therefore does not fully abstract from the reorganized sensory inputs it receives. PMID:29338838
Jørgensen, Søren; Dau, Torsten
The speech-based envelope power spectrum model (sEPSM; ) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...
Rao, K Sreenivasa
Predicting Prosody from Text for Text-to-Speech Synthesis covers the specific aspects of prosody, mainly focusing on how to predict the prosodic information from linguistic text, and then how to exploit the predicted prosodic knowledge for various speech applications. Author K. Sreenivasa Rao discusses proposed methods along with state-of-the-art techniques for the acquisition and incorporation of prosodic knowledge for developing speech systems. Positional, contextual and phonological features are proposed for representing the linguistic and production constraints of the sound units present in the text. This book is intended for graduate students and researchers working in the area of speech processing.
Gao, Yayue; Cao, Shuyang; Qu, Tianshu; Wu, Xihong; Li, Haifeng; Zhang, Jinsheng; Li, Liang
In noisy, multipeople talking environments such as a cocktail party, listeners can use various perceptual and/or cognitive cues to improve recognition of target speech against masking, particularly informational masking. Previous studies have shown that temporally prepresented voice cues (voice primes) improve recognition of target speech against speech masking but not noise masking. This study investigated whether static face image primes that have become target-voice associated (i.e., facial images linked through associative learning with voices reciting the target speech) can be used by listeners to unmask speech. The results showed that in 32 normal-hearing younger adults, temporally prepresenting a voice-priming sentence with the same voice reciting the target sentence significantly improved the recognition of target speech that was masked by irrelevant two-talker speech. When a person's face photograph image became associated with the voice reciting the target speech by learning, temporally prepresenting the target-voice-associated face image significantly improved recognition of target speech against speech masking, particularly for the last two keywords in the target sentence. Moreover, speech-recognition performance under the voice-priming condition was significantly correlated to that under the face-priming condition. The results suggest that learned facial information on talker identity plays an important role in identifying the target-talker's voice and facilitating selective attention to the target-speech stream against the masking-speech stream. © 2014 The Institute of Psychology, Chinese Academy of Sciences and Wiley Publishing Asia Pty Ltd.
Jungheim, M; Miller, S; Kühn, D; Ptok, M
In order to acquire language, children require speech input. The prosody of the speech input plays an important role. In most cultures adults modify their code when communicating with children. Compared to normal speech this code differs especially with regard to prosody. For this review a selective literature search in PubMed and Scopus was performed. Prosodic characteristics are a key feature of spoken language. By analysing prosodic features, children gain knowledge about underlying grammatical structures. Child-directed speech (CDS) is modified in a way that meaningful sequences are highlighted acoustically so that important information can be extracted from the continuous speech flow more easily. CDS is said to enhance the representation of linguistic signs. Taking into consideration what has previously been described in the literature regarding the perception of suprasegmentals, CDS seems to be able to support language acquisition due to the correspondence of prosodic and syntactic units. However, no findings have been reported, stating that the linguistically reduced CDS could hinder first language acquisition.
Mowlaee, Pejman; Christensen, Mads Græsbøll; Jensen, Søren Holdt
In this paper we present a new approach for binary and soft masks used in single-channel speech separation. We present a novel approach called the sinusoidal mask (binary mask and Wiener filter) in a sinusoidal space. Theoretical analysis is presented for the proposed method, and we show...... that the proposed method is able to minimize the target speech distortion while suppressing the crosstalk to a predetermined threshold. It is observed that compared to the STFTbased masks, the proposed sinusoidal masks improve the separation performance in terms of objective measures (SSNR and PESQ) and are mostly...
Marschik, Peter B; Vollmann, Ralf; Bartl-Pokorny, Katrin D; Green, Vanessa A; van der Meer, Larah; Wolin, Thomas; Einspieler, Christa
We assessed various aspects of speech-language and communicative functions of an individual with the preserved speech variant of Rett syndrome (RTT) to describe her developmental profile over a period of 11 years. For this study, we incorporated the following data resources and methods to assess speech-language and communicative functions during pre-, peri- and post-regressional development: retrospective video analyses, medical history data, parental checklists and diaries, standardized tests on vocabulary and grammar, spontaneous speech samples and picture stories to elicit narrative competences. Despite achieving speech-language milestones, atypical behaviours were present at all times. We observed a unique developmental speech-language trajectory (including the RTT typical regression) affecting all linguistic and socio-communicative sub-domains in the receptive as well as the expressive modality. Future research should take into consideration a potentially considerable discordance between formal and functional language use by interpreting communicative acts on a more cautionary note.
Nie, Yingjiu; Galvin, John J; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie
This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.
Babikian, Sarkis; Emerson, Lyndal; Wynn, Gary H
A 22-year-old active duty E1 Nepalese male who recently emigrated from Nepal suddenly exhibited strange behaviors and mutism during Advanced Individual Training. After receiving care from a hospital near his unit, he was transferred to Walter Reed Army Medical Center Inpatient Psychiatry for further evaluation and treatment. Although he was admitted with a diagnosis of psychosis not otherwise specified (NOS), after consideration of cultural factors and by ruling out concurrent thought disorder, a diagnosis of selective mutism was made. To our knowledge this is the first reported case of selective mutism in a soldier. This case serves as a reminder of the need for cultural awareness during psychological evaluation, diagnosis, and treatment of patients.
McCarthy, Rosaleen A; Warrington, Elizabeth K
We summarize the main findings and conclusions of Warrington's (1975) paper, The Selective Impairment of Semantic memory, a neuropsychological paper that described three cases with degenerative neurological conditions [Warrington, E. K. (1975). The selective impairment of semantic memory. The Quarterly Journal of Experimental Psychology, 27, 635-657]. We consider the developments that have followed from its publication and give a selective overview of the field in 2014. The initial impact of the paper was on neuropsychological investigations of semantic loss followed some 14 years later by the identification of Semantic Dementia (the condition shown by the original cases) as a distinctive form of degenerative disease with unique clinical and pathological characteristics. We discuss the distinction between disorders of semantic storage and refractory semantic access, the evidence for category- and modality-specific impairments of semantics, and the light that has been shed on the structure and organization of semantic memory. Finally we consider the relationship between semantic memory and the skills of reading and writing, phonological processing, and autobiographical memory.
Full Text Available Recently, much attention has been given to Stochastic demand due to uncertainty in the real -world. In the literature, decision-making models and suppliers' selection do not often consider inventory management as part of shopping problems. On the other hand, the environmental sustainability of a supply chain depends on the shopping strategy of the supply chain members. The supplier selection plays an important role in the green chain. In this paper, a multi-objective nonlinear integer programming model for selecting a set of supplier considering Stochastic demand is proposed. while the cost of purchasing include the total cost, holding and stock out costs, rejected units, units have been delivered sooner, and total green house gas emissions are minimized, while the obtained total score from the supplier assessment process is maximized. It is assumed, the purchaser provides the different products from the number predetermined supplier to a with Stochastic demand and the uniform probability distribution function. The product price depends on the order quantity for each product line is intended. Multi-objective models using known methods, such as Lp-metric has become an objective function and then uses genetic algorithms and simulated annealing meta-heuristic is solved.
Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan
A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.
Christiner, Markus; Reiterer, Susanne M
In previous research on speech imitation, musicality, and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Forty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64% of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66% of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi) could be explained by working memory together with a singer's sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and auditory memory with singing fitting better into the category of "speech" on the productive level and "music" on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. (1) Motor flexibility and the ability to sing improve language and musical function. (2) Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. (3) The ability to sing improves the memory span of the auditory working memory.
Full Text Available In previous research on speech imitation, musicality and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Fourty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64 % of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66 % of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi could be explained by working memory together with a singer’s sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and sound memory with singing fitting better into the category of "speech" on the productive level and "music" on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. 1. Motor flexibility and the ability to sing improve language and musical function. 2. Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. 3. The ability to sing improves the memory span of the auditory short term memory.
Lewis, James R
Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application
Jiang, Yi-jiao; Chen, Hou-jin; Li, Ju-peng; Zhang, Zhan-song
Aiming at secure analog speech communication, a homology sound-based algorithm for speech signal interference is proposed in this paper. We first split speech signal into phonetic fragments by a short-term energy method and establish an interference noise cache library with the phonetic fragments. Then we implement the homology sound interference by mixing the randomly selected interferential fragments and the original speech in real time. The computer simulation results indicated that the interference produced by this algorithm has advantages of real time, randomness, and high correlation with the original signal, comparing with the traditional noise interference methods such as white noise interference. After further studies, the proposed algorithm may be readily used in secure speech communication.
Shi, Liming; Nielsen, Jesper Kjær; Jensen, Jesper Rindom
The modeling of speech can be used for speech synthesis and speech recognition. We present a speech analysis method based on pole-zero modeling of speech with mixed block sparse and Gaussian excitation. By using a pole-zero model, instead of the all-pole model, a better spectral fitting can...... be expected. Moreover, motivated by the block sparse glottal flow excitation during voiced speech and the white noise excitation for unvoiced speech, we model the excitation sequence as a combination of block sparse signals and white noise. A variational EM (VEM) method is proposed for estimating...... in reconstructing of the block sparse excitation....
Full Text Available Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a...
Full Text Available Objective: Gestures of the hands and arms have long been observed to accompany speech in spontaneous conversation. However, the way in which these two modes of expression are related in production is not yet fully understood. So, the present study aims to investigate the spontaneous gestures that accompany speech in adults who stutter in comparison to fluent controls. Materials & Methods: In this cross-sectional and comparative research, ten adults who stutter were selected randomly from speech and language pathology clinics and compared with ten healthy persons as control group who were matched with stutterers according to sex, age and education. The cartoon story-retelling task used to elicit spontaneous gestures that accompany speech. Participants were asked to watch the animation carefully and then retell the storyline in as much detail as possible to a listener sitting across from him or her and his or her narration was video recorded simultaneously. Then recorded utterances and gestures were analyzed. The statistical methods such as Kolmogorov- Smirnov and Independent t-test were used for data analyzing. Results: The results indicated that stutterers in comparison to controls in average use fewer iconic gestures in their narration (P=0.005. Also, stutterers in comparison to controls in average use fewer iconic gestures per each utterance and word (P=0.019. Furthermore, the execution of gesture production during moments of dysfluency revealed that more than 70% of the gestures produced with stuttering were frozen or abandoned at the moment of dysfluency. Conclusion: It seems gesture and speech have such an intricate and deep association that show similar frequency and timing patterns and move completely parallel to each other in such a way that deficit in speech results in deficiency in hand gesture.
Full Text Available Presidential election speeches, as one significant part of western political life, deserve people’s attention. This paper focuses on the use of interpersonal meaning in political speeches. The nine texts selected from the Internet are analyzed from the perspectives of mood, modality, personal pronoun and tense system based on the theory of Halliday’s Systemic Functional Grammar. It aims to study the way how interpersonal meaning is realized through language by making the contrastive analysis of the speeches given by Hillary and Trump. After making a minute analysis, the paper comes to the following conclusions: (1 As for mood, Trump and Hillary mainly employ the declarative to deliver messages and make statements, and imperative is used to motivate the audiences and narrow the gap between the candidates and the audiences, and interrogative is to make the audiences concentrate on the content of the speeches. (2 With respect to the modality system, the median modal operator holds the dominant position in both Trump’s and Hillary’s speeches to make the speeches less aggressive. In this aspect, Trump does better than Hillary. (3 In regard to personal pronoun, the plural form of first personal pronoun is mainly employed by the two candidates to close the relationship with audiences. (4 Regards to tense system, simple present tense are mostly used to establish the intimacy of the audiences and the candidates. Then two influential factors are discussed. One is their personal background and the other is their language levels. This paper is helpful for people to deeply understand the two candidates’ language differences.
The purpose of this study is to examine speech intelligibility of children with primary speech and sound disorders aged 3 to 6 years in everyday life. The research problem is based on the degree to which parents or guardians, immediate family members (sister, brother, grandparents), extended family members (aunt, uncle, cousin), child's friends, other acquaintances, child's teachers and strangers understand the speech of children with speech sound disorders. We examined whether the level ...
Full Text Available A system for bandwidth extension of telephone speech, aided by data embedding, is presented. The proposed system uses the transmitted analog narrowband speech signal as a carrier of the side information needed to carry out the bandwidth extension. The upper band of the wideband speech is reconstructed at the receiving end from two components: a synthetic wideband excitation signal, generated from the narrowband telephone speech and a wideband spectral envelope, parametrically represented and transmitted as embedded data in the telephone speech. We propose a novel data embedding scheme, in which the scalar Costa scheme is combined with an auditory masking model allowing high rate transparent embedding, while maintaining a low bit error rate. The signal is transformed to the frequency domain via the discrete Hartley transform (DHT and is partitioned into subbands. Data is embedded in an adaptively chosen subset of subbands by modifying the DHT coefficients. In our simulations, high quality wideband speech was obtained from speech transmitted over a telephone line (characterized by spectral magnitude distortion, dispersion, and noise, in which side information data is transparently embedded at the rate of 600 information bits/second and with a bit error rate of approximately . In a listening test, the reconstructed wideband speech was preferred (at different degrees over conventional telephone speech in of the test utterances.
Full Text Available A system for bandwidth extension of telephone speech, aided by data embedding, is presented. The proposed system uses the transmitted analog narrowband speech signal as a carrier of the side information needed to carry out the bandwidth extension. The upper band of the wideband speech is reconstructed at the receiving end from two components: a synthetic wideband excitation signal, generated from the narrowband telephone speech and a wideband spectral envelope, parametrically represented and transmitted as embedded data in the telephone speech. We propose a novel data embedding scheme, in which the scalar Costa scheme is combined with an auditory masking model allowing high rate transparent embedding, while maintaining a low bit error rate. The signal is transformed to the frequency domain via the discrete Hartley transform (DHT and is partitioned into subbands. Data is embedded in an adaptively chosen subset of subbands by modifying the DHT coefficients. In our simulations, high quality wideband speech was obtained from speech transmitted over a telephone line (characterized by spectral magnitude distortion, dispersion, and noise, in which side information data is transparently embedded at the rate of 600 information bits/second and with a bit error rate of approximately 3⋅10−4. In a listening test, the reconstructed wideband speech was preferred (at different degrees over conventional telephone speech in 92.5% of the test utterances.
Full Text Available Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another's mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.
Zaka Al Farisi
Full Text Available Abstract: Iltifat (shifting speech act is distinctive and considered unique style of Arabic. It has potential errors when it is translated into Indonesian. Therefore, translation of iltifat speech act into another language can be an important issue. The objective of the study is to know translation procedures/techniques and ideology required in dealing with iltifat speech act. This research is directed at translation as a cognitive product of a translator. The data used in the present study were the corpus of Koranic verses that contain iltifat speech act along with their translation. Data analysis typically used descriptive-evaluative method with content analysis model. The data source of this research consisted of the Koran and its translation. The purposive sampling technique was employed, with the sample of the iltifat speech act contained in the Koran. The results showed that more than 60% of iltifat speech act were translated by using literal procedure. The significant number of literal translation of the verses asserts that the Ministry of Religious Affairs tended to use literal method of translation. In other words, the Koran translation made by the Ministry of Religious Affairs tended to be oriented to the source language in dealing with iltifat speech act. The number of the literal procedure used shows a tendency of foreignization ideology. Transitional pronouns contained in the iltifat speech act can be clearly translated when thick translations were used in the form of description in parentheses. In this case, explanation can be a choice in translating iltifat speech act.
Gustavo Fernandes Rodrigues
Full Text Available In this paper we present an insight into the use of spectral masking techniques in time-frequency domain, as a preprocessing step for the speech signal recognition. Speech recognition systems have their performance negatively affected in noisy environments or in the presence of other speech signals. The limits of these masking techniques for different levels of the signal-to-noise ratio are discussed. We show the robustness of the spectral masking techniques against four types of noise: white, pink, brown and human speech noise (bubble noise. The main contribution of this work is to analyze the performance limits of recognition systems using spectral masking. We obtain an increase of 18% on the speech hit rate, when the speech signals were corrupted by other speech signals or bubble noise, with different signal-to-noise ratio of approximately 1, 10 and 20 dB. On the other hand, applying the ideal binary masks to mixtures corrupted by white, pink and brown noise, results an average growth of 9% on the speech hit rate, with the same different signal-to-noise ratio. The experimental results suggest that the masking spectral techniques are more suitable for the case when it is applied a bubble noise, which is produced by human speech, than for the case of applying white, pink and brown noise.
Ramos-Estebanez, Ciro; Gokhale, Sankalp; Goddeau, Richard; Kumar, Sandeep
A 36-year-old healthy man presented with sudden onset speech difficulty. Thorough clinical examination revealed interesting deficits suggestive of apraxia of speech. He was found to have an infarct in his frontal region explaining the deficits. We have undertaken clinical evaluation and differential diagnoses of this condition. Copyright © 2012 Elsevier Ltd. All rights reserved.
Ogar, Jennifer; Willock, Sharon; Baldo, Juliana; Wilkins, David; Ludy, Carl; Dronkers, Nina
In a previous study (Dronkers, 1996), stroke patients identified as having apraxia of speech (AOS), an articulatory disorder, were found to have damage to the left superior precentral gyrus of the insula (SPGI). The present study sought (1) to characterize the performance of patients with AOS on a classic motor speech evaluation, and (2) to…
Moriarty, Brigid C.; Gillon, Gail T.
Aims: To investigate the effectiveness of an integrated phonological awareness intervention to improve the speech production, phonological awareness and printed word decoding skills for three children with childhood apraxia of speech (CAS) aged 7;3, 6;3 and 6;10. The three children presented with severely delayed phonological awareness skills…
Mekonnen, Abebayehu Messele
This article presents a case study of speech production in a 14-year-old Amharic-speaking boy. The boy had developed secondary macroglossia, related to a disturbance of growth hormones, following a history of normal speech development. Perceptual analysis combined with acoustic analysis and static palatography is used to investigate the specific…
Petridis, Stavros; Pantic, Maja
Deep bottleneck features (DBNFs) have been used successfully in the past for acoustic speech recognition from audio. However, research on extracting DBNFs for visual speech recognition is very limited. In this work, we present an approach to extract deep bottleneck visual features based on deep
In this article, the author presents the history of human-to-computer interaction based upon the design of sophisticated computerized speech recognition algorithms. Advancements such as the arrival of cloud-based computing and software like Google's Web Speech API allows anyone with an Internet connection and Chrome browser to take advantage of…
Darweesh, Abbas Deygan; Mehdi, Wafaa Sahib
The present paper investigates the performance of the Iraqi students for the speech act of correction and how it is realized with status unequal. It attempts to achieve the following aims: (1) Setting out the felicity conditions for the speech act of correction in terms of Searle conditions; (2) Identifying the semantic formulas that realize the…
Petridis, Stavros; Asghar, Ali; Pantic, Maja
In this study, a system that discriminates laughter from speech by modelling the relationship between audio and visual features is presented. The underlying assumption is that this relationship is different between speech and laughter. Neural networks are trained which learn the audio-to-visual and
Kember, Heather; Connaghan, Kathryn; Patel, Rupal
Although tongue twisters have been widely use to study speech production in healthy speakers, few studies have employed this methodology for individuals with speech impairment. The present study compared tongue twister errors produced by adults with dysarthria and age-matched healthy controls. Eight speakers (four female, four male; mean age =…
In this thesis, research on large vocabulary continuous speech recognition for unknown audio conditions is presented. For automatic speech recognition systems based on statistical methods, it is important that the conditions of the audio used for training the statistical models match the conditions
Williams, Justin H. G.; Massaro, Dominic W.; Peel, Natalie J.; Bosseler, Alexis; Suddendorf, Thomas
Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional "mirror neuron" systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a "virtual" head (Baldi), delivered speech stimuli for…
Lewis, Barbara A.; Avrich, Allison A.; Freebairn, Lisa A.; Taylor, H. Gerry; Iyengar, Sudha K.; Stein, Catherine M.
Purpose: The present study examined associations of 5 endophenotypes (i.e., measurable skills that are closely associated with speech sound disorders and are useful in detecting genetic influences on speech sound production), oral motor skills, phonological memory, phonological awareness, vocabulary, and speeded naming, with 3 clinical criteria…
Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders provides a survey of methods designed to aid clinicians in the diagnosis and monitoring of speech disorders such as dysarthria and dyspraxia, with an emphasis on the signal processing techniques, statistical validity of the results presented in the literature, and the appropriateness of methods that do not require specialized equipment, rigorously controlled recording procedures or highly skilled personnel to interpret results. Such techniques offer the promise of a simple and cost-effective, yet objective, assessment of a range of medical conditions, which would be of great value to clinicians. The ideal scenario would begin with the collection of examples of the clients’ speech, either over the phone or using portable recording devices operated by non-specialist nursing staff. The recordings could then be analyzed initially to aid diagnosis of conditions, and subsequently to monitor the clients’ progress and res...
Full Text Available Objective: Stuttering is a fairly common speech disorder. However, the etiology is poorly understood and is likely to be heterogeneous. The aim of this research is to investigate phonological encoding complexity on speech fluency in 6-9 year old stuttering children in comparison with non-stutterers in Tehran. Materials & Methods: This cross-sectional, descriptive analytic research was done on 18 stuttering children with profound and severe level and 18 non-stuttering children. The stuttering subjects were selected by convenience and normal subjects were matched to stuttering subjects by gender, age and geographics. A non-word test comprising 87 non-words was used to investigate phonological encoding and phonological complexity effects on speech fluency. Stimuli were presented in random order with approximately 5 seconds between items, using a computer via external Toshiba SOMIC SM-818 headphone and requested subject was asked to repeat them. Results: The results indicated that speech fluency decreased significantly (P<0.05 by increasing phonological complexity comparing to controls. Conclusion: The findings of the present research seem to suggest that, stuttering children may have deficits in phonological encoding. The deficit has been increased with phonological encoding complexity. Based on covert repair hypothesis, phonological difficulty may cause covert self- repair and leads to different patterns of stuttering.
Andersen, Tobias; Tiippana, K.; Laarni, J.
Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre-attentive b......Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre...... from each of the faces and from the voice on the auditory speech percept. We found that directing visual spatial attention towards a face increased the influence of that face on auditory perception. However, the influence of the voice on auditory perception did not change suggesting that audiovisual...... integration did not change. Visual spatial attention was also able to select between the faces when lip reading. This suggests that visual spatial attention acts at the level of visual speech perception prior to audiovisual integration and that the effect propagates through audiovisual integration...
Cocks, Naomi; Byrne, Suzanne; Pritchard, Madeleine; Morgan, Gary; Dipper, Lucy
Information from speech and gesture is often integrated to comprehend a message. This integration process requires the appropriate allocation of cognitive resources to both the gesture and speech modalities. People with aphasia are likely to find integration of gesture and speech difficult. This is due to a reduction in cognitive resources, a difficulty with resource allocation or a combination of the two. Despite it being likely that people who have aphasia will have difficulty with integration, empirical evidence describing this difficulty is limited. Such a difficulty was found in a single case study by Cocks et al. in 2009, and is replicated here with a greater number of participants. To determine whether individuals with aphasia have difficulties understanding messages in which they have to integrate speech and gesture. Thirty-one participants with aphasia (PWA) and 30 control participants watched videos of an actor communicating a message in three different conditions: verbal only, gesture only, and verbal and gesture message combined. The message related to an action in which the name of the action (e.g., 'eat') was provided verbally and the manner of the action (e.g., hands in a position as though eating a burger) was provided gesturally. Participants then selected a picture that 'best matched' the message conveyed from a choice of four pictures which represented a gesture match only (G match), a verbal match only (V match), an integrated verbal-gesture match (Target) and an unrelated foil (UR). To determine the gain that participants obtained from integrating gesture and speech, a measure of multimodal gain (MMG) was calculated. The PWA were less able to integrate gesture and speech than the control participants and had significantly lower MMG scores. When the PWA had difficulty integrating, they more frequently selected the verbal match. The findings suggest that people with aphasia can have difficulty integrating speech and gesture in order to obtain
When talker and listener are face-to-face, visual speech cues become an important part of the communication environment, and yet, these cues are seldom considered when designing hearing aids. Models of auditory-visual speech recognition highlight the importance of complementary versus redundant speech information for predicting auditory-visual recognition performance. Thus, for hearing aids to work optimally when visual speech cues are present, it is important to know whether the cues provided by amplification and the cues provided by speechreading complement each other. In this talk, data will be reviewed that show nonmonotonicity between auditory-alone speech recognition and auditory-visual speech recognition, suggesting that efforts designed solely to improve auditory-alone recognition may not always result in improved auditory-visual recognition. Data will also be presented showing that one of the most important speech cues for enhancing auditory-visual speech recognition performance, voicing, is often the cue that benefits least from amplification.
N. N. Zavadenko
Full Text Available The article describes the main clinical forms and causes of speech delay in children. It presents modern data on the role of neurobiological factors in the speech delay pathogenesis, including early organic damage to the central nervous system due to the pregnancy and childbirth pathology, as well as genetic mechanisms. For early and accurate diagnosis of speech disorders in children, you need to consider normal patterns of speech development. The article presents indicators of pre-speech and speech development in children and describes the screening method for determining the speech delay. The main areas of complex correction are speech therapy, psycho-pedagogical and psychotherapeutic assistance, as well as pharmaceutical treatment. The capabilities of drug therapy for dysphasia (alalia are shown.
Laganaro, Marina; Croisier, Michèle; Bagou, Odile; Assal, Frédéric
We present a 3-year follow-up study of a patient with progressive apraxia of speech (PAoS), aimed at investigating whether the theoretical organization of phonetic encoding is reflected in the progressive disruption of speech. As decreased speech rate was the most striking pattern of disruption during the first 2 years, durational analyses were carried out longitudinally on syllables excised from spontaneous, repetition and reading speech samples. The crucial result of the present study is the demonstration of an effect of syllable frequency on duration: the progressive disruption of articulation rate did not affect all syllables in the same way, but followed a gradient that was function of the frequency of use of syllable-sized motor programs. The combination of data from this case of PAoS with previous psycholinguistic and neurolinguistic data, points to a frequency organization of syllable-sized speech-motor plans. In this study we also illustrate how studying PAoS can be exploited in theoretical and clinical investigations of phonetic encoding as it represents a unique opportunity to investigate speech while it progressively disrupts. Copyright © 2011 Elsevier Srl. All rights reserved.
Shriberg, Lawrence D.; Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.
Purpose: Three previous articles provided rationale, methods, and several forms of validity support for a diagnostic marker of childhood apraxia of speech (CAS), termed the pause marker (PM). Goals of the present article were to assess the validity and stability of the PM Index (PMI) to scale CAS severity. Method: PM scores and speech, prosody,…
Zekveld, Adriana A; Kramer, Sophia E; Kessens, Judith M; Vlaming, Marcel S M G; Houtgast, Tammo
The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that speech comprehension in noise by young listeners with normal hearing improves when presenting partly incorrect, automatically generated subtitles. The current study focused on the effects of age, hearing loss, visual working memory capacity, and linguistic skills on the benefit obtained from automatically generated subtitles during listening to speech in noise. In order to investigate the effects of age and hearing loss, three groups of participants were included: 22 young persons with normal hearing (YNH, mean age = 21 years), 22 middle-aged adults with normal hearing (MA-NH, mean age = 55 years) and 30 middle-aged adults with hearing impairment (MA-HI, mean age = 57 years). The benefit from automatic subtitling was measured by Speech Reception Threshold (SRT) tests (Plomp & Mimpen, 1979). Both unimodal auditory and bimodal audiovisual SRT tests were performed. In the audiovisual tests, the subtitles were presented simultaneously with the speech, whereas in the auditory test, only speech was presented. The difference between the auditory and audiovisual SRT was defined as the audiovisual benefit. Participants additionally rated the listening effort. We examined the influences of ASR accuracy level and text delay on the audiovisual benefit and the listening effort using a repeated measures General Linear Model analysis. In a correlation analysis, we evaluated the relationships between age, auditory SRT, visual working memory capacity and the audiovisual benefit and listening effort. The automatically generated subtitles improved speech comprehension in noise for all ASR accuracies and delays covered by the current study. Higher ASR accuracy levels resulted in more benefit obtained
Proper speech functioning in human being, depends on the precise coordination and timing balances in a series of complex neuro nuscular movements and actions. Starting from the prime organ of energy source of expelled air from respirato y system; deliver such air to trigger vocal cords; swift changes of this phonatory episode to a comprehensible sound in RESONACE and final coordination of all head and neck structures to elicit final speech in ...
The paper contains a transcript of a speech by the chairman of the UKAEA, to mark the publication of the 1985/6 annual report. The topics discussed in the speech include: the Chernobyl accident and its effect on public attitudes to nuclear power, management and disposal of radioactive waste, the operation of UKAEA as a trading fund, and the UKAEA development programmes. The development programmes include work on the following: fast reactor technology, thermal reactors, reactor safety, health and safety aspects of water cooled reactors, the Joint European Torus, and under-lying research. (U.K.)
Many children with Down syndrome have difficulty with speech intelligibility. The present study used a parent survey to learn more about a specific factor that affects speech intelligibility, i.e. childhood verbal apraxia. One of the factors that affects speech intelligibility for children with Down syndrome is difficulty with voluntarily…
Haley, Katarina L.; Jacks, Adam; de Riesthal, Michael; Abou-Khalil, Rima; Roth, Heidi L.
Purpose: We explored the reliability and validity of 2 quantitative approaches to document presence and severity of speech properties associated with apraxia of speech (AOS). Method: A motor speech evaluation was administered to 39 individuals with aphasia. Audio-recordings of the evaluation were presented to 3 experienced clinicians to determine…
Shi, Liming; Jensen, Jesper Rindom; Christensen, Mads Græsbøll
In this paper, we present a speech analysis method based on sparse pole-zero modeling of speech. Instead of using the all-pole model to approximate the speech production filter, a pole-zero model is used for the combined effect of the vocal tract; radiation at the lips and the glottal pulse shape...
Terband, H.R.; Maassen, B.A.M.; Guenther, F.H.; Brumberg, J.
Background/Purpose: Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between
Terband, H.; Maassen, B.; Guenther, F. H.; Brumberg, J.
BACKGROUND/PURPOSE: Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between
Mani, Nivedita; Pätzold, Wiebke
One of the first challenges facing the young language learner is the task of segmenting words from a natural language speech stream, without prior knowledge of how these words sound. Studies with younger children find that children find it easier to segment words from fluent speech when the words are presented in infant-directed speech, i.e., the…
Relano-Iborra, Helia; May, Tobias; Zaar, Johannes
A powerful tool to investigate speech perception is the use of speech intelligibility prediction models. Recently, a model was presented, termed correlation-based speechbased envelope power spectrum model (sEPSMcorr) , based on the auditory processing of the multi-resolution speech-based Envel...
Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.
A companion paper describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). The SDCS uses perceptual and acoustic data reduction methods to obtain information on a speaker's speech, prosody, and voice. The present paper provides reliability estimates for…
Naturally produced English clear speech has been shown to be more intelligible than English conversational speech. However, little is known about the extent of the clear speech effects in the production of nonnative English, and perception of foreign-accented English by younger and older listeners. The present study examined whether Cantonese speakers would employ the same strategies as those used by native English speakers in producing clear speech in their second language. Also, the clear s...
Ahmad R. Abu-El-Quran
Full Text Available We introduce a multiengine speech processing system that can detect the location and the type of audio signal in variable noisy environments. This system detects the location of the audio source using a microphone array; the system examines the audio first, determines if it is speech/nonspeech, then estimates the value of the signal to noise (SNR using a Discrete-Valued SNR Estimator. Using this SNR value, instead of trying to adapt the speech signal to the speech processing system, we adapt the speech processing system to the surrounding environment of the captured speech signal. In this paper, we introduced the Discrete-Valued SNR Estimator and a multiengine classifier, using Multiengine Selection or Multiengine Weighted Fusion. Also we use the SI as example of the speech processing. The Discrete-Valued SNR Estimator achieves an accuracy of 98.4% in characterizing the environment's SNR. Compared to a conventional single engine SI system, the improvement in accuracy was as high as 9.0% and 10.0% for the Multiengine Selection and Multiengine Weighted Fusion, respectively.
Johnson, Tracey Jean
This study was an examination of participants' preference for classical music excerpts presented in differentiated types of music video formats. Participants (N = 83) were volunteer students enrolled in intact music appreciation classes at a suburban community college located in a Midwestern city. Participants listened to and viewed music video…
Willadsen, Elisabeth; Henningsson, Gunilla
. Finally, the influence of different languages on some aspects of language acquisition in young children with cleft palate is presented and discussed. Until recently, not much has been written about cross linguistic perspectives when dealing with cleft palate speech. Most literature about assessment......This chapter deals with cross linguistic perspectives that need to be taken into account when comparing speech assessment and speech outcome obtained from cleft palate speakers of different languages. Firstly, an overview of consonants and vowels vulnerable to the cleft condition is presented. Then......, consequences for assessment of cleft palate speech by native versus non-native speakers of a language are discussed, as well as the use of phonemic versus phonetic transcription in cross linguistic studies. Specific recommendations for the construction of speech samples in cross linguistic studies are given...
Hausen, Maija; Torppa, Ritva; Salmela, Viljami R; Vainio, Martti; Särkämö, Teppo
Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress).
Full Text Available Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61 using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress.
Hausen, Maija; Torppa, Ritva; Salmela, Viljami R.; Vainio, Martti; Särkämö, Teppo
Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress). PMID:24032022
Full Text Available Lying is a universal activity and the detection of lying a universal concern. Presently, there is great interest in determining objective measures of deception. The examination of speech, in particular, holds promise in this regard; yet, most of what we know about the relationship between speech and lying is based on the assessment of English-speaking participants. Few studies have examined indicators of deception in languages other than English. The world’s languages differ in significant ways, and cross-linguistic studies of deceptive communications are a research imperative. Here we review some of these differences amongst the world’s languages, and provide an overview of a number of recent studies demonstrating that cross-linguistic research is a worthwhile endeavour. In addition, we report the results of an empirical investigation of pitch, response latency, and speech rate as cues to deception in Italian speech. True and false opinions were elicited in an audio-taped interview. A within subjects analysis revealed no significant difference between the average pitch of the two conditions; however, speech rate was significantly slower, while response latency was longer, during deception compared with truth-telling. We explore the implications of these findings and propose directions for future research, with the aim of expanding the cross-linguistic branch of research on markers of deception.
Boesch, Miriam Chacon
The purpose of this comparative efficacy study was to investigate the Picture Exchange Communication System (PECS) and a speech-generating device (SGD) in developing requesting skills, social-communicative behavior, and speech for three elementary-age children with severe autism and little to no functional speech. Requesting was selected as the…
Tavakoli, Vincent Mohammad; Jensen, Jesper Rindom; Christensen, Mads Græsbøll
Speech enhancement is vital for improved listening practices. Ad hoc microphone arrays are promising assets for this purpose. Most well-established enhancement techniques with conventional arrays can be adapted into ad hoc scenarios. Despite recent efforts to introduce various ad hoc speech...... enhancement apparatus, a common framework for integration of conventional methods into this new scheme is still missing. This paper establishes such an abstraction based on inter and intra sub-array speech coherencies. Along with measures for signal quality at the input of sub-arrays, a measure of coherency...... is proposed both for sub-array selection in local enhancement approaches, and also for selecting a proper global reference when more than one sub-array are used. Proposed methods within this framework are evaluated with regard to quantitative and qualitative measures, including array gains, the speech...
Moulin-Frier, Clément; Arbib, Michael A
The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory.
Arweiler, Iris; Dau, Torsten; Poulsen, Torben
Speech intelligibility depends on many factors such as room acoustics, the acoustical properties and location of the signal and the interferers, and the ability of the (normal and impaired) auditory system to process monaural and binaural sounds. In the present study, the effect of reverberation...... on spatial release from masking was investigated in normal hearing and hearing impaired listeners using three types of interferers: speech shaped noise, an interfering female talker and speech-modulated noise. Speech reception thresholds (SRT) were obtained in three simulated environments: a listening room......, a classroom and a church. The data from the study provide constraints for existing models of speech intelligibility prediction (based on the speech intelligibility index, SII, or the speech transmission index, STI) which have shortcomings when reverberation and/or fluctuating noise affect speech...
Galina M. Shipitsina
Full Text Available The article deals with the semantic, pragmatic and structural features of words, phrases, dialogues motivation, in the contemporary Russian popular speech. These structural features are characterized by originality and unconventional use. Language material is the result of authors` direct observation of spontaneous verbal communication between people of different social and age groups. The words and remarks were analyzed in compliance with the communication system of national Russian language and cultural background of popular speech. Studies have discovered that in spoken discourse there are some other ways to increase the expression statement. It is important to note that spontaneous speech identifies lacunae in the nominative language and its vocabulary system. It is proved, prefixation is also effective and regular way of the same action presenting. The most typical forms, ways and means to update language resources as a result of the linguistic creativity of native speakers were identified.
The welcoming speech underlines the fact that any validation process starting with calculation methods and ending with studies on the long-term behaviour of a repository system can only be effected through laboratory, field and natural-analogue studies. The use of natural analogues (NA) is to secure the biosphere and to verify whether this safety really exists. (HP) [de
Kane, Peter E., Ed.
The seven articles in this collection deal with theoretical and practical freedom of speech issues. Topics covered are: the United States Supreme Court, motion picture censorship, and the color line; judicial decision making; the established scientific community's suppression of the ideas of Immanuel Velikovsky; the problems of avant-garde jazz,…
Full Text Available Learning a foreign language requires students to acquire both grammatical knowledge and socio-pragmatic rules of a language. Pragmatic competence as one of the most difficult aspects of language provides several challenges to L2 learners in the process of learning a foreign language. To overcome this problem, EFL teachers should find the most effective way of teaching pragmatic knowledge to their students. Accordingly, the present study investigated the effect of explicit teaching of apology speech act, as an aspect of pragmatic competence, on the Iranian EFL learners’ appropriate use of the mentioned speech act. In so doing, a total of 73 EFL students at intermediate and advanced levels participated in a pre-posttest design research with experimental and control group. Data were collected using a Discourse Completion Test (DCT. The selection of apologetic situations in DCT was based on two variables of social status and social distance. The results revealed that explicit instruction was a facilitative tool that helped students use the proper apology strategies in different situations. Moreover, it was found that L2 proficiency had a significant influence on overall appropriateness of speech act production. Keywords: Explicit instruction; Apology speech act; Pragmatic competence; Iranian EFL learners
Lee, Pyoung Jik; Jeon, Jin Yong
In the present study, the effects of interference from combined noises on speech transmission were investigated in a simulated open public space. Sound fields for dominant noises were predicted using a typical urban square model surrounded by buildings. Then road traffic noise and two types of construction noises, corresponding to stationary and impulsive noises, were selected as background noises. Listening tests were performed on a group of adults, and the quality of speech transmission was evaluated using listening difficulty as well as intelligibility scores. During the listening tests, two factors that affect speech transmission performance were considered: (1) temporal characteristics of construction noise (stationary or impulsive) and (2) the levels of the construction and road traffic noises. The results indicated that word intelligibility scores and listening difficulty ratings were affected by the temporal characteristics of construction noise due to fluctuations in the background noise level. It was also observed that listening difficulty is unable to describe the speech transmission in noisy open public spaces showing larger variation than did word intelligibility scores. © 2011 Acoustical Society of America
Lam, Choi Ling Coriolanus
of the reverberation time, the indoor ambient noise (or background noise level), the signal-to-noise ratio, and the speech transmission index, it aims to establish a guideline for improving the speech intelligibility in classrooms for any countries and any environmental conditions. The study showed that the acoustical conditions of most of the measured classrooms in Hong Kong are unsatisfactory. The selection of materials inside a classroom is important for improving speech intelligibility at design stage, especially the acoustics ceiling, to shorten the reverberation time inside the classroom. The signal-to-noise should be higher than 11dB(A) for over 70% of speech perception, either tonal or non-tonal languages, without the usage of address system. The unexpected results bring out a call to revise the standard design and to devise acceptable standards for classrooms in Hong Kong. It is also demonstrated a method for assessment on the classroom in other cities with similar environmental conditions.
Christiner, Markus; Reiterer, Susanne M.
In previous research on speech imitation, musicality, and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Forty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64% of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66% of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi) could be explained by working memory together with a singer's sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and auditory memory with singing fitting better into the category of “speech” on the productive level and “music” on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. (1) Motor flexibility and the ability to sing improve language and musical function. (2) Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. (3) The ability to sing improves the memory span of the auditory working memory. PMID:24319438
Full Text Available The Nobel Peace Prize has long been considered the premier peace prize in the world. According to Geir Lundestad, Secretary of the Nobel Committee, of the 300 some peace prizes awarded worldwide, “none is in any way as well known and as highly respected as the Nobel Peace Prize” (Lundestad, 2001. Nobel peace speech is a unique and significant international site of public discourse committed to articulating the universal grammar of peace. Spanning over 100 years of sociopolitical history on the world stage, Nobel Peace Laureates richly represent an important cross-section of domestic and international issues increasingly germane to many publics. Communication scholars’ interest in this rhetorical genre has increased in the past decade. Yet, the norm has been to analyze a single speech artifact from a prestigious or controversial winner rather than examine the collection of speeches for generic commonalities of import. In this essay, we analyze the discourse of Nobel peace speech inductively and argue that the organizing principle of the Nobel peace speech genre is the repetitive form of normative liberal principles and values that function as rhetorical topoi. These topoi include freedom and justice and appeal to the inviolable, inborn right of human beings to exercise certain political and civil liberties and the expectation of equality of protection from totalitarian and tyrannical abuses. The significance of this essay to contemporary communication theory is to expand our theoretical understanding of rhetoric’s role in the maintenance and development of an international and cross-cultural vocabulary for the grammar of peace.
Full Text Available Adults who stutter (AWS have demonstrated atypical coordination of motor and sensory regions during speech production. Yet little is known of the speech-motor network in AWS in the brief time window preceding audible speech onset. The purpose of the current study was to characterize neural oscillations in the speech-motor network during preparation for and execution of overt speech production in AWS using magnetoencephalography (MEG. Twelve AWS and twelve age-matched controls were presented with 220 words, each word embedded in a carrier phrase. Controls were presented with the same word list as their matched AWS participant. Neural oscillatory activity was localized using minimum-variance beamforming during two time periods of interest: speech preparation (prior to speech onset and speech execution (following speech onset. Compared to controls, AWS showed stronger beta (15-25Hz suppression in the speech preparation stage, followed by stronger beta synchronization in the bilateral mouth motor cortex. AWS also recruited the right mouth motor cortex significantly earlier in the speech preparation stage compared to controls. Exaggerated motor preparation is discussed in the context of reduced coordination in the speech-motor network of AWS. It is further proposed that exaggerated beta synchronization may reflect a more strongly inhibited motor system that requires a stronger beta suppression to disengage prior to speech initiation. These novel findings highlight critical differences in the speech-motor network of AWS that occur prior to speech onset and emphasize the need to investigate further the speech-motor assembly in the stuttering population.
Bailly, G.; Theune, Mariet; Meijs, Koen; Campbell, N.; Hamza, W.; Heylen, Dirk K.J.; Ordelman, Roeland J.F.; Hoge, H.; Jianhua, T.
Work on expressive speech synthesis has long focused on the expression of basic emotions. In recent years, however, interest in other expressive styles has been increasing. The research presented in this paper aims at the generation of a storytelling speaking style, which is suitable for
Maas, Edwin; Robin, Donald A.; Wright, David L.; Ballard, Kirrie J.
Apraxia of Speech (AOS) is an impairment of motor programming. However, the exact nature of this deficit remains unclear. The present study examined motor programming in AOS in the context of a recent two-stage model [Klapp, S. T. (1995). Motor response programming during simple and choice reaction time: The role of practice. "Journal of…
Speech errors are generally said to accommodate to their new phonological context. This accommodation has been validated by several transcription studies. The transcription methodology is not the best choice for detecting errors at this level, however, as this type of error can be difficult to perceive. This paper presents an acoustic analysis of…
Petridis, Stavros; Pantic, Maja
Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audiovisual approach to distinguishing laughter from speech and we show that integrating the information from audio and video leads to an improved reliability of audiovisual approach in
Baykaner, K.; Hummersone, H.; Mason, R.
A listening test was conducted to investigate the acceptability of audio-on-audio interference for radio programs featuring speech as the target. Twenty-one subjects, including naïve and expert listeners, were presented with 200 randomly assigned pairs of stimuli and asked to report, for each trial...
Wong, Patrick C. M.; Uppunda, Ajith K.; Parrish, Todd B.; Dhar, Sumitrajit
Purpose: The present study examines the brain basis of listening to spoken words in noise, which is a ubiquitous characteristic of communication, with the focus on the dorsal auditory pathway. Method: English-speaking young adults identified single words in 3 listening conditions while their hemodynamic response was measured using fMRI: speech in…
Hayre, Harb S.
Speech correlates of alcohol/drug impairment and its neurological basis is presented with suggestion for further research in impairment from poly drug/medicine/inhalent/chew use/abuse, and prediagnosis of many neuro- and endocrin-related disorders. Nerve cells all over the body detect chemical entry by smoking, injection, drinking, chewing, or skin absorption, and transmit neurosignals to their corresponding cerebral subsystems, which in turn affect speech centers-Broca's and Wernick's area, and motor cortex. For instance, gustatory cells in the mouth, cranial and spinal nerve cells in the skin, and cilia/olfactory neurons in the nose are the intake sensing nerve cells. Alcohol depression, and brain cell damage were detected from telephone speech using IMPAIRLYZER-TM, and the results of these studies were presented at 1996 ASA meeting in Indianapolis, and 2001 German Acoustical Society-DEGA conference in Hamburg, Germany respectively. Speech based chemical Impairment measure results were presented at the 2001 meeting of ASA in Chicago. New data on neurotolerance based chemical impairment for alcohol, drugs, and medicine shall be presented, and shown not to fully support NIDA-SAMSHA drug and alcohol threshold used in drug testing domain.
Henrichsen, Peter Juel; Christiansen, Thomas Ulrich
In many branches of spoken language analysis including ASR, the set of smallest meaningful units of speech is taken to coincide with the set of phones or phonemes. However, fishing for phones is difficult, error-prone, and computationally expensive. We present an experiment, based on machine...
Jørgensen, Søren; Ewert, Stephan D.; Dau, Torsten
The speech-based envelope power spectrum model (sEPSM) presented by Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] estimates the envelope power signal-to-noise ratio (SNRenv) after modulation-frequency selective processing. Changes in this metric were shown to account well...... to conditions with stationary interferers, due to the long-term integration of the envelope power, and cannot account for increased intelligibility typically obtained with fluctuating maskers. Here, a multi-resolution version of the sEPSM is presented where the SNRenv is estimated in temporal segments...... with a modulation-filter dependent duration. The multi-resolution sEPSM is demonstrated to account for intelligibility obtained in conditions with stationary and fluctuating interferers, and noisy speech distorted by reverberation or spectral subtraction. The results support the hypothesis that the SNRenv...
According to the IAEA Code of Practice on the subject and also to numerous national standards, effective quality assurance (QA) for safety in nuclear power plants depends upon the application of a number of fundamental principles. One of these principles is that QA for systems, components and structures should be commensurate with the individual importance to safety of each item. Evidently, money spent on excessive QA may be partly or wholly wasted, while too little QA will provide insufficient confidence that an item will perform satisfactorily in service. To deal successfully with the requirement of 'importance to safety', a detailed methodology must be established, by means of which QA can be prescribed rationally and consistently. Set in the context of the Canadian nuclear power and nuclear standards programmes, two related methodologies which account for importance to safety as well as for some other specific factors have been developed and are in use. These related methodologies are applied to the manufacture and installation of safety-related items, and are based on the implementation of the fixed-step, graded standards of the Canadian Standards Association, CSA Z299. Information is presented on the main features of the methodologies and on Canadian nuclear power plant QA practice in general. (author)
Yunusova, Yana; Wang, Jun; Zinman, Lorne; Pattee, Gary L.; Berry, James D.; Perry, Bridget; Green, Jordan R.
Purpose To determine the mechanisms of speech intelligibility impairment due to neurologic impairments, intelligibility decline was modeled as a function of co-occurring changes in the articulatory, resonatory, phonatory, and respiratory subsystems. Method Sixty-six individuals diagnosed with amyotrophic lateral sclerosis (ALS) were studied longitudinally. The disease-related changes in articulatory, resonatory, phonatory, and respiratory subsystems were quantified using multiple instrumental measures, which were subjected to a principal component analysis and mixed effects models to derive a set of speech subsystem predictors. A stepwise approach was used to select the best set of subsystem predictors to model the overall decline in intelligibility. Results Intelligibility was modeled as a function of five predictors that corresponded to velocities of lip and jaw movements (articulatory), number of syllable repetitions in the alternating motion rate task (articulatory), nasal airflow (resonatory), maximum fundamental frequency (phonatory), and speech pauses (respiratory). The model accounted for 95.6% of the variance in intelligibility, among which the articulatory predictors showed the most substantial independent contribution (57.7%). Conclusion Articulatory impairments characterized by reduced velocities of lip and jaw movements and resonatory impairments characterized by increased nasal airflow served as the subsystem predictors of the longitudinal decline of speech intelligibility in ALS. Declines in maximum performance tasks such as the alternating motion rate preceded declines in intelligibility, thus serving as early predictors of bulbar dysfunction. Following the rapid decline in speech intelligibility, a precipitous decline in maximum performance tasks subsequently occurred. PMID:27148967
Hollien, Harry; Dejong, Gea; Martin, Camilo A.; Schwartz, Reva; Liljegren, Kristen
The effects of ingesting ethanol have been shown to be somewhat variable in humans. To date, there appear to be but few universals. Yet, the question often arises: is it possible to determine if a person is intoxicated by observing them in some manner? A closely related question is: can speech be used for this purpose and, if so, can the degree of intoxication be determined? One of the many issues associated with these questions involves the relationships between a person's paralinguistic characteristics and the presence and level of inebriation. To this end, young, healthy speakers of both sexes were carefully selected and sorted into roughly equal groups of light, moderate, and heavy drinkers. They were asked to produce four types of utterances during a learning phase, when sober and at four strictly controlled levels of intoxication (three ascending and one descending). The primary motor speech measures employed were speaking fundamental frequency, speech intensity, speaking rate and nonfluencies. Several statistically significant changes were found for increasing intoxication; the primary ones included rises in F0, in task duration and for nonfluencies. Minor gender differences were found but they lacked statistical significance. So did the small differences among the drinking category subgroups and the subject groupings related to levels of perceived intoxication. Finally, although it may be concluded that certain changes in speech suprasegmentals will occur as a function of increasing intoxication, these patterns cannot be viewed as universal since a few subjects (about 20%) exhibited no (or negative) changes.
Full Text Available This experimental phonetic research deals with the prosodies of directive speech in Javanese. The research procedures were: (1 speech production, (2 acoustic analysis, and (3 perception test. The data investigated are three directive utterances, in the form of statements, commands, and questions. The data were obtained by recording dialogues that present polite as well as impolite speech. Three acoustic experiments were conducted for statements, commands, and questions in directive speech: (1 modifications of duration, (2 modifications of contour, and (3 modifications of fundamental frequency. The result of the subsequent perception tests to 90 stimuli with 24 subjects were analysed statistically with ANOVA (Analysis of Variant. Based on this statistic analysis, the prosodic characteristics of polite and impolite speech were identified.
Chakroff, Aleksandr; Thomas, Kyle A; Haque, Omar S; Young, Liane
People often use indirect speech, for example, when trying to bribe a police officer by asking whether there might be "a way to take care of things without all the paperwork." Recent game theoretic accounts suggest that a speaker uses indirect speech to reduce public accountability for socially risky behaviors. The present studies examine a secondary function of indirect speech use: increasing the perceived moral permissibility of an action. Participants report that indirect speech is associated with reduced accountability for unethical behavior, as well as increased moral permissibility and increased likelihood of unethical behavior. Importantly, moral permissibility was a stronger mediator of the effect of indirect speech on likelihood of action, for judgments of one's own versus others' unethical action. In sum, the motorist who bribes the police officer with winks and nudges may not only avoid public punishment but also maintain the sense that his actions are morally permissible. Copyright © 2014 Cognitive Science Society, Inc.
Schill, Melissa T.; And Others
Assesses protocol for conducting a functional analysis of maintaining variables for children with selective mutism. A parent was trained in and later applied various behavior strategies designed to increase speech in an eight-year-old girl with selective mutism. Parent and child ratings of treatment were positive. Presents implications for future…
Alexandrou, Anna Maria; Saarinen, Timo; Kujala, Jan; Salmelin, Riitta
Human utterances demonstrate temporal patterning, also referred to as rhythm. While simple oromotor behaviors (e.g., chewing) feature a salient periodical structure, conversational speech displays a time-varying quasi-rhythmic pattern. Quantification of periodicity in speech is challenging. Unimodal spectral approaches have highlighted rhythmic aspects of speech. However, speech is a complex multimodal phenomenon that arises from the interplay of articulatory, respiratory, and vocal systems. The present study addressed the question of whether a multimodal spectral approach, in the form of coherence analysis between electromyographic (EMG) and acoustic signals, would allow one to characterize rhythm in natural speech more efficiently than a unimodal analysis. The main experimental task consisted of speech production at three speaking rates; a simple oromotor task served as control. The EMG-acoustic coherence emerged as a sensitive means of tracking speech rhythm, whereas spectral analysis of either EMG or acoustic amplitude envelope alone was less informative. Coherence metrics seem to distinguish and highlight rhythmic structure in natural speech.
Claudia Regina Furquim de Andrade
Full Text Available CONTEXT: The speech rate is one of the parameters considered when investigating speech fluency and is an important variable in the assessment of individuals with communication complaints. OBJECTIVE: To correlate the stuttering severity index with one of the indices used for assessing fluency/speech rate. DESIGN: Cross-sectional study. SETTING: Fluency and Fluency Disorders Investigation Laboratory, Faculdade de Medicina da Universidade de São Paulo. PARTICIPANTS: Seventy adults with stuttering diagnosis. MAIN MEASUREMENTS: A speech sample from each participant containing at least 200 fluent syllables was videotaped and analyzed according to a stuttering severity index test and speech rate parameters. RESULTS: The results obtained in this study indicate that the stuttering severity and the speech rate present significant variation, i.e., the more severe the stuttering is, the lower the speech rate in words and syllables per minute. DISCUSSION AND CONCLUSION: The results suggest that speech rate is an important indicator of fluency levels and should be incorporated in the assessment and treatment of stuttering. This study represents a first attempt to identify the possible subtypes of developmental stuttering. DEFINITION: Objective tests that quantify diseases are important in their diagnosis, treatment and prognosis.
Mitterer, Holger; Mattys, Sven L
Two experiments investigated the conditions under which cognitive load exerts an effect on the acuity of speech perception. These experiments extend earlier research by using a different speech perception task (four-interval oddity task) and by implementing cognitive load through a task often thought to be modular, namely, face processing. In the cognitive-load conditions, participants were required to remember two faces presented before the speech stimuli. In Experiment 1, performance in the speech-perception task under cognitive load was not impaired in comparison to a no-load baseline condition. In Experiment 2, we modified the load condition minimally such that it required encoding of the two faces simultaneously with the speech stimuli. As a reference condition, we also used a visual search task that in earlier experiments had led to poorer speech perception. Both concurrent tasks led to decrements in the speech task. The results suggest that speech perception is affected even by loads thought to be processed modularly, and that, critically, encoding in working memory might be the locus of interference.
Athaudage, Chandranath R. N.; Bradley, Alan B.; Lech, Margaret
A dynamic programming-based optimization strategy for a temporal decomposition (TD) model of speech and its application to low-rate speech coding in storage and broadcasting is presented. In previous work with the spectral stability-based event localizing (SBEL) TD algorithm, the event localization was performed based on a spectral stability criterion. Although this approach gave reasonably good results, there was no assurance on the optimality of the event locations. In the present work, we have optimized the event localizing task using a dynamic programming-based optimization strategy. Simulation results show that an improved TD model accuracy can be achieved. A methodology of incorporating the optimized TD algorithm within the standard MELP speech coder for the efficient compression of speech spectral information is also presented. The performance evaluation results revealed that the proposed speech coding scheme achieves 50%-60% compression of speech spectral information with negligible degradation in the decoded speech quality.
Full Text Available Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT, Mishra et al., 2013 along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity. Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.
Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Rönnberg, Jerker; Rudner, Mary
Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT; Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition) and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity (WMC). Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.
Kushnerenko, Elena; Tomalski, Przemyslaw; Ballieux, Haiko; Ribeiro, Helena; Potton, Anita; Axelsson, Emma L; Murphy, Elizabeth; Moore, Derek G
Research on audiovisual speech integration has reported high levels of individual variability, especially among young infants. In the present study we tested the hypothesis that this variability results from individual differences in the maturation of audiovisual speech processing during infancy. A developmental shift in selective attention to audiovisual speech has been demonstrated between 6 and 9 months with an increase in the time spent looking to articulating mouths as compared to eyes (Lewkowicz & Hansen-Tift. (2012) Proc. Natl Acad. Sci. USA, 109, 1431-1436; Tomalski et al. (2012) Eur. J. Dev. Psychol., 1-14). In the present study we tested whether these changes in behavioural maturational level are associated with differences in brain responses to audiovisual speech across this age range. We measured high-density event-related potentials (ERPs) in response to videos of audiovisually matching and mismatched syllables /ba/ and /ga/, and subsequently examined visual scanning of the same stimuli with eye-tracking. There were no clear age-specific changes in ERPs, but the amplitude of audiovisual mismatch response (AVMMR) to the combination of visual /ba/ and auditory /ga/ was strongly negatively associated with looking time to the mouth in the same condition. These results have significant implications for our understanding of individual differences in neural signatures of audiovisual speech processing in infants, suggesting that they are not strictly related to chronological age but instead associated with the maturation of looking behaviour, and develop at individual rates in the second half of the first year of life. © 2013 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Rönnberg, Jerker; Rudner, Mary
Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT; Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition) and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity (WMC). Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills. PMID:24324411
Molinaro, Nicola; Lizarazu, Mikel; Lallier, Marie; Bourguignon, Mathieu; Carreiras, Manuel
Developmental dyslexia is a reading disorder often characterized by reduced awareness of speech units. Whether the neural source of this phonological disorder in dyslexic readers results from the malfunctioning of the primary auditory system or damaged feedback communication between higher-order phonological regions (i.e., left inferior frontal regions) and the auditory cortex is still under dispute. Here we recorded magnetoencephalographic (MEG) signals from 20 dyslexic readers and 20 age-matched controls while they were listening to ∼10-s-long spoken sentences. Compared to controls, dyslexic readers had (1) an impaired neural entrainment to speech in the delta band (0.5-1 Hz); (2) a reduced delta synchronization in both the right auditory cortex and the left inferior frontal gyrus; and (3) an impaired feedforward functional coupling between neural oscillations in the right auditory cortex and the left inferior frontal regions. This shows that during speech listening, individuals with developmental dyslexia present reduced neural synchrony to low-frequency speech oscillations in primary auditory regions that hinders higher-order speech processing steps. The present findings, thus, strengthen proposals assuming that improper low-frequency acoustic entrainment affects speech sampling. This low speech-brain synchronization has the strong potential to cause severe consequences for both phonological and reading skills. Interestingly, the reduced speech-brain synchronization in dyslexic readers compared to normal readers (and its higher-order consequences across the speech processing network) appears preserved through the development from childhood to adulthood. Thus, the evaluation of speech-brain synchronization could possibly serve as a diagnostic tool for early detection of children at risk of dyslexia. Hum Brain Mapp 37:2767-2783, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
VAN HATTUM, ROLLAND J.; AND OTHERS
DESIGNED TO STRENGTHEN THE SKILLS, COMPETENCIES, AND KNOWLEDGE OF SPEECH CORRECTION TEACHERS, THIS SUMMARY OF A SPECIAL STUDY INSTITUTE CONTAINS A SERIES OF PRESENTATIONS. SPEAKERS DISCUSS ASPECTS OF CLEFT PALATE INCLUDING SPEECH, SPEECH ANATOMY, SURGICAL AND DENTAL MANAGEMENT, DIAGNOSIS, AND SPEECH THERAPY. SPEAKERS REPRESENT MEDICAL AND…
van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.
Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…
Gallardo, L.F.; Möller, S.; Beerends, J.
The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility
Faizel Amri, Umar; Nur Wahidah Nik Hashim, Nik; Hazrin Hany Mohamad Hanif, Noor
In the department of engineering, students are required to fulfil at least 80 percent of class attendance. Conventional method requires student to sign his/her initial on the attendance sheet. However, this method is prone to cheating by having another student signing for their fellow classmate that is absent. We develop our hypothesis according to a verse in the Holy Qur’an (95:4), “We have created men in the best of mould”. Based on the verse, we believe each psychological characteristic of human being is unique and thus, their speech characteristic should be unique. In this paper we present the development of speech biometric-based attendance system. The system requires user’s voice to be installed in the system as trained data and it is saved in the system for registration of the user. The following voice of the user will be the test data in order to verify with the trained data stored in the system. The system uses PSD (Power Spectral Density) and Transition Parameter as the method for feature extraction of the voices. Euclidean and Mahalanobis distances are used in order to verified the user’s voice. For this research, ten subjects of five females and five males were chosen to be tested for the performance of the system. The system performance in term of recognition rate is found to be 60% correct identification of individuals.
Full Text Available Cortical oscillations are likely candidates for segmentation and coding of continuous speech. Here, we monitored continuous speech processing with magnetoencephalography (MEG to unravel the principles of speech segmentation and coding. We demonstrate that speech entrains the phase of low-frequency (delta, theta and the amplitude of high-frequency (gamma oscillations in the auditory cortex. Phase entrainment is stronger in the right and amplitude entrainment is stronger in the left auditory cortex. Furthermore, edges in the speech envelope phase reset auditory cortex oscillations thereby enhancing their entrainment to speech. This mechanism adapts to the changing physical features of the speech envelope and enables efficient, stimulus-specific speech sampling. Finally, we show that within the auditory cortex, coupling between delta, theta, and gamma oscillations increases following speech edges. Importantly, all couplings (i.e., brain-speech and also within the cortex attenuate for backward-presented speech, suggesting top-down control. We conclude that segmentation and coding of speech relies on a nested hierarchy of entrained cortical oscillations.
Gross, Joachim; Hoogenboom, Nienke; Thut, Gregor; Schyns, Philippe; Panzeri, Stefano; Belin, Pascal; Garrod, Simon
Cortical oscillations are likely candidates for segmentation and coding of continuous speech. Here, we monitored continuous speech processing with magnetoencephalography (MEG) to unravel the principles of speech segmentation and coding. We demonstrate that speech entrains the phase of low-frequency (delta, theta) and the amplitude of high-frequency (gamma) oscillations in the auditory cortex. Phase entrainment is stronger in the right and amplitude entrainment is stronger in the left auditory cortex. Furthermore, edges in the speech envelope phase reset auditory cortex oscillations thereby enhancing their entrainment to speech. This mechanism adapts to the changing physical features of the speech envelope and enables efficient, stimulus-specific speech sampling. Finally, we show that within the auditory cortex, coupling between delta, theta, and gamma oscillations increases following speech edges. Importantly, all couplings (i.e., brain-speech and also within the cortex) attenuate for backward-presented speech, suggesting top-down control. We conclude that segmentation and coding of speech relies on a nested hierarchy of entrained cortical oscillations. PMID:24391472
Hengst, Julie A; Frame, Simone R; Neuman-Stritzel, Tiffany; Gannaway, Rachel
Reported speech, wherein one quotes or paraphrases the speech of another, has been studied extensively as a set of linguistic and discourse practices. Researchers agree that reported speech is pervasive, found across languages, and used in diverse contexts. However, to date, there have been no studies of the use of reported speech among individuals with aphasia. Grounded in an interactional sociolinguistic perspective, the study presented here documents and analyzes the use of reported speech by 7 adults with mild to moderately severe aphasia and their routine communication partners. Each of the 7 pairs was videotaped in 4 everyday activities at home or around the community, yielding over 27 hr of conversational interaction for analysis. A coding scheme was developed that identified 5 types of explicitly marked reported speech: direct, indirect, projected, indexed, and undecided. Analysis of the data documented reported speech as a common discourse practice used successfully by the individuals with aphasia and their communication partners. All participants produced reported speech at least once, and across all observations the target pairs produced 400 reported speech episodes (RSEs), 149 by individuals with aphasia and 251 by their communication partners. For all participants, direct and indirect forms were the most prevalent (70% of RSEs). Situated discourse analysis of specific episodes of reported speech used by 3 of the pairs provides detailed portraits of the diverse interactional, referential, social, and discourse functions of reported speech and explores ways that the pairs used reported speech to successfully frame talk despite their ongoing management of aphasia.
Full Text Available This study explored how political elites can contribute to power enactment through using language. It started with a theoretical overview of Critical Discourse Analysis (CDA, and then presented a corpus consisting of speeches of eight political elites, namely, Malcolm X, Noam Chomsky, Martin Luther King, Josef Stalin, Vladimir Lenin, Winston Churchill, J.F. Kennedy and Adolph Hitler. This study analyzed speeches in terms of figures of' speech, and interpreted them from the point of view of CDA using the framework introduced by Fairclough (1989 as a three-dimensional approach to the study of discourse (Description, Interpretation, Explanation and van Dijk (2004 as the theory of critical context analysis.. Speech figures are classified in this study into six main categories as Comparison, Grammar, Meaning, Parenthesis, Repetition and Rhetoric. The result of analyses reveals that while there are differences in the type and degree of speech figures employed by our selected individual political elites, there is one striking pattern which is common among all speeches: the frequent use of figures of Grammar, Repetition and Rhetoric
Guddattu, Vasudeva; Krishna, Y.
The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…
... Staying Safe Videos for Educators Search English Español Speech-Language Therapy KidsHealth / For Parents / Speech-Language Therapy ... most kids with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech ...
Zhao, Wanying; Riggs, Kevin; Schindler, Igor; Holle, Henning
Language and action naturally occur together in the form of cospeech gestures, and there is now convincing evidence that listeners display a strong tendency to integrate semantic information from both domains during comprehension. A contentious question, however, has been which brain areas are causally involved in this integration process. In previous neuroimaging studies, left inferior frontal gyrus (IFG) and posterior middle temporal gyrus (pMTG) have emerged as candidate areas; however, it is currently not clear whether these areas are causally or merely epiphenomenally involved in gesture-speech integration. In the present series of experiments, we directly tested for a potential critical role of IFG and pMTG by observing the effect of disrupting activity in these areas using transcranial magnetic stimulation in a mixed gender sample of healthy human volunteers. The outcome measure was performance on a Stroop-like gesture task (Kelly et al., 2010a), which provides a behavioral index of gesture-speech integration. Our results provide clear evidence that disrupting activity in IFG and pMTG selectively impairs gesture-speech integration, suggesting that both areas are causally involved in the process. These findings are consistent with the idea that these areas play a joint role in gesture-speech integration, with IFG regulating strategic semantic access via top-down signals acting upon temporal storage areas. SIGNIFICANCE STATEMENT Previous neuroimaging studies suggest an involvement of inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech integration, but findings have been mixed and due to methodological constraints did not allow inferences of causality. By adopting a virtual lesion approach involving transcranial magnetic stimulation, the present study provides clear evidence that both areas are causally involved in combining semantic information arising from gesture and speech. These findings support the view that, rather than being
WEIJianqiang; DULimin; YANZhaoli; ZENGHui
In this paper, a Kalman filter-based speech enhancement algorithm with some improvements of previous work is presented. A new technique based on spectral subtraction is used for separation speech and noise characteristics from noisy speech and for the computation of speech and noise Autoregressive (AR) parameters. In order to obtain a Kalman filter output with high audible quality, a perceptual post-filter is placed at the output of the Kalman filter to smooth the enhanced speech spectra.Extensive experiments indicate that this newly proposed method works well.
Full Text Available In this paper, a new subspace speech enhancement method using low-rank and sparse decomposition is presented. In the proposed method, we firstly structure the corrupted data as a Toeplitz matrix and estimate its effective rank for the underlying human speech signal. Then the low-rank and sparse decomposition is performed with the guidance of speech rank value to remove the noise. Extensive experiments have been carried out in white Gaussian noise condition, and experimental results show the proposed method performs better than conventional speech enhancement methods, in terms of yielding less residual noise and lower speech distortion.
Igor V. Nefedov
Full Text Available National radio, like television, is called upon to bring to the masses not only relevant information, but also a high culture of language. There were always serious demands to oral public speech from the point of view of the correctness and uniformity of the pronunciation. However, today the analysis of the language practice of broadcasting often indicates a discrepancy between the use of linguistic resources in existing literary norms. The author of the article from the end of December 2016 to early April 2017 listened and analyzed from the point of view of language correctness the majority of programs on the radio Komsomolskaya Pravda (KP. In general, recognizing the good speech qualification of the workers of this radio, as well as their «guests» (political scientists, lawyers, historians, etc., one can not but note the presence of a significant number of errors in their speech. The material presented in the article allows us to conclude that at present, broadcasting is losing its position in the field of speech culture. Neglect of the rules of the Russian language on the radio «Komsomolskaya Pravda» negatively affects the image of the Russian language, which is formed in the minds of listeners. The language of radio should strive to become a standard of cleanliness and high culture for the population, since it has the enormous power of mass impact and supports the unity of the cultural and linguistic space.
Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A
The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg
Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole
Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.
Brodtmann, Amy; Pemberton, Hugh; Darby, David; Vogel, Adam P
Apraxia of speech (AOS) can be the presenting symptom of neurodegenerative disease. The position of primary progressive AOS in the nosology of the dementias is still controversial. Despite seeing many specialists, patients are often misdiagnosed, in part due to a lack of quantitative measures of speech dysfunction. We present a single case report of a patient presenting with AOS, including acoustic analysis, language assessment, and brain imaging. A 52-year-old woman presenting with AOS had remained undiagnosed for 6 years despite seeing 8 specialists. Results of her MRI scans, genetic testing, and computerized speech analysis are provided. AOS is an underdiagnosed clinical syndrome causing great distress to patients and families. Using acoustic analysis of speech may lead to improved diagnostic accuracy. AOS is a complex entity with an expanding phenotype, and quantitative clinical measures will be critical for detection and to assess progression.
Petridis, Stavros; Li, Zuwei; Pantic, Maja
Traditional visual speech recognition systems consist of two stages, feature extraction and classification. Recently, several deep learning approaches have been presented which automatically extract features from the mouth images and aim to replace the feature extraction stage. However, research on
Telage, Kal; Fucci, Donald
Presented is a rationale for using vibrotactile stimulation (vibration applied to oral structures) as a diagnostic instrument to assess oral-sensory deficits which may contribute to defective speech patterns. (Author/LS)
Zhang, Wei; Zhang, Xueying; Sun, Ying
Selecting an appropriate recognition method is crucial in speech emotion recognition applications. However, the current methods do not consider the relationship between emotions. Thus, in this study, a speech emotion recognition system based on the fuzzy cognitive map (FCM) approach is constructed. Moreover, a new FCM learning algorithm for speech emotion recognition is proposed. This algorithm includes the use of the pleasure-arousal-dominance emotion scale to calculate the weights between e...
Sheffert, Sonya M; Olson, Elizabeth
In this research, we investigated the effects of voice and face information on the perceptual learning of talkers and on long-term memory for spoken words. In the first phase, listeners were trained over several days to identify voices from words presented auditorily or audiovisually. The training data showed that visual information about speakers enhanced voice learning, revealing cross-modal connections in talker processing akin to those observed in speech processing. In the second phase, the listeners completed an auditory or audiovisual word recognition memory test in which equal numbers of words were spoken by familiar and unfamiliar talkers. The data showed that words presented by familiar talkers were more likely to be retrieved from episodic memory, regardless of modality. Together, these findings provide new information about the representational code underlying familiar talker recognition and the role of stimulus familiarity in episodic word recognition.
Strand, Edythe A; Duffy, Joseph R; Clark, Heather M; Josephs, Keith
The purpose of this report is to describe an initial version of the Apraxia of Speech Rating Scale (ASRS), a scale designed to quantify the presence or absence, relative frequency, and severity of characteristics frequently associated with apraxia of speech (AOS). In this paper we report intra-judge and inter-judge reliability, as well as indices of validity, for the ASRS which was completed for 133 adult participants with a neurodegenerative speech or language disorder, 56 of whom had AOS. The overall inter-judge ICC among three clinicians was 0.94 for the total ASRS score and 0.91 for the number of AOS characteristics identified as present. Intra-judge ICC measures were high, ranging from 0.91 to 0.98. Validity was demonstrated on the basis of strong correlations with independent clinical diagnosis, as well as strong correlations of ASRS scores with independent clinical judgments of AOS severity. Results suggest that the ASRS is a potentially useful tool for documenting the presence and severity of characteristics of AOS. At this point in its development it has good potential for broader clinical use and for better subject description in AOS research. The Apraxia of Speech Rating Scale: A new tool for diagnosis and description of apraxia of speech 1. The reader will be able to explain characteristics of apraxia of speech. 2. The reader will be able to demonstrate use of a rating scale to document the presence and severity of speech characteristics. 3. The reader will be able to explain the reliability and validity of the ASRS. Copyright © 2014 Elsevier Inc. All rights reserved.
Full Text Available OBJECTIVE: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI users. METHODS: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px, frame rates (30, 20, 10, 7, 5 frames per second (fps, speech velocities (three different speakers, webcameras (Logitech Pro9000, C600 and C500 and image/sound delays (0-500 ms. All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS: Higher frame rate (>7 fps, higher camera resolution (>640 × 480 px and shorter picture/sound delay (<100 ms were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009 in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11 showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032. CONCLUSION: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.
Clausen, Marit Carolin; Fox-Boyer, Anette
in selecting the right intervention approach to resolve the SSD. Different quantitative and qualitative measurements are currently used to subgroup children with SSD. A quantitative method of classifying children is by accuracy of their productions. According to this approach, the severity of children’s SSD...... and clinical decision-making about the need for intervention should not be based on the quantitative approach only. A qualitative classification approach is needed for a distinct subgrouping of children with SSD whereas PCC-A can be used as additional information about the severity of the SSD. Keywords: speech...... is classified by calculating the percentage of correctly produced consonants (i.e. percentage consonants correct, PCC-A) (Shriberg et al., 1997). Alternatively, a qualitative approach seeks to ascertain which types of phonological processes are present in children’s speech, i.e. developmental or idiosyncratic...
Informed consent mandates for abortion providers may infringe the First Amendment's freedom of speech. On the other hand, they may reinforce the physician's duty to obtain informed consent. Courts can promote both doctrines by ensuring that compelled physician speech pertains to medical facts about abortion rather than abortion ideology and that compelled speech is truthful and not misleading. © 2015 American Society of Law, Medicine & Ethics, Inc.
Chang Woon Nam
This study compares incentive effects of various tax depreciation methods which are currently employed in selected OECD countries. Their generosity is determined on the basis of Samuelsonâ€™s true economic depreciation. For this purpose, the present value model is applied. The central issue is that the so-called historical cost accounting method, which is adopted in practice when calculating the corporate tax base, causes fictitious profits in inflationary phases that should also be taxed. Th...
Children with cochlear implants show deficits in identifying emotional intent of utterances without facial or body language cues. A known limitation to cochlear implants is the inability to accurately portray the fundamental frequency contour of speech which carries the majority of information needed to identify emotional intent. Without reliable access to the fundamental frequency, other methods of identifying vocal emotion, if identifiable, could be used to guide therapies for training children with cochlear implants to better identify vocal emotion. The current study analyzed recordings of adults speaking neutral sentences with a set array of emotions in a child-directed and adult-directed manner. The goal was to identify acoustic cues that contribute to emotion identification that may be enhanced in child-directed speech, but are also present in adult-directed speech. Results of this study showed that there were significant differences in the variation of the fundamental frequency, the variation of intensity, and the rate of speech among emotions and between intended audiences.
Juste, Fabiola Staróbole; Andrade, Claudia Regina Furquim de
To characterize the speech fluency profile of patients with Parkinson's disease. Study participants were 40 individuals of both genders aged 40 to 80 years divided into 2 groups: Research Group - RG (20 individuals with diagnosis of Parkinson's disease) and Control Group - CG (20 individuals with no communication or neurological disorders). For all of the participants, three speech samples involving different tasks were collected: monologue, individual reading, and automatic speech. The RG presented a significant larger number of speech disruptions, both stuttering-like and typical dysfluencies, and higher percentage of speech discontinuity in the monologue and individual reading tasks compared with the CG. Both groups presented reduced number of speech disruptions (stuttering-like and typical dysfluencies) in the automatic speech task; the groups presented similar performance in this task. Regarding speech rate, individuals in the RG presented lower number of words and syllables per minute compared with those in the CG in all speech tasks. Participants of the RG presented altered parameters of speech fluency compared with those of the CG; however, this change in fluency cannot be considered a stuttering disorder.
; speech-to-speech translation; language identiﬁcation. ... interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers.
This paper presents an analysis of the temporal alignment be- tween head movements and associated speech segments in the NOMCO corpus of first encounter dialogues . Our results show that head movements tend to start slightly before the onset of the corresponding speech sequence and to end...... slightly after, but also that there are delays in both directions in the range of -/+ 1s. Various factors that may influence delay duration are investigated. Correlations are found between delay length and the duration of the speech sequences associated with the head movements. Effects due to the different...
Binder, Laurence M; Spector, Jack; Youngjohn, James R
Three cases are presented of peculiar speech and language abnormalities that were evaluated in the context of personal injury lawsuit or workers compensation claims of brain dysfunction after mild traumatic brain injuries. Neuropsychological measures of effort and motivation showed evidence of suboptimal motivation or outright malingering. The speech and language abnormalities of these cases probably were not consistent with neurogenic features of dysfluent speech including stuttering or aphasia. We propose that severe dysfluency or language abnormalities persisting after a single, uncomplicated, mild traumatic brain injury are unusual and should elicit suspicion of a psychogenic origin.
Alan John Watson
Full Text Available In this extract from his new book Churchill’s Legacy: Two Speeches to Save the World (Watson, 2016, Lord Watson of Richmond draws on his own experience of post war British politics, as a television presenter and media commentator and then as a Liberal Peer and Chairman of the English-Speaking Union, to analyse the significance of Churchill’s Zurich speech of 19 September 1946. He argues that, building on Churchill’s earlier speech at Fulton, Missouri, it helped change the perceptions of the West and alter their response to the emerging Cold War and the future of Europe.
Full Text Available The most common way to communicate with those around us is speech. Suffering from a speech disorder can have negative social effects: from leaving the individuals with low confidence and moral to problems with social interaction and the ability to live independently like adults. The speech therapy intervention is a complex process having particular objectives such as: discovery and identification of speech disorder and directing the therapy to correction, recovery, compensation, adaptation and social integration of patients. Computer-based Speech Therapy systems are a real help for therapists by creating a special learning environment. The Romanian language is a phonetic one, with special linguistic particularities. This paper aims to present a few computer-based speech therapy systems developed for the treatment of various speech disorders specific to Romanian language.
Kressner, Abigail Anne; May, Tobias; Malik Thaarup Høegh, Rasmus
A recent study suggested that the most important factor for obtaining high speech intelligibility in noise with cochlear implant recipients is to preserve the low-frequency amplitude modulations of speech across time and frequency by, for example, minimizing the amount of noise in speech gaps....... In contrast, other studies have argued that the transients provide the most information. Thus, the present study investigates the relative impact of these two factors in the framework of noise reduction by systematically correcting noise-estimation errors within speech segments, speech gaps......, and the transitions between them. Speech intelligibility in noise was measured using a cochlear implant simulation tested on normal-hearing listeners. The results suggest that minimizing noise in the speech gaps can substantially improve intelligibility, especially in modulated noise. However, significantly larger...
Ossewaarde, Roelant; Jonkers, Roel; Jalvingh, Fedor; Bastiaanse, Yvonne
Measurement of speech parameters in casual speech of dementia patients Roelant Adriaan Ossewaarde1,2, Roel Jonkers1, Fedor Jalvingh1,3, Roelien Bastiaanse1 1CLCG, University of Groningen (NL); 2HU University of Applied Sciences Utrecht (NL); 33St. Marienhospital - Vechta, Geriatric Clinic Vechta
Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas
Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.
Wightman, Frederic L.; Kistler, Doris J.
Using a closed-set speech recognition paradigm thought to be heavily influenced by informational masking, auditory selective attention was measured in 38 children (ages 4-16 years) and 8 adults (ages 20-30 years). The task required attention to a monaural target speech message that was presented with a time-synchronized distracter message in the same ear. In some conditions a second distracter message or a speech-shaped noise was presented to the other ear. Compared to adults, children required higher target/distracter ratios to reach comparable performance levels, reflecting more informational masking in these listeners. Informational masking in most conditions was confirmed by the fact that a large proportion of the errors made by the listeners were contained in the distracter message(s). There was a monotonic age effect, such that even the children in the oldest age group (13.6-16 years) demonstrated poorer performance than adults. For both children and adults, presentation of an additional distracter in the contralateral ear significantly reduced performance, even when the distracter messages were produced by a talker of different sex than the target talker. The results are consistent with earlier reports from pure-tone masking studies that informational masking effects are much larger in children than in adults.
Rosenblum, Lawrence D.
Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal s...
Schafer, Phillip B.
Humans routinely recognize speech in challenging acoustic environments with background music, engine sounds, competing talkers, and other acoustic noise. However, today's automatic speech recognition (ASR) systems perform poorly in such environments. In this dissertation, I present novel methods for ASR designed to approach human-level performance by emulating the brain's processing of sounds. I exploit recent advances in auditory neuroscience to compute neuron-based representations of speech, and design novel methods for decoding these representations to produce word transcriptions. I begin by considering speech representations modeled on the spectrotemporal receptive fields of auditory neurons. These representations can be tuned to optimize a variety of objective functions, which characterize the response properties of a neural population. I propose an objective function that explicitly optimizes the noise invariance of the neural responses, and find that it gives improved performance on an ASR task in noise compared to other objectives. The method as a whole, however, fails to significantly close the performance gap with humans. I next consider speech representations that make use of spiking model neurons. The neurons in this method are feature detectors that selectively respond to spectrotemporal patterns within short time windows in speech. I consider a number of methods for training the response properties of the neurons. In particular, I present a method using linear support vector machines (SVMs) and show that this method produces spikes that are robust to additive noise. I compute the spectrotemporal receptive fields of the neurons for comparison with previous physiological results. To decode the spike-based speech representations, I propose two methods designed to work on isolated word recordings. The first method uses a classical ASR technique based on the hidden Markov model. The second method is a novel template-based recognition scheme that takes
Full Text Available Speech perception is known to rely on both auditory and visual information. However, sound specific somatosensory input has been shown also to influence speech perceptual processing (Ito et al., 2009. In the present study we addressed further the relationship between somatosensory information and speech perceptual processing by addressing the hypothesis that the temporal relationship between orofacial movement and sound processing contributes to somatosensory-auditory interaction in speech perception. We examined the changes in event-related potentials in response to multisensory synchronous (simultaneous and asynchronous (90 ms lag and lead somatosensory and auditory stimulation compared to individual unisensory auditory and somatosensory stimulation alone. We used a robotic device to apply facial skin somatosensory deformations that were similar in timing and duration to those experienced in speech production. Following synchronous multisensory stimulation the amplitude of the event-related potential was reliably different from the two unisensory potentials. More importantly, the magnitude of the event-related potential difference varied as a function of the relative timing of the somatosensory-auditory stimulation. Event-related activity change due to stimulus timing was seen between 160-220 ms following somatosensory onset, mostly around the parietal area. The results demonstrate a dynamic modulation of somatosensory-auditory convergence and suggest the contribution of somatosensory information for speech processing process is dependent on the specific temporal order of sensory inputs in speech production.
Full Text Available Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.
Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.
Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a 93.9% word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.
Schalling, Ellika; Johansson, Kerstin; Hartelius, Lena
Changes in communicative functions are common in Parkinson's disease (PD), but there are only limited data provided by individuals with PD on how these changes are perceived, what their consequences are, and what type of intervention is provided. To present self-reported information about speech and communication, the impact on communicative participation, and the amount and type of speech-language pathology services received by people with PD. Respondents with PD recruited via the Swedish Parkinson's Disease Society filled out a questionnaire accessed via a Web link or provided in a paper version. Of 188 respondents, 92.5% reported at least one symptom related to communication; the most common symptoms were weak voice, word-finding difficulties, imprecise articulation, and getting off topic in conversation. The speech and communication problems resulted in restricted communicative participation for between a quarter and a third of the respondents, and their speech caused embarrassment sometimes or more often to more than half. Forty-five percent of the respondents had received speech-language pathology services. Most respondents reported both speech and language symptoms, and many experienced restricted communicative participation. Access to speech-language pathology services is still inadequate. Services should also address cognitive/linguistic aspects to meet the needs of people with PD. © 2018 S. Karger AG, Basel.
Rusz, Jan; Benova, Barbora; Ruzickova, Hana; Novotny, Michal; Tykalova, Tereza; Hlavnicka, Jan; Uher, Tomas; Vaneckova, Manuela; Andelova, Michaela; Novotna, Klara; Kadrnozkova, Lucie; Horakova, Dana
Motor speech disorders in multiple sclerosis (MS) are poorly understood and their quantitative, objective acoustic characterization remains limited. Additionally, little data regarding relationships between the severity of speech disorders and neurological involvement in MS, as well as the contribution of pyramidal and cerebellar functional systems on speech phenotypes, is available. Speech data were acquired from 141 MS patients with Expanded Disability Status Scale (EDSS) ranging from 1 to 6.5 and 70 matched healthy controls. Objective acoustic speech assessment including subtests on phonation, oral diadochokinesis, articulation and prosody was performed. The prevalence of dysarthria in our MS cohort was 56% while the severity was generally mild and primarily consisted of a combination of spastic and ataxic components. Prosodic-articulatory disorder presenting with monopitch, articulatory decay, excess loudness variations and slow rate was the most salient. Speech disorders reflected subclinical motor impairment with 78% accuracy in discriminating between a subgroup of asymptomatic MS (EDSS oral diadochokinesis and the 9-Hole Peg Test (r = - 0.65, p oral diadochokinesis and excess loudness variations significantly separated pure pyramidal and mixed pyramidal-cerebellar MS subgroups. Automated speech analyses may provide valuable biomarkers of disease progression in MS as dysarthria represents common and early manifestation that reflects disease disability and underlying pyramidal-cerebellar pathophysiology. Copyright © 2017 Elsevier B.V. All rights reserved.
Full Text Available We describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. Speech lip movements are captured using a 3D stereo face capture system and split up into phonetic units. A dynamic parameterisation of this data is constructed which maintains the relationship between lip shapes and velocities; within this parameterisation a model of how lips move is built and is used in the animation of visual speech movements from speech audio input. The mapping from audio parameters to lip movements is disambiguated by selecting only the most similar stored phonetic units to the target utterance during synthesis. By combining properties of model-based synthesis (e.g., HMMs, neural nets with unit selection we improve the quality of our speech synthesis.
... digital filtering for noise cancellation which interfaces to speech recognition software. It uses auditory features in speech recognition training, and provides applications to multilingual spoken language translation...
Teaching Speech Acts
Full Text Available In this paper I argue that pragmatic ability must become part of what we teach in the classroom if we are to realize the goals of communicative competence for our students. I review the research on pragmatics, especially those articles that point to the effectiveness of teaching pragmatics in an explicit manner, and those that posit methods for teaching. I also note two areas of scholarship that address classroom needs—the use of authentic data and appropriate assessment tools. The essay concludes with a summary of my own experience teaching speech acts in an advanced-level Portuguese class.
Kraljevski, Ivan; Tan, Zheng-Hua
This paper addresses the issue of data compression in distributed speech recognition on the basis of a variable frame rate and length analysis method. The method first conducts frame selection by using a posteriori signal-to-noise ratio weighted energy distance to find the right time resolution...... length for steady regions. The method is applied to scalable source coding in distributed speech recognition where the target bitrate is met by adjusting the frame rate. Speech recognition results show that the proposed approach outperforms other compression methods in terms of recognition accuracy...... for noisy speech while achieving higher compression rates....
Full Text Available Listeners must accomplish two complementary perceptual feats in extracting a message from speech. They must discriminate linguistically-relevant acoustic variability and generalize across irrelevant variability. Said another way, they must categorize speech. Since the mapping of acoustic variability is language-specific, these categories must be learned from experience. Thus, understanding how, in general, the auditory system acquires and represents categories can inform us about the toolbox of mechanisms available to speech perception. This perspective invites consideration of findings from cognitive neuroscience literatures outside of the speech domain as a means of constraining models of speech perception. Although neurobiological models of speech perception have mainly focused on cerebral cortex, research outside the speech domain is consistent with the possibility of significant subcortical contributions in category learning. Here, we review the functional role of one such structure, the basal ganglia. We examine research from animal electrophysiology, human neuroimaging, and behavior to consider characteristics of basal ganglia processing that may be advantageous for speech category learning. We also present emerging evidence for a direct role for basal ganglia in learning auditory categories in a complex, naturalistic task intended to model the incidental manner in which speech categories are acquired. To conclude, we highlight new research questions that arise in incorporating the broader neuroscience research literature in modeling speech perception, and suggest how understanding contributions of the basal ganglia can inform attempts to optimize training protocols for learning non-native speech categories in adulthood.
Full Text Available The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated, Passive speech exposure (regular exposure to human speech, and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with "d" or "t" and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal.
Asten, Pamela; Akre, Harriet; Persson, Christina
Treacher Collins syndrome (TCS, OMIM 154500) is a rare congenital disorder of craniofacial development. Characteristic hypoplastic malformations of the ears, zygomatic arch, mandible and pharynx have been described in detail. However, reports on the impact of these malformations on speech are few. Exploring speech features and investigating if speech function is related to phenotypic severity are essential for optimizing follow-up and treatment. Articulation, nasal resonance, voice and intelligibility were examined in 19 individuals (5-74 years, median 34 years) divided into three groups comprising children 5-10 years (n = 4), adolescents 11-18 years (n = 4) and adults 29 years and older (n = 11). A speech composite score (0-6) was calculated to reflect the variability of speech deviations. TCS severity scores of phenotypic expression and total scores of Nordic Orofacial Test-Screening (NOT-S) measuring orofacial dysfunction were used in analyses of correlation with speech characteristics (speech composite scores). Children and adolescents presented with significantly higher speech composite scores (median 4, range 1-6) than adults (median 1, range 0-5). Nearly all children and adolescents (6/8) displayed speech deviations of articulation, nasal resonance and voice, while only three adults were identified with multiple speech aberrations. The variability of speech dysfunction in TCS was exhibited by individual combinations of speech deviations in 13/19 participants. The speech composite scores correlated with TCS severity scores and NOT-S total scores. Speech composite scores higher than 4 were associated with cleft palate. The percent of intelligible words in connected speech was significantly lower in children and adolescents (median 77%, range 31-99) than in adults (98%, range 93-100). Intelligibility of speech among the children was markedly inconsistent and clearly affecting the understandability. Multiple speech deviations were identified in
Full Text Available This article presents some tehnical and pedagogical features of an interactive platforme used for language therapy. Timlogoro project demonstrates that technology is an effective tool in learning and, in particular, a viable solution for improving speech disorders present in different stages of age. A digital platform for different categories of users with speech impairments (children and adults has a good support in pedagogical principles. In speech therapy, the computer was originally used to assess deficiencies. Nowadays it has become a useful tool in language rehabilitation. A few Romanian speech therapists create digital applications that will be used in therapy for recovery.This work was supported by a grant of the Romanian National Authority for Scientific UEFISCDI.
Parkinson, Michael G.; Dobkins, David H.
Using a computerized content analysis, the authors demonstrate changes in speech behaviors of prison inmates. They conclude that two to four hours of public speaking training can have only limited effect on students who live in a culture in which "prison speech" is the expected and rewarded form of behavior. (PD)
Pedersen, Michael Syskind; Wang, DeLiang; Larsen, Jan
a method for underdetermined blind source separation of convolutive mixtures. The proposed framework is applicable for separation of instantaneous as well as convolutive speech mixtures. It is possible to iteratively extract each speech signal from the mixture by combining blind source separation...
Some of the history of gradual infusion of the modulation spectrum concept into Automatic recognition of speech (ASR) comes next, pointing to the relationship of modulation spectrum processing to wellaccepted ASR techniques such as dynamic speech features or RelAtive SpecTrAl (RASTA) ﬁltering. Next, the frequency ...
Casper, Maureen A.; Raphael, Lawrence J.; Harris, Katherine S.; Geibel, Jennifer M.
Persons with cerebellar ataxia exhibit changes in physical coordination and speech and voice production. Previously, these alterations of speech and voice production were described primarily via perceptual coordinates. In this study, the spatial-temporal properties of syllable production were examined in 12 speakers, six of whom were healthy…
Terband, Hayo; Maassen, Ben; Guenther, Frank H.; Brumberg, Jonathan
Purpose: Childhood apraxia of speech (CAS) has been associated with a wide variety of diagnostic descriptions and has been shown to involve different symptoms during successive stages of development. In the present study, the authors attempted to associate the symptoms of CAS in a particular developmental stage with particular…
Terband, H.R.; Maassen, B.A.M.; Guenther, F.H.; Brumberg, J.
PURPOSE: Childhood apraxia of speech (CAS) has been associated with a wide variety of diagnostic descriptions and has been shown to involve different symptoms during successive stages of development. In the present study, the authors attempted to associate the symptoms of CAS in a particular
Maryam Tafaroji yeganeh
Full Text Available Background : Speech errors are a branch of psycholinguistic science. Speech error or slip of tongue is a natural process that happens to everyone. The importance of this research is because of sensitivity and importance of nursing in which the speech errors may be interfere in the treatment of patients, but unfortunately no research has been done yet in this field.This research has been done to study the factors (personality, stress, fatigue and insomnia which cause speech errors happen to nurses of Ilam province. Materials and Methods: The sample of this correlation-descriptive research consists of 50 nurses working in Mustafa Khomeini Hospital of Ilam province who were selected randomly. Our data were collected using The Minnesota Multiphasic Personality Inventory, NEO-Five Factor Inventory and Expanded Nursing Stress Scale, and were analyzed using SPSS version 20, descriptive, inferential and multivariate linear regression or two-variable statistical methods (with significant level: p≤0. 05. Results: 30 (60% of nurses participating in the study were female and 19 (38% were male. In this study, all three factors (type of personality, stress and fatigue have significant effects on nurses' speech errors Conclusion: 30 (60% of nurses participating in the study were female and 19 (38% were male. In this study, all three factors (type of personality, stress and fatigue have significant effects on nurses' speech errors.
Lamers, S.M.A.; Truong, Khiet Phuong; Steunenberg, B.; Steunenberg, B.; de Jong, Franciska M.G.; Westerhof, Gerben Johan
The present study aims to investigate the application of prosodic speech features in a psychological intervention based on lifereview. Several studies have shown that speech features can be used as indicators of depression severity, but these studies are mainly based on controlled speech recording
Le , Viet-Bac; Besacier , Laurent; Seng , Sopheap; Bigi , Brigitte; Do , Thi-Ngoc-Diep
International audience; This paper presents our recent activities for automatic speech recognition for Vietnamese. First, our text data collection and processing methods and tools are described. For language modeling, we investigate word, sub-word and also hybrid word/sub-word models. For acoustic modeling, when only limited speech data are available for Vietnamese, we propose some crosslingual acoustic modeling techniques. Furthermore, since the use of sub-word units can reduce the high out-...
This PhD thesis in human-computer interfaces (informatics) studies the case of the anaesthesia record used during medical operations and the possibility to supplement it with speech recognition facilities. Problems and limitations have been identified with the traditional paper-based anaesthesia...... and inaccuracies in the anaesthesia record. Supplementing the electronic anaesthesia record interface with speech input facilities is proposed as one possible solution to a part of the problem. The testing of the various hypotheses has involved the development of a prototype of an electronic anaesthesia record...... interface with speech input facilities in Danish. The evaluation of the new interface was carried out in a full-scale anaesthesia simulator. This has been complemented by laboratory experiments on several aspects of speech recognition for this type of use, e.g. the effects of noise on speech recognition...
The present paper presents the discussion of scholars concerning speech impact, peculiarities of its realization, speech strategies and techniques in particular. Departing from the viewpoints of many prominent linguists, the paper suggests that manipulative argumentation be viewed as a most pervasive speech strategy with a certain set of techniques which are to be found in modern American political discourse. The precedence of their occurrence allows us to regard them as ...
Full Text Available Event-related potential (ERP evidence demonstrates that preschool-aged children selectively attend to informative moments such as word onsets during speech perception. Although this observation indicates a role for attention in language processing, it is unclear whether this type of attention is part of basic speech perception mechanisms, higher-level language skills, or general cognitive abilities. The current study examined these possibilities by measuring ERPs from 5-year-old children listening to a narrative containing attention probes presented before, during, and after word onsets as well as at random control times. Children also completed behavioral tests assessing verbal and nonverbal skills. Probes presented after word onsets elicited a more negative ERP response beginning around 100 ms after probe onset than control probes, indicating increased attention to word-initial segments. Crucially, the magnitude of this difference was correlated with performance on verbal tasks, but showed no relationship to nonverbal measures. More specifically, ERP attention effects were most strongly correlated with performance on a complex metalinguistic task involving grammaticality judgments. These results demonstrate that effective allocation of attention during speech perception supports higher-level, controlled language processing in children by allowing them to focus on relevant information at individual word and complex sentence levels.
Keshtiari, Niloofar; Kuhlmann, Michael; Eslami, Moharram; Klann-Delius, Gisela
Research on emotional speech often requires valid stimuli for assessing perceived emotion through prosody and lexical content. To date, no comprehensive emotional speech database for Persian is officially available. The present article reports the process of designing, compiling, and evaluating a comprehensive emotional speech database for colloquial Persian. The database contains a set of 90 validated novel Persian sentences classified in five basic emotional categories (anger, disgust, fear, happiness, and sadness), as well as a neutral category. These sentences were validated in two experiments by a group of 1,126 native Persian speakers. The sentences were articulated by two native Persian speakers (one male, one female) in three conditions: (1) congruent (emotional lexical content articulated in a congruent emotional voice), (2) incongruent (neutral sentences articulated in an emotional voice), and (3) baseline (all emotional and neutral sentences articulated in neutral voice). The speech materials comprise about 470 sentences. The validity of the database was evaluated by a group of 34 native speakers in a perception test. Utterances recognized better than five times chance performance (71.4 %) were regarded as valid portrayals of the target emotions. Acoustic analysis of the valid emotional utterances revealed differences in pitch, intensity, and duration, attributes that may help listeners to correctly classify the intended emotion. The database is designed to be used as a reliable material source (for both text and speech) in future cross-cultural or cross-linguistic studies of emotional speech, and it is available for academic research purposes free of charge. To access the database, please contact the first author.
One of the most common complaints of people with impaired hearing concerns their difficulty with understanding speech. Particularly in the presence of background noise, hearing-impaired people often encounter great difficulties with speech communication. In most cases, the problem persists even...... if reduced audibility has been compensated for by hearing aids. It has been hypothesized that part of the difficulty arises from changes in the perception of sounds that are well above hearing threshold, such as reduced frequency selectivity and deficits in the processing of temporal fine structure (TFS......) at the output of the inner-ear (cochlear) filters. The purpose of this work was to investigate these aspects in detail. One chapter studies relations between frequency selectivity, TFS processing, and speech reception in listeners with normal and impaired hearing, using behavioral listening experiments. While...
Titus Felix FURTUNA
Full Text Available In a system of speech recognition containing words, the recognition requires the comparison between the entry signal of the word and the various words of the dictionary. The problem can be solved efficiently by a dynamic comparison algorithm whose goal is to put in optimal correspondence the temporal scales of the two words. An algorithm of this type is Dynamic Time Warping. This paper presents two alternatives for implementation of the algorithm designed for recognition of the isolated words.
The proposal was tested that (P1 X P2) F1 leads to P1 irradiation bone marrow chimeras expressed predominantly P1-restricted T cells because donor derived stem cells were exposed to recipient derived antigen-presenting cells in the thymus. Because P1 recipient-derived antigen-presenting cells are replaced only slowly after 6-8 wk by (P1 X P2) donor-derived antigen-presenting cells in the thymus and because replenished pools of mature T cells may by then prevent substantial numbers of P2-restricted T cells to be generated, a large portion of thymus cells and mature T cells were eliminated using the following treatments of 12-20-wk-old (P1 X P2) F1 leads to P1 irradiation bone marrow chimeras: (a) cortisone plus antilymphocyte serum, (b) Cytoxan, (c) three doses of sublethal irradiation (300 rad) 2d apart, and (d) lethal irradiation (850 rad) and reconstitution with T cell-depleted (P1 X P2) F1 stem cells. 12-20 wk after this second treatment, (P1 X P2) leads to P1 chimeras were infected with vaccinia-virus. Virus-specific cytotoxic T cell reactivity was expressed by chimeric T cells of (P1 X P[2) F1 origin and was restricted predominantly to P1. Virus-specific cytotoxic T cells, therefore, do not seem to be selected to measurable extent by the immigrating donor-derived antigen-presenting cells in the thymus; their selection depends apparently from the recipient-derived radioresistant thymus cells
Full Text Available AbstractObjectiveMany tinnitus patients complain about difficulties regarding speech comprehension. In spite of the high clinical relevance little is known about underlying mechanisms and predisposing factors. Here, we performed an exploratory investigation in a large sample of tinnitus patients to (1 estimate the prevalence of speech comprehension difficulties among tinnitus patients, to (2 compare subjective reports of speech comprehension difficulties with objective measurements in a standardized speech comprehension test and to (3 explore underlying mechanisms by analyzing the relationship between speech comprehension difficulties and peripheral hearing function (pure tone audiogram, as well as with co-morbid hyperacusis as a central auditory processing disorder. Subjects and MethodsSpeech comprehension was assessed in 361 tinnitus patients presenting between 07/2012 and 08/2014 at the Interdisciplinary Tinnitus Clinic at the University of Regensburg. The assessment included standard audiological assessment (pure tone audiometry, tinnitus pitch and loudness matching, the Goettingen sentence test (in quiet for speech audiometric evaluation, two questions about hyperacusis, and two questions about speech comprehension in quiet and noisy environments (How would you rate your ability to understand speech?; How would you rate your ability to follow a conversation when multiple people are speaking simultaneously?. Results Subjectively reported speech comprehension deficits are frequent among tinnitus patients, especially in noisy environments (cocktail party situation. 74.2% of all investigated patients showed disturbed speech comprehension (indicated by values above 21.5 dB SPL in the Goettingen sentence test. Subjective speech comprehension complaints (both in general and in noisy environment were correlated with hearing level and with audiologically-assessed speech comprehension ability. In contrast, co-morbid hyperacusis was only correlated
Eskelund, Kasper; Dau, Torsten
Speech perception integrates signal from ear and eye. This is witnessed by a wide range of audiovisual integration effects, such as ventriloquism and the McGurk illusion. Some behavioral evidence suggest that audiovisual integration of specific aspects is special for speech perception. However, our...... knowledge of such bimodal integration would be strengthened if the phenomena could be investigated by objective, neutrally based methods. One key question of the present work is if perceptual processing of audiovisual speech can be gauged with a specific signature of neurophysiological activity...... on the auditory speech percept? In two experiments, which both combine behavioral and neurophysiological measures, an uncovering of the relation between perception of faces and of audiovisual integration is attempted. Behavioral findings suggest a strong effect of face perception, whereas the MMN results are less...
Full Text Available This paper presents a new method to detect speech/nonspeech components of a given noisy signal. Employing the combination of binary Walsh basis functions and an analysis-synthesis scheme, the original noisy speech signal is modified first. From the modified signals, the speech components are distinguished from the nonspeech components by using a simple decision scheme. Minimal number of Walsh basis functions to be applied is determined using singular value decomposition (SVD. The main advantages of the proposed method are low computational complexity, less parameters to be adjusted, and simple implementation. It is observed that the use of Walsh basis functions makes the proposed algorithm efficiently applicable in real-world situations where processing time is crucial. Simulation results indicate that the proposed algorithm achieves high-speech and nonspeech detection rates while maintaining a low error rate for different noisy conditions.
Full Text Available If it is well known that knowledge facilitates higher cognitive functions, such as visual and auditory word recognition, little is known about the influence of knowledge on detection, particularly in the auditory modality. Our study tested the influence of phonological and lexical knowledge on auditory detection. Words, pseudo words and complex non phonological sounds, energetically matched as closely as possible, were presented at a range of presentation levels from sub threshold to clearly audible. The participants performed a detection task (Experiments 1 and 2 that was followed by a two alternative forced choice recognition task in Experiment 2. The results of this second task in Experiment 2 suggest a correct recognition of words in the absence of detection with a subjective threshold approach. In the detection task of both experiments, phonological stimuli (words and pseudo words were better detected than non phonological stimuli (complex sounds, presented close to the auditory threshold. This finding suggests an advantage of speech for signal detection. An additional advantage of words over pseudo words was observed in Experiment 2, suggesting that lexical knowledge could also improve auditory detection when listeners had to recognize the stimulus in a subsequent task. Two simulations of detection performance performed on the sound signals confirmed that the advantage of speech over non speech processing could not be attributed to energetic differences in the stimuli.
Pinker, Steven; Nowak, Martin A.; Lee, James J.
When people speak, they often insinuate their intent indirectly rather than stating it as a bald proposition. Examples include sexual come-ons, veiled threats, polite requests, and concealed bribes. We propose a three-part theory of indirect speech, based on the idea that human communication involves a mixture of cooperation and conflict. First, indirect requests allow for plausible deniability, in which a cooperative listener can accept the request, but an uncooperative one cannot react adversarially to it. This intuition is supported by a game-theoretic model that predicts the costs and benefits to a speaker of direct and indirect requests. Second, language has two functions: to convey information and to negotiate the type of relationship holding between speaker and hearer (in particular, dominance, communality, or reciprocity). The emotional costs of a mismatch in the assumed relationship type can create a need for plausible deniability and, thereby, select for indirectness even when there are no tangible costs. Third, people perceive language as a digital medium, which allows a sentence to generate common knowledge, to propagate a message with high fidelity, and to serve as a reference point in coordination games. This feature makes an indirect request qualitatively different from a direct one even when the speaker and listener can infer each other's intentions with high confidence. PMID:18199841
Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Herve
To communicate, children must discriminate and identify speech sounds. Because visual speech plays an important role in this process, we explored how visual speech influences phoneme discrimination and identification by children. Critical items had intact visual speech (e.g. baez) coupled to non-intact (excised onsets) auditory speech (signified…
John H. Grose
Full Text Available The purpose of this study was to determine the effects of age on the spectro-temporal integration of speech. The hypothesis was that the integration of speech fragments distributed over frequency, time, and ear of presentation is reduced in older listeners—even for those with good audiometric hearing. Younger, middle-aged, and older listeners (10 per group with good audiometric hearing participated. They were each tested under seven conditions that encompassed combinations of spectral, temporal, and binaural integration. Sentences were filtered into two bands centered at 500 Hz and 2500 Hz, with criterion bandwidth tailored for each participant. In some conditions, the speech bands were individually square wave interrupted at a rate of 10 Hz. Configurations of uninterrupted, synchronously interrupted, and asynchronously interrupted frequency bands were constructed that constituted speech fragments distributed across frequency, time, and ear of presentation. The over-arching finding was that, for most configurations, performance was not differentially affected by listener age. Although speech intelligibility varied across condition, there was no evidence of performance deficits in older listeners in any condition. This study indicates that age, per se, does not necessarily undermine the ability to integrate fragments of speech dispersed across frequency and time.
Naidu, D.H.R.; Srinivasan, S.
Several speech enhancement approaches utilize trained models of clean speech data, such as codebooks, Gaussian mixtures, and hidden Markov models. These models are typically trained on neutral clean speech data, without any emotion. However, in practical scenarios, emotional speech is a common
Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc
Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…
Shannon, Robert V
Music and speech share many acoustic cues but not all are equally important. For example, harmonic pitch is essential for music but not for speech. When birds communicate is their song more like speech or music? A new study contrasting pitch and spectral patterns shows that birds perceive their song more like humans perceive speech. Copyright © 2016 Elsevier Ltd. All rights reserved.