Larson, Martha; Ordelman, Roeland J.F.; de Jong, Franciska M.G.; Kohler, Joachim; Kraaij, Wessel
After two successful years at SIGIR in 2007 and 2008, the third workshop on Searching Spontaneous Conversational Speech (SSCS 2009) was held conjunction with the ACM Multimedia 2009. The goal of the SSCS series is to serve as a forum that brings together the disciplines that collaborate on spoken
Köhler, J.; Larson, M; de Jong, Franciska M.G.; Ordelman, Roeland J.F.; Kraaij, W.
The second workshop on Searching Spontaneous Conversational Speech (SSCS 2008) was held in Singapore on July 24, 2008 in conjunction with the 31st Annual International ACM SIGIR Conference. The goal of the workshop was to bring the speech community and the information retrieval community together.
de Jong, Franciska M.G.; Oard, Douglas; Ordelman, Roeland J.F.; Raaijmakers, Stephan
The Proceedings contain the contributions to the workshop on Searching Spontaneous Conversational Speech organized in conjunction with the 30th ACM SIGIR, Amsterdam 2007. The papers reflect some of the emerging focus areas and cross-cutting research topics, together addressing evaluation metrics,
Full Text Available Multiple cues influence listeners’ segmentation of connected speech into words, but most previous studies have used stimuli elicited in careful readings rather than natural conversation. Discerning word boundaries in conversational speech may differ from the laboratory setting. In particular, a speaker’s articulatory effort – hyperarticulation vs hypoarticulation (H&H – may vary according to communicative demands, suggesting a compensatory relationship whereby acoustic-phonetic cues are attenuated when other information sources strongly guide segmentation. We examined how listeners’ interpretation of segmentation cues is affected by speech style (spontaneous conversation vs read, using cross-modal identity priming. To elicit spontaneous stimuli, we used a map task in which speakers discussed routes around stylised landmarks. These landmarks were two-word phrases in which the strength of potential segmentation cues – semantic likelihood and cross-boundary diphone phonotactics – was systematically varied. Landmark-carrying utterances were transcribed and later re-recorded as read speech.Independent of speech style, we found an interaction between cue valence (favourable/unfavourable and cue type (phonotactics/semantics. Thus, there was an effect of semantic plausibility, but no effect of cross-boundary phonotactics, indicating that the importance of phonotactic segmentation may have been overstated in studies where lexical information was artificially suppressed. These patterns were unaffected by whether the stimuli were elicited in a spontaneous or read context, even though the difference in speech styles was evident in a main effect. Durational analyses suggested speaker-driven cue trade-offs congruent with an H&H account, but these modulations did not impact on listener behaviour. We conclude that previous research exploiting read speech is reliable in indicating the primacy of lexically-based cues in the segmentation of natural
Full Text Available Objectives: The spontaneous language sample analysis is an important part of the language assessment protocol. Language samples give us useful information about how children use language in the natural situations of daily life. The purpose of this study was to compare Conversation, Freeplay, and narrative speech in aspects of Mean Length of Utterance (MLU, Type-token ratio (TTR, and the number of utterances. Methods: By cluster sampling method, a total of 30 Semnanian five-year-old boys with normal speech and language development were selected from the active kindergartens in Semnan city. Conversation, Freeplay, and narrative speech were three applied language sample elicitation methods to obtain 15 minutes of children’s spontaneous language samples. Means for MLU, TTR, and the number of utterances are analyzed by dependent ANOVA. Results: The result showed no significant difference in number of elicited utterances among these three language sampling methods. Narrative speech elicited longer MLU than freeplay and conversation, and compared to freeplay and narrative speech, conversation elicited higher TTR. Discussion: Results suggest that in the clinical assessment of the Persian-language children, it is better to use narrative speech to elicit longer MLU and to use conversation to elicit higher TTR.
Mehta, G; Cutler, A
Although spontaneous speech occurs more frequently in most listeners' experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ considerably, however, which suggests that laboratory results may not generalise to the recognition of spontaneous speech. In the present study listeners were presented with both spontaneous and read speech materials, and their response time to detect word-initial target phonemes was measured. Responses were, overall, equally fast in each speech mode. However, analysis of effects previously reported in phoneme detection studies revealed significant differences between speech modes. In read speech but not in spontaneous speech, later targets were detected more rapidly than targets preceded by short words. In contrast, in spontaneous speech but not in read speech, targets were detected more rapidly in accented than in unaccented words and in strong than in weak syllables. An explanation for this pattern is offered in terms of characteristic prosodic differences between spontaneous and read speech. The results support claims from previous work that listeners pay great attention to prosodic information in the process of recognising speech.
Mehta, G.; Cutler, A.
Although spontaneous speech occurs more frequently in most listeners’ experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ considerably, however, which suggests that laboratory results may not generalize to the recognition of spontaneous and read speech materials, and their response time to detect word-initial target phonem...
Mehta, G.; Cutler, A.
Although spontaneous speech occurs more frequently in most listeners' experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ
Galina M. Shipitsina
Full Text Available The article deals with the semantic, pragmatic and structural features of words, phrases, dialogues motivation, in the contemporary Russian popular speech. These structural features are characterized by originality and unconventional use. Language material is the result of authors` direct observation of spontaneous verbal communication between people of different social and age groups. The words and remarks were analyzed in compliance with the communication system of national Russian language and cultural background of popular speech. Studies have discovered that in spoken discourse there are some other ways to increase the expression statement. It is important to note that spontaneous speech identifies lacunae in the nominative language and its vocabulary system. It is proved, prefixation is also effective and regular way of the same action presenting. The most typical forms, ways and means to update language resources as a result of the linguistic creativity of native speakers were identified.
Behrooz Mahmoodi Bakhtiari
Full Text Available Background and Aim: Recently, researchers have increasingly turned to study the relation between stuttering and syntactic complexity. This study investigates the effect of syntactic complexity on theamount of speech dysfluency in stuttering Persian-speaking children and adults in conversational speech. The obtained results can pave the way to a better understanding of stuttering in children andadults, and finding more appropriate treatments.Methods: In this cross-sectional study, the participants were 15 stuttering adult Persian-speakers, older than 15 years, and 15 stuttering child Persian-speakers of 4-6 years of age. In this study, first a 30 minute sample of the spontaneous speech of the participants was provided. Then the utterances of each person were studied in respect to the amount of dysfluency and syntactic complexity. The obtained information was analyzed using paired samples t-test.Results: In both groups of stuttering children and adults, there was a significant difference between the amount of dysfluency of simple and complex sentences (p<0.05.Conclusion: The results of this study showed that an increase in syntactic complexity in conversational speech, increased the amount of dysfluency in stuttering children and adults. Moreover,as a result of increase of syntactic complexity, dysfluency had a greater increase in stuttering children than stuttering adults.
Vista, C. B.; Satriawan, C. H.; Lestari, D. P.; Widyantoro, D. H.
The performance of an automatic speech recognition system is affected by differences in speech style between the data the model is originally trained upon and incoming speech to be recognized. In this paper, the usage of GMM-HMM acoustic models for specific speech styles is investigated. We develop two systems for the experiments; the first employs a speech style classifier to predict the speech style of incoming speech, either spontaneous or dictated, then decodes this speech using an acoustic model specifically trained for that speech style. The second system uses both acoustic models to recognise incoming speech and decides upon a final result by calculating a confidence score of decoding. Results show that training specific acoustic models for spontaneous and dictated speech styles confers a slight recognition advantage as compared to a baseline model trained on a mixture of spontaneous and dictated training data. In addition, the speech style classifier approach of the first system produced slightly more accurate results than the confidence scoring employed in the second system.
Bryant, Gregory A.
Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…
Raaijmakers, S.; Truong, K.P.
We developed acoustic and lexical classifiers, based on a boosting algorithm, to assess the separability on arousal and valence dimensions in spontaneous emotional speech. The spontaneous emotional speech data was acquired by inviting subjects to play a first-person shooter video game. Our acoustic
Charlop, M H; Milstein, J P
We assessed the effects of video modeling on acquisition and generalization of conversational skills among autistic children. Three autistic boys observed videotaped conversations consisting of two people discussing specific toys. When criterion for learning was met, generalization of conversational skills was assessed with untrained topics of conversation; new stimuli (toys); unfamiliar persons, siblings, and autistic peers; and other settings. The results indicated that the children learned through video modeling, generalized their conversational skills, and maintained conversational speech over a 15-month period. Video modeling shows much promise as a rapid and effective procedure for teaching complex verbal skills such as conversational speech. PMID:2793634
Hussmann, Katja; Grande, Marion; Meffert, Elisabeth; Christoph, Swetlana; Piefke, Martina; Willmes, Klaus; Huber, Walter
Although generally accepted as an important part of aphasia assessment, detailed analysis of spontaneous speech is rarely carried out in clinical practice mostly due to time limitations. The Aachener Sprachanalyse (ASPA; Aachen Speech Analysis) is a computer-assisted method for the quantitative analysis of German spontaneous speech that allows for…
Full Text Available Objective: recently, researchers have increasingly turned to study the relation between stuttering and utterance length. This study investigates the effect of utterance length on the amount of speech dysfluency in stuttering Persian-speaking children and adults in conversational speech. The obtained results can pave the way to reach a better understanding of stuttering of child and adults, as well as finding more appropriate treatments. Materials & Methods: in this descriptive- analysis study, the participants were 15 stuttering Persian- speaker adults, upper from 15 years old, and 15 stuttering Persian- speaker children in the age range of 4-6. In this study, first 30 minutes sample of adults and child's spontaneous speech was provided and then utterances of each person studied for the amount of dysfluency and utterance length. The obtained information intered to computer via spss software and analyzed using paired T test. Results: In both groups of stuttering children and adults, with increase of utterance length, there was a significant increase in the amount of dysfluency. Conclusion: The results of this study showed that by increase of utterance length at the spontaneous speech level, stuttering children and adults had more dysfluency amount. Also, by increase of utterance length, dysfluency amount of stuttering children and adults increased samely.
Lindberg, Søren Østergaard; Hansen, Sidsel; Nielsen, Tonny
Background We studied all patients admitted to hospital with first onset atrial fibrillation (AF) to determine the probability of spontaneous conversion to sinus rhythm and to identify factors predictive of such a conversion. Methods and Results We retrospectively reviewed charts of 438...
Booz, Jaime A.
Although many studies have examined acoustic and sociolinguistic differences between male and female speech, the relationship between talker speaking style and perceived gender has not yet been explored. The present study attempts to determine whether clear speech, a style adopted by talkers who perceive some barrier to effective communication, shifts perceptions of femininity for male and female talkers. Much of our understanding of gender perception in voice and speech is based on sustained vowels or single words, eliminating temporal, prosodic, and articulatory cues available in more naturalistic, connected speech. Thus, clear and conversational sentence stimuli, selected from the 41 talkers of the Ferguson Clear Speech Database (Ferguson, 2004) were presented to 17 normal-hearing listeners, aged 18 to 30. They rated the talkers' gender using a visual analog scale with "masculine" and "feminine" endpoints. This response method was chosen to account for within-category shifts of gender perception by allowing nonbinary responses. Mixed-effects regression analysis of listener responses revealed a small but significant effect of speaking style, and this effect was larger for male talkers than female talkers. Because of the high degree of talker variability observed for talker gender, acoustic analyses of these sentences were undertaken to determine the relationship between acoustic changes in clear and conversational speech and perceived femininity. Results of these analyses showed that mean fundamental frequency (fo) and f o standard deviation were significantly correlated to perceived gender for both male and female talkers, and vowel space was significantly correlated only for male talkers. Speaking rate and breathiness measures (CPPS) were not significantly related for either group. Outcomes of this study indicate that adopting a clear speaking style is correlated with increases in perceived femininity. Although the increase was small, some changes associated
Full Text Available We propose an objective method to assess speech quality in the conversational context by taking into account the talking and listening speech qualities and the impact of delay. This approach is applied to the results of four subjective tests on the effects of echo, delay, packet loss, and noise. The dataset is divided into training and validation sets. For the training set, a multiple linear regression is applied to determine a relationship between conversational, talking, and listening speech qualities and the delay value. The multiple linear regression leads to an accurate estimation of the conversational scores with high correlation and low error between subjective and estimated scores, both on the training and validation sets. In addition, a validation is performed on the data of a subjective test found in the literature which confirms the reliability of the regression. The relationship is then applied to an objective level by replacing talking and listening subjective scores with talking and listening objective scores provided by existing objective models, fed by speech signals recorded during the subjective tests. The conversational model achieves high performance as revealed by comparison with the test results and with the existing standard methodology Ã¢Â€ÂœE-model,Ã¢Â€Â presented in the ITU-T (International Telecommunication Union Recommendation G.107.
Soto, Gloria; Clarke, Michael T
This study was conducted to evaluate the effects of a conversation-based intervention on the expressive vocabulary and grammatical skills of children with severe motor speech disorders and expressive language delay who use augmentative and alternative communication. Eight children aged from 8 to 13 years participated in the study. After a baseline period, a conversation-based intervention was provided for each participant, in which they were supported to learn and use linguistic structures essential for the formation of clauses and the grammaticalization of their utterances, such as pronouns, verbs, and bound morphemes, in the context of personally meaningful and scaffolded conversations with trained clinicians. The conversations were videotaped, transcribed, and analyzed using the Systematic Analysis of Language Transcripts (SALT; Miller & Chapman, 1991). Results indicate that participants showed improvements in their use of spontaneous clauses, and a greater use of pronouns, verbs, and bound morphemes. These improvements were sustained and generalized to conversations with familiar partners. The results demonstrate the positive effects of the conversation-based intervention for improving the expressive vocabulary and grammatical skills of children with severe motor speech disorders and expressive language delay who use augmentative and alternative communication. Clinical and theoretical implications of conversation-based interventions are discussed and future research needs are identified. https://doi.org/10.23641/asha.5150113.
Rosa S Gisladottir
Full Text Available The ability to recognize speech acts (verbal actions in conversation is critical for everyday interaction. However, utterances are often underspecified for the speech act they perform, requiring listeners to rely on the context to recognize the action. The goal of this study was to investigate the time-course of auditory speech act recognition in action-underspecified utterances and explore how sequential context (the prior action impacts this process. We hypothesized that speech acts are recognized early in the utterance to allow for quick transitions between turns in conversation. Event-related potentials (ERPs were recorded while participants listened to spoken dialogues and performed an action categorization task. The dialogues contained target utterances that each of which could deliver three distinct speech acts depending on the prior turn. The targets were identical across conditions, but differed in the type of speech act performed and how it fit into the larger action sequence. The ERP results show an early effect of action type, reflected by frontal positivities as early as 200 ms after target utterance onset. This indicates that speech act recognition begins early in the turn when the utterance has only been partially processed. Providing further support for early speech act recognition, actions in highly constraining contexts did not elicit an ERP effect to the utterance-final word. We take this to show that listeners can recognize the action before the final word through predictions at the speech act level. However, additional processing based on the complete utterance is required in more complex actions, as reflected by a posterior negativity at the final word when the speech act is in a less constraining context and a new action sequence is initiated. These findings demonstrate that sentence comprehension in conversational contexts crucially involves recognition of verbal action which begins as soon as it can.
Furlanis, Giovanni; Ridolfi, Mariana; Polverino, Paola; Menichelli, Alina; Caruso, Paola; Naccarato, Marcello; Sartori, Arianna; Torelli, Lucio; Pesavento, Valentina; Manganotti, Paolo
Aphasia is one of the most devastating stroke-related consequences for social interaction and daily activities. Aphasia recovery in acute stroke depends on the degree of reperfusion after thrombolysis or thrombectomy. As aphasia assessment tests are often time-consuming for patients with acute stroke, physicians have been developing rapid and simple tests. The aim of our study is to evaluate the improvement of language functions in the earliest stage in patients treated with thrombolysis and in nontreated patients using our rapid screening test. Our study is a single-center prospective observational study conducted at the Stroke Unit of the University Medical Hospital of Trieste (January-December 2016). Patients treated with thrombolysis and nontreated patients underwent 3 aphasia assessments through our rapid screening test (at baseline, 24 hours, and 72 hours). The screening test assesses spontaneous speech, oral comprehension of words, reading aloud and comprehension of written words, oral comprehension of sentences, naming, repetition of words and a sentence, and writing words. The study included 40 patients: 18 patients treated with thrombolysis and 22 nontreated patients. Both groups improved over time. Among all language parameters, spontaneous speech was statistically significant between 24 and 72 hours (P value = .012), and between baseline and 72 hours (P value = .017). Our study demonstrates that patients treated with thrombolysis experience greater improvement in language than the nontreated patients. The difference between the 2 groups is increasingly evident over time. Moreover, spontaneous speech is the parameter marked by the greatest improvement. Copyright © 2018 National Stroke Association. Published by Elsevier Inc. All rights reserved.
Larson, Martha; Ordelman, Roeland; Metze, Florian; Kraaij, Wessel; de Jong, Franciska
The spoken word is a valuable source of semantic information. Techniques that exploit the spoken word by making use of speech recognition or spoken audio analysis hold clear potential for improving multimedia search. Nonetheless, speech technology remains underexploited by systems that provide
Bangerter, Adrian; Mayor, Eric; Pekarek Doehler, Simona
Shift handovers in nursing units involve formal transmission of information and informal conversation about non-routine events. Informal conversation often involves telling stories. Direct reported speech (DRS) was studied in handover storytelling in two nursing care units. The study goal is to contribute to a better understanding of conversation…
Full Text Available The possibility of observing spontaneous parametric down-conversion in doped nonlinear crystals at low temperatures, which would be useful for combining heralded single-photon sources and quantum memories, is studied theoretically. The ordinary refractive index of a lithium niobate crystal doped with magnesium oxide LiNbO3:MgO is measured at liquid nitrogen and helium temperatures. On the basis of the experimental data, the coefficients of the Sellmeier equation are determined for the temperatures from 5 to 300 K. In addition, a poling period of the nonlinear crystal has been calculated for observing type-0 spontaneous parametric down-conversion (ooo-synchronism at the liquid helium temperature under pumping at the wavelength of λp = 532 nm and emission of the signal field at the wavelength of λs = 794 nm, which corresponds to the resonant absorption line of Tm3+ doped ions.
Truong, Khiet Phuong; Trouvain, Jürgen
Existing laughter annotations provided with several publicly available conversational speech corpora (both multiparty and dyadic conversations) were investigated and compared. We discuss the possibilities and limitations of these rather coarse and shallow laughter annotations. There are definition
Full Text Available We investigate the effect of the down-conversion angle between the signal and idler beams in spontaneous parametric down-conversion on the bandwidth of the modal spectrum (Schmidt number) of the down-converted quantum state. For this purpose, we...
Full Text Available Many phonological processes can be affected by segmental context spanning word boundaries, which often lead to variable outcomes. This paper tests the idea that some of this variability can be explained by reference to production planning. We examine coronal stop deletion (CSD, a variable process conditioned by preceding and upcoming phonological context, in a corpus of spontaneous British English speech, as a means of investigating a number of variables associated with planning: Prosodic boundary strength, word frequency, conditional probability of the following word, and speech rate. From the perspective of production planning, (1 prosodic boundaries should affect deletion rate independently of following context; (2 given the locality of production planning, the effect of the following context should decrease at stronger prosodic boundaries; and (3 other factors affecting planning scope should modulate the effect of upcoming phonological material above and beyond the modulating effect of prosodic boundaries. We build a statistical model of CSD realization, using pause length as a quantitative proxy for boundary strength, and find support for these predictions. These findings are compatible with the hypothesis that the locality of production planning constrains variability in speech production, and have practical implications for work on CSD and other variable processes.
Voncken, Marisol J; Bögels, Susan M
Cognitive models emphasize that patients with social anxiety disorder (SAD) are mainly characterized by biased perception of their social performance. In addition, there is a growing body of evidence showing that SAD patients suffer from actual deficits in social interaction. To unravel what characterizes SAD patients the most, underestimation of social performance (defined as the discrepancy between self-perceived and observer-perceived social performance), or actual (observer-perceived) social performance, 48 patients with SAD and 27 normal control participants were observed during a speech and conversation. Consistent with the cognitive model of SAD, patients with SAD underestimated their social performance relative to control participants during the two interactions, but primarily during the speech. Actual social performance deficits were clearly apparent in the conversation but not in the speech. In conclusion, interactions that pull for more interpersonal skills, like a conversation, elicit more actual social performance deficits whereas, situations with a performance character, like a speech, bring about more cognitive distortions in patients with SAD.
Dodd, M C; Nikolopoulos, T P; Totten, C; Cope, Y; O'Donoghue, G M
To assess performance of Nucleus 22 mini system pediatric users converted from the Spectra 22 body-worn to the ESPrit 22 ear-level speech processor using aided thresholds and speech discrimination measures before and after the conversion. Spectra 22 body-worn speech processor users were chosen using preselection criteria (stable map, ability to report on the quality of the signal, no device problems). The subjects underwent tuning, map conversion, fitting of the ESPrit 22, and aided soundfield threshold and speech discrimination testing. The first 100 consecutive conversions are analyzed in this study. Fifty children (50%) were female, and 50 (50%) were male. The average age at implantation was 4.6 years (median 4.3 years, range 1.7 to 11 years). The average age of fitting the ear level speech processor was 11.1 years (median 11 years, range 6.2 to 18.2 years). Tertiary referral pediatric cochlear implant center in the United Kingdom. Of the 100 fittings attempted, all Spectra 22 maps could to be converted for use in the ESPrit 22. Of these 100 fittings, 44 were straightforward with no adjustment to map parameters being required, and 56 needed rate reductions and other map adjustments to achieve the conversion. The difference of the mean thresholds before and after the conversion did not exceed 2 dB across the frequencies studied (0.5-4 kHz). In 95% of the cases, the differences were less than 9 dB(A). With regard to speech discrimination testing, the mean threshold before the conversion was 53.4 dB and after the conversion 52.7 dB. Of the 100 conversions, only five children stopped using the ESPrit 22 despite fitting being achieved. Conversion from the Spectra 22 body worn to the ESPrit 22 ear level speech processor was found to be feasible in all the 100 cases studied. Only a minority (5%) of children chose not to use the ear level speech processor suggesting that children and parents were satisfied from the conversion.
Preisig, Basil C; Eggenberger, Noëmi; Zito, Giuseppe; Vanbellingen, Tim; Schumacher, Rahel; Hopfner, Simone; Nyffeler, Thomas; Gutbrod, Klemens; Annoni, Jean-Marie; Bohlhalter, Stephan; Müri, René M
Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or listener) and fixation of body parts. We hypothesized that aphasic patients, who are restricted in verbal comprehension, adapt their visual exploration strategies. Sixteen aphasic patients and 23 healthy control subjects participated in the study. Visual exploration behaviour was measured by means of a contact-free infrared eye-tracker while subjects were watching videos depicting spontaneous dialogues between two individuals. Cumulative fixation duration and mean fixation duration were calculated for the factors co-speech gesture (present and absent), gaze direction (to the speaker or to the listener), and region of interest (ROI), including hands, face, and body. Both aphasic patients and healthy controls mainly fixated the speaker's face. We found a significant co-speech gesture × ROI interaction, indicating that the presence of a co-speech gesture encouraged subjects to look at the speaker. Further, there was a significant gaze direction × ROI × group interaction revealing that aphasic patients showed reduced cumulative fixation duration on the speaker's face compared to healthy controls. Co-speech gestures guide the observer's attention towards the speaker, the source of semantic input. It is discussed whether an underlying semantic processing deficit or a deficit to integrate audio-visual information may cause aphasic patients to explore less the speaker's face. Copyright © 2014 Elsevier Ltd. All rights reserved.
Abuom, Tom O.; Bastiaanse, Roelien
Most studies on spontaneous speech of individuals with agrammatism have focused almost exclusively on monolingual individuals. There is hardly any previous research on bilinguals, especially of structurally different languages; and none on characterization of agrammatism in Swahili. The current
Dassa, Ayelet; Amir, Dorit
Language deficits in people with Alzheimer's disease (AD) manifest, among other things, in a gradual deterioration of spontaneous speech. People with AD tend to speak less as the disease progresses and their speech becomes confused. However, the ability to sing old tunes sometimes remains intact throughout the disease. The purpose of this study was to explore the role of singing familiar songs in encouraging conversation among people with middle to late stage AD. Six participants attended group music therapy sessions over a one-month period. Using content analysis, we qualitatively examined transcriptions of verbal and sung content during 8 group sessions for the purpose of understanding the relationship between specific songs and conversations that occurred during and following group singing. Content analysis revealed that songs from the participants' past-elicited memories, especially songs related to their social and national identity. Analyses also indicated that conversation related to the singing was extensive and the act of group singing encouraged spontaneous responses. After singing, group members expressed positive feelings, a sense of accomplishment, and belonging. Carefully selecting music from the participants' past can encourage conversation. Considering the failure in spontaneous speech in people with middle to late stage AD, it is important to emphasize that group members' responses to each other occurred spontaneously without the researcher's encouragement. © the American Music Therapy Association 2014. All rights reserved. For permissions, please e-mail: email@example.com.
Su Myat Mon
Full Text Available Abstract Speech is an easiest way to communicate with each other. Speech processing is widely used in many applications like security devices household appliances cellular phones ATM machines and computers. The human computer interface has been developed to communicate or interact conveniently for one who is suffering from some kind of disabilities. Speech-to-Text Conversion STT systems have a lot of benefits for the deaf or dumb people and find their applications in our daily lives. In the same way the aim of the system is to convert the input speech signals into the text output for the deaf or dumb students in the educational fields. This paper presents an approach to extract features by using Mel Frequency Cepstral Coefficients MFCC from the speech signals of isolated spoken words. And Hidden Markov Model HMM method is applied to train and test the audio files to get the recognized spoken word. The speech database is created by using MATLAB.Then the original speech signals are preprocessed and these speech samples are extracted to the feature vectors which are used as the observation sequences of the Hidden Markov Model HMM recognizer. The feature vectors are analyzed in the HMM depending on the number of states.
Full Text Available We present a simple way of simulating Spontaneous parametric down-conversion (SPDC) by modulating a classical laser beam with two spatial light modulators (SLM) through a back projection setup. This system has the advantage of having very high...
Tavano, Alessandro; Pesarin, Anna; Murino, Vittorio; Cristani, Marco
Individuals with Asperger syndrome/High Functioning Autism fail to spontaneously attribute mental states to the self and others, a life-long phenotypic characteristic known as mindblindness. We hypothesized that mindblindness would affect the dynamics of conversational interaction. Using generative models, in particular Gaussian mixture models and observed influence models, conversations were coded as interacting Markov processes, operating on novel speech/silence patterns, termed Steady Conversational Periods (SCPs). SCPs assume that whenever an agent's process changes state (e.g., from silence to speech), it causes a general transition of the entire conversational process, forcing inter-actant synchronization. SCPs fed into observed influence models, which captured the conversational dynamics of children and adolescents with Asperger syndrome/High Functioning Autism, and age-matched typically developing participants. Analyzing the parameters of the models by means of discriminative classifiers, the dialogs of patients were successfully distinguished from those of control participants. We conclude that meaning-free speech/silence sequences, reflecting inter-actant synchronization, at least partially encode typical and atypical conversational dynamics. This suggests a direct influence of theory of mind abilities onto basic speech initiative behavior.
Full Text Available Left-hemisphere stroke patients suffering from language and speech disorders are often able to sing entire pieces of text fluently. This finding has inspired a number of melody-based rehabilitation programs – most notable among them a treatment known as Melodic Intonation Therapy – as well as two fundamental research questions. When the experimental design focuses on one point in time (cross section, one may determine whether or not singing has an immediate effect on syllable production in patients with language and speech disorders. When the design focuses on changes over several points in time (longitudinal section, one may gain insight as to whether or not singing has a long-term effect on language and speech recovery. The current work addresses both of these questions with two separate experiments that investigate the interplay of melody, rhythm and lyric type in 32 patients with non-fluent aphasia and apraxia of speech (Stahl et al., 2011; Stahl et al., 2013. Taken together, the experiments deliver three main results. First, singing and rhythmic pacing proved to be equally effective in facilitating immediate syllable production and long-term language and speech recovery. Controlling for various influences such as prosody, syllable duration and phonetic complexity, the data did not reveal any advantage of singing over rhythmic speech. This result was independent of lesion size and lesion location in the patients. Second, patients with extensive left-sided basal ganglia lesions produced more correct syllables when their speech was paced by rhythmic drumbeats. This observation is consistent with the idea that regular auditory cues may partially compensate for corticostriatal damage and thereby improve speech-motor planning (Grahn & Watson, 2013. Third, conversational speech formulas and well-known song lyrics yielded higher rates of correct syllable production than novel word sequences – whether patients were singing or speaking
Van der Geld, Pieter; Oosterveld, Paul; Kuijpers-Jagtman, Anne Marie
The aims of this study were to analyse lip line heights and age effects in an adult male population during spontaneous smiling, speech, and tooth display in the natural rest position and to determine whether lip line height follows a consistent pattern during these different functions. The sample consisted of 122 randomly selected male participants from three age cohorts (20-25 years, 35-40 years, and 50-55 years). Lip line heights were measured with a digital videographic method for smile analysis, which had previously been tested and found reliable. Statistical analysis of the data was carried out using correlation analysis, analysis of variance, and Tukey's post hoc tests. Maxillary lip line heights during spontaneous smiling were generally higher in the premolar area than at the anterior teeth. The aesthetic zone in 75 per cent of the participants included all maxillary teeth up to the first molar. Coherence in lip line heights during spontaneous smiling, speech, and tooth display in the natural rest position was confirmed by significant correlations. In older subjects, maxillary lip line heights decreased significantly in all situations. Lip line heights during spontaneous smiling were reduced by approximately 2 mm. In older participants, the mandibular lip line heights also changed significantly and teeth were displayed less during spontaneous smiling. Mandibular tooth display in the rest position increased significantly. Upper lip length increased significantly by almost 4 mm in older subjects, whereas upper lip elevation did not change significantly. The significant increasing lip coverage of the maxillary teeth indicates that the effects of age should be included in orthodontic treatment planning.
Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris
A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production
Full Text Available Individuals with Asperger syndrome/High Functioning Autism fail to spontaneously attribute mental states to the self and others, a life-long phenotypic characteristic known as mindblindness. We hypothesized that mindblindness would affect the dynamics of conversational interaction. Using generative models, in particular Gaussian mixture models and observed influence models, conversations were coded as interacting Markov processes, operating on novel speech/silence patterns, termed Steady Conversational Periods (SCPs. SCPs assume that whenever an agent's process changes state (e.g., from silence to speech, it causes a general transition of the entire conversational process, forcing inter-actant synchronization. SCPs fed into observed influence models, which captured the conversational dynamics of children and adolescents with Asperger syndrome/High Functioning Autism, and age-matched typically developing participants. Analyzing the parameters of the models by means of discriminative classifiers, the dialogs of patients were successfully distinguished from those of control participants. We conclude that meaning-free speech/silence sequences, reflecting inter-actant synchronization, at least partially encode typical and atypical conversational dynamics. This suggests a direct influence of theory of mind abilities onto basic speech initiative behavior.
Geld, P.A.A.M. van der; Oosterveld, P.; Kuijpers-Jagtman, A.M.
The aims of this study were to analyse lip line heights and age effects in an adult male population during spontaneous smiling, speech, and tooth display in the natural rest position and to determine whether lip line height follows a consistent pattern during these different functions. The sample
Garay-Palmett, Karina; U'Ren, Alfred B.; Rangel-Rojo, Raul
We study the process of copolarized spontaneous four-wave mixing in single-mode optical fibers, with an emphasis on an analysis of the conversion efficiency. We consider both the monochromatic-pump and pulsed-pump regimes, as well as both the degenerate-pump and nondegenerate-pump configurations. We present analytical expressions for the conversion efficiency, which are given in terms of double integrals. In the case of pulsed pumps we take these expressions to closed analytical form with the help of certain approximations. We present results of numerical simulations, and compare them to values obtained from our analytical expressions, for the conversion efficiency as a function of several key experimental parameters.
Rao, B.S.; Murthy, M.S.S.
Spontaneous and radiation induced gene conversion to arginine independence was studied in a heteroallelic diploid strain of yeast Saccharomyces cerevisiae BZ 34. When stationary phase cells were incubated in phosphate buffer (pH 7 ) at 30 0 C under aerated condition for 48 hours, the conversion frequency increased by a factor of about 1000 times the background. This was found to be so even when the cells were incubated in saline (0.85%) or distilled water. Various conditions influencing this enhancement have been investigated. Conversion frequency enhancement was not significant under anoxic conditions and was absent at low temperatures and in log phase cells. Caffeine could inhibit this enhancement when present in the suspension medium. These results can be explained on the basis of the induction of meiosis in cells held in buffer. Microscopic examination confirmed this view. Under conditions not favourable for the onset of meiosis there is no significant enhancement in conversion frequency. In stationary phase cells exposed to series of gamma doses, the conversion frequency increases with dose. Post irradiation incubation in buffer further increases the conversion frequency. However, the increase expressed as the ratio of the conversion frequency on buffer holding to that on immediate plating decreased with increasing dose. This decrease in enhancement with increasing dose may be due to the dose dependent inhibition of meiosis. (author)
S. Hamidreza Kasaei
Full Text Available In this paper, we propose design and initial implementation of a robust system which can automatically translates voice into text and text to sign language animations. Sign Language
Translation Systems could significantly improve deaf lives especially in communications, exchange of information and employment of machine for translation conversations from one language to another has. Therefore, considering these points, it seems necessary to study the speech recognition. Usually, the voice recognition algorithms address three major challenges. The first is extracting feature form speech and the second is when limited sound gallery are available for recognition, and the final challenge is to improve speaker dependent to speaker independent voice recognition. Extracting feature form speech is an important stage in our method. Different procedures are available for extracting feature form speech. One of the commonest of which used in speech
recognition systems is Mel-Frequency Cepstral Coefficients (MFCCs. The algorithm starts with preprocessing and signal conditioning. Next extracting feature form speech using Cepstral coefficients will be done. Then the result of this process sends to segmentation part. Finally recognition part recognizes the words and then converting word recognized to facial animation. The project is still in progress and some new interesting methods are described in the current report.
Ana Margarida Belém Nunes
Full Text Available The present article is a symbiosis of two previous studies made by the author on European Portuguese Emotional Speech. It is known that nonverbal vocal expressions, such as laughter, vocalizations and, for instance, screams are an important source of emotional cues in social contexts (Lima et al., 2013. In social contexts we get information’s about others emotional states also by facial and corporal expressions, touch and voice cues, (Lima et al., 2013 & Cowie et al, 2003. Nevertheless most of the existent research on emotion is based on simulated emotions that are induced in laboratory and/or produced by professional actors. In this study in particular, it is proposed to explore how much and in which voice related parameters spontaneous and acted speech diverge. On the other hand, this study will help to obtain data on emotional speech and to describe the expression of emotions, by voice alone, for the first time for European Portuguese. Analyses are mainly focused on parameters that are generally accepted as more directly related with voice quality like F0; jitter; shimmer and HNR (Lima et all, 2013; Tiovanen et al, 2006; Drioli et all, 2003. Given the scarcity of studies on voice quality in European Portuguese, it is important to highlight that this work presents original corpora specifically created for the presented research: a small corpus for spontaneous emotional speech and Feeltrace system to provide the necessary annotation and interpretation of emotions; a second corpus for acted emotions produced by a professional actor. It is particularly important to highlight that was found that European Portuguese presents some specificities on the values obtained for neutral expression, sadness and joy, that do not occur in other languages.
Penin, A. N.; Reutova, T. A.; Sergienko, A. V.
An experiment on one-photon state localization in space using a correlation technique in Spontaneous Parametric Down Conversion (SPDC) process is discussed. Results of measurements demonstrate an idea of the Einstein-Podolsky-Rosen (EPR) paradox for coordinate and momentum variables of photon states. Results of the experiment can be explained with the help of an advanced wave technique. The experiment is based on the idea that two-photon states of optical electromagnetic fields arising in the nonlinear process of the spontaneous parametric down conversion (spontaneous parametric light scattering) can be explained by quantum mechanical theory with the help of a single wave function.
Penin, A.N.; Reutova, T.A.; Sergienko, A.V.
An experiment on one-photon state localization in space using a correlation technique in Spontaneous Parametric Down Conversion (SPDC) process is discussed. Results of measurements demonstrate an idea of the Einstein-Podolsky-Rosen (EPR) paradox for coordinate and momentum variables of photon states. Results of the experiment can be explained with the help of an advanced wave technique. The experiment is based on the idea that two-photon states of optical electromagnetic fields arising in the nonlinear process of the spontaneous parametric down conversion (spontaneous parametric light scattering) can be explained by quantum mechanical theory with the help of a single wave function
Trouvain, Jürgen; Truong, Khiet Phuong
A crucial feature of spoken interaction is joint activity at various linguistic and phonetic levels that requires fine-tuned coordination. This study gives a brief overview on how laughing in conversational speech can be phonetically analysed as partner-specific adaptation and joint vocal action.
Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris
Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to speech entrainment. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during speech entrainment versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of speech entrainment to improve speech production and may help select patients for speech entrainment treatment. PMID:25989443
Nguyen Van Han
Full Text Available Discourse analysis, as Murcia and Olshtain (2000 assume, is a vast study of language in use that extends beyond sentence level, and it involves a more cognitive and social perspective on language use and communication exchanges. Holding a wide range of phenomena about language with society, culture and thought, discourse analysis contains various approaches: speech act, pragmatics, conversation analysis, variation analysis, and critical discourse analysis. Each approach works in its different domain to discourse. For one dimension, it shares the same assumptions or general problems in discourse analysis with the other approaches: for instance, the explanation on how we organize language into units beyond sentence boundaries, or how language is used to convey information about the world, ourselves and human relationships (Schiffrin 1994: viii. For other dimensions, each approach holds its distinctive characteristics contributing to the vastness of discourse analysis. This paper will mainly discuss two approaches to discourse analysis- conversation analysis and speech act theory- and will attempt to point out some similarities as well as contrasting features between the two approaches, followed by a short reflection on their strengths and weaknesses in the essence of each approach. The organizational and discourse features in the exchanges among three teachers at the College of Finance and Customs in Vietnam will be analysed in terms of conversation analysis and speech act theory.
Avenhaus, M.; Chekhova, M. V.; Krivitsky, Leonid
We study the spectral properties of spontaneous parametric down-conversion (SPDC) in a periodically poled waveguided structure of potassium-titanyl-phosphate (KTP) crystal pumped by ultrashort pulses. Our theoretical analysis reveals a strongly entangled and asymmetric structure of the two...
Conroy, Paul; Sage, Karen; Ralph, Matt Lambon
Background: Naming accuracy for nouns and verbs in aphasia can vary across different elicitation contexts, for example, simple picture naming, composite picture description, narratives, and conversation. For some people with aphasia, naming may be more accurate to simple pictures as opposed to naming in spontaneous, connected speech; for others,…
Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike
Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in "healthy" communication direct speech constructions contribute to the liveliness, and indirectly to the comprehensibility, of speech.…
Full Text Available Objective: Gestures of the hands and arms have long been observed to accompany speech in spontaneous conversation. However, the way in which these two modes of expression are related in production is not yet fully understood. So, the present study aims to investigate the spontaneous gestures that accompany speech in adults who stutter in comparison to fluent controls. Materials & Methods: In this cross-sectional and comparative research, ten adults who stutter were selected randomly from speech and language pathology clinics and compared with ten healthy persons as control group who were matched with stutterers according to sex, age and education. The cartoon story-retelling task used to elicit spontaneous gestures that accompany speech. Participants were asked to watch the animation carefully and then retell the storyline in as much detail as possible to a listener sitting across from him or her and his or her narration was video recorded simultaneously. Then recorded utterances and gestures were analyzed. The statistical methods such as Kolmogorov- Smirnov and Independent t-test were used for data analyzing. Results: The results indicated that stutterers in comparison to controls in average use fewer iconic gestures in their narration (P=0.005. Also, stutterers in comparison to controls in average use fewer iconic gestures per each utterance and word (P=0.019. Furthermore, the execution of gesture production during moments of dysfluency revealed that more than 70% of the gestures produced with stuttering were frozen or abandoned at the moment of dysfluency. Conclusion: It seems gesture and speech have such an intricate and deep association that show similar frequency and timing patterns and move completely parallel to each other in such a way that deficit in speech results in deficiency in hand gesture.
Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike
Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in 'healthy' communication direct speech constructions contribute to the liveliness, and indirectly to
Full Text Available Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement.
Full Text Available The work presented here focuses on how the structuring of spontaneous conversation is related to prosody. How the prosodic parameters, including the break, they fit into a logical demarcation, establishment and organization of parts of speech? To what extent are they relevant to the level of dis-course, intersubjective and interactional? Description of these data allowed us to understand different ways of structuring Moroccan spooked Arabic, prosodic point of view, but also in terms of thematic and interactional, without ever losing sight of the conversational speech is the location of enunciation issues to establish meaning. In the situation of a real conversation, the silence is an opportunity for a transfer of initiative or by speaking one or other of these contacts. On the other hand, the speaker who wishes to keep talking should avoid the use of the silent pause and use in preference to the filled pause. We hypothesize that one role of the silent pause is precisely to manage this aspect of the intersubjective and interactional space and to indicate whether a segment aims to allow the speaker to keep talking or given it to the caller.
Hengst, Julie A; Frame, Simone R; Neuman-Stritzel, Tiffany; Gannaway, Rachel
Reported speech, wherein one quotes or paraphrases the speech of another, has been studied extensively as a set of linguistic and discourse practices. Researchers agree that reported speech is pervasive, found across languages, and used in diverse contexts. However, to date, there have been no studies of the use of reported speech among individuals with aphasia. Grounded in an interactional sociolinguistic perspective, the study presented here documents and analyzes the use of reported speech by 7 adults with mild to moderately severe aphasia and their routine communication partners. Each of the 7 pairs was videotaped in 4 everyday activities at home or around the community, yielding over 27 hr of conversational interaction for analysis. A coding scheme was developed that identified 5 types of explicitly marked reported speech: direct, indirect, projected, indexed, and undecided. Analysis of the data documented reported speech as a common discourse practice used successfully by the individuals with aphasia and their communication partners. All participants produced reported speech at least once, and across all observations the target pairs produced 400 reported speech episodes (RSEs), 149 by individuals with aphasia and 251 by their communication partners. For all participants, direct and indirect forms were the most prevalent (70% of RSEs). Situated discourse analysis of specific episodes of reported speech used by 3 of the pairs provides detailed portraits of the diverse interactional, referential, social, and discourse functions of reported speech and explores ways that the pairs used reported speech to successfully frame talk despite their ongoing management of aphasia.
Fu, Szu-Wei; Li, Pei-Chun; Lai, Ying-Hui; Yang, Cheng-Chien; Hsieh, Li-Chun; Tsao, Yu
Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient's speech may be distorted and difficult to understand. To overcome this problem, VC methods can be applied to convert the distorted speech such that it is clear and more intelligible. To design an effective VC method, two key points must be considered: 1) the amount of training data may be limited (because speaking for a long time is usually difficult for postoperative patients); 2) rapid conversion is desirable (for better communication). Methods: We propose a novel joint dictionary learning based non-negative matrix factorization (JD-NMF) algorithm. Compared to conventional VC techniques, JD-NMF can perform VC efficiently and effectively with only a small amount of training data. Results: The experimental results demonstrate that the proposed JD-NMF method not only achieves notably higher short-time objective intelligibility (STOI) scores (a standardized objective intelligibility evaluation metric) than those obtained using the original unconverted speech but is also significantly more efficient and effective than a conventional exemplar-based NMF VC method. Conclusion: The proposed JD-NMF method may outperform the state-of-the-art exemplar-based NMF VC method in terms of STOI scores under the desired scenario. Significance: We confirmed the advantages of the proposed joint training criterion for the NMF-based VC. Moreover, we verified that the proposed JD-NMF can effectively improve the speech intelligibility scores of oral surgery patients. Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient
Wallesch, C W; Brunner, R J; Seemüller, E
Repetitive phenomena in spontaneous speech were investigated in 30 patients with chronic infarctions of the left hemisphere which included Broca's and/or Wernicke's area and/or the basal ganglia. Perseverations, stereotypies, and echolalias occurred with all types of brain lesions, automatisms and recurring utterances only with those patients, whose infarctions involved Wernicke's area and basal ganglia. These patients also showed more echolalic responses. The results are discussed in view of the role of the basal ganglia as motor program generators.
Full Text Available Observable disruptions in spontaneous speech are among the most prominent characteristics of aphasia. The potential of language production analyses in discourse contexts to reveal subtle language deficits has been progressively exploited, becoming essential for diagnosing language disorders (Vermeulen et al., 1989; Goodglass et al., 2000; Prins and Bastiaanse, 2004; Jaecks et al., 2012. Based on previous studies, short and/or fragmentary utterances, and consequently a shorter MLU, are expected in the speech of individuals with aphasia, together with a large proportions of incomplete sentences and a limited use of embeddings. Fewer verbs with a lower diversity (lower type/token ratio and fewer internal arguments are also predicted, as well as a low proportion of inflected verbs (Bastiaanse and Jonkers, 1998. However, this profile comes mainly from the study of individuals with prototypical aphasia types, mainly Broca’s aphasia, raising the question of how accurate spontaneous speech is to pinpoint deficits in individuals with less clear diagnoses. To address this question, we present the results of a spontaneous speech analysis of 25 Spanish-speaking subjects: 10 individuals with aphasia (IWAs, 7 male and 3 female (mean age: 64.2 in neural stable condition (> 1 year post-onset who suffered from a single CVA in the left hemisphere (Rosell, 2005, and 15 non-brain-damaged matched speakers (NBDs. In the aphasia group, 7 of the participants were diagnosed as non-fluent (1 motor aphasia, 4 transcortical motor aphasia or motor aphasia with signs of transcorticality, 2 mixed aphasia with motor predominance, and 3 of them as fluent (mixed aphasia with anomic predominance. The protocol for data collection included semi-standardized interviews, in which participants were asked 3 questions evoking past, present, and future events (last job, holidays, and hobbies. 300 words per participant were analyzed. The MLU over the total 300 words revealed a decreased
Kwon, Osung; Ra, Young-Sik; Kim, Yoon-Ho
Coherence properties of the photon pair generated via spontaneous parametric down-conversion pumped by a multi-mode cw diode laser are studied with a Mach-Zehnder interferometer. Each photon of the pair enters a different input port of the interferometer and the biphoton coherence properties are studied with a two-photon detector placed at one output port. When the photon pair simultaneously enters the interferometer, periodic recurrence of the biphoton de Broglie wave packet is observed, closely resembling the coherence properties of the pump diode laser. With non-zero delays between the photons at the input ports, biphoton interference exhibits the same periodic recurrence but the wave packet shapes are shown to be dependent on both the input delay as well as the interferometer delay. These properties could be useful for building engineered entangled photon sources based on diode laser-pumped spontaneous parametric down-conversion.
Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit.
Arnold, Denis; Tomaschek, Fabian; Sering, Konstantin; Lopez, Florence; Baayen, R Harald
Sound units play a pivotal role in cognitive models of auditory comprehension. The general consensus is that during perception listeners break down speech into auditory words and subsequently phones. Indeed, cognitive speech recognition is typically taken to be computationally intractable without phones. Here we present a computational model trained on 20 hours of conversational speech that recognizes word meanings within the range of human performance (model 25%, native speakers 20-44%), without making use of phone or word form representations. Our model also generates successfully predictions about the speed and accuracy of human auditory comprehension. At the heart of the model is a 'wide' yet sparse two-layer artificial neural network with some hundred thousand input units representing summaries of changes in acoustic frequency bands, and proxies for lexical meanings as output units. We believe that our model holds promise for resolving longstanding theoretical problems surrounding the notion of the phone in linguistic theory.
Petersen, Sidsel Rübner; Alkeskjold, Thomas Tanggaard; Olausson, Christina Bjarnal Thulin
Frequency conversion through spontaneous degenerate four wave mixing (FWM) is investigated in large mode area hybrid photonic crystal fibers. Different FWM processes are observed, phasematching between fiber modes of orthogonal polarization, intermodal phasematching across bandgaps, and intramodal...
Heinen, Esther; Birkholz, Peter; Willmes, Klaus; Neuschaefer-Rube, Christiane
To explore possible effects of tongue piercing on perceived speech quality. Using a quasi-experimental design, we analyzed the effect of tongue piercing on speech in a perception experiment. Samples of spontaneous speech and read speech were recorded from 20 long-term pierced and 20 non-pierced individuals (10 males, 10 females each). The individuals having a tongue piercing were recorded with attached and removed piercing. The audio samples were blindly rated by 26 female and 20 male laypersons and by 5 female speech-language pathologists with regard to perceived speech quality along 5 dimensions: speech clarity, speech rate, prosody, rhythm and fluency. We found no statistically significant differences for any of the speech quality dimensions between the pierced and non-pierced individuals, neither for the read nor for the spontaneous speech. In addition, neither length nor position of piercing had a significant effect on speech quality. The removal of tongue piercings had no effects on speech performance either. Rating differences between laypersons and speech-language pathologists were not dependent on the presence of a tongue piercing. People are able to perfectly adapt their articulation to long-term tongue piercings such that their speech quality is not perceptually affected.
Ferguson, Sarah Hargus; Morgan, Shae D.
Purpose: The purpose of this study is to examine talker differences for subjectively rated speech clarity in clear versus conversational speech, to determine whether ratings differ for young adults with normal hearing (YNH listeners) and older adults with hearing impairment (OHI listeners), and to explore effects of certain talker characteristics…
Ben-Aderet, Tobey; Gallego-Abenza, Mario; Reby, David; Mathevon, Nicolas
Pet-directed speech is strikingly similar to infant-directed speech, a peculiar speaking pattern with higher pitch and slower tempo known to engage infants' attention and promote language learning. Here, we report the first investigation of potential factors modulating the use of dog-directed speech, as well as its immediate impact on dogs' behaviour. We recorded adult participants speaking in front of pictures of puppies, adult and old dogs, and analysed the quality of their speech. We then performed playback experiments to assess dogs' reaction to dog-directed speech compared with normal speech. We found that human speakers used dog-directed speech with dogs of all ages and that the acoustic structure of dog-directed speech was mostly independent of dog age, except for sound pitch which was relatively higher when communicating with puppies. Playback demonstrated that, in the absence of other non-auditory cues, puppies were highly reactive to dog-directed speech, and that the pitch was a key factor modulating their behaviour, suggesting that this specific speech register has a functional value in young dogs. Conversely, older dogs did not react differentially to dog-directed speech compared with normal speech. The fact that speakers continue to use dog-directed with older dogs therefore suggests that this speech pattern may mainly be a spontaneous attempt to facilitate interactions with non-verbal listeners. © 2017 The Author(s).
Zhang Shuai-Shuai; Shu Qi; Sheng Yu-Bo; Zhou Lan
Entanglement purification is to distill the high quality entanglement from the low quality entanglement with local operations and classical communications. It is one of the key technologies in long-distance quantum communication. We discuss an entanglement purification protocol (EPP) with spontaneous parametric down conversion (SPDC) sources, in contrast to previous EPP with multi-copy mixed states, which requires ideal entanglement sources. We show that the SPDC source is not an obstacle for purification, but can benefit the fidelity of the purified mixed state. This EPP works for linear optics and is feasible in current experiment technology. (paper)
Represented speech refers to speech where we reference somebody. Represented speech is an important phenomenon in everyday conversation, health care communication, and qualitative research. This case will draw first from a case study on physicians’ workplace learning and second from a case study...... on nurses’ apprenticeship learning. The aim of the case is to guide the qualitative researcher to use own and others’ voices in the interview and to be sensitive to represented speech in everyday conversation. Moreover, reported speech matters to health professionals who aim to represent the voice...... of their patients. Qualitative researchers and students might learn to encourage interviewees to elaborate different voices or perspectives. Qualitative researchers working with natural speech might pay attention to how people talk and use represented speech. Finally, represented speech might be relevant...
Sergienko, A. V.; Shih, Y. H.; Pittman, T. B.; Rubin, M. H.
Simultaneous entanglement in spin and space-time of a two-photon quantum state generated in type-2 spontaneous parametric down-conversion is demonstrated by the observation of quantum interference with 98% visibility in a simple beam-splitter (Hanburry Brown-Twiss) anticorrelation experiment. The nonlocal cancellation of two-photon probability amplitudes as a result of this double entanglement allows us to demonstrate two different types of Bell's inequality violations in one experimental setup.
Hughes, V.W.; Ni, B.; Arnold, K.P.
We have searched for spontaneous conversion of muonium (M) to antimuonium M bar by a method involving detection of high-Z muonic X rays. A beam of M atoms with keV energies, produced by electron pickup by μ + from a foil, travels in vacuum and in a magnetic field-free environment to a high-Z target. The event signatures used were a double coincidence of two muonic X rays of the target material and a triple coincidence that also required detection of secondary electrons ejected when M strikes the target. Partial analysis of our 8 x 10 6 triggers indicates upper limits on the effective M →M bar four-dermion coupling constant of G/sub MM bar/ ≤ 30 G/sub F/ (90% C.L.) and G/sub MM bar/ ≤ 8 G/sub F/ (90% C.L.), respectively, from the two signatures. This begins to probe predictions of the left-right symmetric theory with a doubly-charged Higgs triplet
Full Text Available According to the body-specificity hypothesis, people with different bodily characteristics should form correspondingly different mental representations, even in highly abstract conceptual domains. In a previous test of this proposal, right- and left-handers were found to associate positive ideas like intelligence, attractiveness, and honesty with their dominant side and negative ideas with their non-dominant side. The goal of the present study was to determine whether 'body-specific' associations of space and valence can be observed beyond the laboratory in spontaneous behavior, and whether these implicit associations have visible consequences.We analyzed speech and gesture (3012 spoken clauses, 1747 gestures from the final debates of the 2004 and 2008 US presidential elections, which involved two right-handers (Kerry, Bush and two left-handers (Obama, McCain. Blind, independent coding of speech and gesture allowed objective hypothesis testing. Right- and left-handed candidates showed contrasting associations between gesture and speech. In both of the left-handed candidates, left-hand gestures were associated more strongly with positive-valence clauses and right-hand gestures with negative-valence clauses; the opposite pattern was found in both right-handed candidates.Speakers associate positive messages more strongly with dominant hand gestures and negative messages with non-dominant hand gestures, revealing a hidden link between action and emotion. This pattern cannot be explained by conventions in language or culture, which associate 'good' with 'right' but not with 'left'; rather, results support and extend the body-specificity hypothesis. Furthermore, results suggest that the hand speakers use to gesture may have unexpected (and probably unintended communicative value, providing the listener with a subtle index of how the speaker feels about the content of the co-occurring speech.
Full Text Available This paper investigates how and when interactional convergence is established by participants in conversation. We analyze sequences of storytelling using an original method which combines Conversation Analysis and a corpus-based approach. In storytelling, the participant in the position of listener is expected to produce either generic or specific responses adapted to the storyteller’s narrative. The listener’s behavior produced within the current activity, is a cue of his or her interactional alignment. We show here that the listener can produce a specific type of (aligned response which we term a reported speech utterance in echo. The participant who is not telling the story is nonetheless able to animate the characters, while reversing the usual asymmetric roles of storyteller and listener. The use of this device is a way for the listener to display a stance toward the events told by the storyteller. If the listener’s stance is congruent with that of the storyteller, this reveals a high degree of affiliation between the participants. We present seventeen extracts from a collection of 94 instances of echo reported speech which we examined using the concepts of alignment and affiliation in order to show how different kinds of convergent sequences are constructed. We demonstrate that this phenomenon is mainly used by the listener in order to align and affiliate with the storyteller by means of reformulative or overbidding Echo Reported Speech. We also show that in affiliative sequences, reported speech can be used by the listener in a humorous way in order to temporarily disalign. This disalignment constitutes a potential starting point for an oblique sequence, which, if accepted and continued by the storyteller, gives rise to a highly convergent sequence.
Vladimir V. Kornienko
Full Text Available We study a calibration scheme for terahertz wave nonlinear-optical detectors based on spontaneous parametric down-conversion. Contrary to the usual low wavelength pump in the green, we report here on the observation of spontaneous parametric down-conversion originating from an in-growth poled lithium niobate crystal pumped with a continuous wave 50 mW, 795 nm diode laser system, phase-matched to a terahertz frequency idler wave. Such a system is more compact and allows for longer poling periods as well as lower losses in the crystal. Filtering the pump radiation by a rubidium-87 vapor cell allowed the frequency-angular spectra to be obtained down to ∼0.5 THz or ∼1 nm shift from the pump radiation line. The presence of an amplified spontaneous emission “pedestal” in the diode laser radiation spectrum significantly hampers the observation of spontaneous parametric down-conversion spectra, in contrast to conventional narrowband gas lasers. Benefits of switching to longer pump wavelengths are pointed out, such as collinear optical-terahertz phase-matching in bulk crystals.
Kornienko, Vladimir V.; Kitaeva, Galiya Kh.; Sedlmeir, Florian; Leuchs, Gerd; Schwefel, Harald G. L.
We study a calibration scheme for terahertz wave nonlinear-optical detectors based on spontaneous parametric down-conversion. Contrary to the usual low wavelength pump in the green, we report here on the observation of spontaneous parametric down-conversion originating from an in-growth poled lithium niobate crystal pumped with a continuous wave 50 mW, 795 nm diode laser system, phase-matched to a terahertz frequency idler wave. Such a system is more compact and allows for longer poling periods as well as lower losses in the crystal. Filtering the pump radiation by a rubidium-87 vapor cell allowed the frequency-angular spectra to be obtained down to ˜0.5 THz or ˜1 nm shift from the pump radiation line. The presence of an amplified spontaneous emission "pedestal" in the diode laser radiation spectrum significantly hampers the observation of spontaneous parametric down-conversion spectra, in contrast to conventional narrowband gas lasers. Benefits of switching to longer pump wavelengths are pointed out, such as collinear optical-terahertz phase-matching in bulk crystals.
Kicken, Ria; Ernes, Elise; Hoogenberg-Engbers, Ilja
The paper reports on case studies in which an Authoring Concept Mapping Kit was incorporated as a didactic tool in the teaching of children with severe speech-language difficulties. The Kit was introduced to replace methods such as topic webs, or complement others such as conversation exchange......’ practice has been transformed and improved. The children’s perspective on the topic comes through in the teachers’ opinions. Concept mapping turned out to enhance meaning negotiation, active inquiry and collaboration during teaching interactive learning language. Teachers reported that it had great impact...... on children’s language development, vocabulary and spontaneous speech, while it had minimal impact on the way activities were performed in everyday classes....
Centini, M.; Peřina ml., Jan; Sciscione, L.; Sibilia, C.; Scalora, M.; Bloemer, M.J.; Bertolotti, M.
Roč. 72, 03 (2005), 033806/1-033806/11 ISSN 1050-2947 R&D Projects: GA MŠk(CZ) OC P11.003 Institutional research plan: CEZ:AV0Z10100522 Keywords : photon pair * photonic crystals * spontaneous parametric down-conversion Subject RIV: BH - Optics, Masers, Lasers Impact factor: 2.997, year: 2005
Full Text Available Our purpose is to illuminate compliances with, and resistances to, what we are calling "compulsory fluency" which we define as conventions for what constitutes competent speech. We achieve our purpose through a study of day-to-day communication between a woman with less conventional speech and her support providing family members and friends. Drawing from McRuer's (2006 compulsory ablebodiedness and Kafer's (2013 compulsory able-mindedness, we use "compulsory fluency" to refer to a form of articulation that is standardized and idealized and imposed on all speakers including those whose speech is less conventional. We see compulsory fluency as central to North American conceptions of personhood which are tied to individual ability to speak for one's self (Brueggemann, 2005. In this paper, we trace some North American principles for linguistic competence to outline widely held ideals of receptive and expressive language use, namely, conventions for how language should be understood and expressed. Using Critical Disability Studies (Goodley, 2013; McRuer, 2006 together with a feminist framework of relational autonomy (Nedelsky, 1989, our goal is to focus on experiences of people with less conventional speech and draw attention to power in communication as it flows in idiosyncratic and intersubjective fashion (Mackenzie & Stoljar, 2000; Westlund, 2009. In other words, we use a critical disability and feminist framing to call attention to less conventional forms of communication competence and, in this process, we challenge assumptions about what constitutes competent speech. As part of a larger qualitative study, we conduct a conversation analysis informed by Rapley and Antaki (1996 to examine day-to-day verbal, vocal and non-verbal communications of a young woman who self identifies as "having autism" - pseudonym Addison - in interaction with her support-providing family members and friends. We illustrate a multitude of Addison's compliances with
Osorio, Clara I; Valencia, Alejandra; Torres, Juan P
In most configurations aimed at generating entangled photons based on spontaneous parametric down conversion (SPDC), the generated pairs of photons are required to be entangled in only one degree of freedom. Any distinguishing information coming from the other degrees of freedom that characterize the photon should be suppressed to avoid correlations with the degree of freedom of interest. However, this suppression is not always possible. Here, we show how the frequency information available affects the purity of the two-photon state in space, revealing a correlation between the frequency and the space degrees of freedom. This correlation should be taken into account to calculate the total amount of entanglement between the photons.
Guardiola, Mathilde; Bertrand, Roxane
This paper investigates how and when interactional convergence is established by participants in conversation. We analyze sequences of storytelling using an original method that combines Conversation Analysis and a corpus-based approach. In storytelling, the participant in the position of "listener" is expected to produce either generic or specific responses adapted to the storyteller's narrative. The listener's behavior produced within the current activity is a cue of his/her interactional alignment. We show here that the listener can produce a specific type of (aligned) response, which we term a reported speech utterance in echo. The participant who is not telling the story is nonetheless able to animate the characters, while reversing the usual asymmetric roles of storyteller and listener. The use of this device is a way for the listener to display his/her stance toward the events told by the storyteller. If the listener's stance is congruent with that of the storyteller, this reveals a high degree of affiliation between the participants. We present seventeen excerpts from a collection of 94 instances of Echo Reported Speech (ERS) which we examined using the concepts of alignment and affiliation in order to show how different kinds of convergent sequences are constructed. We demonstrate that this phenomenon is mainly used by the listener to align and affiliate with the storyteller by means of reformulative, enumerative, or overbidding ERS. We also show that in affiliative sequences, reported speech can be used by the listener in a humorous way in order to temporarily disalign. This disalignment constitutes a potential starting point for an oblique sequence, which, if accepted and continued by the storyteller, gives rise to a highly convergent sequence.
Wynn, Camille J; Borrie, Stephanie A; Sellers, Tyra P
Conversational entrainment, a phenomenon whereby people modify their behaviors to match their communication partner, has been evidenced as critical to successful conversation. It is plausible that deficits in entrainment contribute to the conversational breakdowns and social difficulties exhibited by people with autism spectrum disorder (ASD). This study examined speech rate entrainment in children and adult populations with and without ASD. Sixty participants including typically developing children, children with ASD, typically developed adults, and adults with ASD participated in a quasi-conversational paradigm with a pseudoconfederate. The confederate's speech rate was digitally manipulated to create slow and fast speech rate conditions. Typically developed adults entrained their speech rate in the quasi-conversational paradigm, using a faster rate during the fast speech rate conditions and a slower rate during the slow speech rate conditions. This entrainment pattern was not evident in adults with ASD or in children populations. Findings suggest that speech rate entrainment is a developmentally acquired skill and offers preliminary evidence of speech rate entrainment deficits in adults with ASD. Impairments in this area may contribute to the conversational breakdowns and social difficulties experienced by this population. Future work is needed to advance this area of inquiry.
Full Text Available Movements and behaviour synchronise during social interaction at many levels, often unintentionally. During smooth conversation, for example, participants adapt to each others' speech rates. Here we aimed to find out to which extent speakers adapt their turn-taking rhythms during a story-building game.Nine sex-matched dyads of adults (12 males, 6 females created two 5-min stories by contributing to them alternatingly one word at a time. The participants were located in different rooms, with audio connection during one story and audiovisual during the other. They were free to select the topic of the story.Although the participants received no instructions regarding the timing of the story building, their word rhythms were highly entrained (R ̅ = 0.70, p < 0.001 even though the rhythms as such were unstable (R ̅ = 0.14 for pooled data. Such high entrainment in the absence of steady word rhythm occurred in every individual story, independently of whether the subjects were connected via audio-only or audiovisual link.The observed entrainment was of similar strength as typical entrainment in finger-tapping tasks where participants are specifically instructed to synchronize their behaviour. Thus speech seems to spontaneously induce strong entrainment between the conversation partners, likely reflecting automatic alignment of their semantic and syntactic processes.
Full Text Available In the dominant theoretical framework, human communication is modeled as the faithful transmission of information. This implies that when people are involved in communicational exchanges, they should be sensitive to the success with which information is transmitted, easily detecting when conversations lack coherence. The expectation that humans are good at detecting conversational incoherence is in line with common intuition, but there are several reasons to suspect that it might be unrealistic. First, similar intuitions have been shown to be unrealistic for a number of psychological processes. Second, faithful information transmission may conflict with other conversational goals. Third, mechanisms supporting information transmission may themselves lead to cases of incoherence being missed. To ascertain the extent to which people are insensitive to patches of serious conversational incoherence, we generated such patches in the laboratory by repeatedly crossing two unrelated conversations. Across two studies, involving both narrowly and broadly focused conversations, between 27% and 42% of the conversants did not notice that their conversations had been crossed. The results of these studies suggest that it may indeed be unrealistic to model spontaneous conversation as faithful information transmission. Rather, our results are more consistent with models of communication that view it as involving noisy and error-prone inferential processes, serving multiple independent goals.
Alaraifi, Jehad Ahmad; Amayreh, Mousa Mohammad; Saleh, Mohammad Yusef
Problem: There are no available studies on the prevalence, and distribution of speech disorders among Arabic speaking undergraduate students in Jordan. Method: A convenience sample of 400 undergraduate students at the University of Jordan was screened for speech disorders. Two spontaneous speech samples and an oral reading of a passage were…
Sameera A. Abdul-Kader; Dr. John Woods
Human-Computer Speech is gaining momentum as a technique of computer interaction. There has been a recent upsurge in speech based search engines and assistants such as Siri, Google Chrome and Cortana. Natural Language Processing (NLP) techniques such as NLTK for Python can be applied to analyse speech, and intelligent responses can be found by designing an engine to provide appropriate human like responses. This type of programme is called a Chatbot, which is the focus of this study. This pap...
McClain, Matthew; Romanowski, Brian
Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
Sergienko, A.V.; Walton, Z.D.; Booth, M.C.; Saleh, B.E.A.; Teich, M.C.
Full text: A new method for generating entangled photons with controllable frequency correlation via spontaneous parametric down-conversion (SPDC) is presented. The method entails initiating counter-propagating SPDC in a single-mode nonlinear waveguide by pumping with a pulsed beam perpendicular to the waveguide. In a typical spontaneous parametric down-conversion (SPDC) experiment, a photon from a monochromatic pump beam decays into two photons (often referred to as signal and idler) via interaction with a nonlinear optical crystal. While the signal and idler may be broadband individually, conservation of energy requires that the sum of their respective frequencies equals the single frequency of the monochromatic pump. This engenders frequency anti-correlation in the down-converted beams. Two developments in quantum information theory have renewed interest in the generalized states of frequency correlation. First, quantum information processes requiring the synchronized creation of multiple photon pairs have been devised, such as quantum teleportation. The requisite temporal control can be achieved by pumping the crystal with a brief pulse. The availability of pump photons of differing frequencies relaxes the strict frequency anti-correlation in the down-converted beams. Second, applications such as entanglement-enhanced clock synchronization and one-way auto-compensating quantum cryptography have been introduced that specifically require frequency correlation, as opposed to the usual frequency anticorrelation. Our method for obtaining controllable frequency entanglement entails initiating type-I SPDC (signal and idler identically polarized) in a single-mode nonlinear waveguide by pumping with a pulsed beam perpendicular to the waveguide. The down-converted photons emerge from opposite ends of the waveguide with a joint spectrum that can be varied from frequency anti-correlated to frequency correlated by adjusting the temporal and spatial characteristics of the
Javůrek, D.; Peřina ml., Jan
Roč. 95, č. 4 (2017), s. 1-13, č. článku 043828. ISSN 2469-9926 Institutional support: RVO:68378271 Keywords : surface spontaneous * parametric down-conversion * photon pairs * layered media Subject RIV: BH - Optics, Masers, Lasers OBOR OECD: Optics (including laser optics and quantum optics) Impact factor: 2.925, year: 2016
Javůrek, D.; Svozilík, J.; Peřina ml., Jan
Roč. 90, č. 4 (2014), "043844-1"-"043844-12" ISSN 1050-2947 R&D Projects: GA ČR GAP205/12/0382 Institutional support: RVO:68378271 Keywords : photon pairs * orbital-angular-momentum-entangled * nonlinear ring fiber * spontaneous parametric down-conversion Subject RIV: BH - Optics , Masers, Lasers Impact factor: 2.808, year: 2014
The experimental addition of speech output to computer-based Esperanto lessons using speech synthesized from text is described. Because of Esperanto's phonetic spelling and simple rhythm, it is particularly easy to describe the mechanisms of Esperanto synthesis. Attention is directed to how the text-to-speech conversion is performed and the ways…
Yoder, Paul J.; Woynaroski, Tiffany; Camarata, Stephen
Purpose: There is an ongoing need to develop assessments of spontaneous speech that focus on whether the child's utterances are comprehensible to listeners. This study sought to identify the attributes of a stable ratings-based measure of speech comprehensibility, which enabled examining the criterion-related validity of an orthography-based…
Juel Henrichsen, Peter
Tools for mapping between written words and phonetic forms are essential components in many applications of speech technology, such as automatic speech recognition (ASR) and speech synthesis (TTS). Simple converters can be derived from annotated speech corpora using machine learning, and such tools...... are available for almost all European languages and a great number of others. Whereas their performance is adequate for ASR and for low-quality TTS, their lack of precision makes them unfit for linguistic research purposes such as phonetic annotation of spontaneous speech recordings. A common method...
Zhang, Shuai-Shuai; Shu, Qi; Zhou, Lan; Sheng, Yu-Bo
Entanglement purification is to distill the high quality entanglement from the low quality entanglement with local operations and classical communications. It is one of the key technologies in long-distance quantum communication. We discuss an entanglement purification protocol (EPP) with spontaneous parametric down conversion (SPDC) sources, in contrast to previous EPP with multi-copy mixed states, which requires ideal entanglement sources. We show that the SPDC source is not an obstacle for purification, but can benefit the fidelity of the purified mixed state. This EPP works for linear optics and is feasible in current experiment technology. Project supported by the National Natural Science Foundation of China (Grant Nos. 11474168 and 61401222), the Natural Science Foundation of Jiangsu Province, China (Grant No. BK20151502), the Qing Lan Project in Jiangsu Province, China, and a Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions, China.
This dissertation focuses on the design and evaluation of speech-based conversational interfaces for task-oriented dialogues. Conversational interfaces are software programs enabling interaction with computer devices through natural language dialogue. Even though processing conversational speech is
Hammer, Annemiek; Coene, Martine
In this study, the acquisition of Dutch finite verb morphology is investigated in children with cochlear implants (CIs) with profound hearing loss and in children with hearing aids (HAs) with moderate to severe hearing loss. Comparing these two groups of children increases our insight into how hearing experience and audibility affect the acquisition of morphosyntax. Spontaneous speech samples were analyzed of 48 children with CIs and 29 children with HAs, ages 4 to 7 years. These language samples were analyzed by means of standardized language analysis involving mean length of utterance, the number of finite verbs produced, and target-like subject-verb agreement. The outcomes were interpreted relative to expectations based on the performance of typically developing peers with normal hearing. Outcomes of all measures were correlated with hearing level in the group of HA users and age at implantation in the group of CI users. For both groups, the number of finite verbs that were produced in 50-utterance sample was on par with mean length of utterance and at the lower bound of the normal distribution. No significant differences were found between children with CIs and HAs on any of the measures under investigation. Yet, both groups produced more subject-verb agreement errors than are to be expected for typically developing hearing peers. No significant correlation was found between the hearing level of the children and the relevant measures of verb morphology, both with respect to the overall number of verbs that were used and the number of errors that children made. Within the group of CI users, the outcomes were significantly correlated with age at implantation. When producing finite verb morphology, profoundly deaf children wearing CIs perform similarly to their peers with moderate-to-severe hearing loss wearing HAs. Hearing loss negatively affects the acquisition of subject-verb agreement regardless of the hearing device (CI or HA) that the child is wearing. The
Centini, M.; Sciscione, L.; Sibilia, C.; Bertolotti, M.; Perina, J. Jr.; Scalora, M.; Bloemer, M.J.
A description of spontaneous parametric down-conversion in finite-length one-dimensional nonlinear photonic crystals is developed using semiclassical and quantum approaches. It is shown that if a suitable averaging is added to the semiclassical model, its results are in very good agreement with the quantum approach. We propose two structures made with GaN/AlN that generate both degenerate and nondegenerate entangled photon pairs. Both structures are designed so as to achieve a high efficiency of the nonlinear process
Silber, Ronnie F.
Two studies examined the modifications that adult speakers make in speech to disadvantaged listeners. Previous research that has focused on speech to the deaf individuals and to young children has shown that adults clarify speech when addressing these two populations. Acoustic measurements suggest that the signal undergoes similar changes for both populations. Perceptual tests corroborate these results for the deaf population, but are nonsystematic in developmental studies. The differences in the findings for these populations and the nonsystematic results in the developmental literature may be due to methodological factors. The present experiments addressed these methodological questions. Studies of speech to hearing impaired listeners have used read, nonsense, sentences, for which speakers received explicit clarification instructions and feedback, while in the child literature, excerpts of real-time conversations were used. Therefore, linguistic samples were not precisely matched. In this study, experiments used various linguistic materials. Experiment 1 used a children's story; experiment 2, nonsense sentences. Four mothers read both types of material in four ways: (1) in "normal" adult speech, (2) in "babytalk," (3) under the clarification instructions used in the "hearing impaired studies" (instructed clear speech) and (4) in (spontaneous) clear speech without instruction. No extra practice or feedback was given. Sentences were presented to 40 normal hearing college students with and without simultaneous masking noise. Results were separately tabulated for content and function words, and analyzed using standard statistical tests. The major finding in the study was individual variation in speaker intelligibility. "Real world" speakers vary in their baseline intelligibility. The four speakers also showed unique patterns of intelligibility as a function of each independent variable. Results were as follows. Nonsense sentences were less intelligible than story
Full Text Available The learning-based speech recovery approach using statistical spectral conversion has been used for some kind of distorted speech as alaryngeal speech and body-conducted speech (or bone-conducted speech. This approach attempts to recover clean speech (undistorted speech from noisy speech (distorted speech by converting the statistical models of noisy speech into that of clean speech without the prior knowledge on characteristics and distributions of noise source. Presently, this approach has still not attracted many researchers to apply in general noisy speech enhancement because of some major problems: those are the difficulties of noise adaptation and the lack of noise robust synthesizable features in different noisy environments. In this paper, we adopted the methods of state-of-the-art voice conversions and speaker adaptation in speech recognition to the proposed speech recovery approach applied in different kinds of noisy environment, especially in adverse environments with joint compensation of additive and convolutive noises. We proposed to use the decorrelated wavelet packet coefficients as a low-dimensional robust synthesizable feature under noisy environments. We also proposed a noise adaptation for speech recovery with the eigennoise similar to the eigenvoice in voice conversion. The experimental results showed that the proposed approach highly outperformed traditional nonlearning-based approaches.
Babel, Molly; McAuliffe, Michael; Haber, Graham
This study examines spontaneous phonetic accommodation of a dialect with distinct categories by speakers who are in the process of merging those categories. We focus on the merger of the NEAR and SQUARE lexical sets in New Zealand English, presenting New Zealand participants with an unmerged speaker of Australian English. Mergers-in-progress are a uniquely interesting sound change as they showcase the asymmetry between speech perception and production. Yet, we examine mergers using spontaneous phonetic imitation, which is phenomenon that is necessarily a behavior where perceptual input influences speech production. Phonetic imitation is quantified by a perceptual measure and an acoustic calculation of mergedness using a Pillai-Bartlett trace. The results from both analyses indicate spontaneous phonetic imitation is moderated by extra-linguistic factors such as the valence of assigned conditions and social bias. We also find evidence for a decrease in the degree of mergedness in post-exposure productions. Taken together, our results suggest that under the appropriate conditions New Zealanders phonetically accommodate to Australian English and that in the process of speech imitation, mergers-in-progress can, but do not consistently, become less merged.
Full Text Available This study examines spontaneous phonetic accommodation of a dialect with distinct categories by speakers who are in the process of merging those categories. We focus on the merger of the NEAR and SQUARE lexical sets in New Zealand English, presenting New Zealand participants with an unmerged speaker of Australian English. Mergers-in-progress are a uniquely interesting sound change as they showcase the asymmetry between speech perception and production. Yet, we examine mergers using spontaneous phonetic imitation, which is phenomenon that is necessarily a behavior where perceptual input influences speech production. Phonetic imitation is quantified by a perceptual measure and an acoustic calculation of mergedness using a Pillai-Bartlett trace. The results from both analyses indicate spontaneous phonetic imitation is moderated by extra-linguistic factors such as the valence of assigned conditions and social bias. We also find evidence for a decrease in the degree of mergedness in post-exposure productions. Taken together, our results suggest that under the appropriate conditions New Zealanders phonetically accommodate to Australian English and that in the process of speech imitation, mergers-in-progress can, but do not consistently, become less merged.
Soto, Gloria; Clarke, Michael T.
Purpose: This study was conducted to evaluate the effects of a conversation-based intervention on the expressive vocabulary and grammatical skills of children with severe motor speech disorders and expressive language delay who use augmentative and alternative communication. Method: Eight children aged from 8 to 13 years participated in the study.…
Laganaro, Marina; Croisier, Michèle; Bagou, Odile; Assal, Frédéric
We present a 3-year follow-up study of a patient with progressive apraxia of speech (PAoS), aimed at investigating whether the theoretical organization of phonetic encoding is reflected in the progressive disruption of speech. As decreased speech rate was the most striking pattern of disruption during the first 2 years, durational analyses were carried out longitudinally on syllables excised from spontaneous, repetition and reading speech samples. The crucial result of the present study is the demonstration of an effect of syllable frequency on duration: the progressive disruption of articulation rate did not affect all syllables in the same way, but followed a gradient that was function of the frequency of use of syllable-sized motor programs. The combination of data from this case of PAoS with previous psycholinguistic and neurolinguistic data, points to a frequency organization of syllable-sized speech-motor plans. In this study we also illustrate how studying PAoS can be exploited in theoretical and clinical investigations of phonetic encoding as it represents a unique opportunity to investigate speech while it progressively disrupts. Copyright © 2011 Elsevier Srl. All rights reserved.
Bloch, Steven; Saldert, Charlotta; Ferm, Ulrika
This study examined the nature of topic transition problems associated with acquired progressive dysarthric speech in the everyday conversation of people with motor neurone disease. Using conversation analytic methods, a video collection of five naturally occurring problematic topic transitions was identified, transcribed and analysed. These were extracted from a main collection of over 200 other-initiated repair sequences and a sub-set of 15 problematic topic transition sequences. The sequences were analysed with reference to how the participants both identified and resolved the problems. Analysis revealed that topic transition by people with dysarthria can prove problematic. Conversation partners may find transitions problematic not only because of speech intelligibility but also because of a sequential disjuncture between the dysarthric speech turn and whatever topic has come prior. In addition the treatment of problematic topic transition as a complaint reveals the potential vulnerability of people with dysarthria to judgements of competence. These findings have implications for how dysarthria is conceptualized and how specific actions in conversation, such as topic transition, might be suitable targets for clinical intervention.
Marschik, Peter B; Vollmann, Ralf; Bartl-Pokorny, Katrin D; Green, Vanessa A; van der Meer, Larah; Wolin, Thomas; Einspieler, Christa
We assessed various aspects of speech-language and communicative functions of an individual with the preserved speech variant of Rett syndrome (RTT) to describe her developmental profile over a period of 11 years. For this study, we incorporated the following data resources and methods to assess speech-language and communicative functions during pre-, peri- and post-regressional development: retrospective video analyses, medical history data, parental checklists and diaries, standardized tests on vocabulary and grammar, spontaneous speech samples and picture stories to elicit narrative competences. Despite achieving speech-language milestones, atypical behaviours were present at all times. We observed a unique developmental speech-language trajectory (including the RTT typical regression) affecting all linguistic and socio-communicative sub-domains in the receptive as well as the expressive modality. Future research should take into consideration a potentially considerable discordance between formal and functional language use by interpreting communicative acts on a more cautionary note.
Lind, Marianne; Kristoffersen, Kristian Emil; Moen, Inger; Simonsen, Hanne Gram
Functionally relevant assessment of the language production of speakers with aphasia should include assessment of connected speech production. Despite the ecological validity of everyday conversations, more controlled and monological types of texts may be easier to obtain and analyse in clinical practice. This article discusses some simple measurements for the analysis of semi-spontaneous oral text production by speakers with aphasia. Specifically, the measurements are related to the production of verbs and nouns, and the realization of different sentence types. The proposed measurements should be clinically relevant, easily applicable, and linguistically meaningful. The measurements have been applied to oral descriptions of the 'Cookie Theft' picture by eight monolingual Norwegian speakers, four with an anomic type of aphasia and four without any type of language impairment. Despite individual differences in both the clinical and the non-clinical group, most of the measurements seem to distinguish between speakers with and without aphasia.
Van Engen, Kristin J
This study investigated whether clear speech reduces the cognitive demands of lexical competition by crossing speaking style with lexical difficulty. Younger and older adults identified more words in clear versus conversational speech and more easy words than hard words. An initial analysis suggested that the effect of lexical difficulty was reduced in clear speech, but more detailed analyses within each age group showed this interaction was significant only for older adults. The results also showed that both groups improved over the course of the task and that clear speech was particularly helpful for individuals with poorer hearing: for younger adults, clear speech eliminated hearing-related differences that affected performance on conversational speech. For older adults, clear speech was generally more helpful to listeners with poorer hearing. These results suggest that clear speech affords perceptual benefits to all listeners and, for older adults, mitigates the cognitive challenge associated with identifying words with many phonological neighbors.
Tilsen, Sam; Arvaniti, Amalia
This study presents a method for analyzing speech rhythm using empirical mode decomposition of the speech amplitude envelope, which allows for extraction and quantification of syllabic- and supra-syllabic time-scale components of the envelope. The method of empirical mode decomposition of a vocalic energy amplitude envelope is illustrated in detail, and several types of rhythm metrics derived from this method are presented. Spontaneous speech extracted from the Buckeye Corpus is used to assess the effect of utterance length on metrics, and it is shown how metrics representing variability in the supra-syllabic time-scale components of the envelope can be used to identify stretches of speech with targeted rhythmic characteristics. Furthermore, the envelope-based metrics are used to characterize cross-linguistic differences in speech rhythm in the UC San Diego Speech Lab corpus of English, German, Greek, Italian, Korean, and Spanish speech elicited in read sentences, read passages, and spontaneous speech. The envelope-based metrics exhibit significant effects of language and elicitation method that argue for a nuanced view of cross-linguistic rhythm patterns.
Chakrabarty, Madhushree; Kumar, Suman; Chatterjee, Indranil; Maheshwari, Neha
The present study aims at analyzing speech samples of four Bengali speaking children with repaired cleft palates with a view to differentiate between the misarticulations arising out of a deficit in linguistic skills and structural or motoric limitations. Spontaneous speech samples were collected and subjected to a number of linguistic analyses…
Summarizes the current state of research in conversation analysis, referring primarily to six different perspectives that have developed from the philosophy, sociology, anthropology, and linguistics disciplines. These include pragmatics; speech act theory; interactional sociolinguistics; ethnomethodology; ethnography of communication; and…
Baese-Berk, Melissa M; Heffner, Christopher C; Dilley, Laura C; Pitt, Mark A; Morrill, Tuuli H; McAuley, J Devin
Humans unconsciously track a wide array of distributional characteristics in their sensory environment. Recent research in spoken-language processing has demonstrated that the speech rate surrounding a target region within an utterance influences which words, and how many words, listeners hear later in that utterance. On the basis of hypotheses that listeners track timing information in speech over long timescales, we investigated the possibility that the perception of words is sensitive to speech rate over such a timescale (e.g., an extended conversation). Results demonstrated that listeners tracked variation in the overall pace of speech over an extended duration (analogous to that of a conversation that listeners might have outside the lab) and that this global speech rate influenced which words listeners reported hearing. The effects of speech rate became stronger over time. Our findings are consistent with the hypothesis that neural entrainment by speech occurs on multiple timescales, some lasting more than an hour. © The Author(s) 2014.
Uses conversation analysis to investigate reported speech in talk-in-interaction. Beginning with an examination of direct and indirect reported speech, the article highlights some of the design features of the former, and the sequential environments in which it occurs. (Author/VWL)
Moore, Christopher A.; Ruark, Jacki L.
This investigation was designed to quantify the coordinative organization of mandibular muscles in toddlers during speech and nonspeech behaviors. Seven 15-month-olds were observed during spontaneous production of chewing, sucking, babbling, and speech. Comparison of mandibular coordination across these behaviors revealed that, even for children in the earliest stages of true word production, coordination was quite different from that observed for other behaviors. Production of true words was...
Corona, Maria [Instituto de Ciencias Nucleares, Universidad Nacional Autonoma de Mexico, apdo. postal 70-543, DF 04510 Mexico City (Mexico); Departamento de Optica, Centro de Investigacion Cientifica y de Educacion Superior de Ensenada, Apartado Postal 2732, BC 22860 Ensenada (Mexico); Garay-Palmett, Karina; U' Ren, Alfred B. [Instituto de Ciencias Nucleares, Universidad Nacional Autonoma de Mexico, apdo. postal 70-543, DF 04510 Mexico City (Mexico)
We study the third-order spontaneous parametric down-conversion (TOSPDC) process, as a means to generate entangled photon triplets. Specifically, we consider thin optical fibers as the nonlinear medium to be used as the basis for TOSPDC in configurations where phase matching is attained through the use of more than one fiber transverse modes. Our analysis in this paper, which follows from our earlier paper [Opt. Lett. 36, 190-192 (2011)], aims to supply experimentalists with the details required in order to design a TOSPDC photon-triplet source. Specifically, our analysis focuses on the photon triplet state, on the rate of emission, and on the TOSPDC phase-matching characteristics for the cases of frequency-degenerate and frequency nondegenerate TOSPDC.
Krause, Jean C.; Braida, Louis D.
Sentences spoken ``clearly'' are significantly more intelligible than those spoken ``conversationally'' for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.
This study asks how speakers adjust their speech to their addressees, focusing on the potential roles of cognitive representations such as partner models, automatic processes such as interactive alignment, and social processes such as interactional negotiation. The nature of addressee orientation......, psycholinguistics and conversation analysis, and offers both overviews of child-directed, foreigner-directed and robot-directed speech and in-depth analyses of the processes involved in adjusting to a communication partner....
Pratt, Michael W.; And Others
Investigated relations between certain family context variables and the conversational behavior of 36 parents who were playing with their 3 year olds. Transcripts were coded for types of conversational functions and structure of parent speech. Marital satisfaction was associated with aspects of parent speech. (LB)
Full Text Available Introduction: Many researches have dealt with the relationship between stuttering and differentlinguistic factors. This study investigates the effect of syntactic complexity on the amount of speechdysfluency in stuttering Persian-speaking children, and comparing them with the non-stuttering ones. Theobtained results can pave the way to obtain a better knowledge of the nature of stuttering, as well asfinding more suitable ways in the process of its treatment.Materials and Methods: The participants were 10 stuttering and 10 non-stuttering Persian-speakingand monolingual children in the age range of 4-6 which were matched by age and gender. First 30minutes sample of child's spontaneous speech was provided and then utterances of each child studied forthe amount of dysfluency and syntactic complexity.Results: In both groups of stuttering and non-stuttering children, there was a significant difference forthe amount of dysfluency between simple and complex sentences.Conclusion: The results of this study show that by increase of syntactic complexity at the spontaneousspeech level, stuttering and non-stuttering children had more dysfluency amount. Also, by increase ofsyntactic complexity, stuttering children had more dysfluency amount than non-stuttering children.
Teixeira, João Paulo; Fernandes, Anildo
Text-to-speech synthesis is the main subject treated in this work. It will be presented the constitution of a generic text-to-speech system conversion, explained the functions of the various modules and described the development techniques using the formants model. The development of a didactic formant synthesiser under Matlab environment will also be described. This didactic synthesiser is intended for a didactic understanding of the formant model of speech production.
Poorjam, Amir Hossein; Hesaraki, Soheila; Safavi, Saeid
This paper proposes an automatic smoking habit detection from spontaneous telephone speech signals. In this method, each utterance is modeled using i-vector and non-negative factor analysis (NFA) frameworks, which yield low-dimensional representation of utterances by applying factor analysis...... method is evaluated on telephone speech signals of speakers whose smoking habits are known drawn from the National Institute of Standards and Technology (NIST) 2008 and 2010 Speaker Recognition Evaluation databases. Experimental results over 1194 utterances show the effectiveness of the proposed approach...... for the automatic smoking habit detection task....
Dodge, Hiroko H; Mattek, Nora; Gregor, Mattie; Bowman, Molly; Seelye, Adriana; Ybarra, Oscar; Asgari, Meysam; Kaye, Jeffrey A
identifying MCI (vs. normals) was 0.71 (95% Confidence Interval: 0.54 - 0.89) when average proportion of word counts spoken by subjects was included univariately into the model. An ecologically valid social marker such as the proportion of spoken words produced during spontaneous conversations may be sensitive to transitions from normal cognition to MCI.
Kardava, Irakli; Tadyszak, Krzysztof; Gulua, Nana; Jurga, Stefan
For more flexibility of environmental perception by artificial intelligence it is needed to exist the supporting software modules, which will be able to automate the creation of specific language syntax and to make a further analysis for relevant decisions based on semantic functions. According of our proposed approach, of which implementation it is possible to create the couples of formal rules of given sentences (in case of natural languages) or statements (in case of special languages) by helping of computer vision, speech recognition or editable text conversion system for further automatic improvement. In other words, we have developed an approach, by which it can be achieved to significantly improve the training process automation of artificial intelligence, which as a result will give us a higher level of self-developing skills independently from us (from users). At the base of our approach we have developed a software demo version, which includes the algorithm and software code for the entire above mentioned component's implementation (computer vision, speech recognition and editable text conversion system). The program has the ability to work in a multi - stream mode and simultaneously create a syntax based on receiving information from several sources.
Yoder, Paul J.; Camarata, Stephen; Woynaroski, Tiffany
Purpose: This study examined whether a particular type of therapy (Broad Target Speech Recasts, BTSR) was superior to a contrast treatment in facilitating speech comprehensibility in conversations of students with Down syndrome who began treatment with initially high verbal imitation. Method: We randomly assigned 51 5- to 12-year-old students to…
Van Engen, Kristin J.; Baese-Berk, Melissa; Baker, Rachel E.; Choi, Arim; Kim, Midam; Bradlow, Ann R.
This paper describes the development of the Wildcat Corpus of native- and foreign-accented English, a corpus containing scripted and spontaneous speech recordings from 24 native speakers of American English and 52 non-native speakers of English. The core element of this corpus is a set of spontaneous speech recordings, for which a new method of…
Magalhães, Ana Tereza de Matos; Goffi-Gomez, Maria Valéria Schmidt; Hoshino, Ana Cristina; Tsuji, Robinson Koji; Bento, Ricardo Ferreira; Brito, Rubens
New technology in the Freedom® speech processor for cochlear implants was developed to improve how incoming acoustic sound is processed; this applies not only for new users, but also for previous generations of cochlear implants. To identify the contribution of this technology-- the Nucleus 22®--on speech perception tests in silence and in noise, and on audiometric thresholds. A cross-sectional cohort study was undertaken. Seventeen patients were selected. The last map based on the Spectra® was revised and optimized before starting the tests. Troubleshooting was used to identify malfunction. To identify the contribution of the Freedom® technology for the Nucleus22®, auditory thresholds and speech perception tests were performed in free field in sound-proof booths. Recorded monosyllables and sentences in silence and in noise (SNR = 0dB) were presented at 60 dBSPL. The nonparametric Wilcoxon test for paired data was used to compare groups. Freedom® applied for the Nucleus22® showed a statistically significant difference in all speech perception tests and audiometric thresholds. The Freedom® technology improved the performance of speech perception and audiometric thresholds of patients with Nucleus 22®.
Martínez, Angela; Felizzola Donado, Carlos Alberto; Matallana Eslava, Diana Lucía
Patients with schizophrenia and Frontotemporal Dementia (FTD) in their linguistic variants share some language characteristics such as the lexical access difficulties, disordered speech with disruptions, many pauses, interruptions and reformulations. For the schizophrenia patients it reflects a difficulty of affect expression, while for the FTD patients it reflects a linguistic issue. This study, through an analysis of a series of cases assessed Clinic both in memory and on the Mental Health Unit of HUSI-PUJ (Hospital Universitario San Ignacio), with additional language assessment (analysis speech and acoustic analysis), present distinctive features of the DFT in its linguistic variants and schizophrenia that will guide the specialist in finding early markers of a differential diagnosis. In patients with FTD language variants, in 100% of cases there is a difficulty understanding linguistic structure of complex type; and important speech fluency problems. In patients with schizophrenia, there are significant alterations in the expression of the suprasegmental elements of speech, as well as disruptions in discourse. We present how depth language assessment allows to reassess some of the rules for the speech and prosody analysis of patients with dementia and schizophrenia; we suggest how elements of speech are useful in guiding the diagnosis and correlate functional compromise in everyday psychiatrist's practice. Copyright © 2014 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.
Chu, Mingyuan; Kita, Sotaro
People spontaneously gesture when they speak (co-speech gestures) and when they solve problems silently (co-thought gestures). In this study, we first explored the relationship between these 2 types of gestures and found that individuals who produced co-thought gestures more frequently also produced co-speech gestures more frequently (Experiments…
Rao, K Sreenivasa
This book discusses the contribution of articulatory and excitation source information in discriminating sound units. The authors focus on excitation source component of speech -- and the dynamics of various articulators during speech production -- for enhancement of speech recognition (SR) performance. Speech recognition is analyzed for read, extempore, and conversation modes of speech. Five groups of articulatory features (AFs) are explored for speech recognition, in addition to conventional spectral features. Each chapter provides the motivation for exploring the specific feature for SR task, discusses the methods to extract those features, and finally suggests appropriate models to capture the sound unit specific knowledge from the proposed features. The authors close by discussing various combinations of spectral, articulatory and source features, and the desired models to enhance the performance of SR systems.
Pilesjö, Maja Sigurd; Norén, Niklas
This Conversation Analysis study investigated how a speech and language therapist (SLT) created opportunities for communication aid use in multiparty conversation. An SLT interacted with a child with multiple disabilities and her grandparents in a home setting, using a bliss board. The analyses...
Larson, Martha; Ordelman, Roeland J.F.; de Jong, Franciska M.G.; Kraaij, Wessel; Kohler, Joachim
Spoken document retrieval research effort invested into developing broadcast news retrieval systems has yielded impressive results. This paper is the introduction the proceedings of the 3rd workshop aiming at the advancement of the field in less explored domains (SSCS2009) which was organized in
Coppens-Hofman, Marjolein C; Terband, Hayo R; Maassen, Ben A M; van Schrojenstein Lantman-De Valk, Henny M J; van Zaalen-op't Hof, Yvonne; Snik, Ad F M
In individuals with an intellectual disability, speech dysfluencies are more common than in the general population. In clinical practice, these fluency disorders are generally diagnosed and treated as stuttering rather than cluttering. To characterise the type of dysfluencies in adults with intellectual disabilities and reported speech difficulties with an emphasis on manifestations of stuttering and cluttering, which distinction is to help optimise treatment aimed at improving fluency and intelligibility. The dysfluencies in the spontaneous speech of 28 adults (18-40 years; 16 men) with mild and moderate intellectual disabilities (IQs 40-70), who were characterised as poorly intelligible by their caregivers, were analysed using the speech norms for typically developing adults and children. The speakers were subsequently assigned to different diagnostic categories by relating their resulting dysfluency profiles to mean articulatory rate and articulatory rate variability. Twenty-two (75%) of the participants showed clinically significant dysfluencies, of which 21% were classified as cluttering, 29% as cluttering-stuttering and 25% as clear cluttering at normal articulatory rate. The characteristic pattern of stuttering did not occur. The dysfluencies in the speech of adults with intellectual disabilities and poor intelligibility show patterns that are specific for this population. Together, the results suggest that in this specific group of dysfluent speakers interventions should be aimed at cluttering rather than stuttering. The reader will be able to (1) describe patterns of dysfluencies in the speech of adults with intellectual disabilities that are specific for this group of people, (2) explain that a high rate of dysfluencies in speech is potentially a major determiner of poor intelligibility in adults with ID and (3) describe suggestions for intervention focusing on cluttering rather than stuttering in dysfluent speakers with ID. Copyright © 2013 Elsevier Inc
Gubiani, Marileda Barichello; Pagliarin, Karina Carlesso; Keske-Soares, Marcia
This study systematically reviews the literature on the main tools used to evaluate childhood apraxia of speech (CAS). The search strategy includes Scopus, PubMed, and Embase databases. Empirical studies that used tools for assessing CAS were selected. Articles were selected by two independent researchers. The search retrieved 695 articles, out of which 12 were included in the study. Five tools were identified: Verbal Motor Production Assessment for Children, Dynamic Evaluation of Motor Speech Skill, The Orofacial Praxis Test, Kaufman Speech Praxis Test for Children, and Madison Speech Assessment Protocol. There are few instruments available for CAS assessment and most of them are intended to assess praxis and/or orofacial movements, sequences of orofacial movements, articulation of syllables and phonemes, spontaneous speech, and prosody. There are some tests for assessment and diagnosis of CAS. However, few studies on this topic have been conducted at the national level, as well as protocols to assess and assist in an accurate diagnosis.
Jacks, Adam; Marquardt, Thomas P.; Davis, Barbara L.
Changes in consonant and syllable-level error patterns of three children diagnosed with childhood apraxia of speech (CAS) were investigated in a 3-year longitudinal study. Spontaneous speech samples were analyzed to assess the accuracy of consonants and syllables. Consonant accuracy was low overall, with most frequent errors on middle- and…
Ekström, Seth-Reino; Borg, Erik
The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (Ptempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (Pmusic offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.
Sierra Rose Dye
Full Text Available In early modern Scotland, thousands of people were accused and tried for the crime of witchcraft, many of whom were women. This paper examines the particular qualities associated with witches in Scottish belief – specifically speech and sexuality – in order to better understand how and why the witch hunts occurred. This research suggests that the growing emphasis on the words of witches during this period was a reflection of a mounting concern over the power and control of speech in early modern society. In looking at witchcraft as a speech crime, it is possible to explain not only why accused witches were more frequently women, but also how the persecution of individuals – both male and female – functioned to ensure that local and state authorities maintained a monopoly on powerful speech.
Vitelli, Chiara; Spagnolo, Nicolo; Toffoli, Lorenzo; Sciarrino, Fabio; De Martini, Francesco
We consider the high-gain spontaneous parametric down-conversion in a noncollinear geometry as a paradigmatic scenario to investigate the quantum-to-classical transition by increasing the pump power, that is, the average number of generated photons. The possibility of observing quantum correlations in such a macroscopic quantum system through dichotomic measurement will be analyzed by addressing two different measurement schemes, based on different dichotomization processes. More specifically, we will investigate the persistence of nonlocality in an increasing size (n/2)-spin singlet state by studying the change in the correlations form as n increases, both in the ideal case and in presence of losses. We observe a fast decrease in the amount of Bell's inequality violation for increasing system size. This theoretical analysis is supported by the experimental observation of macro-macro correlations with an average number of photons of about 10 3 . Our results shed light on the practical extreme difficulty of observing nonlocality by performing such a dichotomic fuzzy measurement.
Rosa S. Gisladottir
Full Text Available Everyday conversation requires listeners to quickly recognize verbal actions, so-called speech acts, from the underspecified linguistic code and prepare a relevant response within the tight time constraints of turn-taking. The goal of this study was to determine the time-course of speech act recognition by investigating oscillatory EEG activity during comprehension of spoken dialog. Participants listened to short, spoken dialogs with target utterances that delivered three distinct speech acts (Answers, Declinations, Pre-offers. The targets were identical across conditions at lexico-syntactic and phonetic/prosodic levels but differed in the pragmatic interpretation of the speech act performed. Speech act comprehension was associated with reduced power in the alpha/beta bands just prior to Declination speech acts, relative to Answers and Pre-offers. In addition, we observed reduced power in the theta band during the beginning of Declinations, relative to Answers. Based on the role of alpha and beta desynchronization in anticipatory processes, the results are taken to indicate that anticipation plays a role in speech act recognition. Anticipation of speech acts could be critical for efficient turn-taking, allowing interactants to quickly recognize speech acts and respond within the tight time frame characteristic of conversation. The results show that anticipatory processes can be triggered by the characteristics of the interaction, including the speech act type.
Full Text Available This study highlights the internal conversation which takes place in Oracle CorporationMalaysia. Through the study, it will be shown how conversational analysis is used toanalyze the transcription of a telephone conversation between Oracle staffs. The analysisof the transcriptions will apply a few basic concepts of conversational analysis; turntakingorganization, and the adjacency pair. The objective of the study is to find out howthe internal conversations takes place by focusing on the conversation itself, that is, theconversational structures spontaneously produced by people during talk ranging fromturn-taking strategies, how topics are introduced, conversation closings and so on. Bylooking in detail at such talk, we can gain a detailed understanding of how the staffs seethemselves in relation to the company that influence their daily lives.Keywords: conversational analysis, turn-taking, adjacency pairs
Moreau, Paul-Antoine; Mougin-Sisini, Joé; Devaux, Fabrice; Lantz, Eric
We demonstrate Einstein-Podolsky-Rosen (EPR) entanglement by detecting purely spatial quantum correlations in the near and far fields of spontaneous parametric down-conversion generated in a type-2 beta barium borate crystal. Full-field imaging is performed in the photon-counting regime with an electron-multiplying CCD camera. The data are used without any postselection, and we obtain a violation of Heisenberg inequalities with inferred quantities taking into account all the biphoton pairs in both the near and far fields by integration on the entire two-dimensional transverse planes. This ensures a rigorous demonstration of the EPR paradox in its original position-momentum form.
Naturally produced English clear speech has been shown to be more intelligible than English conversational speech. However, little is known about the extent of the clear speech effects in the production of nonnative English, and perception of foreign-accented English by younger and older listeners. The present study examined whether Cantonese speakers would employ the same strategies as those used by native English speakers in producing clear speech in their second language. Also, the clear s...
Robins, Sarah; Treiman, Rebecca; Rosales, Nicole
Learning about letters is an important component of emergent literacy. We explored the possibility that parent speech provides information about letters, and also that children’s speech reflects their own letter knowledge. By studying conversations transcribed in CHILDES (MacWhinney, 2000) between parents and children aged one to five, we found that alphabetic order influenced use of individual letters and letter sequences. The frequency of letters in children’s books influenced parent utterances throughout the age range studied, but children’s utterances only after age two. Conversations emphasized some literacy-relevant features of letters, such as their shapes and association with words, but not letters’ sounds. Describing these patterns and how they change over the preschool years offers important insight into the home literacy environment. PMID:25598577
Chen Haitao; Ren Jian; Che Jiaming; Hang Junbiao; Qiu Weicheng; Chen Zhongyuan
Objective: To propose a treatment protocol by video thoracoscopy in spontaneous pneumothorax. Methods: One hundred and three patients underwent Video-assisted thoracoscopy (VATS) treatment of spontaneous pneumothorax and hemothorax. Indications included recurrent pneumothorax, persistent air leakage following conservative therapy, complicated hemothorax and CT scan identified bullae formation. Results: No operative deaths occurred, conversion rate was 2.91%, recurrence rate was 0.97%, complication rate was 3.81% and mean postoperative hospital stay was 5.6 days. Conclusions: VATS treatment of spontaneous pneumothorax is better than open chest surgery and also superior than conservative therapy
Dick, Anthony Steven; Mok, Eva H; Raja Beharelle, Anjali; Goldin-Meadow, Susan; Small, Steven L
In everyday conversation, listeners often rely on a speaker's gestures to clarify any ambiguities in the verbal message. Using fMRI during naturalistic story comprehension, we examined which brain regions in the listener are sensitive to speakers' iconic gestures. We focused on iconic gestures that contribute information not found in the speaker's talk, compared with those that convey information redundant with the speaker's talk. We found that three regions-left inferior frontal gyrus triangular (IFGTr) and opercular (IFGOp) portions, and left posterior middle temporal gyrus (MTGp)--responded more strongly when gestures added information to nonspecific language, compared with when they conveyed the same information in more specific language; in other words, when gesture disambiguated speech as opposed to reinforced it. An increased BOLD response was not found in these regions when the nonspecific language was produced without gesture, suggesting that IFGTr, IFGOp, and MTGp are involved in integrating semantic information across gesture and speech. In addition, we found that activity in the posterior superior temporal sulcus (STSp), previously thought to be involved in gesture-speech integration, was not sensitive to the gesture-speech relation. Together, these findings clarify the neurobiology of gesture-speech integration and contribute to an emerging picture of how listeners glean meaning from gestures that accompany speech. Copyright © 2012 Wiley Periodicals, Inc.
Hengst, Julie A.; Frame, Simone R.; Neuman-Stritzel, Tiffany; Gannaway, Rachel
Reported speech, wherein one quotes or paraphrases the speech of another, has been studied extensively as a set of linguistic and discourse practices. Researchers agree that reported speech is pervasive, found across languages, and used in diverse contexts. However, to date, there have been no studies of the use of reported speech among…
In this research the role of the RH in the comprehension of speech acts (or illocutionary force) was examined. Two split-screen experiments were conducted in which participants made lexical decisions for lateralized targets after reading a brief conversation remark. On one-half of the trials the target word named the speech act performed with the…
Alexandrou, Anna Maria; Saarinen, Timo; Kujala, Jan; Salmelin, Riitta
Human utterances demonstrate temporal patterning, also referred to as rhythm. While simple oromotor behaviors (e.g., chewing) feature a salient periodical structure, conversational speech displays a time-varying quasi-rhythmic pattern. Quantification of periodicity in speech is challenging. Unimodal spectral approaches have highlighted rhythmic aspects of speech. However, speech is a complex multimodal phenomenon that arises from the interplay of articulatory, respiratory, and vocal systems. The present study addressed the question of whether a multimodal spectral approach, in the form of coherence analysis between electromyographic (EMG) and acoustic signals, would allow one to characterize rhythm in natural speech more efficiently than a unimodal analysis. The main experimental task consisted of speech production at three speaking rates; a simple oromotor task served as control. The EMG-acoustic coherence emerged as a sensitive means of tracking speech rhythm, whereas spectral analysis of either EMG or acoustic amplitude envelope alone was less informative. Coherence metrics seem to distinguish and highlight rhythmic structure in natural speech.
Bauminger-Zviely, Nirit; Golan-Itshaky, Adi; Tubul-Lavy, Gila
In this study, we videotaped two 10-min. free-play interactions and coded speech acts (SAs) in peer talk of 51 preschoolers (21 ASD, 30 typical), interacting with friend versus non-friend partners. Groups were matched for maternal education, IQ (verbal/nonverbal), and CA. We compared SAs by group (ASD/typical), by partner's friendship status (friend/non-friend), and by partner's disability status. Main results yielded a higher amount and diversity of SAs in the typical than the ASD group (mainly in assertive acts, organizational devices, object-dubbing, and pretend-play); yet, those categories, among others, showed better performance with friends versus non-friends. Overall, a more nuanced perception of the pragmatic deficit in ASD should be adopted, highlighting friendship as an important context for children's development of SAs.
Pittman, Todd Butler
The concept of two (or more) particle entanglement lies at the heart of many fascinating questions concerning the foundations of quantum mechanics. The counterintuitive nonlocal behavior of entangled states led Einstein, Podolsky, and Rosen (EPR) to ask their famous 1935 question, "Can quantum mechanical description of reality be considered complete?". Although the debate has been raging on for more than 60 years, there is still no absolutely conclusive answer to this question. For if entangled states exist and can be observed, then accepting quantum mechanics as a complete theory requires a drastic overhaul of one's physical intuition with regards to the common sense notions of locality and reality put forth by EPR. Contained herein are the results of research investigating various non-classical features of the two-photon entangled states produced in Type-II Spontaneous Parametric Down -Conversion (SPDC). Through a series of experiments we have manifest the nonlocal nature of the quantum mechanical "two-photon effective wavefunction" (or Biphoton) realized by certain photon-counting coincidence measurements performed on these states. In particular, we examine a special double entanglement, in which the states are seen to be simultaneously entangled in both spin and space-time variables. The observed phenomena based on this double entanglement lead to many interesting results which defy classical explanation, but are well described within the framework of quantum mechanics. The implications provide a unique perspective concerning the nature of the photon, and the concept of quantum entanglement.
This thesis aimed partly to examine the effects of gender on conversation dynamics, partly to investigate whether interaction between participants with contrasting opinions promotes cognitive development on a moral task. Another objective was to explore whether particular conversational features of interaction would have any impact upon a pair’s joint response or on each child’s moral development. The conversations were coded with regard to simultaneous speech acts, psychosocial behaviour and...
A. N. Romanenko
Full Text Available The paper deals with description of several speech recognition systems for the Egyptian Colloquial Arabic. The research is based on the CALLHOME Egyptian corpus. The description of both systems, classic: based on Hidden Markov and Gaussian Mixture Models, and state-of-the-art: deep neural network acoustic models is given. We have demonstrated the contribution from the usage of speaker-dependent bottleneck features; for their extraction three extractors based on neural networks were trained. For their training three datasets in several languageswere used:Russian, English and differentArabic dialects.We have studied the possibility of application of a small Modern Standard Arabic (MSA corpus to derive phonetic transcriptions. The experiments have shown that application of the extractor obtained on the basis of the Russian dataset enables to increase significantly the quality of the Arabic speech recognition. We have also stated that the usage of phonetic transcriptions based on modern standard Arabic decreases recognition quality. Nevertheless, system operation results remain applicable in practice. In addition, we have carried out the study of obtained models application for the keywords searching problem solution. The systems obtained demonstrate good results as compared to those published before. Some ways to improve speech recognition are offered.
B.V.A.N.S.S. Prabhakar Rao
Full Text Available Productivity is a very important part of any organisation in general and software industry in particular. Now a day’s Software Effort estimation is a challenging task. Both Effort and Productivity are inter-related to each other. This can be achieved from the employee’s of the organization. Every organisation requires emotionally stable employees in their firm for seamless and progressive working. Of course, in other industries this may be achieved without man power. But, software project development is labour intensive activity. Each line of code should be delivered from software engineer. Tools and techniques may helpful and act as aid or supplementary. Whatever be the reason software industry has been suffering with success rate. Software industry is facing lot of problems in delivering the project on time and within the estimated budget limit. If we want to estimate the required effort of the project it is significant to know the emotional state of the team member. The responsibility of ensuring emotional contentment falls on the human resource department and the department can deploy a series of systems to carry out its survey. This analysis can be done using a variety of tools, one such, is through study of emotion recognition. The data needed for this is readily available and collectable and can be an excellent source for the feedback systems. The challenge of recognition of emotion in speech is convoluted primarily due to the noisy recording condition, the variations in sentiment in sample space and exhibition of multiple emotions in a single sentence. The ambiguity in the labels of training set also increases the complexity of problem addressed. The existing models using probabilistic models have dominated the study but present a flaw in scalability due to statistical inefficiency. The problem of sentiment prediction in spontaneous speech can thus be addressed using a hybrid system comprising of a Convolution Neural Network and
Iberall, A. S. (1978). Cybernetics offers a (hydrodynamic) thermodynamic view of brain activities. An alternative to reflexology . In F. Brambilla, P. K...spontaneous speech that have the effect of equalizing the number of syllables per foot , and thus making the speaker’s output more isochronous
Full Text Available recently, before computers systems were able to synthesize or recognize speech, speech was a capability unique to humans. The human brain has developed to differentiate between human speech and other audio occurrences. Therefore, the slowly- evolving... human brain reacts in certain ways to voice stimuli, and has certain expectations regarding communication by voice. Nass affirms that the human brain operates using the same mechanisms when interacting with speech interfaces as when conversing...
Full Text Available Although the importance of contextual information in speech recognition has been acknowledged for a long time now, it has remained clearly underutilized even in state-of-the-art speech recognition systems. This article introduces a novel, methodologically hybrid approach to the research question of context-dependent speech recognition in human–machine interaction. To the extent that it is hybrid, the approach integrates aspects of both statistical and representational paradigms. We extend the standard statistical pattern-matching approach with a cognitively inspired and analytically tractable model with explanatory power. This methodological extension allows for accounting for contextual information which is otherwise unavailable in speech recognition systems, and using it to improve post-processing of recognition hypotheses. The article introduces an algorithm for evaluation of recognition hypotheses, illustrates it for concrete interaction domains, and discusses its implementation within two prototype conversational agents.
Conroy, Paul; Sage, Karen; Ralph, Matt Lambon
Naming accuracy for nouns and verbs in aphasia can vary across different elicitation contexts, for example, simple picture naming, composite picture description, narratives, and conversation. For some people with aphasia, naming may be more accurate to simple pictures as opposed to naming in spontaneous, connected speech; for others, the opposite pattern may be evident. These differences have, in some instances, been related to word class (for example, noun or verb) as well as aphasia subtype. Given that the aim of picture-naming therapies is to improve word-finding in general, these differences in naming accuracy across contexts may have important implications for the potential functional benefits of picture-naming therapies. This study aimed to explore single-word therapy for both nouns and verbs, and to answer the following questions. (1) To what extent does an increase in naming accuracy after picture-naming therapy (for both nouns and verbs) predict accurate naming of the same items in less constrained spontaneous connected speech tasks such as composite picture description and retelling of a narrative? (2) Does the word class targeted in therapy (verb or noun) dictate whether there is 'carry-over' of the therapy item to connected speech tasks? (3) Does the speed at which the picture is named after therapy predict whether it will also be used appropriately in connected speech tasks? Seven participants with aphasia of varying degrees of severity and subtype took part in ten therapy sessions over five weeks. A set of potentially useful items was collected from control participant accounts of the Cookie Theft Picture Description and the Cinderella Story from the Quantitative Production Analysis. Twenty-four of these words (twelve verbs and twelve nouns) were collated for each participant, on the basis that they had failed to name them in either simple picture naming or connected speech tasks (picture-supported narrative and unsupported retelling of a narrative
Havik, E.; Bastiaanse, Y.R.M.
Background: Cross-linguistic investigation of agrammatic speech in speakers of different languages allows us to tests theoretical accounts of the nature of agrammatism. A significant feature of the speech of many agrammatic speakers is a problem with article production. Mansson and Ahlsen (2001)
Peelle, Jonathan E.; Sommers, Mitchell S.
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported
Peelle, Jonathan E; Sommers, Mitchell S
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration
...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...
The purpose of this study was to investigate the differences between spontaneous and elicited expressive communication in Australian and Taiwanese children with autism who were nonverbal or had limited speech. Thirty-four children with autism (17 Australian and 17 Taiwanese children) participated in this study. Each participant was observed for 2…
John, Peter; Wellmann, J.; Appell, J.E.
This My Practice session presents a novel online tool for practising verbal communication in a maritime setting. It is based on low-fi ChatBot simulation exercises which employ computer-based dialogue systems. The ChatBot exercises are equipped with an automatic speech recognition engine specifically designed for maritime communication. The speech input and output functionality enables learners to communicate with the computer freely and spontaneously. The exercises replicate real communicati...
Nielsen, Jens Bo
Reliable methods for assessing speech intelligibility are essential within hearing research, audiology, and related areas. Such methods can be used for obtaining a better understanding of how speech intelligibility is affected by, e.g., various environmental factors or different types of hearing...... impairment. In this thesis, two sentence-based tests for speech intelligibility in Danish were developed. The first test is the Conversational Language Understanding Evaluation (CLUE), which is based on the principles of the original American-English Hearing in Noise Test (HINT). The second test...... is a modified version of CLUE where the speech material and the scoring rules have been reconsidered. An extensive validation of the modified test was conducted with both normal-hearing and hearing-impaired listeners. The validation showed that the test produces reliable results for both groups of listeners...
Carlile, Simon; Corkhill, Caitlin
To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20?dB improvement in speec...
Full Text Available It is known that Parkinson’s disease is often accompanied by a motor speech disorder, which results in impaired communication. However, people with Parkinson’s disease may also have impaired word retrieval (anomia and other communicative problems, which have a negative impact on their ability to participate in conversations with family as well as healthcare staff. The aim of the present study was to explore effects of impaired speech and language on communication and how this is managed by people with Parkinson’s disease and their spouses. Using a qualitative method based on Conversation Analysis, in-depth analyses were performed on natural conversational interaction in five dyads including elderly men who were at different stages of Parkinson’s disease. The findings showed that the motor speech disorder in combination with word retrieval difficulties and adaptations, such as using communication strategies, may result in atypical utterances that are difficult for communication partners to understand. The coexistence of several communication problems compounds the difficulties faced in conversations and individuals with Parkinson’s disease are often dependent on cooperation with their communication partner to make themselves understood.
Saldert, Charlotta; Bauer, Malin
It is known that Parkinson’s disease is often accompanied by a motor speech disorder, which results in impaired communication. However, people with Parkinson’s disease may also have impaired word retrieval (anomia) and other communicative problems, which have a negative impact on their ability to participate in conversations with family as well as healthcare staff. The aim of the present study was to explore effects of impaired speech and language on communication and how this is managed by people with Parkinson’s disease and their spouses. Using a qualitative method based on Conversation Analysis, in-depth analyses were performed on natural conversational interaction in five dyads including elderly men who were at different stages of Parkinson’s disease. The findings showed that the motor speech disorder in combination with word retrieval difficulties and adaptations, such as using communication strategies, may result in atypical utterances that are difficult for communication partners to understand. The coexistence of several communication problems compounds the difficulties faced in conversations and individuals with Parkinson’s disease are often dependent on cooperation with their communication partner to make themselves understood. PMID:28946714
This research study is based on the analysis of speech in three Spanish conversation classes. Research questions are: What is the ratio of English and Spanish spoken in class? Is classroom speech more predominant in students or the instructor? And, are teachers' beliefs in regards to the use of English and Spanish consistent with their classroom…
Full Text Available The article deals with a number of relevant methodological issues. First of all, the author analyses psychological peculiarities of dialogic speech and states that the dialogue is the product of at least two persons. Therefore, in this view, dialogic speech, unlike monologic speech, happens impromptu and is not prepared in advance. Dialogic speech is mainly of situational character. The linguistic nature of dialogic speech, in the author’s opinion, lies in the process of exchanging replications, which are coherent in structural and functional character. The author classifies dialogue groups by the number of replications and communicative parameters. The basic goal of dialogic speech teaching is developing the abilities and skills which enable to exchange replications. The author distinguishes two basic stages of dialogic speech teaching: 1. Training of abilities to exchange replications during communicative exercises. 2. Development of skills by training the capability to perform exercises of creative nature during a group dialogue, conversation or debate.
Radel, Rémi; Sarrazin, Philippe; Jehu, Marie; Pelletier, Luc
This study examines whether motivation can be primed through unattended speech. Study 1 used a dichotic-listening paradigm and repeated strength measures. In comparison to the baseline condition, in which the unattended channel was only composed by neutral words, the presence of words related to high (low) intensity of motivation led participants to exert more (less) strength when squeezing a hand dynamometer. In a second study, a barely audible conversation was played while participants' attention was mobilized on a demanding task. Participants who were exposed to a conversation depicting intrinsic motivation performed better and persevered longer in a subsequent word-fragment completion task than those exposed to the same conversation made unintelligible. These findings suggest that motivation can be primed without attention. © 2013 The British Psychological Society.
Full Text Available In this paper, we describe the design and develop a database of Indonesian diphone synthesis using speech segment of recorded voice to be converted from text to speech and save it as audio file like WAV or MP3. In designing and develop a database of Indonesian diphone there are several steps to follow; First, developed Diphone database includes: create a list of sample of words consisting of diphones organized by prioritizing looking diphone located in the middle of a word if not at the beginning or end; recording the samples of words by segmentation. ;create diphones made with a tool Diphone Studio 1.3. Second, develop system using Microsoft Visual Delphi 6.0, includes: the conversion system from the input of numbers, acronyms, words, and sentences into representations diphone. There are two kinds of conversion (process alleged in analyzing the Indonesian text-to-speech system. One is to convert the text to be sounded to phonem and two, to convert the phonem to speech. Method used in this research is called Diphone Concatenative synthesis, in which recorded sound segments are collected. Every segment consists of a diphone (2 phonems. This synthesizer may produce voice with high level of naturalness. The Indonesian Text to Speech system can differentiate special phonemes like in ‘Beda’ and ‘Bedak’ but sample of other spesific words is necessary to put into the system. This Indonesia TTS system can handle texts with abbreviation, there is the facility to add such words.
speaking to their 18 month old children (infant directed speech - IDS) as opposed to an adult (adult directed speech - ADS). Caregivers were recorded talking about toy animals in conversations with their child and with an adult interlocutor. The toy names were designed to elicit Danish contrasts differing......, the Euclidean F1/F2 differences between vowels, F0 of the stressed (first) syllable in the toy name, as well as the duration of the stressed syllable, the vowels, and the fricatives. Results of the acoustic differences between ADS and IDS were compared to the results of parents' reports on the children...
Full Text Available The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA noise and speech spectrum-filtered noise (SPN]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA. The results showed a significant effect of piano performance speed and octave (P<.01. Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01 and SPN (P<.05. Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01, but there were smaller differences between masking conditions (P<.01. It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.
Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...
Full Text Available Spoken text differs from written one in its features of context dependency, turn-taking organization, and dynamic structure. EFL learners; however, sometime find it difficult to produce typical characteristics of spoken language, particularly in casual talk. When they are asked to conduct a conversation, some of them tend to be script-based which is considered unnatural. Using the theory of Thornburry (2005, this paper aims to analyze characteristics of spoken language in casual conversation which cover spontaneity, interactivity, interpersonality, and coherence. This study used discourse analysis to reveal four features in turns and moves of three casual conversations. The findings indicate that not all sub-features used in the conversation. In this case, the spontaneity features were used 132 times; the interactivity features were used 1081 times; the interpersonality features were used 257 times; while the coherence features (negotiation features were used 526 times. Besides, the results also present that some participants seem to dominantly produce some sub-features naturally and vice versa. Therefore, this finding is expected to be beneficial to provide a model of how spoken interaction should be carried out. More importantly, it could raise English teachers or lecturers‘ awareness in teaching features of spoken language, so that, the students could develop their communicative competence as the native speakers of English do.
Purpose: The current study sought to investigate the separate effects of dysarthria and cognitive status on global speech timing, speech hesitation, and linguistic complexity characteristics and how these speech behaviors impose on listener impressions for three connected speech tasks presumed to differ in cognitive-linguistic demand for four carefully defined speaker groups; 1) MS with cognitive deficits (MSCI), 2) MS with clinically diagnosed dysarthria and intact cognition (MSDYS), 3) MS without dysarthria or cognitive deficits (MS), and 4) healthy talkers (CON). The relationship between neuropsychological test scores and speech-language production and perceptual variables for speakers with cognitive deficits was also explored. Methods: 48 speakers, including 36 individuals reporting a neurological diagnosis of MS and 12 healthy talkers participated. The three MS groups and control group each contained 12 speakers (8 women and 4 men). Cognitive function was quantified using standard clinical tests of memory, information processing speed, and executive function. A standard z-score of ≤ -1.50 indicated deficits in a given cognitive domain. Three certified speech-language pathologists determined the clinical diagnosis of dysarthria for speakers with MS. Experimental speech tasks of interest included audio-recordings of an oral reading of the Grandfather passage and two spontaneous speech samples in the form of Familiar and Unfamiliar descriptive discourse. Various measures of spoken language were of interest. Suprasegmental acoustic measures included speech and articulatory rate. Linguistic speech hesitation measures included pause frequency (i.e., silent and filled pauses), mean silent pause duration, grammatical appropriateness of pauses, and interjection frequency. For the two discourse samples, three standard measures of language complexity were obtained including subordination index, inter-sentence cohesion adequacy, and lexical diversity. Ten listeners
Características iniciais da comunicação verbal de pré-escolares com Alterações Específicas do Desenvolvimento da Linguagem em fala espontânea Primary characteristics of the verbal communication of preschoolers with Specific Language Impairment in spontaneous speech
Debora Maria Befi-Lopes
Full Text Available OBJETIVO: Verificar desempenho fonológico de pré-escolares com Alterações Específicas do Desenvolvimento da Linguagem (AEDL em fala espontânea. MÉTODOS: Foram sujeitos 27 crianças com AEDL, entre três anos e cinco anos e 11 meses, em tratamento fonoaudiológico. Foram selecionados aqueles que realizaram ao menos 50% da avaliação da fonologia a partir de provas de nomeação e imitação de palavras, ou que apresentaram inteligibilidade de fala passível de análise. Foram coletadas amostras de fala na prova de pragmática e no discurso eliciado por figuras. Foram realizadas análises a partir da utilização de processos fonológicos do desenvolvimento de linguagem (PD e idiossincráticos (PI. RESULTADOS: A estatística descritiva (médias de PD e PI indicou grande variabilidade intra-grupos. Não houve variação em número de processos conforme a idade (PD: p=0,38; PI: p=0,72, porém houve predominância de PD em todas as idades, nas duas provas aplicadas (Z=-6,327; pPURPOSE: To verify the phonological performance of preschoolers with Specific Language Impairment (SLI in spontaneous speech. METHODS: The subjects were 27 children with SLI with ages between three years and five years and 11 months, who attended Speech-Language Pathology therapy. The subjects who carried out at least 50% of the phonological assessment or who had speech intelligibility that allowed analysis were selected. Speech samples were obtained from a pragmatics evaluation and from elicited discourse. Analyses considered the use of developmental (DP and idiossyncratic phonological processes (IP in spontaneous speech. RESULTS: The descriptive statistics (mean DP and IP showed large within-group variability. There was no variation in the number of processes according to age (DP: p=0.38; IP: p=0.72, but there was a prevalence of DP in all ages, in both tests (Z=-6.327; p<0.001. The occurrence of DP and IP was higher in the pragmatics evaluation (p<0.001, situation in
Florence Gacoin Marks
The paper deals with the transformation of Flaubert’s free indirect speech in the film Madame Bovary by Claude Chabrol. Conversion of free indirect speech into direct speech or into narration by an external narrator (voice-over) cannot be avoided, it does, however, pose many problems because of the potential ambiguousness (polyphony) of free indirect speech. In such cases, Chabrol often finds effective solutions which bring the film closer to Flaubert’s style. Nevertheless, it remains clear t...
Heinemann, Trine; Wagner, Johannes
This paper investigates how speakers who are about to produce, or in the midst of producing, reported speech and thought (RT), temporarily abandon the production of RT to include other material. Using Conversation Analysis, we identify three positions in which RT is abandoned temporarily...
Davis, Matthew H.; Coleman, Martin R.; Absalom, Anthony R.; Rodd, Jennifer M.; Johnsrude, Ingrid S.; Matta, Basil F.; Owen, Adrian M.; Menon, David K.
We used functional MRI and the anesthetic agent propofol to assess the relationship among neural responses to speech, successful comprehension, and conscious awareness. Volunteers were scanned while listening to sentences containing ambiguous words, matched sentences without ambiguous words, and signal-correlated noise (SCN). During three scanning sessions, participants were nonsedated (awake), lightly sedated (a slowed response to conversation), and deeply sedated (no conversational response...
Griffiths, Sarah; Barnes, Rebecca; Britten, Nicky; Wilkinson, Ray
Around 70% of people who develop Parkinson's disease (PD) experience speech and voice changes. Clinicians often find that when asked about their primary communication concerns, PD clients will talk about the difficulties they have 'getting into' conversations. This is an important area for clients and it has implications for quality of life and clinical management. To review the extant literature on PD and communication impairments in order to reveal key topic areas, the range of methodologies applied, and any gaps in knowledge relating to PD and social interaction and how these might be usefully addressed. A systematic search of a number of key databases and available grey literatures regarding PD and communication impairment was conducted (including motor speech changes, intelligibility, cognitive/language changes) to obtain a sense of key areas and methodologies applied. Research applying conversation analysis in the field of communication disability was also reviewed to illustrate the value of this methodology in uncovering common interactional difficulties, and in revealing the use of strategic collaborative competencies in naturally occurring conversation. In addition, available speech and language therapy assessment and intervention approaches to PD were examined with a view to their effectiveness in promoting individualized intervention planning and advice-giving for everyday interaction. A great deal has been written about the deficits underpinning communication changes in PD and the impact of communication disability on the self and others as measured in a clinical setting. Less is known about what happens for this client group in everyday conversations outside of the clinic. Current speech and language therapy assessments and interventions focus on the individual and are largely impairment based or focused on compensatory speaker-oriented techniques. A conversation analysis approach would complement basic research on what actually happens in everyday
Wöllmer, Martin; Marchi, Erik; Squartini, Stefano; Schuller, Björn
Highly spontaneous, conversational, and potentially emotional and noisy speech is known to be a challenge for today's automatic speech recognition (ASR) systems, which highlights the need for advanced algorithms that improve speech features and models. Histogram Equalization is an efficient method to reduce the mismatch between clean and noisy conditions by normalizing all moments of the probability distribution of the feature vector components. In this article, we propose to combine histogram equalization and multi-condition training for robust keyword detection in noisy speech. To better cope with conversational speaking styles, we show how contextual information can be effectively exploited in a multi-stream ASR framework that dynamically models context-sensitive phoneme estimates generated by a long short-term memory neural network. The proposed techniques are evaluated on the SEMAINE database-a corpus containing emotionally colored conversations with a cognitive system for "Sensitive Artificial Listening".
Lind, Marianne; Kristoffersen, Kristian Emil; Moen, Inger; Simonsen, Hanne Gram
Functionally relevant assessment of the language production of speakers with aphasia should include assessment of connected speech production. Despite the ecological validity of everyday conversations, more controlled and monological types of texts may be easier to obtain and analyse in clinical practice. This article discusses some simple…
de Carvalho-Teles, Viviane; Pegoraro-Krook, Maria Inês; Lauris, José Roberto Pereira
Most patients who have undergone resection of the maxillae due to benign or malignant tumors in the palatomaxillary region present with speech and swallowing disorders. Coupling of the oral and nasal cavities increases nasal resonance, resulting in hypernasality and unintelligible speech. Prosthodontic rehabilitation of maxillary resections with effective separation of the oral and nasal cavities can improve speech and esthetics, and assist the psychosocial adjustment of the patient as well. The objective of this study was to evaluate the efficacy of the palatal obturator prosthesis on speech intelligibility and resonance of 23 patients with age ranging from 18 to 83 years (Mean = 49.5 years), who had undergone inframedial-structural maxillectomy. The patients were requested to count from 1 to 20, to repeat 21 words and to spontaneously speak for 15 seconds, once with and again without the prosthesis, for tape recording purposes. The resonance and speech intelligibility were judged by 5 speech language pathologists from the tape recordings samples. The results have shown that the majority of patients (82.6%) significantly improved their speech intelligibility, and 16 patients (69.9%) exhibited a significant hypernasality reduction with the obturator in place. The results of this study indicated that maxillary obturator prosthesis was efficient to improve the speech intelligibility and resonance in patients who had undergone maxillectomy. PMID:19089242
Kalinowski, Joseph; Saltuklaroglu, Tim
'Choral speech', 'unison speech', or 'imitation speech' has long been known to immediately induce reflexive, spontaneous, and natural sounding fluency, even the most severe cases of stuttering. Unlike typical post-therapeutic speech, a hallmark characteristic of choral speech is the sense of 'invulnerability' to stuttering, regardless of phonetic context, situational environment, or audience size. We suggest that choral speech immediately inhibits stuttering by engaging mirror systems of neurons, innate primitive neuronal substrates that dominate the initial phases of language development due to their predisposition to reflexively imitate gestural action sequences in a fluent manner. Since mirror systems are primordial in nature, they take precedence over the much later developing stuttering pathology. We suggest that stuttering may best be ameliorated by reengaging mirror neurons via choral speech or one of its derivatives (using digital signal processing technology) to provide gestural mirrors, that are nature's way of immediately overriding the central stuttering block. Copyright 2003 Elsevier Science Ltd.
Vogel, Adam P; Block, Susan; Kefalianos, Elaina; Onslow, Mark; Eadie, Patricia; Barth, Ben; Conway, Laura; Mundt, James C; Reilly, Sheena
To investigate the feasibility of adopting automated interactive voice response (IVR) technology for remotely capturing standardized speech samples from stuttering children. Participants were 10 6-year-old stuttering children. Their parents called a toll-free number from their homes and were prompted to elicit speech from their children using a standard protocol involving conversation, picture description and games. The automated IVR system was implemented using an off-the-shelf telephony software program and delivered by a standard desktop computer. The software infrastructure utilizes voice over internet protocol. Speech samples were automatically recorded during the calls. Video recordings were simultaneously acquired in the home at the time of the call to evaluate the fidelity of the telephone collected samples. Key outcome measures included syllables spoken, percentage of syllables stuttered and an overall rating of stuttering severity using a 10-point scale. Data revealed a high level of relative reliability in terms of intra-class correlation between the video and telephone acquired samples on all outcome measures during the conversation task. Findings were less consistent for speech samples during picture description and games. Results suggest that IVR technology can be used successfully to automate remote capture of child speech samples.
This book covers language modeling and automatic speech recognition for inflective languages (e.g. Slavic languages), which represent roughly half of the languages spoken in Europe. These languages do not perform as well as English in speech recognition systems and it is therefore harder to develop an application with sufficient quality for the end user. The authors describe the most important language features for the development of a speech recognition system. This is then presented through the analysis of errors in the system and the development of language models and their inclusion in speech recognition systems, which specifically address the errors that are relevant for targeted applications. The error analysis is done with regard to morphological characteristics of the word in the recognized sentences. The book is oriented towards speech recognition with large vocabularies and continuous and even spontaneous speech. Today such applications work with a rather small number of languages compared to the nu...
Özçaliskan, Seyda; Adamson, Lauren B.; Dimitrova, Nevena; Bailey, Jhonelle; Schmuck, Lauren
Early spontaneous gesture, specifically deictic gesture, predicts subsequent vocabulary development in typically developing (TD) children. Here, we ask whether deictic gesture plays a similar role in predicting later vocabulary size in children with Down Syndrome (DS), who have been shown to have difficulties in speech production, but strengths in…
Zahra Sadat Ghoreishi
Full Text Available Objectives: Lexical access is the process by which the basic conceptual, syntactical and morpho-phonological information of words are activated. Most studies of lexical access have focused on picture naming. There is hardly any previous research on other parameters of lexical access such as verbal fluency and analysis of connected speech in Persian normal participants. This study investigates the lexical access performance in normal speakers in different issues such as age, sex and education. Methods: The performance of 120 adult Persian speakers in three tasks including picture naming, verbal fluency and connected speech, was examined using "Persian Lexical Access Assessment Package”. The performance of participants between two gender groups (male/female, three education groups (below 5 years, above 12 years, between 5 and 12 years and three age groups (18-35 years, 36-55 years, 56-75 years were compared. Results: According to findings, picture naming increased with increasing education and decreased with increasing age. The performance of participants in phonological and semantic verbal fluency showed improvement with age and education. No significant difference was seen between males and females in verbal fluency task. In the analysis of connected speech there were no significant differences between different age and education groups and just mean length of utterance in males was significantly higher than females. Discussion: The findings could be a primitive scale for comparison between normal subjects and patients in lexical access tasks, furthermore it could be a horizon for planning of treatment goals in patients with word finding problem according to age, gender and education.
Nayar, Harry S; Cray, James J; MacIsaac, Zoe M; Argenta, Anne E; Ford, Matthew D; Fenton, Regina A; Losee, Joseph E; Grunwaldt, Lorelei J
Velopharyngeal insufficiency occurs in a nontrivial number of cases following cleft palate repair. We hypothesize that a conversion Furlow palatoplasty allows for long-term correction of VPI resulting from a failed primary palate repair, obviating the need for pharyngoplasty and its attendant comorbidities. A retrospective review of patients undergoing a conversion Furlow palatoplasty between 2003 and 2010 was performed. Patients were grouped according to the type of preceding palatal repair. Velopharyngeal insufficiency was assessed using Pittsburgh Weighted Speech Scale (PWSS). Scores were recorded and compared preoperatively and postoperatively at 3 sequential visits. Sixty-two patients met inclusion criteria and were grouped by preceding repair (straight-line repair (n = 37), straight-line repair with subsequent oronasal fistula (n = 14), or pharyngeal flap (n = 11). Median PWSS scores at individual visits were as follows: preoperative = 11, first postoperative = 3 (mean, 114.0 ± 6.7 days), second postoperative = 1 (mean, 529.0 ± 29.1 days), and most recent postoperative = 3 (mean, 1368.6 ± 76.9 days). There was a significant difference between preoperative and postoperative PWSS scores in the entire cohort (P the exception of the second to the most recent visit. There were no differences between postoperative PWSS scores in the operative subgroupings (P > 0.05). Eight patients failed to improve and showed no differences in PWSS scores over time (P > 0.05). Patients with a PWSS score of 7 or greater (n = 8) at the first postoperative visit (0-6 months) displayed improvement at the most recent visit (Pspeech. Future studies should elucidate which factors predict the success of this technique following failed palate repair.
Rhone, Ariane E; Nourski, Kirill V; Oya, Hiroyuki; Kawasaki, Hiroto; Howard, Matthew A; McMurray, Bob
In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas.
Walters, F. Scott
Speech act theory-based, second language pragmatics testing (SLPT) raises test-validation issues owing to a lack of correspondence with empirical conversational data. On the assumption that conversation analysis (CA) provides a more accurate account of language use, it is suggested that CA serve as a more empirically valid basis for SLPT…
Santiago Omar Caballero Morales
Full Text Available Dysarthria is a motor speech disorder due to weakness or poor coordination of the speech muscles. This condition can be caused by a stroke, traumatic brain injury, or by a degenerative neurological disease. Commonly, people with this disorder also have muscular dystrophy, which restricts their use of switches or keyboards for communication or control of assistive devices (i.e., an electric wheelchair or a service robot. In this case, speech recognition is an attractive alternative for interaction and control of service robots, despite the difficulty of achieving robust recognition performance. In this paper we present a speech recognition system for human and service robot interaction for Mexican Spanish dysarthric speakers. The core of the system consisted of a Speaker Adaptive (SA recognition system trained with normal-speech. Features such as on-line control of the language model perplexity and the adding of vocabulary, contribute to high recognition performance. Others, such as assessment and text-to-speech (TTS synthesis, contribute to a more complete interaction with a service robot. Live tests were performed with two mild dysarthric speakers, achieving recognition accuracies of 90–95% for spontaneous speech and 95–100% of accomplished simulated service robot tasks.
Carenini, Giuseppe; Murray, Gabriel
Due to the Internet Revolution, human conversational data -- in written forms -- are accumulating at a phenomenal rate. At the same time, improvements in speech technology enable many spoken conversations to be transcribed. Individuals and organizations engage in email exchanges, face-to-face meetings, blogging, texting and other social media activities. The advances in natural language processing provide ample opportunities for these "informal documents" to be analyzed and mined, thus creating numerous new and valuable applications. This book presents a set of computational methods
Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.
This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...
Yao, Bo; Belin, Pascal; Scheepers, Christoph
In human communication, direct speech (e.g., Mary said: "I'm hungry") is perceived to be more vivid than indirect speech (e.g., Mary said [that] she was hungry). However, for silent reading, the representational consequences of this distinction are still unclear. Although many of us share the intuition of an "inner voice," particularly during silent reading of direct speech statements in text, there has been little direct empirical confirmation of this experience so far. Combining fMRI with eye tracking in human volunteers, we show that silent reading of direct versus indirect speech engenders differential brain activation in voice-selective areas of the auditory cortex. This suggests that readers are indeed more likely to engage in perceptual simulations (or spontaneous imagery) of the reported speaker's voice when reading direct speech as opposed to meaning-equivalent indirect speech statements as part of a more vivid representation of the former. Our results may be interpreted in line with embodied cognition and form a starting point for more sophisticated interdisciplinary research on the nature of auditory mental simulation during reading.
Picciotto, C.E.; Zahir, M.S.
A new mechanism is used to calculate μ - → e + conversion in nuclei, based on the existence of a doubly charged Higgs scalar. The scalar is part of a triplet which generates the spontaneous breakdown of B-L symmetry in an extension of the standard model, as proposed by Gelmini and Roncadelli. We find a limit for conversion rates which is comparable to those of earlier calculations
Schalling, Ellika; Johansson, Kerstin; Hartelius, Lena
Changes in communicative functions are common in Parkinson's disease (PD), but there are only limited data provided by individuals with PD on how these changes are perceived, what their consequences are, and what type of intervention is provided. To present self-reported information about speech and communication, the impact on communicative participation, and the amount and type of speech-language pathology services received by people with PD. Respondents with PD recruited via the Swedish Parkinson's Disease Society filled out a questionnaire accessed via a Web link or provided in a paper version. Of 188 respondents, 92.5% reported at least one symptom related to communication; the most common symptoms were weak voice, word-finding difficulties, imprecise articulation, and getting off topic in conversation. The speech and communication problems resulted in restricted communicative participation for between a quarter and a third of the respondents, and their speech caused embarrassment sometimes or more often to more than half. Forty-five percent of the respondents had received speech-language pathology services. Most respondents reported both speech and language symptoms, and many experienced restricted communicative participation. Access to speech-language pathology services is still inadequate. Services should also address cognitive/linguistic aspects to meet the needs of people with PD. © 2018 S. Karger AG, Basel.
Berger, Stephanie; Niebuhr, Oliver; Fischer, Kerstin
The research initiative Innovating Speech EliCitation Techniques (INSPECT) aims to describe and quantify how recording methods, situations and materials influence speech produc-tion in lab-speech experiments. On this basis, INSPECT aims to develop methods that reliably stimulate specific patterns...... and styles of speech, like expressive or conversational speech or different types emphatic accents. The present study investigates if and how different text highlighting methods (yellow background, bold, capital letter, italics, and underlining) make speakers reinforce the level of perceived prominence...
Hirano, Shigeru; Naito, Yasushi; Kojima, Hisayoshi
We review the literature on speech processing in the central nervous system as demonstrated by positron emission tomography (PET). Activation study using PET has been proved to be a useful and non-invasive method of investigating the speech processing system in normal subjects. In speech recognition, the auditory association areas and lexico-semantic areas called Wernicke's area play important roles. Broca's area, motor areas, supplementary motor cortices and the prefrontal area have been proved to be related to speech output. Visual speech stimulation activates not only the visual association areas but also the temporal region and prefrontal area, especially in lexico-semantic processing. Higher level speech processing, such as conversation which includes auditory processing, vocalization and thinking, activates broad areas in both hemispheres. This paper also discusses problems to be resolved in the future. (author) 42 refs
Duran, Nicholas; Fusaroli, Riccardo; Paxton, Alexandra
-based NLP tools, the procedure begins by taking conversational partners' turns and converting each into a lemmatized sequence of words, assigning part-of-speech tags and computing high-dimensional semantic vectors per each utterance. Words and part-of-speech tags are further sequenced into n-g! rams...
Full Text Available The objective of this research is the evident positive vocal development in pre-lingual deaf children, who underwent a Cochlea Implantation in early age. The presented research compares the vocal speech expressions of three hearing impaired children and two children with normal hearing from 10 months to 5 years. Comparisons of the spontaneous vocal expressions were conducted by sonagraphic analyses. The awareness of the own voice as well as the voices of others is essential for the child’s continuous vocal development from crying to speech. Supra-segmental factors, such as rhythm, dynamics and melody play a very important role in this development.
Carlile, Simon; Corkhill, Caitlin
To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20 dB improvement in speech reception threshold; 40% of which was attributed to a release from informational masking. When across frequency temporal modulations in the masker talkers are decorrelated the speech is unintelligible, although the within frequency modulation characteristics remains identical. Used as a masker as above, the information masking accounted for 37% of the spatial unmasking seen with this masker. This unintelligible and highly differentiable masker is unlikely to involve top-down processes. These data provides strong evidence of bottom-up masking involving speech-like, within-frequency modulations and that this, presumably low level process, can be modulated by selective spatial attention.
C. Hendriks, Richard; Gerkmann, Timo; Jensen, Jesper
As speech processing devices like mobile phones, voice controlled devices, and hearing aids have increased in popularity, people expect them to work anywhere and at any time without user intervention. However, the presence of acoustical disturbances limits the use of these applications, degrades...... their performance, or causes the user difficulties in understanding the conversation or appreciating the device. A common way to reduce the effects of such disturbances is through the use of single-microphone noise reduction algorithms for speech enhancement. The field of single-microphone noise reduction...
Kojo, Nobuto; Tokutomi, Takashi; Eguchi, Gihachirou; Takagi, Shigeyuki; Matsumoto, Tomie; Sasaguri, Yasuyuki; Shigemori, Minoru.
In a 46-year-old female with a 1-month history of gait and speech disturbances, computed tomography (CT) demonstrated mass lesions of slightly high density in the left basal ganglia and left frontal lobe. The lesions were markedly enhanced by contrast medium. The patient received no specific treatment, but her clinical manifestations gradually abated and the lesions decreased in size. Five months after her initial examination, the lesions were absent on CT scans; only a small area of low density remained. Residual clinical symptoms included mild right hemiparesis and aphasia. After 14 months the patient again deteriorated, and a CT scan revealed mass lesions in the right frontal lobe and the pons. However, no enhancement was observed in the previously affected regions. A biopsy revealed malignant lymphoma. Despite treatment with steroids and radiation, the patient's clinical status progressively worsened and she died 27 months after initial presentation. Seven other cases of spontaneous regression of primary malignant lymphoma have been reported. In this case, the mechanism of the spontaneous regression was not clear, but changes in immunologic status may have been involved. (author)
Nose, Takashi; Kobayashi, Takao
In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.
Evans, Julia L.; Alibali, Martha W.; McNeil, Nicole M.
Explores the extent to which children with specific language impairment (SLI) with severe phonological working memory deficits express knowledge uniquely in gesture as compared to speech. Using a paradigm in which gesture-speech relationships have been studied extensively, children with SLI and conversation judgment-matched, typically developing…
Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne
Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and
Ireland, David; Atay, Christina; Liddle, Jacki; Bradford, Dana; Lee, Helen; Rushin, Olivia; Mullins, Thomas; Angus, Dan; Wiles, Janet; McBride, Simon; Vogel, Adam
People with neurological conditions such as Parkinson's disease and dementia are known to have difficulties in language and communication. This paper presents initial testing of an artificial conversational agent, called Harlie. Harlie runs on a smartphone and is able to converse with the user on a variety of topics. A description of the application and a sample dialog are provided to illustrate the various roles chat-bots can play in the management of neurological conditions. Harlie can be used for measuring voice and communication outcomes during the daily life of the user, and for gaining information about challenges encountered. Moreover, it is anticipated that she may also have an educational and support role.
Full Text Available This paper presents analysis of misunderstanding occurred in a conversation which is caused by different interpretation of speech act labels between the speaker and the hearer. Misunderstanding occurred in these comic series causes various emotional effects to the hearer involved in the conversation. The hearer might feel happy, impressed, embarrassed, or even proud of what the speaker conveys through his/ her utterance. It depends on the face wants used and intended between the participants in the conversation. According to Goffman in Brown and Levinson (1987, “face is something that is emotionally invested, and that can be lost, maintained, or enhanced, and must be constantly attended to in interaction” (p. 60. There are two kinds of face wants. The positive purpose is called face saving act, while the negative one is called face threatening act. The data in this paper are taken from Tintin and Asterix comic series. The theories used cover pragmatics area, especially taxonomy of speech act theory (Yule, 1996; Mey, 2001; Leech, 1991 and theory of the notion of face by Erving Goffman (as cited in Yule, 1996; Thomas, 1995. Therefore, this paper will try to convey how the misinterpretation of speech act labels affects the participants in the conversation.
Kell, Christian A; Neumann, Katrin; Behrens, Marion; von Gudenberg, Alexander W; Giraud, Anne-Lise
We previously reported speaking-related activity changes associated with assisted recovery induced by a fluency shaping therapy program and unassisted recovery from developmental stuttering (Kell et al., Brain 2009). While assisted recovery re-lateralized activity to the left hemisphere, unassisted recovery was specifically associated with the activation of the left BA 47/12 in the lateral orbitofrontal cortex. These findings suggested plastic changes in speaking-related functional connectivity between left hemispheric speech network nodes. We reanalyzed these data involving 13 stuttering men before and after fluency shaping, 13 men who recovered spontaneously from their stuttering, and 13 male control participants, and examined functional connectivity during overt vs. covert reading by means of psychophysiological interactions computed across left cortical regions involved in articulation control. Persistent stuttering was associated with reduced auditory-motor coupling and enhanced integration of somatosensory feedback between the supramarginal gyrus and the prefrontal cortex. Assisted recovery reduced this hyper-connectivity and increased functional connectivity between the articulatory motor cortex and the auditory feedback processing anterior superior temporal gyrus. In spontaneous recovery, both auditory-motor coupling and integration of somatosensory feedback were normalized. In addition, activity in the left orbitofrontal cortex and superior cerebellum appeared uncoupled from the rest of the speech production network. These data suggest that therapy and spontaneous recovery normalizes the left hemispheric speaking-related activity via an improvement of auditory-motor mapping. By contrast, long-lasting unassisted recovery from stuttering is additionally supported by a functional isolation of the superior cerebellum from the rest of the speech production network, through the pivotal left BA 47/12. Copyright © 2017 Elsevier Inc. All rights reserved.
Frank H Guenther
Full Text Available Brain-machine interfaces (BMIs involving electrodes implanted into the human cerebral cortex have recently been developed in an attempt to restore function to profoundly paralyzed individuals. Current BMIs for restoring communication can provide important capabilities via a typing process, but unfortunately they are only capable of slow communication rates. In the current study we use a novel approach to speech restoration in which we decode continuous auditory parameters for a real-time speech synthesizer from neuronal activity in motor cortex during attempted speech.Neural signals recorded by a Neurotrophic Electrode implanted in a speech-related region of the left precentral gyrus of a human volunteer suffering from locked-in syndrome, characterized by near-total paralysis with spared cognition, were transmitted wirelessly across the scalp and used to drive a speech synthesizer. A Kalman filter-based decoder translated the neural signals generated during attempted speech into continuous parameters for controlling a synthesizer that provided immediate (within 50 ms auditory feedback of the decoded sound. Accuracy of the volunteer's vowel productions with the synthesizer improved quickly with practice, with a 25% improvement in average hit rate (from 45% to 70% and 46% decrease in average endpoint error from the first to the last block of a three-vowel task.Our results support the feasibility of neural prostheses that may have the potential to provide near-conversational synthetic speech output for individuals with severely impaired speech motor control. They also provide an initial glimpse into the functional properties of neurons in speech motor cortical areas.
Fluent aphasia of the anomic and Wernicke's type is characterized by word retrieval difficulties. However, in fluent aphasic speech, grammatical deviations have been observed as well. There is debate as to whether these grammatical problems are caused by the word retrieval deficit, by an additional
Trouvain, Jürgen; Truong, Khiet Phuong
In this study, we analysed laughter in dyadic conversational interaction. We attempted to categorise patterns of speaking and laughing activity in conversation in order to gain more insight into how speaking and laughing are timed and related to each other. Special attention was paid to a particular
Davis, Matthew H.; Coleman, Martin R.; Absalom, Anthony R.; Rodd, Jennifer M.; Johnsrude, Ingrid S.; Matta, Basil F.; Owen, Adrian M.; Menon, David K.
We used functional MRI and the anesthetic agent propofol to assess the relationship among neural responses to speech, successful comprehension, and conscious awareness. Volunteers were scanned while listening to sentences containing ambiguous words, matched sentences without ambiguous words, and signal-correlated noise (SCN). During three scanning sessions, participants were nonsedated (awake), lightly sedated (a slowed response to conversation), and deeply sedated (no conversational response, rousable by loud command). Bilateral temporal-lobe responses for sentences compared with signal-correlated noise were observed at all three levels of sedation, although prefrontal and premotor responses to speech were absent at the deepest level of sedation. Additional inferior frontal and posterior temporal responses to ambiguous sentences provide a neural correlate of semantic processes critical for comprehending sentences containing ambiguous words. However, this additional response was absent during light sedation, suggesting a marked impairment of sentence comprehension. A significant decline in postscan recognition memory for sentences also suggests that sedation impaired encoding of sentences into memory, with left inferior frontal and temporal lobe responses during light sedation predicting subsequent recognition memory. These findings suggest a graded degradation of cognitive function in response to sedation such that “higher-level” semantic and mnemonic processes can be impaired at relatively low levels of sedation, whereas perceptual processing of speech remains resilient even during deep sedation. These results have important implications for understanding the relationship between speech comprehension and awareness in the healthy brain in patients receiving sedation and in patients with disorders of consciousness. PMID:17938125
Cleary, Rebecca A; Poliakoff, Ellen; Galpin, Adam; Dick, Jeremy P R; Holler, Judith
Parkinson's disease (PD) can impact enormously on speech communication. One aspect of non-verbal behaviour closely tied to speech is co-speech gesture production. In healthy people, co-speech gestures can add significant meaning and emphasis to speech. There is, however, little research into how this important channel of communication is affected in PD. The present study provides a systematic analysis of co-speech gestures which spontaneously accompany the description of actions in a group of PD patients (N = 23, Hoehn and Yahr Stage III or less) and age-matched healthy controls (N = 22). The analysis considers different co-speech gesture types, using established classification schemes from the field of gesture research. The analysis focuses on the rate of these gestures as well as on their qualitative nature. In doing so, the analysis attempts to overcome several methodological shortcomings of research in this area. Contrary to expectation, gesture rate was not significantly affected in our patient group, with relatively mild PD. This indicates that co-speech gestures could compensate for speech problems. However, while gesture rate seems unaffected, the qualitative precision of gestures representing actions was significantly reduced. This study demonstrates the feasibility of carrying out fine-grained, detailed analyses of gestures in PD and offers insights into an as yet neglected facet of communication in patients with PD. Based on the present findings, an important next step is the closer investigation of the qualitative changes in gesture (including different communicative situations) and an analysis of the heterogeneity in co-speech gesture production in PD. Copyright © 2011 Elsevier Ltd. All rights reserved.
Full Text Available Conversational agents have become a strong alternative to enhance educational systems with intelligent communicative capabilities, provide motivation and engagement, and increment significant learning and helping in the acquisition of meta-cognitive skills. In this paper, we present Geranium, a multimodal conversational agent that helps children to appreciate and protect their environment. The system, which integrates an interactive chatbot, has been developed by means of a modular and scalable framework that eases building pedagogic conversational agents that can interact with the students using speech and natural language.
Leclercq, Anne-Lise; Suaire, Pauline; Moyse, Astrid
The aim of this study was to establish normative data on the speech disfluencies of normally fluent French-speaking children at age 4, an age at which stuttering has begun in 95% of children who stutter (Yairi & Ambrose, 2013). Fifty monolingual French-speaking children who do not stutter participated in the study. Analyses of a conversational speech sample comprising 250-550 words revealed an average of 10% total disfluencies, 2% stuttering-like disfluencies and around 8% non-stuttered disfluencies. Possible explanations for these high speech disfluency frequencies are discussed, including explanations linked to French in particular. The results shed light on the importance of normative data specific to each language.
Zion Golumbic, Elana M.; Poeppel, David; Schroeder, Charles E.
The human capacity for processing speech is remarkable, especially given that information in speech unfolds over multiple time scales concurrently. Similarly notable is our ability to filter out of extraneous sounds and focus our attention on one conversation, epitomized by the ‘Cocktail Party’ effect. Yet, the neural mechanisms underlying on-line speech decoding and attentional stream selection are not well understood. We review findings from behavioral and neurophysiological investigations that underscore the importance of the temporal structure of speech for achieving these perceptual feats. We discuss the hypothesis that entrainment of ambient neuronal oscillations to speech’s temporal structure, across multiple time-scales, serves to facilitate its decoding and underlies the selection of an attended speech stream over other competing input. In this regard, speech decoding and attentional stream selection are examples of ‘active sensing’, emphasizing an interaction between proactive and predictive top-down modulation of neuronal dynamics and bottom-up sensory input. PMID:22285024
Levelt, W.J.M.; Roelofs, A.P.A.; Meyer, A.S.
Preparing words in speech production is normally a fast and accurate process. We generate them two or three per second in fluent conversation; and overtly naming a clear picture of an object can easily be initiated within 600 ms after picture onset. The underlying process, however, is exceedingly
Roman Aleksandrovich Vasilyev
Full Text Available In the work the method of the phonetic analysis of speech — allocation of the list of elementary speech units such as separate phonemes from a continuous stream of informal conversation of the specific announcer is offered. The practical algorithm of identification of the announcer — process of definition speaking of the set of announcers is described.
Shohreh Shahpouri Arani
Full Text Available This paper aims at finding out the forms and functions of directive speech acts uttered by Persian-speaking children. The writer’s goal is to discover the distinct strategies applied by speakers of nursery school age children regarding three parameters: the choice of form, the negotiation of communicative goals within conversation, and the protection of face. The data collected for this purpose are based on actual school conversational situations that were audio recorded in four nursery schools during classroom work and playtime activities. Children, who are the subjects of this study, are of both sexes and various social backgrounds. The results revealed that (1 the investigation of children’s directive speech acts confirm the fact that they are aware of social parameters of talk (Andersen- Slosberg,1990; Ervin, Tripp et al., 1990; (2 they use linguistic forms that are different from what is used by adults as politeness marker, such as, polite 2nd plural subject-agreement on the verb, “please” and “thank you” words; (3 They use declaratives with illocutionary force in order to mark distance (Georgalidou, 2001. Keywords: Iranian children’s speech; Directive speech act; Politeness, Conversational analysis; Persian
Pick, Adi; Zhen, Bo; Miller, Owen D; Hsu, Chia W; Hernandez, Felipe; Rodriguez, Alejandro W; Soljačić, Marin; Johnson, Steven G
We present a general theory of spontaneous emission at exceptional points (EPs)-exotic degeneracies in non-Hermitian systems. Our theory extends beyond spontaneous emission to any light-matter interaction described by the local density of states (e.g., absorption, thermal emission, and nonlinear frequency conversion). Whereas traditional spontaneous-emission theories imply infinite enhancement factors at EPs, we derive finite bounds on the enhancement, proving maximum enhancement of 4 in passive systems with second-order EPs and significantly larger enhancements (exceeding 400×) in gain-aided and higher-order EP systems. In contrast to non-degenerate resonances, which are typically associated with Lorentzian emission curves in systems with low losses, EPs are associated with non-Lorentzian lineshapes, leading to enhancements that scale nonlinearly with the resonance quality factor. Our theory can be applied to dispersive media, with proper normalization of the resonant modes.
Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A
Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.
Hargrove, Patricia M.; Pittelko, Stephen; Fillingane, Evan; Rustman, Emily; Lund, Bonnie
The purpose of this research was to compare selected speech and paralinguistic skills of speakers with Williams syndrome (WS) and typically developing peers and to demonstrate the feasibility of providing preexisting databases to students to facilitate graduate research. In a series of three studies, conversational samples of 12 adolescents with…
Přibilová, Anna; Přibil, Jiří
Roč. 48, č. 12 (2006), s. 1691-1703 ISSN 0167-6393 R&D Projects: GA MŠk(CZ) OC 277.001; GA AV ČR(CZ) 1QS108040569 Grant - others:MŠk(SK) 102/VTP/2000; MŠk(SK) 1/3107/06 Institutional research plan: CEZ:AV0Z20670512 Keywords : signal processing * speech processing * speech synthesis Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering Impact factor: 0.678, year: 2006
Full Text Available AbstractObjectiveMany tinnitus patients complain about difficulties regarding speech comprehension. In spite of the high clinical relevance little is known about underlying mechanisms and predisposing factors. Here, we performed an exploratory investigation in a large sample of tinnitus patients to (1 estimate the prevalence of speech comprehension difficulties among tinnitus patients, to (2 compare subjective reports of speech comprehension difficulties with objective measurements in a standardized speech comprehension test and to (3 explore underlying mechanisms by analyzing the relationship between speech comprehension difficulties and peripheral hearing function (pure tone audiogram, as well as with co-morbid hyperacusis as a central auditory processing disorder. Subjects and MethodsSpeech comprehension was assessed in 361 tinnitus patients presenting between 07/2012 and 08/2014 at the Interdisciplinary Tinnitus Clinic at the University of Regensburg. The assessment included standard audiological assessment (pure tone audiometry, tinnitus pitch and loudness matching, the Goettingen sentence test (in quiet for speech audiometric evaluation, two questions about hyperacusis, and two questions about speech comprehension in quiet and noisy environments (How would you rate your ability to understand speech?; How would you rate your ability to follow a conversation when multiple people are speaking simultaneously?. Results Subjectively reported speech comprehension deficits are frequent among tinnitus patients, especially in noisy environments (cocktail party situation. 74.2% of all investigated patients showed disturbed speech comprehension (indicated by values above 21.5 dB SPL in the Goettingen sentence test. Subjective speech comprehension complaints (both in general and in noisy environment were correlated with hearing level and with audiologically-assessed speech comprehension ability. In contrast, co-morbid hyperacusis was only correlated
Liu, Jie; Koshizuka, Seiichi; Oka, Yoshiaki
A computer code PROVER-I is developed for propagation phase of vapor explosion. A new thermal fragmentation model is proposed with three kinds of time scale for modeling instant fragmentation, spontaneous nucleation fragmentation and normal boiling fragmentation. The energetics of ex-vessel vapor explosion is investigated based on different fragmentation models. A higher pressure peak and a larger mechanical energy conversion ratio are obtained by spontaneous nucleation fragmentation. A smaller energy conversion ratio results from normal boiling fragmentation. When the delay time in thermal fragmentation model is near 0.0 ms, the pressure propagation behavior tends to be analogous with that in hydrodynamic fragmentation. If the delay time is longer, pressure attenuation occurs at the shock front. The high energy conversion ratio (>4%) is obtained in a small vapor volume fraction together with spontaneous nucleation fragmentation. These results are consistent with fuel-coolant interaction experiments with alumina melt. However, in larger vapor volume fraction conditions (α υ >0.3), the vapor explosion is weak. For corium melt, a coarse mixture with void fraction of more than 30% can be generated in the pre-mixing process because of its physical properties. In the mixture with such a high void fraction the energetic vapor explosion hardly takes place. (author)
Başkent, Deniz; Gaudrain, Etienne
Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level
Heimbauer, Lisa A; Beran, Michael J; Owren, Michael J
A long-standing debate concerns whether humans are specialized for speech perception, which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words, asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuographic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users. Experiment 2 tested "impossibly unspeechlike" sine-wave (SW) synthesis, which reduces speech to just three moving tones. Although receiving only intermittent and noncontingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate but improved in experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human. Copyright © 2011 Elsevier Ltd. All rights reserved.
Full Text Available Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.
Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.
Mortensen, Johannes; Tøndering, John
Voice onset time has been reported to vary with the height of vowels following the stop consonant. This paper investigates the effects of vowel height on VOT in Danish CV sequences with stop consonants in Danish spontaneous speech. A significant effect of vowel height on VOT was found...
Melo, Roberta Michelon; Mota, Helena Bolli; Berti, Larissa Cristina
This study used acoustic and articulatory analyses to characterize the contrast between alveolar and velar stops with typical speech data, comparing the parameters (acoustic and articulatory) of adults and children with typical speech development. The sample consisted of 20 adults and 15 children with typical speech development. The analyzed corpus was organized through five repetitions of each target-word (/'kap ə/, /'tapə/, /'galo/ e /'daɾə/). These words were inserted into a carrier phrase and the participant was asked to name them spontaneously. Simultaneous audio and video data were recorded (tongue ultrasound images). The data was submitted to acoustic analyses (voice onset time; spectral peak and burst spectral moments; vowel/consonant transition and relative duration measures) and articulatory analyses (proportion of significant axes of the anterior and posterior tongue regions and description of tongue curves). Acoustic and articulatory parameters were effective to indicate the contrast between alveolar and velar stops, mainly in the adult group. Both speech analyses showed statistically significant differences between the two groups. The acoustic and articulatory parameters provided signals to characterize the phonic contrast of speech. One of the main findings in the comparison between adult and child speech was evidence of articulatory refinement/maturation even after the period of segment acquisition.
Lowit, Anja; Kuschmann, Anja
Purpose: The autosegmental-metrical (AM) framework represents an established methodology for intonational analysis in unimpaired speaker populations but has found little application in describing intonation in motor speech disorders (MSDs). This study compared the intonation patterns of unimpaired participants (CON) and those with Parkinson's…
Full Text Available It has been suggested that intra- and inter-speaker variability in speech are correlated. Interlocutors have been shown to converge on various phonetic dimensions. In addition, speakers imitate the phonetic properties of voices they are exposed to in shadowing, repetition, and even passive listening tasks. We review three theoretical accounts of speech imitation and convergence phenomena: (i the Episodic Theory (ET of speech perception and production (Goldinger, 1998; (ii the Motor Theory (MT of speech perception (Liberman and Whalen, 2000;Galantucci et al., 2006 ; (iii Communication Accommodation Theory (CAT; Giles et al., 1991;Giles and Coupland, 1991. We argue that no account is able to explain all the available evidence. In particular, there is a need to integrate low-level, mechanistic accounts (like ET and MT and higher-level accounts (like CAT. We propose that this is possible within the framework of an integrated theory of production and comprehension (Pickering & Garrod, in press. Similarly to both ET and MT, this theory assumes parity between production and perception. Uniquely, however, it posits that listeners simulate speakers’ utterances by computing forward-model predictions at many different levels, which are then compared to the incoming phonetic input. In our account phonetic imitation can be achieved via the same mechanism that is responsible for sensorimotor adaptation; i.e. the correction of prediction errors. In addition, the model assumes that the degree to which sensory prediction errors lead to motor adjustments is context-dependent. The notion of context subsumes both the preceding linguistic input and non-linguistic attributes of the situation (e.g., the speaker’s and listener’s social identities, their conversational roles, the listener’s intention to imitate.
The phenomenology of muon number violation in gauge theories of weak and electromagnetic interactions is studied. In the first chapter a brief introduction to the concept of muon number and to spontaneously broken gauge theories is given. A review of the phenomenology and experimental situation regarding different muon number violating processes is made in the second chapter. A detailed phenomenological study of the μe conversion process μ - + (A,Z) → e - + (A,Z) is given in the third chapter. In the fourth chapter some specific gauge theories incorporating spontaneously broken horizontal gauge symmetries between different fermion generations are discussed with special reference to muon number violation in the theories. The μe conversion process seems to be a good process to search for muon number violation if it occurs. The K/sub L/-K/sub S/ mass difference is likely to constrain muon number violating rates to lie far below present experimental limits unless strangeness changing neutral currents changing strangeness by two units are suppressed
Full Text Available The article presents characteristic speech patterns of psychologist-mediator on the basis of five staged model of his professional speech behavior that involves the following five speech activities: introductory talks with the conflict parties; clarifying of the parties’ positions; finding the optimal solution to the problem; persuasion in the legality of a compromise; execution of the agreement between the parties. Each of these stages of the mediation process in terms of mental and speech activities of a specialist have been analyzed and subsequently the structure of mediator’s communication has been derived. The concept of a "strategy of verbal behavior" considering professional activity of a psychologist-mediator has been described in terms of its correlation with the type of negotiation behaviors of disputants. The basic types of opponents’ behavior in negotiations ‒ namely avoidance, concession, denial, aggression have been specified. The compliance of strategy of speech of mediator’s behavior to his chosen style of mediation has been discovered. The tactics and logic of mediator’s speech behavior according to the stages of mediation conversation have been determined. It has been found out that the mediator’s tactics implies application of specific professional speech skills to conduct a dialogue in accordance with the chosen strategy as well as emotional and verbal reaction of conflict sides in the process of communication.
Scheurich, Rebecca; Zamm, Anna; Palmer, Caroline
The ability to flexibly adapt one’s behavior is critical for social tasks such as speech and music performance, in which individuals must coordinate the timing of their actions with others. Natural movement frequencies, also called spontaneous rates, constrain synchronization accuracy between partners during duet music performance, whereas musical training enhances synchronization accuracy. We investigated the combined influences of these factors on the flexibility with which individuals can synchronize their actions with sequences at different rates. First, we developed a novel musical task capable of measuring spontaneous rates in both musicians and non-musicians in which participants tapped the rhythm of a familiar melody while hearing the corresponding melody tones. The novel task was validated by similar measures of spontaneous rates generated by piano performance and by the tapping task from the same pianists. We then implemented the novel task with musicians and non-musicians as they synchronized tapping of a familiar melody with a metronome at their spontaneous rates, and at rates proportionally slower and faster than their spontaneous rates. Musicians synchronized more flexibly across rates than non-musicians, indicated by greater synchronization accuracy. Additionally, musicians showed greater engagement of error correction mechanisms than non-musicians. Finally, differences in flexibility were characterized by more recurrent (repetitive) and patterned synchronization in non-musicians, indicative of greater temporal rigidity. PMID:29681872
Full Text Available The ability to flexibly adapt one’s behavior is critical for social tasks such as speech and music performance, in which individuals must coordinate the timing of their actions with others. Natural movement frequencies, also called spontaneous rates, constrain synchronization accuracy between partners during duet music performance, whereas musical training enhances synchronization accuracy. We investigated the combined influences of these factors on the flexibility with which individuals can synchronize their actions with sequences at different rates. First, we developed a novel musical task capable of measuring spontaneous rates in both musicians and non-musicians in which participants tapped the rhythm of a familiar melody while hearing the corresponding melody tones. The novel task was validated by similar measures of spontaneous rates generated by piano performance and by the tapping task from the same pianists. We then implemented the novel task with musicians and non-musicians as they synchronized tapping of a familiar melody with a metronome at their spontaneous rates, and at rates proportionally slower and faster than their spontaneous rates. Musicians synchronized more flexibly across rates than non-musicians, indicated by greater synchronization accuracy. Additionally, musicians showed greater engagement of error correction mechanisms than non-musicians. Finally, differences in flexibility were characterized by more recurrent (repetitive and patterned synchronization in non-musicians, indicative of greater temporal rigidity.
Hilverman, Caitlin; Clough, Sharice; Duff, Melissa C; Cook, Susan Wagner
During conversation, people integrate information from co-speech hand gestures with information in spoken language. For example, after hearing the sentence, "A piece of the log flew up and hit Carl in the face" while viewing a gesture directed at the nose, people tend to later report that the log hit Carl in the nose (information only in gesture) rather than in the face (information in speech). The cognitive and neural mechanisms that support the integration of gesture with speech are unclear. One possibility is that the hippocampus - known for its role in relational memory and information integration - is necessary for integrating gesture and speech. To test this possibility, we examined how patients with hippocampal amnesia and healthy and brain-damaged comparison participants express information from gesture in a narrative retelling task. Participants watched videos of an experimenter telling narratives that included hand gestures that contained supplementary information. Participants were asked to retell the narratives and their spoken retellings were assessed for the presence of information from gesture. For features that had been accompanied by supplementary gesture, patients with amnesia retold fewer of these features overall and fewer retellings that matched the speech from the narrative. Yet their retellings included features that contained information that had been present uniquely in gesture in amounts that were not reliably different from comparison groups. Thus, a functioning hippocampus is not necessary for gesture-speech integration over short timescales. Providing unique information in gesture may enhance communication for individuals with declarative memory impairment, possibly via non-declarative memory mechanisms. Copyright © 2018. Published by Elsevier Ltd.
Nie, Yingjiu; Galvin, John J; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie
This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.
Best, Virginia; Keidser, Gitte; Freeston, Katrina; Buchholz, Jörg M
Many listeners with hearing loss report particular difficulties with multitalker communication situations, but these difficulties are not well predicted using current clinical and laboratory assessment tools. The overall aim of this work is to create new speech tests that capture key aspects of multitalker communication situations and ultimately provide better predictions of real-world communication abilities and the effect of hearing aids. A test of ongoing speech comprehension introduced previously was extended to include naturalistic conversations between multiple talkers as targets, and a reverberant background environment containing competing conversations. In this article, we describe the development of this test and present a validation study. Thirty listeners with normal hearing participated in this study. Speech comprehension was measured for one-, two-, and three-talker passages at three different signal-to-noise ratios (SNRs), and working memory ability was measured using the reading span test. Analyses were conducted to examine passage equivalence, learning effects, and test-retest reliability, and to characterize the effects of number of talkers and SNR. Although we observed differences in difficulty across passages, it was possible to group the passages into four equivalent sets. Using this grouping, we achieved good test-retest reliability and observed no significant learning effects. Comprehension performance was sensitive to the SNR but did not decrease as the number of talkers increased. Individual performance showed associations with age and reading span score. This new dynamic speech comprehension test appears to be valid and suitable for experimental purposes. Further work will explore its utility as a tool for predicting real-world communication ability and hearing aid benefit. American Academy of Audiology.
... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...
The relationship of one aspect of conversational style, the degree of directness in the sending and interpretation of messages, to ethnicity was investigated in a comparison of the communication styles of Greeks and Americans. It was hypothesized that Greeks tend to be more indirect in speech than Americans, and that English speakers of Greek…
Chevret , Patrick; EBISSOU , Ange; Parizet , Etienne
International audience; In open-plan offices, ambient noise made of intelligible conversations is generally perceived as one of the most important annoyance for tasks requiring concentration efforts. This annoyance has been proved to lead to a decrease of task performance and to health troubles for people in the mean and long term (tiredness, stress, etc.) Consequently, the improvement of working conditions should pass by the evaluation of speech annoyance that could give rise to recommendati...
Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M J; Poletto, Christopher J; Ludlow, Christy L
The issue of whether speech is supported by the same neural substrates as non-speech vocal tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, was compared to the production of speech syllables without meaning. Brain activation related to overt production was captured with BOLD fMRI using a sparse sampling design for both conditions. Speech and non-speech were compared using voxel-wise whole brain analyses, and ROI analyses focused on frontal and temporoparietal structures previously reported to support speech production. Results showed substantial activation overlap between speech and non-speech function in regions. Although non-speech gesture production showed greater extent and amplitude of activation in the regions examined, both speech and non-speech showed comparable left laterality in activation for both target perception and production. These findings posit a more general role of the previously proposed "auditory dorsal stream" in the left hemisphere--to support the production of vocal tract gestures that are not limited to speech processing.
Jerry D. Gibson
Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.
Jacobs, Naomi; Garnham, Alan
The primary functional role of conversational hand gestures in narrative discourse is disputed. A novel experimental technique investigated whether gestures function primarily to aid speech production by the speaker, or communication to the listener. The experiment involved repeated narration of a cartoon story or stories to a single or multiple…
Bergmann, Christopher; Sprenger, Simone A; Schmid, Monika S
Fluent speech depends on the availability of well-established linguistic knowledge and routines for speech planning and articulation. A lack of speech fluency in late second-language (L2) learners may point to a deficiency of these representations, due to incomplete acquisition. Experiments on bilingual language processing have shown, however, that there are strong reasons to believe that multilingual speakers experience co-activation of the languages they speak. We have studied to what degree language co-activation affects fluency in the speech of bilinguals, comparing a monolingual German control group with two bilingual groups: 1) first-language (L1) attriters, who have fully acquired German before emigrating to an L2 English environment, and 2) immersed L2 learners of German (L1: English). We have analysed the temporal fluency and the incidence of disfluency markers (pauses, repetitions and self-corrections) in spontaneous film retellings. Our findings show that learners to speak more slowly than controls and attriters. Also, on each count, the speech of at least one of the bilingual groups contains more disfluency markers than the retellings of the control group. Generally speaking, both bilingual groups-learners and attriters-are equally (dis)fluent and significantly more disfluent than the monolingual speakers. Given that the L1 attriters are unaffected by incomplete acquisition, we interpret these findings as evidence for language competition during speech production. Copyright © 2015. Published by Elsevier B.V.
Forrest, William Craig; Novelli, Cornelius
Maynard Mack, Emeritus Sterling Professor of English at Yale, discusses his pioneering work with the oral interpretation of literature in the graduate and undergraduate English classroom, making the point that such oral techniques need not be limited to the drama or speech departments. (Author/SJL)
Goffman, Lisa; Ertmer, David J; Erdle, Christa
A method is presented for examining change in motor patterns used to produce linguistic contrasts. In this case study, the method is applied to a child receiving new auditory input following cochlear implantation. This child experienced hearing loss at age 3 years and received a multichannel cochlear implant at age 7 years. Data collection points occurred both pre- and postimplant and included acoustic and kinematic analyses. Overall, this child's speech output was transcribed as accurate across the pre- and postimplant periods. Postimplant, with the onset of new auditory experience, acoustic durations showed a predictable maturational change, usually decreasing in duration. Conversely, the spatiotemporal stability of speech movements initially became more variable postimplantation. The auditory perturbations experienced by this child during development led to changes in the physiological underpinnings of speech production, even when speech output was perceived as accurate.
Lee, Ji Young; Lee, Jin Tae; Heo, Hye Jeong; Choi, Chul-Hee; Choi, Seong Hee; Lee, Kyungjae
Background and Objectives People usually converse in real-life background noise. They experience more difficulty understanding speech in noise than in a quiet environment. The present study investigated how speech recognition in real-life background noise is affected by the type of noise, signal-to-noise ratio (SNR), and age. Subjects and Methods Eighteen young adults and fifteen middle-aged adults with normal hearing participated in the present study. Three types of noise [subway noise, vacu...
Zahir, M.S.; Picciotto, C.E.
We study the process of μ - → e + in nuclei with the emission of a Majoron with B-L symmetry spontaneously broken by the vacuum expectation value of a Higgs triplet. We find that this mechanism may contribute to the μ - → e + conversion rate at the same level as predicted by earlier models
, (iii two sentence comprehension tasks (sentence-picture matching, plausibility judgments, and (iv two sensory-motor tasks (a non-word repetition task and BDAE repetition subtest. Our results indicate that the neural bases of speech perception are task-dependent. The syllable discrimination and sensory-motor tasks all identified a dorsal temporal-parietal voxel cluster, including area Spt, primary auditory and somatosensory cortex. Conversely, the auditory comprehension task identified left mid-temporal regions. This suggest that syllable discrimination deficits do not stem from impairments in the perceptual analysis of speech sounds but rather involve temporary maintenance of the stimulus trace and/or the similarity comparison process. The ventral stream (anterior and posterior clusters in the superior and middle temporal gyri, were associated with both sentence tasks. However, the dorsal stream’s involvement was more selective: inferior frontal regions were identified in the sentence–to-picture matching task, not the semantic plausibility task. Within the sentence-to-picture matching task, these inferior frontal regions were only identified by the trials with the most difficult sentences. This suggests that the dorsal stream’s contribution to sentence comprehension is not driven by perception per se. These initial findings highlight the task-dependent nature of speech processing, challenge claims regarding any specific motor region being critical for speech perception, and refute the notion that speech perception relies on dorsal stream auditory-motor systems.
Full Text Available Human Computer Interaction is one of the pervasive application areas of computer science to develop with multimodal interaction for information sharings. The conversation agent acts as the major core area for developing interfaces between a system and user with applied AI for proper responses. In this paper, the interactive system plays a vital role in improving knowledge in the domain of health through the intelligent interface between machine and human with text and speech. The primary aim is to enrich the knowledge and help the user in the domain of health using conversation agent to offer immediate response with human companion feel.
For George Herbert Mead, thinking amounts to holding an "inner conversation of gestures ". Such a conception does not seem especially original at first glance. What makes it truly original is the "social-behavioral" approach of which it is a part, and, particularly, two ideas. The first is that the conversation in question is a conversation of gestures or attitudes, and the second, that thought and reflexive intelligence arise from the internalization of an external process supported by the social mechanism of communication: that of conduct organization. It imports then to understand what distinguishes such ideas from those of the founder of behavioral psychology, John B. Watson, for whom thinking amounts to nothing other than subvocal speech.
Novelli-Olmstead, Tina; Ling, Daniel
Seven hearing impaired children (five to seven years old) assigned to the Speakers group made highly significant gains in speech production and auditory discrimination of speech, while Listeners made only slight speech production gains and no gains in auditory discrimination. Combined speech and auditory training was more effective than auditory…
Chesters, Jennifer; Möttönen, Riikka; Watkins, Kate E
See Crinion (doi:10.1093/brain/awy075) for a scientific commentary on this article.Stuttering is a neurodevelopmental condition affecting 5% of children, and persisting in 1% of adults. Promoting lasting fluency improvement in adults who stutter is a particular challenge. Novel interventions to improve outcomes are of value, therefore. Previous work in patients with acquired motor and language disorders reported enhanced benefits of behavioural therapies when paired with transcranial direct current stimulation. Here, we report the results of the first trial investigating whether transcranial direct current stimulation can improve speech fluency in adults who stutter. We predicted that applying anodal stimulation to the left inferior frontal cortex during speech production with temporary fluency inducers would result in longer-lasting fluency improvements. Thirty male adults who stutter completed a randomized, double-blind, controlled trial of anodal transcranial direct current stimulation over left inferior frontal cortex. Fifteen participants received 20 min of 1-mA stimulation on five consecutive days while speech fluency was temporarily induced using choral and metronome-timed speech. The other 15 participants received the same speech fluency intervention with sham stimulation. Speech fluency during reading and conversation was assessed at baseline, before and after the stimulation on each day of the 5-day intervention, and at 1 and 6 weeks after the end of the intervention. Anodal stimulation combined with speech fluency training significantly reduced the percentage of disfluent speech measured 1 week after the intervention compared with fluency intervention alone. At 6 weeks after the intervention, this improvement was maintained during reading but not during conversation. Outcome scores at both post-intervention time points on a clinical assessment tool (the Stuttering Severity Instrument, version 4) also showed significant improvement in the group receiving
Florence Gacoin Marks
Full Text Available The paper deals with the transformation of Flaubert’s free indirect speech in the film Madame Bovary by Claude Chabrol. Conversion of free indirect speech into direct speech or into narration by an external narrator (voice-over cannot be avoided, it does, however, pose many problems because of the potential ambiguousness (polyphony of free indirect speech. In such cases, Chabrol often finds effective solutions which bring the film closer to Flaubert’s style. Nevertheless, it remains clear that film adaptations of literary masterpieces entail serious losses. Therefore teachers must convince students that film adaptations of high-quality literature can never replace literary works themselves. Literature and film are two different media, based on distinct types of artistic expression, and should be presented as such to students. As a consequence, film adaptations of literary masterpieces should not uncritically be considered as the most suitable material in all teaching contexts. Key words: Gustave Flaubert, Claude Chabrol, Madame Bovary, free indirect speech, literature and film
Davidow, Jason H; Grossman, Heather L; Edge, Robin L
Voluntary stuttering techniques involve persons who stutter purposefully interjecting disfluencies into their speech. Little research has been conducted on the impact of these techniques on the speech pattern of persons who stutter. The present study examined whether changes in the frequency of voluntary stuttering accompanied changes in stuttering frequency, articulation rate, speech naturalness, and speech effort. In total, 12 persons who stutter aged 16-34 years participated. Participants read four 300-syllable passages during a control condition, and three voluntary stuttering conditions that involved attempting to produce purposeful, tension-free repetitions of initial sounds or syllables of a word for two or more repetitions (i.e., bouncing). The three voluntary stuttering conditions included bouncing on 5%, 10%, and 15% of syllables read. Friedman tests and follow-up Wilcoxon signed ranks tests were conducted for the statistical analyses. Stuttering frequency, articulation rate, and speech naturalness were significantly different between the voluntary stuttering conditions. Speech effort did not differ between the voluntary stuttering conditions. Stuttering frequency was significantly lower during the three voluntary stuttering conditions compared to the control condition, and speech effort was significantly lower during two of the three voluntary stuttering conditions compared to the control condition. Due to changes in articulation rate across the voluntary stuttering conditions, it is difficult to conclude, as has been suggested previously, that voluntary stuttering is the reason for stuttering reductions found when using voluntary stuttering techniques. Additionally, future investigations should examine different types of voluntary stuttering over an extended period of time to determine their impact on stuttering frequency, speech rate, speech naturalness, and speech effort.
Hux, Karen; Knollman-Porter, Kelly; Brown, Jessica; Wallace, Sarah E
Using text-to-speech technology to provide simultaneous written and auditory content presentation may help compensate for chronic reading challenges if people with aphasia can understand synthetic speech output; however, inherent auditory comprehension challenges experienced by people with aphasia may make understanding synthetic speech difficult. This study's purpose was to compare the preferences and auditory comprehension accuracy of people with aphasia when listening to sentences generated with digitized natural speech, Alex synthetic speech (i.e., Macintosh platform), or David synthetic speech (i.e., Windows platform). The methodology required each of 20 participants with aphasia to select one of four images corresponding in meaning to each of 60 sentences comprising three stimulus sets. Results revealed significantly better accuracy given digitized natural speech than either synthetic speech option; however, individual participant performance analyses revealed three patterns: (a) comparable accuracy regardless of speech condition for 30% of participants, (b) comparable accuracy between digitized natural speech and one, but not both, synthetic speech option for 45% of participants, and (c) greater accuracy with digitized natural speech than with either synthetic speech option for remaining participants. Ranking and Likert-scale rating data revealed a preference for digitized natural speech and David synthetic speech over Alex synthetic speech. Results suggest many individuals with aphasia can comprehend synthetic speech options available on popular operating systems. Further examination of synthetic speech use to support reading comprehension through text-to-speech technology is thus warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M.J.; Poletto, Christopher J.; Ludlow, Christy L.
The issue of whether speech is supported by the same neural substrates as non-speech vocal-tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, were compared to the production of speech sylla...
Tanaka, Ryo; Kameyama, Hitoshi; Nagahashi, Masayuki; Kanda, Tatsuo; Ichikawa, Hiroshi; Hanyu, Takaaki; Ishikawa, Takashi; Kobayashi, Takashi; Sakata, Jun; Kosugi, Shin-Ichi; Wakai, Toshifumi
Idiopathic spontaneous pneumoperitoneum is a rare condition that is characterized by intraperitoneal gas for which no clear etiology has been identified. We report here a case of idiopathic spontaneous pneumoperitoneum, which was successfully managed by conservative treatment. A 77-year-old woman who was bedridden with speech disability as a sequela of brain hemorrhage presented at our hospital with a 1-day history of abdominal distention. On physical examination, she had stable vital signs and slight epigastric tenderness on deep palpation without any other signs of peritonitis. A chest radiograph and computed tomography showed that a large amount of free gas extended into the upper abdominal cavity. Esophagogastroduodenoscopy revealed no perforation of the upper gastrointestinal tract. The patient was diagnosed with idiopathic spontaneous pneumoperitoneum, and conservative treatment was selected. The abdominal distension rapidly disappeared, and the patient resumed oral intake on the 5th hospital day without deterioration of symptoms. Knowledge of this rare disease and accurate diagnosis with findings of clinical imaging might contribute towards refraining from unnecessary laparotomy.
This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering
Jørgensen, Søren; Dau, Torsten
The speech-based envelope power spectrum model (sEPSM; ) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...
Rumbach, Anna F; Rose, Tanya A; Cheah, Mynn
To explore Australian speech-language pathologists' use of non-speech oral motor exercises, and rationales for using/not using non-speech oral motor exercises in clinical practice. A total of 124 speech-language pathologists practising in Australia, working with paediatric and/or adult clients with speech sound difficulties, completed an online survey. The majority of speech-language pathologists reported that they did not use non-speech oral motor exercises when working with paediatric or adult clients with speech sound difficulties. However, more than half of the speech-language pathologists working with adult clients who have dysarthria reported using non-speech oral motor exercises with this population. The most frequently reported rationale for using non-speech oral motor exercises in speech sound difficulty management was to improve awareness/placement of articulators. The majority of speech-language pathologists agreed there is no clear clinical or research evidence base to support non-speech oral motor exercise use with clients who have speech sound difficulties. This study provides an overview of Australian speech-language pathologists' reported use and perceptions of non-speech oral motor exercises' applicability and efficacy in treating paediatric and adult clients who have speech sound difficulties. The research findings provide speech-language pathologists with insight into how and why non-speech oral motor exercises are currently used, and adds to the knowledge base regarding Australian speech-language pathology practice of non-speech oral motor exercises in the treatment of speech sound difficulties. Implications for Rehabilitation Non-speech oral motor exercises refer to oral motor activities which do not involve speech, but involve the manipulation or stimulation of oral structures including the lips, tongue, jaw, and soft palate. Non-speech oral motor exercises are intended to improve the function (e.g., movement, strength) of oral structures. The
Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A
The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg
Ravishankar, C., Hughes Network Systems, Germantown, MD
Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the
Cho, S.; Swanson, D.G.
The fourth-order mode conversion equation is solved completely via the Green's function to include an inhomogeneous source term. This Green's function itself contains all the plasma responsive effects such as mode conversion and absorption, and can be used to describe the spontaneous emission. In the course of the analysis, the reciprocity relations between coupling parameters are proved
Arya, Kamal Narayan; Pandian, Shanta
Broca's aphasia is the most challenging communication deficit in stroke. Left inferior frontal gyrus (IFG), a key region of the mirror-neuron system, gets lesioned in Broca's aphasia. Mirror therapy (MT), a form of action-observation, may trigger the mirror neurons. The aim of this study was to report a case of poststroke subject with Broca's aphasia, who exhibited an inadvertent and significant improvement in speech after MT for the paretic upper limb. The 20-month old stroke patient underwent MT through goal-directed tasks. He received a total absence of spontaneous speech, writing, and naming. After 45 sessions of task-based MT for the upper limb, he showed tremendous recovery in expressive communication. He had fluent and comprehensive communication; however, with a low pitch and minor pronunciation errors. He showed a substantial change (from 18/100 to 79/100) on the Communicative Effective Index, particularly, on items such as expressing emotions, one-to-one conversation, naming, and spontaneous conversation. Copyright © 2014 Elsevier Ltd. All rights reserved.
Groenewold, Rimke; Armstrong, Elizabeth
Previous research has shown that speakers with aphasia rely on enactment more often than non-brain-damaged language users. Several studies have been conducted to explain this observed increase, demonstrating that spoken language containing enactment is easier to produce and is more engaging to the conversation partner. This paper describes the effects of the occurrence of enactment in casual conversation involving individuals with aphasia on its level of conversational assertiveness. To evaluate whether and to what extent the occurrence of enactment in speech of individuals with aphasia contributes to its conversational assertiveness. Conversations between a speaker with aphasia and his wife (drawn from AphasiaBank) were analysed in several steps. First, the transcripts were divided into moves, and all moves were coded according to the systemic functional linguistics (SFL) framework. Next, all moves were labelled in terms of their level of conversational assertiveness, as defined in the previous literature. Finally, all enactments were identified and their level of conversational assertiveness was compared with that of non-enactments. Throughout their conversations, the non-brain-damaged speaker was more assertive than the speaker with aphasia. However, the speaker with aphasia produced more enactments than the non-brain-damaged speaker. The moves of the speaker with aphasia containing enactment were more assertive than those without enactment. The use of enactment in the conversations under study positively affected the level of conversational assertiveness of the speaker with aphasia, a competence that is important for speakers with aphasia because it contributes to their floor time, chances to be heard seriously and degree of control over the conversation topic. © 2018 The Authors International Journal of Language & Communication Disorders published by John Wiley & Sons Ltd on behalf of Royal College of Speech and Language Therapists.
Lebib, Riadh; Papo, David; Douiri, Abdel; de Bode, Stella; Gillon Dowens, Margaret; Baudonnière, Pierre-Marie
Lipreading reliably improve speech perception during face-to-face conversation. Within the range of good dubbing, however, adults tolerate some audiovisual (AV) discrepancies and lipreading, then, can give rise to confusion. We used event-related brain potentials (ERPs) to study the perceptual strategies governing the intermodal processing of dynamic and bimodal speech stimuli, either congruently dubbed or not. Electrophysiological analyses revealed that non-coherent audiovisual dubbings modulated in amplitude an endogenous ERP component, the N300, we compared to a 'N400-like effect' reflecting the difficulty to integrate these conflicting pieces of information. This result adds further support for the existence of a cerebral system underlying 'integrative processes' lato sensu. Further studies should take advantage of this 'N400-like effect' with AV speech stimuli to open new perspectives in the domain of psycholinguistics.
Imad Hayif Sameer
Full Text Available The theory of speech acts, which clarifies what people do when they speak, is not about individual words or sentences that form the basic elements of human communication, but rather about particular speech acts that are performed when uttering words. A speech act is the attempt at doing something purely by speaking. Many things can be done by speaking. Speech acts are studied under what is called speech act theory, and belong to the domain of pragmatics. In this paper, two Egyptian inaugural speeches from El-Sadat and El-Sisi, belonging to different periods were analyzed to find out whether there were differences within this genre in the same culture or not. The study showed that there was a very small difference between these two speeches which were analyzed according to Searle’s theory of speech acts. In El Sadat’s speech, commissives came to occupy the first place. Meanwhile, in El–Sisi’s speech, assertives occupied the first place. Within the speeches of one culture, we can find that the differences depended on the circumstances that surrounded the elections of the Presidents at the time. Speech acts were tools they used to convey what they wanted and to obtain support from their audiences.
Kojo, Nobuto; Tokutomi, Takashi; Eguchi, Gihachirou; Takagi, Shigeyuki; Matsumoto, Tomie; Sasaguri, Yasuyuki; Shigemori, Minoru.
In a 46-year-old female with a 1-month history of gait and speech disturbances, computed tomography (CT) demonstrated mass lesions of slightly high density in the left basal ganglia and left frontal lobe. The lesions were markedly enhanced by contrast medium. The patient received no specific treatment, but her clinical manifestations gradually abated and the lesions decreased in size. Five months after her initial examination, the lesions were absent on CT scans; only a small area of low density remained. Residual clinical symptoms included mild right hemiparesis and aphasia. After 14 months the patient again deteriorated, and a CT scan revealed mass lesions in the right frontal lobe and the pons. However, no enhancement was observed in the previously affected regions. A biopsy revealed malignant lymphoma. Despite treatment with steroids and radiation, the patient's clinical status progressively worsened and she died 27 months after initial presentation. Seven other cases of spontaneous regression of primary malignant lymphoma have been reported. In this case, the mechanism of the spontaneous regression was not clear, but changes in immunologic status may have been involved.
... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...
Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas
Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.
Kwon, Osung; Park, Kwang-Kyoon; Ra, Young-Sik; Kim, Yong-Su; Kim, Yoon-Ho
Generation of time-bin entangled photon pairs requires the use of the Franson interferometer which consists of two spatially separated unbalanced Mach-Zehnder interferometers through which the signal and idler photons from spontaneous parametric down-conversion (SPDC) are made to transmit individually. There have been two SPDC pumping regimes where the scheme works: the narrowband regime and the double-pulse regime. In the narrowband regime, the SPDC process is pumped by a narrowband cw laser with the coherence length much longer than the path length difference of the Franson interferometer. In the double-pulse regime, the longitudinal separation between the pulse pair is made equal to the path length difference of the Franson interferometer. In this paper, we propose another regime by which the generation of time-bin entanglement is possible and demonstrate the scheme experimentally. In our scheme, differently from the previous approaches, the SPDC process is pumped by a cw multi-mode (i.e., short coherence length) laser and makes use of the coherence revival property of such a laser. The high-visibility two-photon Franson interference demonstrates clearly that high-quality time-bin entanglement source can be developed using inexpensive cw multi-mode diode lasers for various quantum communication applications.
Nielsen, Jens Bo; Dau, Torsten; Neher, Tobias
Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs......) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded...... with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed...
Martinelli, Eugenio; Mencattini, Arianna; Daprati, Elena; Di Natale, Corrado
Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present 'intelligent personal assistants', and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants' emotional state, selective/differential data collection based on emotional content, etc.).
Mauszycki, Shannon C.; Wambaugh, Julie L.; Cameron, Rosalea M.
Purpose: Early apraxia of speech (AOS) research has characterized errors as being variable, resulting in a number of different error types being produced on repeated productions of the same stimuli. Conversely, recent research has uncovered greater consistency in errors, but there are limited data examining sound errors over time (more than one…
Smotrova, Tetyana; Lantolf, James P.
The purpose of the present study is to investigate the mediational function of the gesture-speech interface in the instructional conversation that emerged as teachers attempted to explain the meaning of English words to their students in two EFL classrooms in the Ukraine. Its analytical framework is provided by Vygotsky's sociocultural psychology…
Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam
Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.
van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.
Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…
Chiodo, Liliane; Majerus, Steve; Mottron, Laurent
The distinction between autism and Asperger syndrome has been abandoned in the DSM-5. However, this clinical categorization largely overlaps with the presence or absence of a speech onset delay which is associated with clinical, cognitive, and neural differences. It is unknown whether these different speech development pathways and associated cognitive differences are involved in the heterogeneity of the restricted interests that characterize autistic adults. This study tested the hypothesis that speech onset delay, or conversely, early mastery of speech, orients the nature and verbal reporting of adult autistic interests. The occurrence of a priori defined descriptors for perceptual and thematic dimensions were determined, as well as the perceived function and benefits, in the response of autistic people to a semi-structured interview on their intense interests. The number of words, grammatical categories, and proportion of perceptual / thematic descriptors were computed and compared between groups by variance analyses. The participants comprised 40 autistic adults grouped according to the presence ( N = 20) or absence ( N = 20) of speech onset delay, as well as 20 non-autistic adults, also with intense interests, matched for non-verbal intelligence using Raven's Progressive Matrices. The overall nature, function, and benefit of intense interests were similar across autistic subgroups, and between autistic and non-autistic groups. However, autistic participants with a history of speech onset delay used more perceptual than thematic descriptors when talking about their interests, whereas the opposite was true for autistic individuals without speech onset delay. This finding remained significant after controlling for linguistic differences observed between the two groups. Verbal reporting, but not the nature or positive function, of intense interests differed between adult autistic individuals depending on their speech acquisition history: oral reporting of
Nejime, Y; Aritsuka, T; Imamura, T; Ifukube, T; Matsushima, J
A real-time hand-sized portable device that slows speech speed without changing the pitch is proposed for hearing impairment. By using this device, people can listen to fast speech at a comfortable speed. A combination of solid-state memory recording and real-time digital signal processing with a single chip processor enables this unique function. A simplified pitchsynchronous, time-scale-modification algorithm is proposed to minimize the complexity of the DSP operation. Unlike the traditional algorithm, this dynamic-processing algorithm reduces distortion even when the expansion rate is only just above 1. Seven out of 10 elderly hearing-impaired listeners showed improvement in a sentence recognition test when using speech-rate conversion with the largest expansion rate, although no improvement was observed in a word recognition test. Some subjects who showed large improvement had limited auditory temporal resolution, but the correlation was not significant. The results suggest that, unlike conventional hearing aids, this device can be used to overcome the deterioration of auditory ability by improving the transfer of information from short-term (echoic) memory into a more stable memory trace in the human auditory system.
This study was conducted to determine the effectiveness of using the high-tech speech-generating device with Proloquo2Go app to reduce echolalic utterances in a student with autism during conversational speech. After observing that the iPad device with several apps was used by the students and that it served as a communication device, language…
Miyazawa, Kouki; Shinya, Takahito; Martin, Andrew; Kikuchi, Hideaki; Mazuka, Reiko
Infant-directed speech (IDS) is known to differ from adult-directed speech (ADS) in a number of ways, and it has often been argued that some of these IDS properties facilitate infants' acquisition of language. An influential study in support of this view is Kuhl et al. (1997), which found that vowels in IDS are produced with expanded first and second formants (F1/F2) on average, indicating that the vowels are acoustically further apart in IDS than in ADS. These results have been interpreted to mean that the way vowels are produced in IDS makes infants' task of learning vowel categories easier. The present paper revisits this interpretation by means of a thorough analysis of IDS vowels using a large-scale corpus of Japanese natural utterances. We will show that the expansion of F1/F2 values does occur in spontaneous IDS even when the vowels' prosodic position, lexical pitch accent, and lexical bias are accounted for. When IDS vowels are compared to carefully read speech (CS) by the same mothers, however, larger variability among IDS vowel tokens means that the acoustic distances among vowels are farther apart only in CS, but not in IDS when compared to ADS. Finally, we will show that IDS vowels are significantly more breathy than ADS or CS vowels. Taken together, our results demonstrate that even though expansion of formant values occurs in spontaneous IDS, this expansion cannot be interpreted as an indication that the acoustic distances among vowels are farther apart, as is the case in CS. Instead, we found that IDS vowels are characterized by breathy voice, which has been associated with the communication of emotional affect. Copyright © 2017 Elsevier B.V. All rights reserved.
Rostkowska, Hanna; Lapinski, Leszek; Nowak, Maciej J
Spontaneous thiol → thione hydrogen-atom transfer has been investigated for molecules of thiourea trapped in Ar, Ne, normal-H2 (n-H2) and normal-D2 (n-D2) low-temperature matrices. The most stable thione isomer was the only form of the compound present in the matrices after their deposition. According to MP2/6-311++G(2d,p) calculations, the thiol tautomer should be higher in energy by 62.5 kJ mol-1. This less stable thiol form of the compound was photochemically generated in a thione → thiol process, occurring upon UV irradiation of the matrix. Subsequently, a very slow spontaneous conversion of the thiol tautomer into the thione form was observed for the molecules isolated in Ar, Ne, n-H2 and n-D2 matrices kept at 3.5 K and in the dark. Since the thiol → thione transformation in thiourea is a process involving the dissociation of a chemical bond, the barrier for this hydrogen-atom transfer is very high (104-181 kJ mol-1). Crossing such a high potential-energy barrier at a temperature as low as 3.5 K, is possible only by hydrogen-atom tunneling. The experimentally measured time constants of this tunneling process: 52 h (Ar), 76 h (Ne), 94 h (n-H2) and 94 h (n-D2), do not differ much from one another. Hence, the dependence of the tunneling rate on the matrix environment is not drastic. The progress of the thiol → thione conversion was also monitored for Ar matrices at different temperature: 3.5 K, 9 K and 15 K. For this temperature range, the experiments revealed no detectable temperature dependence of the rate of the tunneling process.
Dalton, B M; Bedrosian, J L
The communicative performance of 4 preoperational-level adolescents, using limited speech, gestures, and communication board techniques, was examined in a two-part investigation. In Part 1, each subject participated in an academic interaction with a teacher in a therapy room. Data were transcribed and coded for communication mode, function, and role. Two subjects were found to predominantly use the speech mode, while the remaining 2 predominantly used board and one other mode. The majority of productions consisted of responses to requests, and the initiator role was infrequently occupied. These findings were similar to those reported in previous investigations conducted in classroom settings. In Part 2, another examination of the communicative performance of these subjects was conducted in spontaneous interactions involving speaking and nonspeaking peers in a therapy room. Using the same data analysis procedures, gesture and speech modes predominated for 3 of the subjects in the nonspeaking peer interactions. The remaining subject exhibited minimal interaction. No consistent pattern of mode usage was exhibited across the speaking peer interactions. In the nonspeaking peer interactions, request predominated. In contrast, a variety of communication functions was exhibited in the speaking peer interactions. Both the initiator and the maintainer roles were occupied in the majority of interactions. Pertinent variables and clinical implications are discussed.
Full Text Available People often use spontaneous gestures when talking about space, such as when giving directions. In a recent study from our lab, we examined whether focal brain-injured individuals’ naming motion event components of manner and path (represented in English by verbs and prepositions, respectively are impaired selectively, and whether gestures compensate for impairment in speech. Left or right hemisphere damaged patients and elderly control participants were asked to describe motion events (e.g., walking around depicted in brief videos. Results suggest that producing verbs and prepositions can be separately impaired in the left hemisphere and gesture production compensates for naming impairments when damage involves specific areas in the left temporal cortex.
Harrison N. Jones
Full Text Available Multiple reports have described patients with disordered articulation and prosody, often following acute aphasia, dysarthria, or apraxia of speech, which results in the perception by listeners of a foreign-like accent. These features led to the term foreign accent syndrome (FAS, a speech disorder with perceptual features that suggest an indistinct, non-native speaking accent. Also correctly known as psuedoforeign accent, the speech does not typically match a specific foreign accent, but is rather a constellation of speech features that result in the perception of a foreign accent by listeners. The primary etiologies of FAS are cerebrovascular accidents or traumatic brain injuries which affect cortical and subcortical regions critical to expressive speech and language production. Far fewer cases of FAS associated with psychiatric conditions have been reported. We will present the clinical history, neurological examination, neuropsychological assessment, cognitive-behavioral and biofeedback assessments, and motor speech examination of a patient with FAS without a known vascular, traumatic, or infectious precipitant. Repeated multidisciplinary examinations of this patient provided convergent evidence in support of FAS secondary to conversion disorder. We discuss these findings and their implications for evaluation and treatment of rare neurological and psychiatric conditions.
Demir, Özlem Ece; Levine, Susan C.; Goldin-Meadow, Susan
Speakers of all ages spontaneously gesture as they talk. These gestures predict children's milestones in vocabulary and sentence structure. We ask whether gesture serves a similar role in the development of narrative skill. Children were asked to retell a story conveyed in a wordless cartoon at age 5 and then again at 6, 7, and 8. Children's narrative structure in speech improved across these ages. At age 5, many of the children expressed a character's viewpoint in gesture, and these children were more likely to tell better-structured stories at the later ages than children who did not produce character-viewpoint gestures at age 5. In contrast, framing narratives from a character's perspective in speech at age 5 did not predict later narrative structure in speech. Gesture thus continues to act as a harbinger of change even as it assumes new roles in relation to discourse. PMID:25088361
Van Lancker Sidtis, Diana; Rogers, Tiffany; Godier, Violette; Tagliati, Michele; Sidtis, John J.
Purpose: Speaking, which naturally occurs in different modes or "tasks" such as conversation and repetition, relies on intact basal ganglia nuclei. Recent studies suggest that voice and fluency parameters are differentially affected by speech task. In this study, the authors examine the effects of subcortical functionality on voice and fluency,…
De Looze, Céline; Moreau, Noémie; Renié, Laurent; Kelly, Finnian; Ghio, Alain; Rico, Audrey; Audoin, Bertrand; Viallet, François; Pelletier, Jean; Petrone, Caterina
Cognitive impairment (CI) affects 40-65% of patients with multiple sclerosis (MS). CI can have a negative impact on a patient's everyday activities, such as engaging in conversations. Speech production planning ability is crucial for successful verbal interactions and thus for preserving social and occupational skills. This study investigates the effect of cognitive-linguistic demand and CI on speech production planning in MS, as reflected in speech prosody. A secondary aim is to explore the clinical potential of prosodic features for the prediction of an individual's cognitive status in MS. A total of 45 subjects, that is 22 healthy controls (HC) and 23 patients in early stages of relapsing-remitting MS, underwent neuropsychological tests probing specific cognitive processes involved in speech production planning. All subjects also performed a read speech task, in which they had to read isolated sentences manipulated as for phonological length. Results show that the speech of MS patients with CI is mainly affected at the temporal level (articulation and speech rate, pause duration). Regression analyses further indicate that rate measures are correlated with working memory scores. In addition, linear discriminant analysis shows the ROC AUC of identifying MS patients with CI is 0.70 (95% confidence interval: 0.68-0.73). Our findings indicate that prosodic planning is deficient in patients with MS-CI and that the scope of planning depends on patients' cognitive abilities. We discuss how speech-based approaches could be used as an ecological method for the assessment and monitoring of CI in MS. © 2017 The British Psychological Society.
Jiang, Hongyan; Qiu, Hongbing; He, Ning; Liao, Xin
For the optoacoustic communication from in-air platforms to submerged apparatus, a method based on speech recognition and variable laser-pulse repetition rates is proposed, which realizes character encoding and transmission for speech. Firstly, the theories and spectrum characteristics of the laser-generated underwater sound are analyzed; and moreover character conversion and encoding for speech as well as the pattern of codes for laser modulation is studied; lastly experiments to verify the system design are carried out. Results show that the optoacoustic system, where laser modulation is controlled by speech-to-character baseband codes, is beneficial to improve flexibility in receiving location for underwater targets as well as real-time performance in information transmission. In the overwater transmitter, a pulse laser is controlled to radiate by speech signals with several repetition rates randomly selected in the range of one to fifty Hz, and then in the underwater receiver laser pulse repetition rate and data can be acquired by the preamble and information codes of the corresponding laser-generated sound. When the energy of the laser pulse is appropriate, real-time transmission for speaker-independent speech can be realized in that way, which solves the problem of underwater bandwidth resource and provides a technical approach for the air-sea communication.
Nørholm, Sidsel Marie
This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based...... on this model. The basic model used in this thesis is the harmonic model which is a commonly used model for describing the voiced part of the speech signal. We show that it can be beneficial to extend the model to take inharmonicities or the non-stationarity of speech into account. Extending the model...
Full Text Available A 45-year-old man presented with diminution of vision in the left eye following a firecracker injury. Best corrected visual acuity (BCVA was 20/20 in the right eye and 20/125 in the left eye. Fundus examination revealed vitreous hemorrhage, a macular hole, and submacular hemorrhage in the left eye. The patient underwent vitrectomy, tissue plasminogen activator (tPA-assisted evacuation of the submacular hemorrhage, internal limiting membrane (ILM peeling, and 14% C3F8 gas insufflation. After two months, the BCVA remained 20/125 and optical coherence tomography (OCT showed type 2 macular hole closure. On a follow-up, seven months after surgery, BCVA improved to 20/80, N6, with type 1 closure of the macular hole. The clinical findings were confirmed on OCT. Delayed and spontaneous conversion of the traumatic macular hole could occur several months after the primary surgery and may be associated with improved visual outcome. Larger studies are required to better understand the factors implicated in such a phenomenon.
Schoenmaker, Esther; van de Par, Steven
Speech intelligibility in multitalker settings improves when the target speaker is spatially separated from the interfering speakers. A factor that may contribute to this improvement is the improved detectability of target-speech components due to binaural interaction in analogy to the Binaural Masking Level Difference (BMLD). This would allow listeners to hear target speech components within specific time-frequency intervals that have a negative SNR, similar to the improvement in the detectability of a tone in noise when these contain disparate interaural difference cues. To investigate whether these negative-SNR target-speech components indeed contribute to speech intelligibility, a stimulus manipulation was performed where all target components were removed when local SNRs were smaller than a certain criterion value. It can be expected that for sufficiently high criterion values target speech components will be removed that do contribute to speech intelligibility. For spatially separated speakers, assuming that a BMLD-like detection advantage contributes to intelligibility, degradation in intelligibility is expected already at criterion values below 0 dB SNR. However, for collocated speakers it is expected that higher criterion values can be applied without impairing speech intelligibility. Results show that degradation of intelligibility for separated speakers is only seen for criterion values of 0 dB and above, indicating a negligible contribution of a BMLD-like detection advantage in multitalker settings. These results show that the spatial benefit is related to a spatial separation of speech components at positive local SNRs rather than to a BMLD-like detection improvement for speech components at negative local SNRs.
An experimental Dutch keyboard-to-speech system has been developed to explor the possibilities and limitations of Dutch speech synthesis in a communication aid for the speech impaired. The system uses diphones and a formant synthesizer chip for speech synthesis. Input to the system is in
Rehana, Ridha; Silitonga, Sortha
One aim of this article is to show through a concrete example how speech function and speech role used in movie. The illustrative example is taken from the dialogue of Up movie. Central to the analysis proper form of dialogue on Up movie that contain of speech function and speech role; i.e. statement, offer, question, command, giving, and demanding. 269 dialogue were interpreted by actor, and it was found that the use of speech function and speech role.
In many early childhood classrooms, visual arts experiences occur around a communal arts table. A shared workspace allows for spontaneous conversation and exploration of the art-making process of peers and teachers. In this setting, conversation can play an important role in visual arts experiences as children explore new media, skills, and ideas.…
Full Text Available When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes. Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS. Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz, syllables (Syllable AM, ~5 Hz and onset-rime units (Phoneme AM, ~20 Hz. We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words, syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72-82% (freely-read CDS and 90-98% (rhythmically-regular CDS stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across
Larm, Petra; Hongisto, Valtteri
During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.
Huijbregts, M.A.H.; de Jong, Franciska M.G.
In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because
Self-healing hydrophobic light-to-heat conversion membranes for interfacial solar heating are fabricated by deposition of light-to-heat conversion material of polypyrrole onto porous stainless steel mesh, followed by hydrophobic fluoroalkylsilane modification. The mesh-based membranes spontaneously stay at the water–air interface, collect and convert solar light into heat, and locally heat only the water surface for an enhanced evaporation.
Zhang, Lianbin; Tang, Bo; Wu, Jinbo; Li, Renyuan; Wang, Peng
Self-healing hydrophobic light-to-heat conversion membranes for interfacial solar heating are fabricated by deposition of light-to-heat conversion material of polypyrrole onto porous stainless steel mesh, followed by hydrophobic fluoroalkylsilane modification. The mesh-based membranes spontaneously stay at the water–air interface, collect and convert solar light into heat, and locally heat only the water surface for an enhanced evaporation.
The purpose of this study is to examine speech intelligibility of children with primary speech and sound disorders aged 3 to 6 years in everyday life. The research problem is based on the degree to which parents or guardians, immediate family members (sister, brother, grandparents), extended family members (aunt, uncle, cousin), child's friends, other acquaintances, child's teachers and strangers understand the speech of children with speech sound disorders. We examined whether the level ...
Albelda Marco, Marta
Full Text Available This paper focuses on the advantages of teaching and learning a foreign language with and through spoken discursive corpora, and especially colloquial and conversational ones. The benefits of developing oral competence and communicative skills in language learners using colloquial conversations will be exposed and discussed. In this paper, we characterise the colloquial conversation and the features that define this register and discursive genre. Being the most natural and original way to communicate among human beings, the colloquial conversation is the most common means to communicate, and therefore, this genre should have a greater presence in foreign-language classrooms. Secondly, we expound on the advantages of teaching using colloquial conversations corpora, particularly resulting from its contextualisation (the linguistic input is learnt in its real and authentic context and from its oral and conversational features (prosodic elements and interactional mechanisms. Thirdly, the paper provides a list of corpora of colloquial conversations that are available in Spanish, focusing on Val.Es.Co. colloquial corpus (peninsular Spanish oral corpus, Briz et al., 2002; Cabedo & Pons online, www.valesco.es. Finally, a set of pragmatic applications of corpora in foreign-language classroom is offered, in particular using the Val.Es.Co. colloquial corpus: functions of discourse markers and interjections (whose meanings change depending on the context, strategies of turn-takings, ways of introducing new topic in the dialogues, mechanisms of keeping or “stealing” the turn, devices to introduce direct speech, attitudes expressed by the falling and rising intonations, hedges and intensifiers, and so on. In general, this paper pretends to offer ideas, resources and materials to make the students more competent in communication using authentic discursive oral corpora.
... disorder; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder; Speech disorder - stuttering ... evaluation tools that can help identify and diagnose speech disorders: Denver Articulation Screening Examination Goldman-Fristoe Test of ...
Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole
Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.
Medical records are a critical component of a patient's treatment. However, documentation of patient-related information is considered a secondary activity in the provision of healthcare services, often leading to incomplete medical records and patient data of low quality. Advances in information technology (IT) in the health system and registration of information in electronic health records (EHR) using speechto- text conversion software have facilitated service delivery. This narrative review is a literature search with the help of libraries, books, conference proceedings, databases of Science Direct, PubMed, Proquest, Springer, SID (Scientific Information Database), and search engines such as Yahoo, and Google. I used the following keywords and their combinations: speech recognition, automatic report documentation, voice to text software, healthcare, information, and voice recognition. Due to lack of knowledge of other languages, I searched all texts in English or Persian with no time limits. Of a total of 70, only 42 articles were selected. Speech-to-text conversion technology offers opportunities to improve the documentation process of medical records, reduce cost and time of recording information, enhance the quality of documentation, improve the quality of services provided to patients, and support healthcare providers in legal matters. Healthcare providers should recognize the impact of this technology on service delivery.
Full Text Available Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present 'intelligent personal assistants', and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc. and for applications in the research domain (such as real-time pairing of stimuli to participants' emotional state, selective/differential data collection based on emotional content, etc..
So, Wing-Chee; Wong, Miranda Kit-Yi; Lui, Ming; Yip, Virginia
Previous work leaves open the question of whether children with autism spectrum disorders aged 6-12 years have delay in producing gestures compared to their typically developing peers. This study examined gestural production among school-aged children in a naturalistic context and how their gestures are semantically related to the accompanying speech. Delay in gestural production was found in children with autism spectrum disorders through their middle to late childhood. Compared to their typically developing counterparts, children with autism spectrum disorders gestured less often and used fewer types of gestures, in particular markers, which carry culture-specific meaning. Typically developing children's gestural production was related to language and cognitive skills, but among children with autism spectrum disorders, gestural production was more strongly related to the severity of socio-communicative impairment. Gesture impairment also included the failure to integrate speech with gesture: in particular, supplementary gestures are absent in children with autism spectrum disorders. The findings extend our understanding of gestural production in school-aged children with autism spectrum disorders during spontaneous interaction. The results can help guide new therapies for gestural production for children with autism spectrum disorders in middle and late childhood. © The Author(s) 2014.
Maruyama, Tsukasa; Takeuchi, Hikaru; Taki, Yasuyuki; Motoki, Kosuke; Jeong, Hyeonjeong; Kotozaki, Yuka; Nakagawa, Seishu; Nouchi, Rui; Iizuka, Kunio; Yokoyama, Ryoichi; Yamamoto, Yuki; Hanawa, Sugiko; Araki, Tsuyoshi; Sakaki, Kohei; Sasaki, Yukako; Magistro, Daniele; Kawashima, Ryuta
Time-compressed speech is an artificial form of rapidly presented speech. Training with time-compressed speech (TCSSL) in a second language leads to adaptation toward TCSSL. Here, we newly investigated the effects of 4 weeks of training with TCSSL on diverse cognitive functions and neural systems using the fractional amplitude of spontaneous low-frequency fluctuations (fALFF), resting-state functional connectivity (RSFC) with the left superior temporal gyrus (STG), fractional anisotropy (FA), and regional gray matter volume (rGMV) of young adults by magnetic resonance imaging. There were no significant differences in change of performance of measures of cognitive functions or second language skills after training with TCSSL compared with that of the active control group. However, compared with the active control group, training with TCSSL was associated with increased fALFF, RSFC, and FA and decreased rGMV involving areas in the left STG. These results lacked evidence of a far transfer effect of time-compressed speech training on a wide range of cognitive functions and second language skills in young adults. However, these results demonstrated effects of time-compressed speech training on gray and white matter structures as well as on resting-state intrinsic activity and connectivity involving the left STG, which plays a key role in listening comprehension.
Sujatha, R.; Khandelwa, Prakhar; Gupta, Anusha; Anand, Nayan
A long time ago our society accepted the notion of treating people with disabilities not as unviable and disabled but as differently-abled, recognizing their skills beyond their disabilities. The next step has to be taken by our scientific community, that is, to normalize lives of the people with disabilities and make it so as if they are no different to us. The primary step in this direction would be to normalize communication between people. People with an impaired speech or impaired vision or impaired hearing face difficulties while having a casual conversation with others. Any form of communication feels so strenuous that the impaired end up communicating just the important information and avoid a casual conversation. To normalize conversation between the impaired we need a simple and compact device which facilitates the conversation by providing the information in the desired form.
The quality of a telecommunication voice service is largely inftuenced by the quality of the transmission system. Nevertheless, the analysis, synthesis and prediction of quality should take into account its multidimensional aspects. Quality can be regarded as a point where the perceived characteristics and the desired or expected ones meet. A schematic is presented which classifies different entities which contribute to the quality of a service, taking into account conversational, user as weIl as service related contributions. Starting from this concept, perceptively relevant constituents of speech communication quality are identified. The perceptive factors result from ele ments of the transmission configuration. A simulation model is developed and implemented which allows the most relevant parameters of traditional trans mission configurations to be manipulated, in real time and for the conversation situation. Inputs into the simulation are instrumentally measurable quality elements commonly used in tra...
Viswanathan, Navin; Kokkinakis, Kostas; Williams, Brittany T.
Purpose: The purpose of this study was to evaluate whether listeners with normal hearing perceiving noise-vocoded speech-in-speech demonstrate better intelligibility of target speech when the background speech was mismatched in language (linguistic release from masking [LRM]) and/or location (spatial release from masking [SRM]) relative to the…
Full Text Available Multi-talker conversations challenge the perceptual and cognitive capabilities of older adults and those listening in their second language (L2. In older adults these difficulties could reflect declines in the auditory, cognitive, or linguistic processes supporting speech comprehension. The tendency of L2 listeners to invoke some of the semantic and syntactic processes from their first language (L1 may interfere with speech comprehension in L2. These challenges might also force them to reorganize the ways in which they perceive and process speech, thereby altering the balance between the contributions of bottom-up versus top-down processes to speech comprehension. Younger and older L1s as well as young L2s listened to conversations played against a babble background, with or without spatial separation between the talkers and masker, when the spatial positions of the stimuli were specified either by loudspeaker placements (real location, or through use of the precedence effect (virtual location. After listening to a conversation, the participants were asked to answer questions regarding its content. Individual hearing differences were compensated for by creating the same degree of difficulty in identifying individual words in babble. Once compensation was applied, the number of questions correctly answered increased when a real or virtual spatial separation was introduced between babble and talkers. There was no evidence that performance differed between real and virtual locations. The contribution of vocabulary knowledge to dialogue comprehension was found to be larger in the virtual conditions than in the real whereas the contribution of reading comprehension skill did not depend on the listening environment but rather differed as a function of age and language proficiency. The results indicate that the acoustic scene and the cognitive and linguistic competencies of listeners modulate how and when top-down resources are engaged in aid of speech
Avivi-Reich, Meital; Daneman, Meredyth; Schneider, Bruce A
Multi-talker conversations challenge the perceptual and cognitive capabilities of older adults and those listening in their second language (L2). In older adults these difficulties could reflect declines in the auditory, cognitive, or linguistic processes supporting speech comprehension. The tendency of L2 listeners to invoke some of the semantic and syntactic processes from their first language (L1) may interfere with speech comprehension in L2. These challenges might also force them to reorganize the ways in which they perceive and process speech, thereby altering the balance between the contributions of bottom-up vs. top-down processes to speech comprehension. Younger and older L1s as well as young L2s listened to conversations played against a babble background, with or without spatial separation between the talkers and masker, when the spatial positions of the stimuli were specified either by loudspeaker placements (real location), or through use of the precedence effect (virtual location). After listening to a conversation, the participants were asked to answer questions regarding its content. Individual hearing differences were compensated for by creating the same degree of difficulty in identifying individual words in babble. Once compensation was applied, the number of questions correctly answered increased when a real or virtual spatial separation was introduced between babble and talkers. There was no evidence that performance differed between real and virtual locations. The contribution of vocabulary knowledge to dialog comprehension was found to be larger in the virtual conditions than in the real whereas the contribution of reading comprehension skill did not depend on the listening environment but rather differed as a function of age and language proficiency. The results indicate that the acoustic scene and the cognitive and linguistic competencies of listeners modulate how and when top-down resources are engaged in aid of speech comprehension.
Kenney, Mary Kay; Barac-Cikoja, Dragana; Finnegan, Kimberly; Jeffries, Neal; Ludlow, Christy L.
Children with developmental speech disorders may have additional deficits in speech perception and/or short-term memory. To determine whether these are only transient developmental delays that can accompany the disorder in childhood or persist as part of the speech disorder, adults with a persistent familial speech disorder were tested on speech…
Simmons-Mackie, Nina; Savage, Meghan C; Worrall, Linda
A diverse literature addresses elements of conversation therapy in aphasia including intervention rooted in conversation analysis, partner training, group therapy and behavioural intervention. Currently there is no resource for clinicians or researchers that defines and organizes this information into a coherent synopsis describing various conversation therapy practices. To organize information from varied sources into a descriptive overview of conversation therapy for aphasia. Academic search engines were employed to identify research articles published between 1950 and September 2013 reporting on conversation therapy for aphasia. Thirty articles met criteria for review and were identified as primary sources for the qualitative review. Using qualitative methodology, relevant data were extracted from articles and categories were identified to create a descriptive taxonomy of conversation therapy for aphasia. Conversation interventions were divided into descriptive categories including: treatment participants (person with aphasia, partner, dyad), primary guiding orientation (conversation analysis, social model, behavioural, relationship centred), service delivery (individual, group), focus of intervention (generic/individualized; problem/solution oriented; compensatory), training methods (explicit/implicit; external/embedded), activities or tasks, and outcomes measured. Finally, articles were categorized by research design. There was marked variation in conversation therapy approaches and outcome measures reported and a notable gap in information about one-on-one conversation therapy for individuals with aphasia. This review provides a description of various conversation therapy approaches and identified gaps in the existing literature. Valid measures of natural conversation, research on one-on-one conversation approaches for individuals with aphasia, and a systematic body of evidence consisting of high quality research are needed. © 2014 Royal College of Speech
Jamal, Norezmi; Shanta, Shahnoor; Mahmud, Farhanahani; Sha'abani, MNAH
This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system's library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.
... OTC Relief for Diarrhea Home Diseases and Conditions Speech and Language Delay Condition Speech and Language Delay Share Print Table of Contents1. ... Treatment6. Everyday Life7. Questions8. Resources What is a speech and language delay? A speech and language delay ...
Davidson, Lisa S; Skinner, Margaret W; Holstad, Beth A; Fears, Beverly T; Richter, Marie K; Matusofsky, Margaret; Brenner, Christine; Holden, Timothy; Birath, Amy; Kettel, Jerrica L; Scollie, Susan
The purpose of this study was to examine the effects of a wider instantaneous input dynamic range (IIDR) setting on speech perception and comfort in quiet and noise for children wearing the Nucleus 24 implant system and the Freedom speech processor. In addition, children's ability to understand soft and conversational level speech in relation to aided sound-field thresholds was examined. Thirty children (age, 7 to 17 years) with the Nucleus 24 cochlear implant system and the Freedom speech processor with two different IIDR settings (30 versus 40 dB) were tested on the Consonant Nucleus Consonant (CNC) word test at 50 and 60 dB SPL, the Bamford-Kowal-Bench Speech in Noise Test, and a loudness rating task for four-talker speech noise. Aided thresholds for frequency-modulated tones, narrowband noise, and recorded Ling sounds were obtained with the two IIDRs and examined in relation to CNC scores at 50 dB SPL. Speech Intelligibility Indices were calculated using the long-term average speech spectrum of the CNC words at 50 dB SPL measured at each test site and aided thresholds. Group mean CNC scores at 50 dB SPL with the 40 IIDR were significantly higher (p Speech in Noise Test were not significantly different for the two IIDRs. Significantly improved aided thresholds at 250 to 6000 Hz as well as higher Speech Intelligibility Indices afforded improved audibility for speech presented at soft levels (50 dB SPL). These results indicate that an increased IIDR provides improved word recognition for soft levels of speech without compromising comfort of higher levels of speech sounds or sentence recognition in noise.
Lametti, Daniel R.; Rochet-Capellan, Amélie; Neufeld, Emily; Shiller, Douglas M.
Recent studies of human speech motor learning suggest that learning is accompanied by changes in auditory perception. But what drives the perceptual change? Is it a consequence of changes in the motor system? Or is it a result of sensory inflow during learning? Here, subjects participated in a speech motor-learning task involving adaptation to altered auditory feedback and they were subsequently tested for perceptual change. In two separate experiments, involving two different auditory perceptual continua, we show that changes in the speech motor system that accompany learning drive changes in auditory speech perception. Specifically, we obtained changes in speech perception when adaptation to altered auditory feedback led to speech production that fell into the phonetic range of the speech perceptual tests. However, a similar change in perception was not observed when the auditory feedback that subjects' received during learning fell into the phonetic range of the perceptual tests. This indicates that the central motor outflow associated with vocal sensorimotor adaptation drives changes to the perceptual classification of speech sounds. PMID:25080594
Raglio, Alfredo; Oasi, Osmano; Gianotti, Marta; Rossi, Agnese; Goulene, Karine; Stramba-Badiale, Marco
The aim of this research is to evaluate the effects of active music therapy (MT) based on free-improvisation (relational approach) in addition to speech language therapy (SLT) compared with SLT alone (communicative-pragmatic approach: Promoting Aphasic's Communicative Effectiveness) in stroke patients with chronic aphasia. The experimental group (n = 10) was randomized to 30 MT individual sessions over 15 weeks in addition to 30 SLT individual sessions while the control group (n = 10) was randomized to only 30 SLT sessions during the same period. Psychological and speech language assessment were made before (T0) and after (T1) the treatments. The study shows a significant improvement in spontaneous speech in the experimental group (Aachener Aphasie subtest: p = 0.020; Cohen's d = 0.35); the 50% of the experimental group showed also an improvement in vitality scores of Short Form Health Survey (chi-square test = 4.114; p = 0.043). The current trial highlights the possibility that the combined use of MT and SLT can lead to a better result in the rehabilitation of patients with aphasia than SLT alone.
Malmenholt, Ann; Lohmander, Anette; McAllister, Anita
The purpose of this study was to investigate current knowledge of the diagnosis childhood apraxia of speech (CAS) in Sweden and compare speech characteristics and symptoms to those of earlier survey findings in mainly English-speakers. In a web-based questionnaire 178 Swedish speech-language pathologists (SLPs) anonymously answered questions about their perception of typical speech characteristics for CAS. They graded own assessment skills and estimated clinical occurrence. The seven top speech characteristics reported as typical for children with CAS were: inconsistent speech production (85%), sequencing difficulties (71%), oro-motor deficits (63%), vowel errors (62%), voicing errors (61%), consonant cluster deletions (54%), and prosodic disturbance (53%). Motor-programming deficits described as lack of automatization of speech movements were perceived by 82%. All listed characteristics were consistent with the American Speech-Language-Hearing Association (ASHA) consensus-based features, Strand's 10-point checklist, and the diagnostic model proposed by Ozanne. The mode for clinical occurrence was 5%. Number of suspected cases of CAS in the clinical caseload was approximately one new patient/year and SLP. The results support and add to findings from studies of CAS in English-speaking children with similar speech characteristics regarded as typical. Possibly, these findings could contribute to cross-linguistic consensus on CAS characteristics.
Ijaz, Tazvin; Nasir, Attikah; Sarfraz, Naema; Ijaz, Shirmeen
To revise conversion disorder scale and to establish the psychometric properties of the revised scale. This case-control study was conducted from February to June, 2014, at the Government College University, Lahore, Pakistan, and comprised schoolchildren and children with conversion disorder. In order to generate items for revised version of conversion disorder scale, seven practising mental health professionals were consulted. A list of 42 items was finalised for expert ratings. After empirical validation, a scale of 40 items was administered on the participants and factor analysis was conducted. Of the240 participants, 120(50%) were schoolchildren (controls group) and 120(50%)were children with conversion disorder (clinical group).The results of factor analysis revealed five factors (swallowing and speech symptoms, motor symptoms, sensory symptoms, weakness and fatigue, and mixed symptoms) and retention of all 40 items of revised version of conversion disorder scale. Concurrent validity of the revised scale was found to be 0.81 which was significantly high. Similarly, discriminant validity of the scale was also high as both clinical and control groups had significant difference (pconversion disorder scale was 76% sensitive to predicting conversion disorder while specificity showed that the scale was 73% accurate in specifying participants of the control group. The revised version of conversion disorder scale was a reliable and valid tool to be used for screening of children with conversion disorder.
Cong, D-K; Sharikadze, M; Staude, G; Deubel, H; Wolf, W
We studied the mutual cross-talk between spontaneous eye blinks and continuous, self-paced unimanual and bimanual tapping. Both types of motor activities were analyzed with regard to their time-structure in synchronization-continuation tapping tasks which involved different task instructions, namely "standard" finger tapping (Experiment 1), "strong" tapping (Experiment 2) requiring more forceful finger movements, and "impulse-like" tapping (Experiment 3) where upward-downward finger movements had to be very fast. In a further control condition (Experiment 4), tapping was omitted altogether. The results revealed a prominent entrainment of spontaneous blink behavior by the manual tapping, with bimanual tapping being more effective than unimanual tapping, and with the "strong" and "impulse-like" tapping showing the largest effects on blink timing. Conversely, we found no significant effects of the tapping on the timing of the eye blinks across all experiments. The findings suggest a functional overlap of the motor control structures responsible for voluntary, rhythmic finger movements and eye blinking behavior.
Eskelund, Kasper; Andersen, Tobias
Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate...... of audiovisual integration specific to speech perception. However, the results of Tuomainen et al. might have been influenced by another effect. When observers were naïve, they had little motivation to look at the face. When informed, they knew that the face was relevant for the task and this could increase...... visual detection task. In our first experiment, observers presented with congruent and incongruent audiovisual sine-wave speech stimuli did only show a McGurk effect when informed of the speech nature of the stimulus. Performance on the secondary visual task was very good, thus supporting the finding...
... Staying Safe Videos for Educators Search English Español Speech-Language Therapy KidsHealth / For Parents / Speech-Language Therapy ... most kids with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech ...
Gopi, E S
Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.
Developmental apraxia of speech (DAS) in children is a speech disorder, supposed to have a neurological origin, which is commonly considered to result from particular deficits in speech processing (i.e., phonological planning, motor programming). However, the label DAS has often been used as
Full Text Available Textbooks play an important role in English Language Teaching (ELT, particularly in the English as a Foreign Language (EFL context where it provides the primary linguistic input. The present research was an attempt to comparatively evaluate the Touchstone series in terms of compliment and complaint speech acts. Four Touchstone textbooks (Book 1, Book 2, Book 3, and Book 4 were selected and content analysis was done using Olshtain and Weinbach’s (1993 complaint strategies and Wolfson and Manes’ (1980 classification of compliment. The frequencies and percentages of compliments and complaint speech acts were obtained. Data analysis showed that, first, the total frequency of the complaint speech act was higher in Touchstone, Book 4 than the other three textbooks; second, the frequency of complaint and compliment speech acts in the Writing section was quite low, but the Conversation section had a high frequency of compliment speech act in the Touchstone series; third, the expression of annoyance or disapproval complaint strategy was frequently used in the Touchstone series; fourth, the compliment strategy of ‘noun phrase + looks/is (intensifier adjective’ was very frequent in the Touchstone series; finally, there was a significant difference between the frequencies of the two speech acts, in general, in the four Touchstone textbooks. Considering the weaknesses and strengthens of Touchstone series, implications for teachers, material developers, and textbook writers are provided.
The study is aimed at the search for H-plus-electron centers of luminescence and the investigation of the conversion of H- into I centers by the luminescence of H-plus-electron centers in alkali iodide crystals. KI, RbI and NaI crystals were studied at 12 K. H and F centers were created by irradiation with ultraviolet light corresponding to the absorption band of anion excitons. Then the excitation of electron centers by red light irradiation was followed. The spectra of stimulated recombination luminescence were studied. The luminescence of H-plus- electron centers had been observed and the conclusion was made that this center was formed on immobile H centers. In case of stable H centers the optically stimulated conversion of H centers into I centers occurs. The assumption is advanced on the spontaneous annihilation of near placed unstable F, H centers which leads to the creation of H-plus-electron luminescence centers and to the spontaneous H-I-centers conversion [ru
Full Text Available Purpose: Presbycusis, age-related hearing loss, is believed to involve neural changes in the central nervous system, which is associated with an increased risk of cognitive impairment. The goal of this study was to determine if presbycusis disrupted spontaneous neural activity in specific brain areas involved in auditory processing, attention and cognitive function using resting-state functional magnetic resonance imaging (fMRI approach.Methods: Hearing and resting-state fMRI measurements were obtained from 22 presbycusis patients and 23 age-, sex- and education-matched healthy controls. To identify changes in spontaneous neural activity associated with age-related hearing loss, we compared the amplitude of low-frequency fluctuations (ALFF and regional homogeneity (ReHo of fMRI signals in presbycusis patients vs. controls and then determined if these changes were linked to clinical measures of presbycusis.Results: Compared with healthy controls, presbycusis patients manifested decreased spontaneous activity mainly in the superior temporal gyrus (STG, parahippocampal gyrus (PHG, precuneus and inferior parietal lobule (IPL as well as increased neural activity in the middle frontal gyrus (MFG, cuneus and postcentral gyrus (PoCG. A significant negative correlation was observed between ALFF/ReHo activity in the STG and average hearing thresholds in presbycusis patients. Increased ALFF/ReHo activity in the MFG was positively correlated with impaired Trail-Making Test B (TMT-B scores, indicative of impaired cognitive function involving the frontal lobe.Conclusions: Presbycusis patients have disrupted spontaneous neural activity reflected by ALFF and ReHo measurements in several brain regions; these changes are associated with specific cognitive performance and speech/language processing. These findings mainly emphasize the crucial role of aberrant resting-state ALFF/ReHo patterns in presbycusis patients and will lead to a better understanding of the
Chen, Yu-Chen; Chen, Huiyou; Jiang, Liang; Bo, Fan; Xu, Jin-Jing; Mao, Cun-Nan; Salvi, Richard; Yin, Xindao; Lu, Guangming; Gu, Jian-Ping
Purpose : Presbycusis, age-related hearing loss, is believed to involve neural changes in the central nervous system, which is associated with an increased risk of cognitive impairment. The goal of this study was to determine if presbycusis disrupted spontaneous neural activity in specific brain areas involved in auditory processing, attention and cognitive function using resting-state functional magnetic resonance imaging (fMRI) approach. Methods : Hearing and resting-state fMRI measurements were obtained from 22 presbycusis patients and 23 age-, sex- and education-matched healthy controls. To identify changes in spontaneous neural activity associated with age-related hearing loss, we compared the amplitude of low-frequency fluctuations (ALFF) and regional homogeneity (ReHo) of fMRI signals in presbycusis patients vs. controls and then determined if these changes were linked to clinical measures of presbycusis. Results : Compared with healthy controls, presbycusis patients manifested decreased spontaneous activity mainly in the superior temporal gyrus (STG), parahippocampal gyrus (PHG), precuneus and inferior parietal lobule (IPL) as well as increased neural activity in the middle frontal gyrus (MFG), cuneus and postcentral gyrus (PoCG). A significant negative correlation was observed between ALFF/ReHo activity in the STG and average hearing thresholds in presbycusis patients. Increased ALFF/ReHo activity in the MFG was positively correlated with impaired Trail-Making Test B (TMT-B) scores, indicative of impaired cognitive function involving the frontal lobe. Conclusions : Presbycusis patients have disrupted spontaneous neural activity reflected by ALFF and ReHo measurements in several brain regions; these changes are associated with specific cognitive performance and speech/language processing. These findings mainly emphasize the crucial role of aberrant resting-state ALFF/ReHo patterns in presbycusis patients and will lead to a better understanding of the
Greene, Beth G; Logan, John S; Pisoni, David B
We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.
GREENE, BETH G.; LOGAN, JOHN S.; PISONI, DAVID B.
We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered. PMID:23225916
Hearnshaw, Stephanie; Baker, Elise; Munro, Natalie
To investigate whether Australian-English speaking children with and without speech sound disorder (SSD) differ in their overall speech perception accuracy. Additionally, to investigate differences in the perception of specific phonemes and the association between speech perception and speech production skills. Twenty-five Australian-English speaking children aged 48-60 months participated in this study. The SSD group included 12 children and the typically developing (TD) group included 13 children. Children completed routine speech and language assessments in addition to an experimental Australian-English lexical and phonetic judgement task based on Rvachew's Speech Assessment and Interactive Learning System (SAILS) program (Rvachew, 2009). This task included eight words across four word-initial phonemes-/k, ɹ, ʃ, s/. Children with SSD showed significantly poorer perceptual accuracy on the lexical and phonetic judgement task compared with TD peers. The phonemes /ɹ/ and /s/ were most frequently perceived in error across both groups. Additionally, the phoneme /ɹ/ was most commonly produced in error. There was also a positive correlation between overall speech perception and speech production scores. Children with SSD perceived speech less accurately than their typically developing peers. The findings suggest that an Australian-English variation of a lexical and phonetic judgement task similar to the SAILS program is promising and worthy of a larger scale study. Copyright © 2017 Elsevier Inc. All rights reserved.
Kello, Christopher T; Bella, Simone Dalla; Médé, Butovens; Balasubramaniam, Ramesh
Humans talk, sing and play music. Some species of birds and whales sing long and complex songs. All these behaviours and sounds exhibit hierarchical structure-syllables and notes are positioned within words and musical phrases, words and motives in sentences and musical phrases, and so on. We developed a new method to measure and compare hierarchical temporal structures in speech, song and music. The method identifies temporal events as peaks in the sound amplitude envelope, and quantifies event clustering across a range of timescales using Allan factor (AF) variance. AF variances were analysed and compared for over 200 different recordings from more than 16 different categories of signals, including recordings of speech in different contexts and languages, musical compositions and performances from different genres. Non-human vocalizations from two bird species and two types of marine mammals were also analysed for comparison. The resulting patterns of AF variance across timescales were distinct to each of four natural categories of complex sound: speech, popular music, classical music and complex animal vocalizations. Comparisons within and across categories indicated that nested clustering in longer timescales was more prominent when prosodic variation was greater, and when sounds came from interactions among individuals, including interactions between speakers, musicians, and even killer whales. Nested clustering also was more prominent for music compared with speech, and reflected beat structure for popular music and self-similarity across timescales for classical music. In summary, hierarchical temporal structures reflect the behavioural and social processes underlying complex vocalizations and musical performances. © 2017 The Author(s).
Millman, Rebecca E; Mattys, Sven L; Gouws, André D; Prendergast, Garreth
Verbal communication in noisy backgrounds is challenging. Understanding speech in background noise that fluctuates in intensity over time is particularly difficult for hearing-impaired listeners with a sensorineural hearing loss (SNHL). The reduction in fast-acting cochlear compression associated with SNHL exaggerates the perceived fluctuations in intensity in amplitude-modulated sounds. SNHL-induced changes in the coding of amplitude-modulated sounds may have a detrimental effect on the ability of SNHL listeners to understand speech in the presence of modulated background noise. To date, direct evidence for a link between magnified envelope coding and deficits in speech identification in modulated noise has been absent. Here, magnetoencephalography was used to quantify the effects of SNHL on phase locking to the temporal envelope of modulated noise (envelope coding) in human auditory cortex. Our results show that SNHL enhances the amplitude of envelope coding in posteromedial auditory cortex, whereas it enhances the fidelity of envelope coding in posteromedial and posterolateral auditory cortex. This dissociation was more evident in the right hemisphere, demonstrating functional lateralization in enhanced envelope coding in SNHL listeners. However, enhanced envelope coding was not perceptually beneficial. Our results also show that both hearing thresholds and, to a lesser extent, magnified cortical envelope coding in left posteromedial auditory cortex predict speech identification in modulated background noise. We propose a framework in which magnified envelope coding in posteromedial auditory cortex disrupts the segregation of speech from background noise, leading to deficits in speech perception in modulated background noise. SIGNIFICANCE STATEMENT People with hearing loss struggle to follow conversations in noisy environments. Background noise that fluctuates in intensity over time poses a particular challenge. Using magnetoencephalography, we demonstrate
Hasse Jørgensen, Stina
About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....
Anne Birgitta Nilsen
Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the
Iuzzini-Seigel, Jenya; Hogan, Tiffany P.; Green, Jordan R.
Purpose: The current research sought to determine (a) if speech inconsistency is a core feature of childhood apraxia of speech (CAS) or if it is driven by comorbid language impairment that affects a large subset of children with CAS and (b) if speech inconsistency is a sensitive and specific diagnostic marker that can differentiate between CAS and…
Todman, John; Morrison, Zara
TALK (Talk Aid using pre-Loaded Knowledge) is a computer system linked to a speech synthesizer which enables nonspeaking people to engage in real-time social conversation. TALK provides categories of general comments that can be used whenever a suitable specific response is unavailable. Results are reported of a study evaluating effectiveness of…
Full Text Available Human locomotion typically creates noise, a possible consequence of which is the masking of sound signals originating in the surroundings. When walking side by side, people often subconsciously synchronize their steps. The neurophysiological and evolutionary background of this behavior is unclear. The present study investigated the potential of sound created by walking to mask perception of speech and compared the masking produced by walking in step with that produced by unsynchronized walking. The masking sound (footsteps on gravel and the target sound (speech were presented through the same speaker to 15 normal-hearing subjects. The original recorded walking sound was modified to mimic the sound of two individuals walking in pace or walking out of synchrony. The participants were instructed to adjust the sound level of the target sound until they could just comprehend the speech signal ("just follow conversation" or JFC level when presented simultaneously with synchronized or unsynchronized walking sound at 40 dBA, 50 dBA, 60 dBA, or 70 dBA. Synchronized walking sounds produced slightly less masking of speech than did unsynchronized sound. The median JFC threshold in the synchronized condition was 38.5 dBA, while the corresponding value for the unsynchronized condition was 41.2 dBA. Combined results at all sound pressure levels showed an improvement in the signal-to-noise ratio (SNR for synchronized footsteps; the median difference was 2.7 dB and the mean difference was 1.2 dB [P < 0.001, repeated-measures analysis of variance (RM-ANOVA]. The difference was significant for masker levels of 50 dBA and 60 dBA, but not for 40 dBA or 70 dBA. This study provides evidence that synchronized walking may reduce the masking potential of footsteps.
Hastings, P.J.; Quah, S.-K.; Borstel, R.C. von
It is stated that strains of yeast carrying mutations in many of the steps in pathways repairing radiation-induced damage to DNA have enhanced spontaneous mutation rates. Most strains isolated because they have enhanced spontaneous mutation carry mutations in DNA repair systems. This suggests that much spontaneous mutation arises by mutagenic repair of spontaneous lesions. (author)
of reduction levels and perceived speaker attributes in which moderate reduction can make a better impression on listeners than no reduction. In addition to its relevance in reduction models and theories, this interplay is instructive for various fields of speech application from social robotics to charisma...... whether variation in the degree of reduction also has a systematic effect on the attributes we ascribe to the speaker who produces the speech signal. A perception experiment was carried out for German in which 46 listeners judged whether or not speakers showing 3 different combinations of segmental...... and prosodic reduction levels (unreduced, moderately reduced, strongly reduced) are appropriately described by 13 physical, social, and cognitive attributes. The experiment shows that clear speech is not mere speech, and less clear speech is not just reduced either. Rather, results revealed a complex interplay...
Asakawa, Kaori; Tanaka, Akihiro; Imai, Hisato
We investigated whether audiovisual synchrony perception for speech could change after observation of the audiovisual temporal mismatch. Previous studies have revealed that audiovisual synchrony perception is re-calibrated after exposure to a constant timing difference between auditory and visual signals in non-speech. In the present study, we examined whether this audiovisual temporal recalibration occurs at the perceptual level even for speech (monosyllables). In Experiment 1, participants performed an audiovisual simultaneity judgment task (i.e., a direct measurement of the audiovisual synchrony perception) in terms of the speech signal after observation of the speech stimuli which had a constant audiovisual lag. The results showed that the “simultaneous” responses (i.e., proportion of responses for which participants judged the auditory and visual stimuli to be synchronous) at least partly depended on exposure lag. In Experiment 2, we adopted the McGurk identification task (i.e., an indirect measurement of the audiovisual synchrony perception) to exclude the possibility that this modulation of synchrony perception was solely attributable to the response strategy using stimuli identical to those of Experiment 1. The characteristics of the McGurk effect reported by participants depended on exposure lag. Thus, it was shown that audiovisual synchrony perception for speech could be modulated following exposure to constant lag both in direct and indirect measurement. Our results suggest that temporal recalibration occurs not only in non-speech signals but also in monosyllabic speech at the perceptual level.
Calderwood, Patricia; Mazza, Morgan Aboud; Ruel, Abiah Clarke; Favano, Amy; Jean-Guilluame, Vonick; McNeill, Daniel; Stenerson, Carolyn
In this paper we examine aspects of the construction of authentic membership, competence, and sense of shared purpose within a professional community of educators accomplished by a class of pre-service teachers during a spontaneous electronic conversation. Implications for teacher education are considered.
Uhrin, Dominik; Chmelikova, Zdenka; Tovarek, Jaromir; Partila, Pavol; Voznak, Miroslav
This article describes a system for evaluating the credibility of recordings with emotional character. Sound recordings form Czech language database for training and testing systems of speech emotion recognition. These systems are designed to detect human emotions in his voice. The emotional state of man is useful in the security forces and emergency call service. Man in action (soldier, police officer and firefighter) is often exposed to stress. Information about the emotional state (his voice) will help to dispatch to adapt control commands for procedure intervention. Call agents of emergency call service must recognize the mental state of the caller to adjust the mood of the conversation. In this case, the evaluation of the psychological state is the key factor for successful intervention. A quality database of sound recordings is essential for the creation of the mentioned systems. There are quality databases such as Berlin Database of Emotional Speech or Humaine. The actors have created these databases in an audio studio. It means that the recordings contain simulated emotions, not real. Our research aims at creating a database of the Czech emotional recordings of real human speech. Collecting sound samples to the database is only one of the tasks. Another one, no less important, is to evaluate the significance of recordings from the perspective of emotional states. The design of a methodology for evaluating emotional recordings credibility is described in this article. The results describe the advantages and applicability of the developed method.
Full Text Available Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a...
Salmi, Juha; Koistinen, Olli-Pekka; Glerean, Enrico; Jylänki, Pasi; Vehtari, Aki; Jääskeläinen, Iiro P; Mäkelä, Sasu; Nummenmaa, Lauri; Nummi-Kuisma, Katarina; Nummi, Ilari; Sams, Mikko
During a conversation or when listening to music, auditory and visual information are combined automatically into audiovisual objects. However, it is still poorly understood how specific type of visual information shapes neural processing of sounds in lifelike stimulus environments. Here we applied multi-voxel pattern analysis to investigate how naturally matching visual input modulates supratemporal cortex activity during processing of naturalistic acoustic speech, singing and instrumental music. Bayesian logistic regression classifiers with sparsity-promoting priors were trained to predict whether the stimulus was audiovisual or auditory, and whether it contained piano playing, speech, or singing. The predictive performances of the classifiers were tested by leaving one participant at a time for testing and training the model using the remaining 15 participants. The signature patterns associated with unimodal auditory stimuli encompassed distributed locations mostly in the middle and superior temporal gyrus (STG/MTG). A pattern regression analysis, based on a continuous acoustic model, revealed that activity in some of these MTG and STG areas were associated with acoustic features present in speech and music stimuli. Concurrent visual stimulus modulated activity in bilateral MTG (speech), lateral aspect of right anterior STG (singing), and bilateral parietal opercular cortex (piano). Our results suggest that specific supratemporal brain areas are involved in processing complex natural speech, singing, and piano playing, and other brain areas located in anterior (facial speech) and posterior (music-related hand actions) supratemporal cortex are influenced by related visual information. Those anterior and posterior supratemporal areas have been linked to stimulus identification and sensory-motor integration, respectively. Copyright © 2017 Elsevier Inc. All rights reserved.
Martin Ofelia POPESCU
Full Text Available The article presents a concise speech correction intervention program in of dyslalia in conjunction with capacity development of intra, interpersonal and social integration of children with speech disorders. The program main objectives represent: the potential increasing of individual social integration by correcting speech disorders in conjunction with intra- and interpersonal capacity, the potential growth of children and community groups for social integration by optimizing the socio-relational context of children with speech disorder. In the program were included 60 children / students with dyslalia speech disorders (monomorphic and polymorphic dyslalia, from 11 educational institutions - 6 kindergartens and 5 schools / secondary schools, joined with inter-school logopedic centre (CLI from Targu Jiu city and areas of Gorj district. The program was implemented under the assumption that therapeutic-formative intervention to correct speech disorders and facilitate the social integration will lead, in combination with correct pronunciation disorders, to social integration optimization of children with speech disorders. The results conirm the hypothesis and gives facts about the intervention program eficiency.
Frick, Maria; Riionheimo, Helka
Through a conversation analytic investigation of Finnish-Estonian bilingual (direct) reported speech (i.e., voicing) by Finns who live in Estonia, this study shows how code-switching is used as a double contextualization device. The code-switched voicings are shaped by the on-going interactional situation, serving its needs by opening up a context…
Zheng, Yingjun; Wu, Chao; Li, Juanhua; Li, Ruikeng; Peng, Hongjun; She, Shenglin; Ning, Yuping; Li, Liang
Speech recognition under noisy "cocktail-party" environments involves multiple perceptual/cognitive processes, including target detection, selective attention, irrelevant signal inhibition, sensory/working memory, and speech production. Compared to health listeners, people with schizophrenia are more vulnerable to masking stimuli and perform worse in speech recognition under speech-on-speech masking conditions. Although the schizophrenia-related speech-recognition impairment under "cocktail-party" conditions is associated with deficits of various perceptual/cognitive processes, it is crucial to know whether the brain substrates critically underlying speech detection against informational speech masking are impaired in people with schizophrenia. Using functional magnetic resonance imaging (fMRI), this study investigated differences between people with schizophrenia (n = 19, mean age = 33 ± 10 years) and their matched healthy controls (n = 15, mean age = 30 ± 9 years) in intra-network functional connectivity (FC) specifically associated with target-speech detection under speech-on-speech-masking conditions. The target-speech detection performance under the speech-on-speech-masking condition in participants with schizophrenia was significantly worse than that in matched healthy participants (healthy controls). Moreover, in healthy controls, but not participants with schizophrenia, the strength of intra-network FC within the bilateral caudate was positively correlated with the speech-detection performance under the speech-masking conditions. Compared to controls, patients showed altered spatial activity pattern and decreased intra-network FC in the caudate. In people with schizophrenia, the declined speech-detection performance under speech-on-speech masking conditions is associated with reduced intra-caudate functional connectivity, which normally contributes to detecting target speech against speech masking via its functions of suppressing masking-speech signals.
Full Text Available Language therapy has trafficked from a medical focus until a preventive focus. However, difficulties are evidenced in the development of this last task, because he is devoted bigger space to the correction of the disorders of the language. Because the speech disorders is the dysfunction with more frequently appearance, acquires special importance the preventive work that is developed to avoid its appearance. Speech education since early age of the childhood makes work easier for prevent the appearance of speech disorders in the children. The present work has as objective to offer different activities for the prevention of the speech disorders.
Parametric down-conversion is a second-order nonlinear optical process annihilating a pump photon and creating a pair of photons in the signal and idler modes. Then, by using two parametric down-converters and introducing a path indistinguishability for the two generated idler modes, a quantum coherence between two conjugate signal beams can be induced. Such a double spontaneous or stimulated parametric down-conversion scheme has been used to demonstrate quantum spectroscopy and imaging with undetected idler photons via measuring one-photon interference between their correlated signal beams. Recently, we considered another quantum optical measurement scheme utilizing W-type tripartite entangled signal photons that can be generated by employing three spontaneous parametric down-conversion crystals and by inducing coherences or path-indistinguishabilities between their correlated idler beams and between quantum vacuum fields. Here, we consider an extended triple stimulated parametric down-conversion scheme for quantum optical measurement of sample properties with undetected idler and photons. Noting the real effect of vacuum field indistinguishability on the fringe visibility as well as the role of zero-point field energy in the interferometry, we show that this scheme is an ideal and efficient way to create a coherent state of W-type entangled signal photons. We anticipate that this scheme would be of critical use in further developing quantum optical measurements in spectroscopy and microscopy with undetected photons.
Parametric down-conversion is a second-order nonlinear optical process annihilating a pump photon and creating a pair of photons in the signal and idler modes. Then, by using two parametric down-converters and introducing a path indistinguishability for the two generated idler modes, a quantum coherence between two conjugate signal beams can be induced. Such a double spontaneous or stimulated parametric down-conversion scheme has been used to demonstrate quantum spectroscopy and imaging with undetected idler photons via measuring one-photon interference between their correlated signal beams. Recently, we considered another quantum optical measurement scheme utilizing W-type tripartite entangled signal photons that can be generated by employing three spontaneous parametric down-conversion crystals and by inducing coherences or path-indistinguishabilities between their correlated idler beams and between quantum vacuum fields. Here, we consider an extended triple stimulated parametric down-conversion scheme for quantum optical measurement of sample properties with undetected idler and photons. Noting the real effect of vacuum field indistinguishability on the fringe visibility as well as the role of zero-point field energy in the interferometry, we show that this scheme is an ideal and efficient way to create a coherent state of W-type entangled signal photons. We anticipate that this scheme would be of critical use in further developing quantum optical measurements in spectroscopy and microscopy with undetected photons.
Schönmeyr, Björn; Wendby, Lisa; Sharma, Mitali; Jacobson, Lia; Restrepo, Carolina; Campbell, Alex
Many patients with cleft palate deformities worldwide receive treatment at a later age than is recommended for normal speech to develop. The outcomes after late palate repairs in terms of speech and quality of life (QOL) still remain largely unstudied. In the current study, questionnaires were used to assess the patients' perception of speech and QOL before and after primary palate repair. All of the patients were operated at a cleft center in northeast India and had a cleft palate with a normal lip or with a cleft lip that had been previously repaired. A total of 134 patients (7-35 years) were interviewed preoperatively and 46 patients (7-32 years) were assessed in the postoperative survey. The survey showed that scores based on the speech handicap index, concerning speech and speech-related QOL, did not improve postoperatively. In fact, the questionnaires indicated that the speech became more unpredictable (P reported that their self-confidence had improved after the operation. Thus, the majority of interviewed patients who underwent late primary palate repair were satisfied with the surgery. At the same time, speech and speech-related QOL did not improve according to the speech handicap index-based survey. Speech predictability may even become worse and nasal regurgitation may increase after late palate repair, according to these results.
Avivi-Reich, Meital; Jakubczyk, Agnes; Daneman, Meredyth; Schneider, Bruce A
We investigated how age and linguistic status affected listeners' ability to follow and comprehend 3-talker conversations, and the extent to which individual differences in language proficiency predict speech comprehension under difficult listening conditions. Younger and older L1s as well as young L2s listened to 3-talker conversations, with or without spatial separation between talkers, in either quiet or against moderate or high 12-talker babble background, and were asked to answer questions regarding their contents. After compensating for individual differences in speech recognition, no significant differences in conversation comprehension were found among the groups. As expected, conversation comprehension decreased as babble level increased. Individual differences in reading comprehension skill contributed positively to performance in younger EL1s and in young EL2s to a lesser degree but not in older EL1s. Vocabulary knowledge was significantly and positively related to performance only at the intermediate babble level. The results indicate that the manner in which spoken language comprehension is achieved is modulated by the listeners' age and linguistic status.
Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Herve
To communicate, children must discriminate and identify speech sounds. Because visual speech plays an important role in this process, we explored how visual speech influences phoneme discrimination and identification by children. Critical items had intact visual speech (e.g. baez) coupled to non-intact (excised onsets) auditory speech (signified…
section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...
Full Text Available Following a multi-talker conversation relies on the ability to rapidly and efficiently shift the focus of spatial attention from one talker to another. The current study investigated the listening costs associated with shifts in spatial attention during conversational turn-taking in 16 normally-hearing listeners using a novel sentence recall task. Three pairs of syntactically fixed but semantically unpredictable matrix sentences, recorded from a single male talker, were presented concurrently through an array of three loudspeakers (directly ahead and +/-30° azimuth. Subjects attended to one spatial location, cued by a tone, and followed the target conversation from one sentence to the next using the call-sign at the beginning of each sentence. Subjects were required to report the last three words of each sentence (speech recall task or answer multiple choice questions related to the target material (speech comprehension task. The reading span test, attention network test, and trail making test were also administered to assess working memory, attentional control, and executive function. There was a 10.7 ± 1.3% decrease in word recall, a pronounced primacy effect, and a rise in masker confusion errors and word omissions when the target switched location between sentences. Switching costs were independent of the location, direction, and angular size of the spatial shift but did appear to be load dependent and only significant for complex questions requiring multiple cognitive operations. Reading span scores were positively correlated with total words recalled, and negatively correlated with switching costs and word omissions. Task switching speed (Trail-B time was also significantly correlated with recall accuracy. Overall, this study highlights i the listening costs associated with shifts in spatial attention and ii the important role of working memory in maintaining goal relevant information and extracting meaning from dynamic multi
Schalling, Ellika; Hartelius, Lena
Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.
Lee, Jimin; Hustad, Katherine C.; Weismer, Gary
Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…
Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z
The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.
Golabbakhsh, Marzieh; Rajaei, Ali; Derakhshan, Mahmoud; Sadri, Saeed; Taheri, Masoud; Adibi, Peyman
Acoustic monitoring of swallow frequency has become important as the frequency of spontaneous swallowing can be an index for dysphagia and related complications. In addition, it can be employed as an objective quantification of ingestive behavior. Commonly, swallowing complications are manually detected using videofluoroscopy recordings, which require expensive equipment and exposure to radiation. In this study, a noninvasive automated technique is proposed that uses breath and swallowing recordings obtained via a microphone located over the laryngopharynx. Nonlinear diffusion filters were used in which a scale-space decomposition of recorded sound at different levels extract swallows from breath sounds and artifacts. This technique was compared to manual detection of swallows using acoustic signals on a sample of 34 subjects with Parkinson's disease. A speech language pathologist identified five subjects who showed aspiration during the videofluoroscopic swallowing study. The proposed automated method identified swallows with a sensitivity of 86.67 %, a specificity of 77.50 %, and an accuracy of 82.35 %. These results indicate the validity of automated acoustic recognition of swallowing as a fast and efficient approach to objectively estimate spontaneous swallow frequency.
Alicia C. Weeks MD
Full Text Available Tumor lysis syndrome (TLS is a known complication of malignancy and its treatment. The incidence varies on malignancy type, but is most common with hematologic neoplasms during cytotoxic treatment. Spontaneous TLS is thought to be rare. This case study is of a 62-year-old female admitted with multisystem organ failure, with subsequent diagnosis of aggressive B cell lymphoma. On admission, laboratory abnormalities included renal failure, elevated uric acid (20.7 mg/dL, and 3+ amorphous urates on urinalysis. Oliguric renal failure persisted despite aggressive hydration and diuretic use, requiring initiation of hemodialysis prior to chemotherapy. Antihyperuricemic therapy and hemodialysis were used to resolve hyperuricemia. However, due to multisystem organ dysfunction syndrome with extremely poor prognosis, the patient ultimately expired in the setting of a terminal ventilator wean. Although our patient did not meet current TLS criteria, she required hemodialysis due to uric acid nephropathy, a complication of TLS. This poses the clinical question of whether adequate diagnostic criteria exist for spontaneous TLS and if the lack of currently accepted guidelines has resulted in the underestimation of its incidence. Allopurinol and rasburicase are commonly used for prevention and treatment of TLS. Although both drugs decrease uric acid levels, allopurinol mechanistically prevents formation of the substrate rasburicase acts to solubilize. These drugs were administered together in our patient, although no established guidelines recommend combined use. This raises the clinical question of whether combined therapy is truly beneficial or, conversely, detrimental to patient outcomes.
De Keyser, Kim; Santens, Patrick; Bockstael, Annelies; Botteldooren, Dick; Talsma, Durk; De Vos, Stefanie; Van Cauwenberghe, Mieke; Verheugen, Femke; Corthals, Paul; De Letter, Miet
Purpose: This study investigated the possible relationship between hypokinetic speech production and speech intensity perception in patients with Parkinson's disease (PD). Method: Participants included 14 patients with idiopathic PD and 14 matched healthy controls (HCs) with normal hearing and cognition. First, speech production was objectified…
Borrie, Stephanie A
This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.
Hurkmans, Josephus Johannes Stephanus
Apraxia of Speech (AoS) is a neurogenic speech disorder. A wide variety of behavioural methods have been developed to treat AoS. Various therapy programmes use musical elements to improve speech production. A unique therapy programme combining elements of speech therapy and music therapy is called
Lewis, James R
Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application
Poole, Matthew L.; Brodtmann, Amy; Darby, David; Vogel, Adam P.
Purpose: Our purpose was to create a comprehensive review of speech impairment in frontotemporal dementia (FTD), primary progressive aphasia (PPA), and progressive apraxia of speech in order to identify the most effective measures for diagnosis and monitoring, and to elucidate associations between speech and neuroimaging. Method: Speech and…
Agus, Trevor R; Akeroyd, Michael A; Noble, William; Bhullar, Navjot
Many of the items in the "Speech, Spatial, and Qualities of Hearing" scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol. 43, 85-99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively higher scores for competing speech (six items), energetic masking (one item), and no masking (three items). The results suggest significant masking by competing speech in everyday listening situations.
Lynne E Bernstein
Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.
Full Text Available Spontaneous pneumobilia without previous surgery or interventional procedures indicates an abnormal biliary-enteric communication, most usually a cholelithiasis-related gallbladder perforation. Conversely, choledocho-duodenal fistulisation (CDF from duodenal bulb ulcer is currently exceptional, reflecting the low prevalence of peptic disease. Combination of clinical data (occurrence in middle-aged males, ulcer history, absent jaundice and cholangitis and CT findings including pneumobilia, normal gallbladder, adhesion with fistulous track between posterior duodenum and pancreatic head allow diagnosis of CDF, and differentiation from usual gallstone-related biliary fistulas requiring surgery. Conversely, ulcer-related CDF are effectively treated medically, whereas surgery is reserved for poorly controlled symptoms or major complications.
Full Text Available One of the goals of text-to-speech (TTS) systems is to produce natural-sounding synthesised speech. Towards this end various natural language processing (NLP) tasks are performed to model the prosodic aspects of the TTS voice. One of the fundamental...
...] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities... proposed compensation rates for Interstate TRS, Speech-to-Speech Services (STS), Captioned Telephone... costs reported in the data submitted to NECA by VRS providers. In this regard, document DA 10-761 also...
Silverberg, Lee J.; Raff, Lionel M.
Thermodynamic spontaneity-equilibrium criteria require that in a single-reaction system, reactions in either the forward or reverse direction at equilibrium be nonspontaneous. Conversely, the concept of dynamic equilibrium holds that forward and reverse reactions both occur at equal rates at equilibrium to the extent allowed by kinetic…
Gallardo, L.F.; Möller, S.; Beerends, J.
The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility
Gillespie, Alex; Corti, Kevin
This article examines advances in research methods that enable experimental substitution of the speaking body in unscripted face-to-face communication. A taxonomy of six hybrid social agents is presented by combining three types of bodies (mechanical, virtual, and human) with either an artificial or human speech source. Our contribution is to introduce and explore the significance of two particular hybrids: (1) the cyranoid method that enables humans to converse face-to-face through the medium of another person's body, and (2) the echoborg method that enables artificial intelligence to converse face-to-face through the medium of a human body. These two methods are distinct in being able to parse the unique influence of the human body when combined with various speech sources. We also introduce a new framework for conceptualizing the body's role in communication, distinguishing three levels: self's perspective on the body, other's perspective on the body, and self's perspective of other's perspective on the body. Within each level the cyranoid and echoborg methodologies make important research questions tractable. By conceptualizing and synthesizing these methods, we outline a novel paradigm of research on the role of the body in unscripted face-to-face communication.
Elizabeth Anggraeni Amalo
Full Text Available The teaching of English-expressions has always been done through conversation samples in form of written texts, audio recordings, and videos. In the meantime, the development of computer-aided learning technology has made autonomous language learning possible. Game, as one of computer-aided learning technology products, can serve as a medium to provide educational contents like that of language teaching and learning. Visual Novel is considered as a conversational game that is suitable to be combined with English-expressions material. Unlike the other click-based interaction Visual Novel Games, the visual novel game in this research implements speech recognition as the interaction trigger. Hence, this paper aims at elaborating how visual novel games are utilized to deliver English-expressions with speech recognition command for the interaction. This research used Research and Development (R&D method with Experimental design through control and experimental groups to measure its effectiveness in enhancing students’ English-expressions mastery. ANOVA was utilized to prove the significant differences between the control and experimental groups. It is expected that the result of this development and experiment can devote benefits to the English teaching and learning, especially on English-expressions.
Ygual-Fernandez, A; Cervera-Merida, J F
In the treatment of speech disorders by means of speech therapy two antagonistic methodological approaches are applied: non-verbal ones, based on oral motor exercises (OME), and verbal ones, which are based on speech processing tasks with syllables, phonemes and words. In Spain, OME programmes are called 'programas de praxias', and are widely used and valued by speech therapists. To review the studies conducted on the effectiveness of OME-based treatments applied to children with speech disorders and the theoretical arguments that could justify, or not, their usefulness. Over the last few decades evidence has been gathered about the lack of efficacy of this approach to treat developmental speech disorders and pronunciation problems in populations without any neurological alteration of motor functioning. The American Speech-Language-Hearing Association has advised against its use taking into account the principles of evidence-based practice. The knowledge gathered to date on motor control shows that the pattern of mobility and its corresponding organisation in the brain are different in speech and other non-verbal functions linked to nutrition and breathing. Neither the studies on their effectiveness nor the arguments based on motor control studies recommend the use of OME-based programmes for the treatment of pronunciation problems in children with developmental language disorders.
Full Text Available Auditory sensation is often thought of as a bottom-up process, yet the brain exerts top-down control to affect how and what we hear. We report the discovery that the magnitude of top-down influence varies across individuals as a result of differences in linguistic background and executive function. Participants were 32 normal-hearing individuals (23 female varying in language background (11 English monolinguals, 10 Korean-English late bilinguals, and 11 Korean-English early bilinguals, as well as cognitive abilities (working memory, cognitive control. To assess efferent control over inner ear function, participants were presented with speech-sounds (e.g., /ba/, /pa/ in one ear while spontaneous otoacoustic emissions (SOAEs were measured in the contralateral ear. SOAEs are associated with the amplification of sound in the cochlea, and can be used as an index of top-down efferent activity. Individuals with bilingual experience and those with better cognitive control experienced larger reductions in the amplitude of SOAEs in response to speech stimuli, likely as a result of greater efferent suppression of amplification in the cochlea. This suppression may aid in the critical task of speech perception by minimizing the disruptive effects of noise. In contrast, individuals with better working memory exert less control over the cochlea, possibly due to a greater capacity to process complex stimuli at later stages. These findings demonstrate that even peripheral mechanics of auditory perception are shaped by top-down cognitive and linguistic influences.
...] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...; speech-to-speech (STS); pay-per-call (900) calls; types of calls; and equal access to interexchange... of a report, due April 16, 2011, addressing whether it is necessary for the waivers to remain in...
Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan
A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.
Harley, Trevor A.
Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…
Pressman, Peter S; Simpson, Michaela; Gola, Kelly; Shdo, Suzanne M; Spinelli, Edoardo G; Miller, Bruce L; Gorno-Tempini, Maria Luisa; Rankin, Katherine; Levenson, Robert W
We performed an observational study of laughter during seminaturalistic conversations between patients with dementia and familial caregivers. Patients were diagnosed with (1) behavioural variant frontotemporal dementia (bvFTD), (2) right temporal variant frontotemporal dementia (rtFTD), (3) semantic variant of primary progressive aphasia (svPPA), (4) non-fluent variant primary progressive aphasia (nfvPPA) or (5) early onset Alzheimer's disease (eoAD). We hypothesised that those with bvFTD would laugh less in response to their own speech than other dementia groups or controls, while those with rtFTD would laugh less regardless of who was speaking. Patients with bvFTD (n=39), svPPA (n=19), rtFTD (n=14), nfvPPA (n=16), eoAD (n=17) and healthy controls (n=156) were recorded (video and audio) while discussing a problem in their relationship with a healthy control companion. Using the audio track only, laughs were identified by trained coders and then further classed by an automated algorithm as occurring during or shortly after the participant's own vocalisation ('self' context) or during or shortly after the partner's vocalisation ('partner' context). Individuals with bvFTD, eoAD or rtFTD laughed less across both contexts of self and partner than the other groups. Those with bvFTD laughed less relative to their own speech comparedwith healthy controls. Those with nfvPPA laughed more in the partner context compared with healthy controls. Laughter in response to one's own vocalisations or those of a conversational partner may be a clinically useful measure in dementia diagnosis. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang
Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.
Cao, Houwei; Savran, Arman; Verma, Ragini; Nenkova, Ani
In this article we investigate what representations of acoustics and word usage are most suitable for predicting dimensions of affect|AROUSAL, VALANCE, POWER and EXPECTANCY|in spontaneous interactions. Our experiments are based on the AVEC 2012 challenge dataset. For lexical representations, we compare corpus-independent features based on psychological word norms of emotional dimensions, as well as corpus-dependent representations. We find that corpus-dependent bag of words approach with mutual information between word and emotion dimensions is by far the best representation. For the analysis of acoustics, we zero in on the question of granularity. We confirm on our corpus that utterance-level features are more predictive than word-level features. Further, we study more detailed representations in which the utterance is divided into regions of interest (ROI), each with separate representation. We introduce two ROI representations, which significantly outperform less informed approaches. In addition we show that acoustic models of emotion can be improved considerably by taking into account annotator agreement and training the model on smaller but reliable dataset. Finally we discuss the potential for improving prediction by combining the lexical and acoustic modalities. Simple fusion methods do not lead to consistent improvements over lexical classifiers alone but improve over acoustic models.
Guddattu, Vasudeva; Krishna, Y.
The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…
Full Text Available This contribution presents two spectral amplitude estimators for acoustical background noise suppression based on maximum a posteriori estimation and super-Gaussian statistical modelling of the speech DFT amplitudes. The probability density function of the speech spectral amplitude is modelled with a simple parametric function, which allows a high approximation accuracy for Laplace- or Gamma-distributed real and imaginary parts of the speech DFT coefficients. Also, the statistical model can be adapted to optimally fit the distribution of the speech spectral amplitudes for a specific noise reduction system. Based on the super-Gaussian statistical model, computationally efficient maximum a posteriori speech estimators are derived, which outperform the commonly applied Ephraim-Malah algorithm.
Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.
An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...
Hwee Ling eLee
Full Text Available This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogues of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms. Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past three years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.
Sayyahi, Fateme; Soleymani, Zahra; Akbari, Mohammad; Bijankhan, Mahmood; Dolatshahi, Behrooz
The present study examined the relationship between gap detection threshold and speech error consistency in children with speech sound disorder. The participants were children five to six years of age who were categorized into three groups of typical speech, consistent speech disorder (CSD) and inconsistent speech disorder (ISD).The phonetic gap detection threshold test was used for this study, which is a valid test comprised six syllables with inter-stimulus intervals between 20-300ms. The participants were asked to listen to the recorded stimuli three times and indicate whether they heard one or two sounds. There was no significant difference between the typical and CSD groups (p=0.55), but there were significant differences in performance between the ISD and CSD groups and the ISD and typical groups (p=0.00). The ISD group discriminated between speech sounds at a higher threshold. Children with inconsistent speech errors could not distinguish speech sounds during time-limited phonetic discrimination. It is suggested that inconsistency in speech is a representation of inconsistency in auditory perception, which causes by high gap detection threshold. Copyright © 2016 Elsevier Ltd. All rights reserved.
Rosenblum, Lawrence D.
Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal s...
Zuk, Jennifer; Iuzzini-Seigel, Jenya; Cabbage, Kathryn; Green, Jordan R.; Hogan, Tiffany P.
Purpose: Childhood apraxia of speech (CAS) is hypothesized to arise from deficits in speech motor planning and programming, but the influence of abnormal speech perception in CAS on these processes is debated. This study examined speech perception abilities among children with CAS with and without language impairment compared to those with…
Abdulmohsen A. Dashti
Full Text Available In light of sociolinguist phonological change, the following study investigates the [j] sound in the speech of Kuwaitis as the predominant form and characterizes the sedentary population which is made up of both the indigenous and non-indigenous group; while [ʤ] is the realisation of the Bedouins who are also a part of the indigenous population. Although [ʤ] is the classical variant, it has, for some time, been regarded by Kuwaitis as the stigmatized form and the [j] as the one that carries prestige. This study examines the change of status of [j] and [ʤ] in the speech of Kuwaitis. The main hypothesis is that [j] no longer carries prestige. To test this hypothesis, 40 Kuwaitis of different gender, ages, educational background, and social networks were spontaneously chosen to be interviewed. Their speech was phonetically transcribed and accordingly was quantitatively and qualitatively analyzed. Results indicate that the [j] variant is undergoing change of status and that the social parameters and the significant political and social changes, that Kuwait has undergone recently, have triggered this linguistic shift.
It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the
Saldert, Charlotta; Hartelius, Lena
In this case study, we investigated the use of repetition in an individual with a neurogenic communication disorder. We present an analysis of interaction in natural conversations between a woman with advanced Huntington's disease (HD), whose speech had been described as sometimes characterised by echolalia, and her personal assistant. The conversational interaction is analysed on a sequential level, and recurrent patterns are explored. Although the ability of the person with HD to interact is affected by chorea, word retrieval problems and reduced comprehension, she takes an active part in conversation. The conversational partner's contributions are often adapted to her communicative ability as they are formulated as questions or suggestions that can be elaborated on or responded to with a simple 'yes' or 'no'. The person with HD often repeats the words of her conversational partner in a way that extends her contributions and shows listenership, and this use of repetition is also frequent in ordinary conversations between non-brain-damaged individuals. The results show that the conversation partners in this case cooperate in making the conversation proceed and evolve, and that verbal repetition is used in a way that works as a strategy for compensating for the impairment.
The role of speech production mechanisms in difficult speech comprehension is the subject of on-going debate in speech science. Two Activation Likelihood Estimation (ALE) analyses were conducted on neuroimaging studies investigating difficult speech comprehension or speech production. Meta-analysis 1 included 10 studies contrasting comprehension…
This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.
Davidow, Jason H.
Background: Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control…
Marjanovic, Nicholas; Piccinini, Giacomo; Kerr, Kevin; Esmailbeigi, Hananeh
Speech is an important aspect of human communication; individuals with speech impairment are unable to communicate vocally in real time. Our team has developed the TongueToSpeech (TTS) device with the goal of augmenting speech communication for the vocally impaired. The proposed device is a wearable wireless assistive device that incorporates a capacitive touch keyboard interface embedded inside a discrete retainer. This device connects to a computer, tablet or a smartphone via Bluetooth connection. The developed TTS application converts text typed by the tongue into audible speech. Our studies have concluded that an 8-contact point configuration between the tongue and the TTS device would yield the best user precision and speed performance. On average using the TTS device inside the oral cavity takes 2.5 times longer than the pointer finger using a T9 (Text on 9 keys) keyboard configuration to type the same phrase. In conclusion, we have developed a discrete noninvasive wearable device that allows the vocally impaired individuals to communicate in real time.
Holler, Judith; Schubotz, Louise; Kelly, Spencer; Hagoort, Peter; Schuetze, Manuela; Özyürek, Aslı
In human face-to-face communication, language comprehension is a multi-modal, situated activity. However, little is known about how we combine information from different modalities during comprehension, and how perceived communicative intentions, often signaled through visual signals, influence this process. We explored this question by simulating a multi-party communication context in which a speaker alternated her gaze between two recipients. Participants viewed speech-only or speech+gesture object-related messages when being addressed (direct gaze) or unaddressed (gaze averted to other participant). They were then asked to choose which of two object images matched the speaker's preceding message. Unaddressed recipients responded significantly more slowly than addressees for speech-only utterances. However, perceiving the same speech accompanied by gestures sped unaddressed recipients up to a level identical to that of addressees. That is, when unaddressed recipients' speech processing suffers, gestures can enhance the comprehension of a speaker's message. We discuss our findings with respect to two hypotheses attempting to account for how social eye gaze may modulate multi-modal language comprehension. Copyright © 2014 Elsevier B.V. All rights reserved.
Baart, Martijn; Stekelenburg, Jeroen J; Vroomen, Jean
Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were modulated by phonetic audiovisual congruency. In order to disentangle speech-specific (phonetic) integration from non-speech integration, we used Sine-Wave Speech (SWS) that was perceived as speech by half of the participants (they were in speech-mode), while the other half was in non-speech mode. Results showed that the N1 obtained with audiovisual stimuli peaked earlier than the N1 evoked by auditory-only stimuli. This lip-read induced speeding up of the N1 occurred for listeners in speech and non-speech mode. In contrast, if listeners were in speech-mode, lip-read speech also modulated the auditory P2, but not if listeners were in non-speech mode, thus revealing speech-specific audiovisual binding. Comparing ERPs for phonetically congruent audiovisual stimuli with ERPs for incongruent stimuli revealed an effect of phonetic stimulus congruency that started at ~200 ms after (in)congruence became apparent. Critically, akin to the P2 suppression, congruency effects were only observed if listeners were in speech mode, and not if they were in non-speech mode. Using identical stimuli, we thus confirm that audiovisual binding involves (partially) different neural mechanisms for sound processing in speech and non-speech mode. © 2013 Published by Elsevier Ltd.
Forsgren, Emma; Antonsson, Malin; Saldert, Charlotta
This paper reports on the adaptation of a training programme for conversation partners of persons with Parkinson's disease, and a protocol for assessment of possible changes in conversational interaction as a result of intervention. We present data from an explorative multiple case study with three individuals with Parkinson's disease and their spouses. Repeated analysis of natural conversational interaction and measures of the participants' perception of communication as well as measures of different cognitive abilities were obtained. The results show that the communication in all three dyads was affected by both speech and language problems and that the conversation training model and the assessment protocol may work well after minor adjustments. Influence of different aspects of cognition on communication is discussed.
Phifer, Gregg, Ed.
The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…
Eklund, Robert; Ingvar, Martin
Spontaneously produced Unfilled Pauses (UPs) and Filled Pauses (FPs) were played to subjects in an fMRI experiment. For both stimuli increased activity was observed in the Primary Auditory Cortex (PAC). However, FPs, but not UPs, elicited modulation in the Supplementary Motor Area (SMA), Brodmann Area 6. Our results provide neurocognitive confirmation of the alleged difference between FPs and other kinds of speech disfluency and could also provide a partial explanation for the previously repo...
Fujii, Shinya; Wan, Catherine Y
For thousands of years, human beings have engaged in rhythmic activities such as drumming, dancing, and singing. Rhythm can be a powerful medium to stimulate communication and social interactions, due to the strong sensorimotor coupling. For example, the mere presence of an underlying beat or pulse can result in spontaneous motor responses such as hand clapping, foot stepping, and rhythmic vocalizations. Examining the relationship between rhythm and speech is fundamental not only to our understanding of the origins of human communication but also in the treatment of neurological disorders. In this paper, we explore whether rhythm has therapeutic potential for promoting recovery from speech and language dysfunctions. Although clinical studies are limited to date, existing experimental evidence demonstrates rich rhythmic organization in both music and language, as well as overlapping brain networks that are crucial in the design of rehabilitation approaches. Here, we propose the "SEP" hypothesis, which postulates that (1) "sound envelope processing" and (2) "synchronization and entrainment to pulse" may help stimulate brain networks that underlie human communication. Ultimately, we hope that the SEP hypothesis will provide a useful framework for facilitating rhythm-based research in various patient populations.
Drijvers, L.; Özyürek, A.
Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech
Souchon, Nicolas; Maio, Gregory R; Hanel, Paul H P; Bardin, Brigitte
We conducted five studies testing whether an implicit measure of favorability toward power over universalism values predicts spontaneous prejudice and discrimination. Studies 1 (N = 192) and 2 (N = 86) examined correlations between spontaneous favorability toward power (vs. universalism) values, achievement (vs. benevolence) values, and a spontaneous measure of prejudice toward ethnic minorities. Study 3 (N = 159) tested whether conditioning participants to associate power values with positive adjectives and universalism values with negative adjectives (or inversely) affects spontaneous prejudice. Study 4 (N = 95) tested whether decision bias toward female handball players could be predicted by spontaneous attitude toward power (vs. universalism) values. Study 5 (N = 123) examined correlations between spontaneous attitude toward power (vs. universalism) values, spontaneous importance toward power (vs. universalism) values, and spontaneous prejudice toward Black African people. Spontaneous positivity toward power (vs. universalism) values was associated with spontaneous negativity toward minorities and predicted gender bias in a decision task, whereas the explicit measures did not. These results indicate that the implicit assessment of evaluative responses attached to human values helps to model value-attitude-behavior relations. © 2016 The Authors. Journal of Personality Published by Wiley Periodicals, Inc.
E. A. Borisova
Full Text Available The research objective is to disclose the subject matter of speech therapy work focused on fluidity speech formation of preschool age children, suffering stutter. Stutter is a difficult disorder of articulation organs suchthat the tempo-rhythmical organisation of statements is distressed that leads to defects and failures of dialogue system, negatively influences on individual development of the child; more specifically it generates the mental stratifications, specific features of emotional-volitional sphere, and causes undesirable qualities ofcharacter such as shyness, indecision, isolation, negativism. The author notes that the problem of early stutter correction among junior preschool-aged children considered as topical and immediate issue. Methods. Concerning the clinical, physiological, psychological and psychologic-pedagogical positions, the author summarizes theoretical framework; an experimentally-practical approbation of an author's method of speech fluidity and stutter abolition of preschool children is described. Stage-by-stage process of correction,spontaneous and non-convulsive speech formation: 1. restraint mode application in order to decrease incorrect verbal output; 2. training exercises to long phonatory and speech expiration; 3. development of coordination and movements rhythm helping to pronounce words and phrases; 4. formation of situational speech, at first consisted of short sentences, then passing to long ones; 5. training to coherent text statements. The research demonstrates data analyses of postexperimental diagnostic examination of stuttering preschool children, proving the efficiency of the author’s applied method. Scientific novelty. The research findings demonstrate a specific approach to correction and stutter abolition of preschool children. Proposed author’s approach consists of complementary to each other directions of speech therapy work which are combines in the following way: coherent speech
Full Text Available The precise conversion of arbitrary text into its corresponding phoneme sequence (grapheme-to-phoneme or G2P conversion is implemented in speech synthesis and recognition, pronunciation learning software, spoken term detection and spoken document retrieval systems. Because the quality of this module plays an important role in the performance of such systems and many problems regarding G2P conversion have been reported, we propose a novel two-stage model-based approach, which is implemented using an existing weighted finite-state transducer-based G2P conversion framework, to improve the performance of the G2P conversion model. The first-stage model is built for automatic conversion of words to phonemes, while the second-stage model utilizes the input graphemes and output phonemes obtained from the first stage to determine the best final output phoneme sequence. Additionally, we designed new grapheme generation rules, which enable extra detail for the vowel and consonant graphemes appearing within a word. When compared with previous approaches, the evaluation results indicate that our approach using rules focusing on the vowel graphemes slightly improved the accuracy of the out-of-vocabulary dataset and consistently increased the accuracy of the in-vocabulary dataset.
Stekelenburg, J.J.; Keetels, M.N.; Vroomen, J.H.M.
Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect.
Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.
Spectral integration (or formant averaging) was proposed in vowel perception research to account for the observation that a reduction of the intensity of one of two closely spaced formants (as in /u/) produced a predictable shift in vowel quality [Delattre et al., Word 8, 195-210 (1952)]. A related observation was reported in psychoacoustics, indicating that when the components of a two-tone periodic complex differ in amplitude and frequency, its perceived pitch is shifted toward that of the more intense tone [Helmholtz, App. XIV (1875/1948)]. Subsequent research in both fields focused on the frequency interval that separates these two spectral components, in an attempt to determine the size of the bandwidth for spectral integration to occur. This talk will review the accumulated evidence for and against spectral integration within the hypothesized limit of 3.5 Bark for static and dynamic signals in speech perception and psychoacoustics. Based on similarities in the processing of speech and non-speech sounds, it is suggested that spectral integration may reflect a general property of the auditory system. A larger frequency bandwidth, possibly close to 3.5 Bark, may be utilized in integrating acoustic information, including speech, complex signals, or sound quality of a violin.
Ossewaarde, Roelant; Jonkers, Roel; Jalvingh, Fedor; Bastiaanse, Yvonne
Measurement of speech parameters in casual speech of dementia patients Roelant Adriaan Ossewaarde1,2, Roel Jonkers1, Fedor Jalvingh1,3, Roelien Bastiaanse1 1CLCG, University of Groningen (NL); 2HU University of Applied Sciences Utrecht (NL); 33St. Marienhospital - Vechta, Geriatric Clinic Vechta
Full Text Available ABSTRACT Purpose: to compare the frequency of disfluencies and speech rate in spontaneous speech and reading in adults with and without stuttering in non-altered and delayed auditory feedback (NAF, DAF. Methods: participants were 30 adults: 15 with Stuttering (Research Group - RG, and 15 without stuttering (Control Group - CG. The procedures were: audiological assessment and speech fluency evaluation in two listening conditions, normal and delayed auditory feedback (100 milliseconds delayed by Fono Tools software. Results: the DAF caused a significant improvement in the fluency of spontaneous speech in RG when compared to speech under NAF. The effect of DAF was different in CG, because it increased the common disfluencies and the total of disfluencies in spontaneous speech and reading, besides showing an increase in the frequency of stuttering-like disfluencies in reading. The intergroup analysis showed significant differences in the two speech tasks for the two listening conditions in the frequency of stuttering-like disfluencies and in the total of disfluencies, and in the flows of syllable and word-per-minute in the NAF. Conclusion: the results demonstrated that delayed auditory feedback promoted fluency in spontaneous speech of adults who stutter, without interfering in the speech rate. In non-stuttering adults an increase occurred in the number of common disfluencies and total of disfluencies as well as reduction of speech rate in spontaneous speech and reading.
Pennington, Lindsay; Virella, Daniel; Mjøen, Tone; da Graça Andrada, Maria; Murray, Janice; Colver, Allan; Himmelmann, Kate; Rackauskaite, Gija; Greitane, Andra; Prasauskiene, Audrone; Andersen, Guro; de la Cruz, Javier
Surveillance registers monitor the prevalence of cerebral palsy and the severity of resulting impairments across time and place. The motor disorders of cerebral palsy can affect children's speech production and limit their intelligibility. We describe the development of a scale to classify children's speech performance for use in cerebral palsy surveillance registers, and its reliability across raters and across time. Speech and language therapists, other healthcare professionals and parents classified the speech of 139 children with cerebral palsy (85 boys, 54 girls; mean age 6.03 years, SD 1.09) from observation and previous knowledge of the children. Another group of health professionals rated children's speech from information in their medical notes. With the exception of parents, raters reclassified children's speech at least four weeks after their initial classification. Raters were asked to rate how easy the scale was to use and how well the scale described the child's speech production using Likert scales. Inter-rater reliability was moderate to substantial (k>.58 for all comparisons). Test-retest reliability was substantial to almost perfect for all groups (k>.68). Over 74% of raters found the scale easy or very easy to use; 66% of parents and over 70% of health care professionals judged the scale to describe children's speech well or very well. We conclude that the Viking Speech Scale is a reliable tool to describe the speech performance of children with cerebral palsy, which can be applied through direct observation of children or through case note review. Copyright © 2013 Elsevier Ltd. All rights reserved.
Buchan, Julie N; Paré, Martin; Munhall, Kevin G
During face-to-face conversation the face provides auditory and visual linguistic information, and also conveys information about the identity of the speaker. This study investigated behavioral strategies involved in gathering visual information while watching talking faces. The effects of varying talker identity and varying the intelligibility of speech (by adding acoustic noise) on gaze behavior were measured with an eyetracker. Varying the intelligibility of the speech by adding noise had a noticeable effect on the location and duration of fixations. When noise was present subjects adopted a vantage point that was more centralized on the face by reducing the frequency of the fixations on the eyes and mouth and lengthening the duration of their gaze fixations on the nose and mouth. Varying talker identity resulted in a more modest change in gaze behavior that was modulated by the intelligibility of the speech. Although subjects generally used similar strategies to extract visual information in both talker variability conditions, when noise was absent there were more fixations on the mouth when viewing a different talker every trial as opposed to the same talker every trial. These findings provide a useful baseline for studies examining gaze behavior during audiovisual speech perception and perception of dynamic faces.
Nagasawa, Tetsuro; Juhász, Csaba; Rothermel, Robert; Hoechstetter, Karsten; Sood, Sandeep; Asano, Eishi
SUMMARY High-frequency oscillations (HFOs) at ≧80 Hz of nonepileptic nature spontaneously emerge from human cerebral cortex. In 10 patients with extra-occipital lobe epilepsy, we compared the spectral-spatial characteristics of HFOs spontaneously arising from the nonepileptic occipital cortex with those of HFOs driven by a visual task as well as epileptogenic HFOs arising from the extra-occipital seizure focus. We identified spontaneous HFOs at ≧80 Hz with a mean duration of 330 msec intermittently emerging from the occipital cortex during interictal slow-wave sleep. The spectral frequency band of spontaneous occipital HFOs was similar to that of visually-driven HFOs. Spontaneous occipital HFOs were spatially sparse and confined to smaller areas, whereas visually-driven HFOs involved the larger areas including the more rostral sites. Neither spectral frequency band nor amplitude of spontaneous occipital HFOs significantly differed from those of epileptogenic HFOs. Spontaneous occipital HFOs were strongly locked to the phase of delta activity, but the strength of delta-phase coupling decayed from 1 to 3 Hz. Conversely, epileptogenic extra-occipital HFOs were locked to the phase of delta activity about equally in the range from 1 to 3 Hz. The occipital cortex spontaneously generates physiological HFOs which may stand out on electrocorticography traces as prominently as pathological HFOs arising from elsewhere; this observation should be taken into consideration during presurgical evaluation. Coupling of spontaneous delta and HFOs may increase the understanding of significance of delta-oscillations during slow-wave sleep. Further studies are warranted to determine whether delta-phase coupling distinguishes physiological from pathological HFOs or simply differs across anatomical locations. PMID:21432945
Drijvers, Linda; Ozyurek, Asli
Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. Method:…
Zhang, Ying; Wu, Ningjing; Bai, Xinyan; Xu, Changqi; Liu, Yi; Wang, Yong
The objective of this study is to report for the first time the spontaneous polymerization phenomenon of self-etch dental adhesives induced by hydroxylapatite (HAp). Model self-etch adhesives were prepared by using a monomer mixture of bis[2-(methacryloyloxy)ethyl] phosphate (2MP) with 2-hydroxyethyl methacrylate (HEMA). The initiator system consisted of camphorquinone (CQ, 0.022 mmol/g) and ethyl 4-dimethylaminobenzoate (4E, 0.022-0.088 mmol/g). HAp (2-8 wt.%) was added to the neat model adhesive. In a dark environment, the polymerization was monitored in-situ using ATR/FT-IR, and the mechanical properties of the polymerized adhesives were evaluated using nanoindentation technique. Results indicated that spontaneous polymerization was not observed in the absence of HAp. However, as different amounts of HAp were incorporated into the adhesives, spontaneous polymerization was induced. Higher HAp content led to higher degree of conversion (DC), higher rate of polymerization (RP) and shorter induction period (IP). In addition, higher 4E content also elevated DC and RP and reduced IP of the adhesives. Nanoindentation result suggested that the Young's modulus of the polymerized adhesives showed similar dependence on HAp and 4E contents. In summary, interaction with HAp could induce spontaneous polymerization of the model self-etch adhesives. This result provides important information for understanding the initiation mechanism of the self-etch adhesives, and may be of clinical significance to strengthen the adhesive/dentin interface based on the finding. Copyright © 2013 Elsevier B.V. All rights reserved.
Naidu, D.H.R.; Srinivasan, S.
Several speech enhancement approaches utilize trained models of clean speech data, such as codebooks, Gaussian mixtures, and hidden Markov models. These models are typically trained on neutral clean speech data, without any emotion. However, in practical scenarios, emotional speech is a common
Brouwer, Susanne; Van Engen, Kristin J.; Calandruccio, Lauren; Bradlow, Ann R.
This study examined whether speech-on-speech masking is sensitive to variation in the degree of similarity between the target and the masker speech. Three experiments investigated whether speech-in-speech recognition varies across different background speech languages (English vs Dutch) for both English and Dutch targets, as well as across variation in the semantic content of the background speech (meaningful vs semantically anomalous sentences), and across variation in listener status vis-à-vis the target and masker languages (native, non-native, or unfamiliar). The results showed that the more similar the target speech is to the masker speech (e.g., same vs different language, same vs different levels of semantic content), the greater the interference on speech recognition accuracy. Moreover, the listener’s knowledge of the target and the background language modulate the size of the release from masking. These factors had an especially strong effect on masking effectiveness in highly unfavorable listening conditions. Overall this research provided evidence that that the degree of target-masker similarity plays a significant role in speech-in-speech recognition. The results also give insight into how listeners assign their resources differently depending on whether they are listening to their first or second language. PMID:22352516
Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias
Seeing the talker’s articulatory mouth movements can influence the auditory speech percept both in speech identification and detection tasks. Here we show that these audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers...... often fail to perceive as speech. While audiovisual integration in the identification task only occurred when observers were informed of the speech-like nature of SWS, integration occurred in the detection task both for informed and naïve observers. This shows that both speech-specific and general...... mechanisms underlie audiovisual integration of speech....
Moulin-Frier, Clément; Arbib, Michael A
The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory.
tecture, are either wrapped natural-language processing ( NLP ) components or objects developed from scratch using the architecture’s API. GATE is...framework, we put together a demonstration Arabic -to- English speech translation system using both internally developed ( Arabic speech recognition and MT...conditions of our Arabic S2S demonstration system described earlier. Once again, the data size was varied and eighty identical requests were
Wijngaarden, S.J. van; Bronkhorst, A.W.; Houtgast, T.; Steeneken, H.J.M.
While the Speech Transmission Index ~STI! is widely applied for prediction of speech intelligibility in room acoustics and telecommunication engineering, it is unclear how to interpret STI values when non-native talkers or listeners are involved. Based on subjectively measured psychometric functions
Maas, Edwin; Mailend, Marja-Liisa
Purpose: The purpose of this article is to present an argument for the use of online reaction time (RT) methods to the study of apraxia of speech (AOS) and to review the existing small literature in this area and the contributions it has made to our fundamental understanding of speech planning (deficits) in AOS. Method: Following a brief…
Jørgensen, Søren; Dau, Torsten
conditions by comparing predictions to measured data from [Kjems et al. (2009). J. Acoust. Soc. Am. 126 (3), 1415-1426] where speech is mixed with four different interferers, including speech-shaped noise, bottle noise, car noise, and cafe noise. The model accounts well for the differences in intelligibility......The speech-based envelope power spectrum model (sEPSM) [Jørgensen and Dau (2011). J. Acoust. Soc. Am., 130 (3), 1475–1487] estimates the envelope signal-to-noise ratio (SNRenv) of distorted speech and accurately describes the speech recognition thresholds (SRT) for normal-hearing listeners...... observed for the different interferers. None of the standardized models successfully describe these data....
The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation may decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.
Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.
Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…
Castellanos, Irina; Kronenberger, William G; Beer, Jessica; Henning, Shirley C; Colson, Bethany G; Pisoni, David B
Speech and language measures during grade school predict adolescent speech-language outcomes in children who receive cochlear implants (CIs), but no research has examined whether speech and language functioning at even younger ages is predictive of long-term outcomes in this population. The purpose of this study was to examine whether early preschool measures of speech and language performance predict speech-language functioning in long-term users of CIs. Early measures of speech intelligibility and receptive vocabulary (obtained during preschool ages of 3-6 years) in a sample of 35 prelingually deaf, early-implanted children predicted speech perception, language, and verbal working memory skills up to 18 years later. Age of onset of deafness and age at implantation added additional variance to preschool speech intelligibility in predicting some long-term outcome scores, but the relationship between preschool speech-language skills and later speech-language outcomes was not significantly attenuated by the addition of these hearing history variables. These findings suggest that speech and language development during the preschool years is predictive of long-term speech and language functioning in early-implanted, prelingually deaf children. As a result, measures of speech-language functioning at preschool ages can be used to identify and adjust interventions for very young CI users who may be at long-term risk for suboptimal speech and language outcomes.
Kuliev, Anver; Janzen, Jeanine Cieslak; Zlatopolsky, Zev; Kirillova, Irina; Ilkevitch, Yury; Verlinsky, Yury
Due to the limitations of preimplantation genetic diagnosis (PGD) for chromosomal rearrangements by interphase fluorescent in-situ hybridization (FISH) analysis, a method for obtaining chromosomes from single blastomeres was introduced by their fusion with enucleated or intact mouse zygotes, followed by FISH analysis of the resulting heterokaryons. Although this allowed a significant improvement in the accuracy of testing of both maternally and paternally derived translocations, it is still labour intensive and requires the availability of fertilized mouse oocytes, also creating ethical issues related to the formation of interspecies heterokaryons. This method was modified with a chemical conversion procedure that has now been clinically applied for the first time on 877 embryos from PGD cycles for chromosomal rearrangements and has become the method of choice for performing PGD for structural rearrangements. This is presented within the context of overall experience of 475 PGD cycles for translocations with pre-selection and transfer of balanced or normal embryos in 342 (72%) of these cycles, which resulted in 131 clinical pregnancies (38%), with healthy deliveries of 113 unaffected children. The spontaneous abortion rate in these cycles was as low as 17%, which confirms an almost five-fold reduction of spontaneous abortion rate following PGD for chromosomal rearrangements. 2010 Reproductive Healthcare Ltd. Published by Elsevier Ltd. All rights reserved.
Sikveland, A.; Öttl, A.; Amdal, I.; Ernestus, M.; Svendsen, T.; Edlund, J.
Spontal-N is a corpus of spontaneous, interactional Norwegian. To our knowledge, it is the first corpus of Norwegian in which the majority of speakers have spent significant parts of their lives in Sweden, and in which the recorded speech displays varying degrees of interference from Swedish. The corpus consists of studio quality audio- and video-recordings of four 30-minute free conversations between acquaintances, and a manual orthographic transcription of the entire material. On basis of t...
Kayasith, Prakasith; Theeramunkong, Thanaruk
It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.
Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc
Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…
Oommen, Elizabeth R; McCarthy, John W
In childhood apraxia of speech (CAS), children exhibit varying levels of speech intelligibility depending on the nature of errors in articulation and prosody. Augmentative and alternative communication (AAC) strategies are beneficial, and commonly adopted with children with CAS. This study focused on the decision-making process and strategies adopted by speech-language pathologists (SLPs) when simultaneously implementing interventions that focused on natural speech and AAC. Eight SLPs, with significant clinical experience in CAS and AAC interventions, participated in an online focus group. Thematic analysis revealed eight themes: key decision-making factors; treatment history and rationale; benefits; challenges; therapy strategies and activities; collaboration with team members; recommendations; and other comments. Results are discussed along with clinical implications and directions for future research.
Tan, Zheng-Hua; Lindberg, Børge
in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within......The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR...
Christiner, Markus; Reiterer, Susanne M
In previous research on speech imitation, musicality, and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Forty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64% of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66% of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi) could be explained by working memory together with a singer's sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and auditory memory with singing fitting better into the category of "speech" on the productive level and "music" on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. (1) Motor flexibility and the ability to sing improve language and musical function. (2) Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. (3) The ability to sing improves the memory span of the auditory working memory.
Full Text Available In previous research on speech imitation, musicality and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Fourty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64 % of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66 % of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi could be explained by working memory together with a singer’s sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and sound memory with singing fitting better into the category of "speech" on the productive level and "music" on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. 1. Motor flexibility and the ability to sing improve language and musical function. 2. Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. 3. The ability to sing improves the memory span of the auditory short term memory.
Sandor, Aniko; Moses, Haifa
Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.
Allen, Winfred G., Jr., Ed.
The Freedom of Speech Newsletter is the communication medium for the Freedom of Speech Interest Group of the Western Speech Communication Association. The newsletter contains such features as a statement of concern by the National Ad Hoc Committee Against Censorship; Reticence and Free Speech, an article by James F. Vickrey discussing the subtle…
Vích, Robert; Nouza, J.; Vondra, Martin
-, č. 5042 (2008), s. 136-148 ISSN 0302-9743 R&D Projects: GA AV ČR 1ET301710509; GA AV ČR 1QS108040569 Institutional research plan: CEZ:AV0Z20670512 Keywords : speech recognition * speech processing Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering
Full Text Available This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling.
Farshid Tayari Ashtiani
Full Text Available The present study was an attempt to investigate the impact of English verbal songs on connected speech aspects of adult English learners’ speech production. 40 participants were selected based on the results of their performance in a piloted and validated version of NELSON test given to 60 intermediate English learners in a language institute in Tehran. Then they were equally distributed in two control and experimental groups and received a validated pretest of reading aloud and speaking in English. Afterward, the treatment was performed in 18 sessions by singing preselected songs culled based on some criteria such as popularity, familiarity, amount, and speed of speech delivery, etc. In the end, the posttests of reading aloud and speaking in English were administered. The results revealed that the treatment had statistically positive effects on the connected speech aspects of English learners’ speech production at statistical .05 level of significance. Meanwhile, the results represented that there was not any significant difference between the experimental group’s mean scores on the posttests of reading aloud and speaking. It was thus concluded that providing the EFL learners with English verbal songs could positively affect connected speech aspects of both modes of speech production, reading aloud and speaking. The Findings of this study have pedagogical implications for language teachers to be more aware and knowledgeable of the benefits of verbal songs to promote speech production of language learners in terms of naturalness and fluency. Keywords: English Verbal Songs, Connected Speech, Speech Production, Reading Aloud, Speaking
Meyer, Antje S; Alday, Phillip M; Decuyper, Caitlin; Knudsen, Birgit
As conversation is the most important way of using language, linguists and psychologists should combine forces to investigate how interlocutors deal with the cognitive demands arising during conversation. Linguistic analyses of corpora of conversation are needed to understand the structure of conversations, and experimental work is indispensable for understanding the underlying cognitive processes. We argue that joint consideration of corpus and experimental data is most informative when the utterances elicited in a lab experiment match those extracted from a corpus in relevant ways. This requirement to compare like with like seems obvious but is not trivial to achieve. To illustrate this approach, we report two experiments where responses to polar (yes/no) questions were elicited in the lab and the response latencies were compared to gaps between polar questions and answers in a corpus of conversational speech. We found, as expected, that responses were given faster when they were easy to plan and planning could be initiated earlier than when they were harder to plan and planning was initiated later. Overall, in all but one condition, the latencies were longer than one would expect based on the analyses of corpus data. We discuss the implication of this partial match between the data sets and more generally how corpus and experimental data can best be combined in studies of conversation.
Agus, Trevor R.; Akeroyd, Michael A.; Noble, William; Bhullar, Navjot
Many of the items in the “Speech, Spatial, and Qualities of Hearing” scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol.43, 85–99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively ...
Shearer, William M.
Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…
Heldal, Magnus; Atar, Dan
Recent-onset (duration ≤ 1 week) atrial fibrillation (AF) has a high rate of spontaneous conversion to sinus rhythm (SR); still anti-arrhythmic drugs (AAD) are given for conversion purposes. We assessed the effect of AADs by reviewing the literature regarding conversion rates of available drugs in a systematic manner. PubMed searches were performed using the terms "drug name", "atrial fibrillation", and "clinical study/RCT", and a list of 1302 titles was generated. These titles, including abstracts or complete papers when needed, were reviewed for recent-onset of AF, the use of a control group, and the endpoint of SR within 24 hours. Postoperative and intensive care settings were excluded. Five AADs were demonstrated to have an effect, and these were Amiodarone, Ibutilide (only one study and risk of torsade de pointes), Flecainide and Propafenone (only to be used in patients without structural heart disease) and Vernakalant. The time taken for conversion differed markedly; Vernakalant converted after 10 minutes, while Amiodarone converted only after 24 hours; Propafenone and Flecainide had conversion times in-between. For a rapid response in a broad group of patients, Vernakalant appears to be a reasonable first choice, while Flecainide and Propafenone can be used in patients without structural heart disease.
Basilakos, Alexandra; Rorden, Chris; Bonilha, Leonardo; Moser, Dana; Fridriksson, Julius
Acquired apraxia of speech (AOS) is a motor speech disorder caused by brain damage. AOS often co-occurs with aphasia, a language disorder in which patients may also demonstrate speech production errors. The overlap of speech production deficits in both disorders has raised questions on whether AOS emerges from a unique pattern of brain damage or as a subelement of the aphasic syndrome. The purpose of this study was to determine whether speech production errors in AOS and aphasia are associated with distinctive patterns of brain injury. Forty-three patients with history of a single left-hemisphere stroke underwent comprehensive speech and language testing. The AOS Rating Scale was used to rate speech errors specific to AOS versus speech errors that can also be associated with both AOS and aphasia. Localized brain damage was identified using structural magnetic resonance imaging, and voxel-based lesion-impairment mapping was used to evaluate the relationship between speech errors specific to AOS, those that can occur in AOS or aphasia, and brain damage. The pattern of brain damage associated with AOS was most strongly associated with damage to cortical motor regions, with additional involvement of somatosensory areas. Speech production deficits that could be attributed to AOS or aphasia were associated with damage to the temporal lobe and the inferior precentral frontal regions. AOS likely occurs in conjunction with aphasia because of the proximity of the brain areas supporting speech and language, but the neurobiological substrate for each disorder differs. © 2015 American Heart Association, Inc.
Full Text Available Maize was traditionally the second most common staple food in Indonesia. Conversion to maize flour has been accomplished to improve its convenience. Traditionally, maize flour is produced by soaking the kernels in water followed by grinding. It was reported that final physicochemical characteristics of the maize flour were influenced by spontaneous fermentation which occurred during soaking. This research aimed to isolate and identify important microorganisms that grew during fermentation thus a standardized starter culture can be developed for a more controlled fermentation process. Soaking of maize grits was conducted in sterile water (grits:water=1:2, w/v in a closed container at room temperature (±28ºC for 72 hours. After 0, 4, 12, 24, 36, 48, 72 hours, water and maize grits were sampled and tested for the presence of mold, yeast, and lactic acid bacteria (LAB. Isolates obtained from the spontaneous fermentation were reinoculated into the appropriate media containing starch to observe their amylolytic activity. Individual isolate was then identified; mold by slide culture method, while yeast and LAB by biochemical rapid kits, i.e. API 20C AUX and API CH50, respectively. The number of each microorganism was plotted against time to obtain the growth curve of the microorganisms during spontaneous fermentation. The microorganisms were identified as Penicillium chrysogenum, P. citrinum, A. flavus, A. niger, Rhizopus stolonifer, R.oryzae, Fusarium oxysporum, Acremonium strictum, Candida famata, Kodamaea ohmeri, Candida krusei/incospicua, Lactobacillus plantarum 1a, Pediococcus pentosaceus, L. brevis 1, L. plantarum 1b, and L. paracasei ssp paracasei 3. Four molds and one yeast were amylolytic while none of the LAB was capable of starch hydrolysis. The growth curve suggested that the amylolitic mold and yeast grew to hydrolyze starch during the course of fermentation, while the LABs benefited from the hydrolyzed products and dominated the later
Alim Sabur Ajibola
Full Text Available Stuttered speech is a dysfluency rich speech, more prevalent in males than females. It has been associated with insufficient air pressure or poor articulation, even though the root causes are more complex. The primary features include prolonged speech and repetitive speech, while some of its secondary features include, anxiety, fear, and shame. This study used LPC analysis and synthesis algorithms to reconstruct the stuttered speech. The results were evaluated using cepstral distance, Itakura-Saito distance, mean square error, and likelihood ratio. These measures implied perfect speech reconstruction quality. ASR was used for further testing, and the results showed that all the reconstructed speech samples were perfectly recognized while only three samples of the original speech were perfectly recognized.
Sneed, Don; Stonecipher, Harry W.
The ultimate test of the speech-action dichotomy, as it relates to symbolic speech to be considered by the courts, may be the fasting of prison inmates who use hunger strikes to protest the conditions of their confinement or to make political statements. While hunger strikes have been utilized by prisoners for years as a means of protest, it was…
Aziz, Azza Adel; Shohdi, Sahar; Osman, Dalia Mostafa; Habib, Emad Iskander
Childhood apraxia of speech is a neurological childhood speech-sound disorder in which the precision and consistency of movements underlying speech are impaired in the absence of neuromuscular deficits. Children with childhood apraxia of speech and those with multiple phonological disorder share some common phonological errors that can be misleading in diagnosis. This study posed a question about a possible significant difference in language, speech and non-speech oral performances between children with childhood apraxia of speech, multiple phonological disorder and normal children that can be used for a differential diagnostic purpose. 30 pre-school children between the ages of 4 and 6 years served as participants. Each of these children represented one of 3 possible subject-groups: Group 1: multiple phonological disorder; Group 2: suspected cases of childhood apraxia of speech; Group 3: control group with no communication disorder. Assessment procedures included: parent interviews; testing of non-speech oral motor skills and testing of speech skills. Data showed that children with suspected childhood apraxia of speech showed significantly lower language score only in their expressive abilities. Non-speech tasks did not identify significant differences between childhood apraxia of speech and multiple phonological disorder groups except for those which required two sequential motor performances. In speech tasks, both consonant and vowel accuracy were significantly lower and inconsistent in childhood apraxia of speech group than in the multiple phonological disorder group. Syllable number, shape and sequence accuracy differed significantly in the childhood apraxia of speech group than the other two groups. In addition, children with childhood apraxia of speech showed greater difficulty in processing prosodic features indicating a clear need to address these variables for differential diagnosis and treatment of children with childhood apraxia of speech. Copyright (c
Carbonell, Kathy M.
One of the lasting concerns in audiology is the unexplained individual differences in speech perception performance even for individuals with similar audiograms. One proposal is that there are cognitive/perceptual individual differences underlying this vulnerability and that these differences are present in normal hearing (NH) individuals but do not reveal themselves in studies that use clear speech produced in quiet (because of a ceiling effect). However, previous studies have failed to uncover cognitive/perceptual variables that explain much of the variance in NH performance on more challenging degraded speech tasks. This lack of strong correlations may be due to either examining the wrong measures (e.g., working memory capacity) or to there being no reliable differences in degraded speech performance in NH listeners (i.e., variability in performance is due to measurement noise). The proposed project has 3 aims; the first, is to establish whether there are reliable individual differences in degraded speech performance for NH listeners that are sustained both across degradation types (speech in noise, compressed speech, noise-vocoded speech) and across multiple testing sessions. The second aim is to establish whether there are reliable differences in NH listeners' ability to adapt their phonetic categories based on short-term statistics both across tasks and across sessions; and finally, to determine whether performance on degraded speech perception tasks are correlated with performance on phonetic adaptability tasks, thus establishing a possible explanatory variable for individual differences in speech perception for NH and hearing impaired listeners.
Meijers, A.W.M.; Tsohatzidis, S.L.
From its early development in the 1960s, speech act theory always had an individualistic orientation. It focused exclusively on speech acts performed by individual agents. Paradigmatic examples are ‘I promise that p’, ‘I order that p’, and ‘I declare that p’. There is a single speaker and a single
Светлана Викторовна Иванова
Full Text Available Discourse and media communication researchers pay attention to the fact that popular discursive and communicative practices have a tendency to hybridization and convergence. Discourse which is understood as language in use is flexible. Consequently, it turns out that one and the same text can represent several types of discourses. A vivid example of this tendency is revealed in American commencement speech / commencement address / graduation speech. A commencement speech is a speech university graduates are addressed with which in compliance with the modern trend is delivered by outstanding media personalities (politicians, athletes, actors, etc.. The objective of this study is to define the specificity of the realization of polydiscursive practices within commencement speech. The research involves discursive, contextual, stylistic and definitive analyses. Methodologically the study is based on the discourse analysis theory, in particular the notion of a discursive practice as a verbalized social practice makes up the conceptual basis of the research. This research draws upon a hundred commencement speeches delivered by prominent representatives of American society since 1980s till now. In brief, commencement speech belongs to institutional discourse public speech embodies. Commencement speech institutional parameters are well represented in speeches delivered by people in power like American and university presidents. Nevertheless, as the results of the research indicate commencement speech institutional character is not its only feature. Conceptual information analysis enables to refer commencement speech to didactic discourse as it is aimed at teaching university graduates how to deal with challenges life is rich in. Discursive practices of personal discourse are also actively integrated into the commencement speech discourse. More than that, existential discursive practices also find their way into the discourse under study. Commencement
Hurkmans, Joost; Jonkers, Roel; de Bruijn, Madeleen; Boonstra, Anne M.; Hartman, Paul P.; Arendzen, Hans; Reinders - Messelink, Heelen
Background: Several studies using musical elements in the treatment of neurological language and speech disorders have reported improvement of speech production. One such programme, Speech-Music Therapy for Aphasia (SMTA), integrates speech therapy and music therapy (MT) to treat the individual with
; speech-to-speech translation; language identiﬁcation. ... interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers.
Basilakos, Alexandra; Rorden, Chris; Bonilha, Leonardo; Moser, Dana; Fridriksson, Julius
Background and Purpose Acquired apraxia of speech (AOS) is a motor speech disorder caused by brain damage. AOS often co-occurs with aphasia, a language disorder in which patients may also demonstrate speech production errors. The overlap of speech production deficits in both disorders has raised questions regarding if AOS emerges from a unique pattern of brain damage or as a sub-element of the aphasic syndrome. The purpose of this study was to determine whether speech production errors in AOS and aphasia are associated with distinctive patterns of brain injury. Methods Forty-three patients with history of a single left-hemisphere stroke underwent comprehensive speech and language testing. The Apraxia of Speech Rating Scale was used to rate speech errors specific to AOS versus speech errors that can also be associated with AOS and/or aphasia. Localized brain damage was identified using structural MRI, and voxel-based lesion-impairment mapping was used to evaluate the relationship between speech errors specific to AOS, those that can occur in AOS and/or aphasia, and brain damage. Results The pattern of brain damage associated with AOS was most strongly associated with damage to cortical motor regions, with additional involvement of somatosensory areas. Speech production deficits that could be attributed to AOS and/or aphasia were associated with damage to the temporal lobe and the inferior pre-central frontal regions. Conclusion AOS likely occurs in conjunction with aphasia due to the proximity of the brain areas supporting speech and language, but the neurobiological substrate for each disorder differs. PMID:25908457
Madsen, Sara Miay Kim; Whiteford, Kelly L.; Oxenham, Andrew J.
Recent studies disagree on whether musicians have an advantage over non-musicians in understanding speech in noise. However, it has been suggested that musicians may be able to use diferences in fundamental frequency (F0) to better understand target speech in the presence of interfering talkers....... Here we studied a relatively large (N=60) cohort of young adults, equally divided between nonmusicians and highly trained musicians, to test whether the musicians were better able to understand speech either in noise or in a two-talker competing speech masker. The target speech and competing speech...... were presented with either their natural F0 contours or on a monotone F0, and the F0 diference between the target and masker was systematically varied. As expected, speech intelligibility improved with increasing F0 diference between the target and the two-talker masker for both natural and monotone...
Full Text Available Recurrent respiratory papillomatosis is a relatively uncommon disease that presents clinically with symptoms ranging from hoarseness to severe dyspnea. Human papilloma virus types 6 and 11 are important in the etiology of papillomas and are most probably transmitted from mother to child during birth. Although spontaneous remission is frequent, pulmonary spread and/or malignant transformation resulting in death has been reported. CO2 laser evaporation of papillomas and adjuvant drug therapy using lymphoblastoid interferon-alpha are the most common treatments. However, several other treatments have been tried, with varying success. In the present report, a case of laryngeal papillomatosis presenting with chronic stridor and delayed speech is described.
Elmahdy, Mohamed; Minker, Wolfgang
Novel Techniques for Dialectal Arabic Speech describes approaches to improve automatic speech recognition for dialectal Arabic. Since speech resources for dialectal Arabic speech recognition are very sparse, the authors describe how existing Modern Standard Arabic (MSA) speech data can be applied to dialectal Arabic speech recognition, while assuming that MSA is always a second language for all Arabic speakers. In this book, Egyptian Colloquial Arabic (ECA) has been chosen as a typical Arabic dialect. ECA is the first ranked Arabic dialect in terms of number of speakers, and a high quality ECA speech corpus with accurate phonetic transcription has been collected. MSA acoustic models were trained using news broadcast speech. In order to cross-lingually use MSA in dialectal Arabic speech recognition, the authors have normalized the phoneme sets for MSA and ECA. After this normalization, they have applied state-of-the-art acoustic model adaptation techniques like Maximum Likelihood Linear Regression (MLLR) and M...
Full Text Available The superlative quantifiers, at least and at most, are commonly assumed to have the same truth-conditions as the comparative quantifiers more than and fewer than. However, as Geurts & Nouwen (2007 have demonstrated, this is wrong, and several theories have been proposed to account for them. In this paper we propose that superlative quantifiers are illocutionary operators; specifically, they modify meta-speech acts. Meta speech-acts are operators that do not express a speech act, but a willingness to make or refrain from making a certain speech act. The classic example is speech act denegation, e.g. I don't promise to come, where the speaker is explicitly refraining from performing the speech act of promising What denegations do is to delimit the future development of conversation, that is, they delimit future admissible speech acts. Hence we call them meta-speech acts. They are not moves in a game, but rather commitments to behave in certain ways in the future. We formalize the notion of meta speech acts as commitment development spaces, which are rooted graphs: The root of the graph describes the commitment development up to the current point in conversation; the continuations from the root describe the admissible future directions. We define and formalize the meta-speech act GRANT, which indicates that the speaker, while not necessarily subscribing to a proposition, refrains from asserting its negation. We propose that superlative quantifiers are quantifiers over GRANTs. Thus, Mary petted at least three rabbits means that the minimal number n such that the speaker GRANTs that Mary petted n rabbits is n = 3. In other words, the speaker denies that Mary petted two, one, or no rabbits, but GRANTs that she petted more. We formalize this interpretation of superlative quantifiers in terms of commitment development spaces, and show how the truth conditions that are derived from it are partly entailed and partly conversationally
... to being completely unable to speak or understand speech. Causes include Hearing disorders and deafness Voice problems, ... or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism ...
Błeszyński, Jacek Jarosław
Speech of people with autism is recognised as one of the basic diagnostic, therapeutic and theoretical problems. One of the most common symptoms of autism in children is echolalia, described here as being of different types and severity. This paper presents the results of studies into different levels of echolalia, both in normally developing children and in children diagnosed with autism, discusses the differences between simple echolalia and echolalic speech - which can be considered to b...
Shriberg, Lawrence D.; Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.
Purpose: The goal of this article is to introduce the pause marker (PM), a single-sign diagnostic marker proposed to discriminate early or persistent childhood apraxia of speech (CAS) from speech delay.
In order to obtain articulatory analysis of speech production the model is improved. the standard model, as used in LPC analysis, to a large extent only models the acoustic properties of speech signal as opposed to articulatory modelling of the speech production. In spite of this the LPC model...... is by far the most widely used model in speech technology....
Melissa M Baese-Berk
Full Text Available Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate.
Baese-Berk, Melissa M; Dilley, Laura C; Schmidt, Stephanie; Morrill, Tuuli H; Pitt, Mark A
Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate.
Shin, Yu-Jeong; Ko, Seung-O
Velopharyngeal dysfunction in cleft palate patients following the primary palate repair may result in nasal air emission, hypernasality, articulation disorder and poor intelligibility of speech. Among conservative treatment methods, speech aid prosthesis combined with speech therapy is widely used method. However because of its long time of treatment more than a year and low predictability, some clinicians prefer a surgical intervention. Thus, the purpose of this report was to increase an attention on the effectiveness of speech aid prosthesis by introducing a case that was successfully treated. In this clinical report, speech bulb reduction program with intensive speech therapy was applied for a patient with velopharyngeal dysfunction and it was rapidly treated by 5months which was unusually short period for speech aid therapy. Furthermore, advantages of pre-operative speech aid therapy were discussed.
Payer, Michael; Agosti, Reto
Spontaneous idiopathic acute spinal subdural hematomas are highly exceptional. Neurological symptoms are usually severe, and rapid diagnosis with MRI is mandatory. Surgical evacuation has frequently been used therapeutically; however, spontaneous recovery in mild cases has also been reported. We present a case of spontaneous recovery from severe paraparesis after spontaneous acute SSDH, and review the English-speaking literature.
Jørgensen, Søren; Cubick, Jens; Dau, Torsten
In the development process of modern telecommunication systems, such as mobile phones, it is common practice to use computer models to objectively evaluate the transmission quality of the system, instead of time-consuming perceptual listening tests. Such models have typically focused on the quality...... of the transmitted speech, while little or no attention has been provided to speech intelligibility. The present study investigated to what extent three state-of-the art speech intelligibility models could predict the intelligibility of noisy speech transmitted through mobile phones. Sentences from the Danish...... Dantale II speech material were mixed with three different kinds of background noise, transmitted through three different mobile phones, and recorded at the receiver via a local network simulator. The speech intelligibility of the transmitted sentences was assessed by six normal-hearing listeners...
Full Text Available Previous event-related potential (ERP research utilizing oddball stimulus paradigms suggests diminished processing of speech versus non-speech sounds in children with an Autism Spectrum Disorder (ASD. However, brain mechanisms underlying these speech processing abnormalities, and to what extent they are related to poor language abilities in this population remain unknown. In the current study, we utilized a novel paired repetition paradigm in order to investigate ERP responses associated with the detection and discrimination of speech and non-speech sounds in 4- to 6-year old children with ASD, compared with gender and verbal age matched controls. ERPs were recorded while children passively listened to pairs of stimuli that were either both speech sounds, both non-speech sounds, speech followed by non-speech, or non-speech followed by speech. Control participants exhibited N330 match/mismatch responses measured from temporal electrodes, reflecting speech versus non-speech detection, bilaterally, whereas children with ASD exhibited this effect only over temporal electrodes in the left hemisphere. Furthermore, while the control groups exhibited match/mismatch effects at approximately 600 ms (central N600, temporal P600 when a non-speech sound was followed by a speech sound, these effects were absent in the ASD group. These findings suggest that children with ASD fail to activate right hemisphere mechanisms, likely associated with social or emotional aspects of speech detection, when distinguishing non-speech from speech stimuli. Together, these results demonstrate the presence of atypical speech versus non-speech processing in children with ASD when compared with typically developing children matched on verbal age.
Galilee, Alena; Stefanidou, Chrysi; McCleery, Joseph P
Previous event-related potential (ERP) research utilizing oddball stimulus paradigms suggests diminished processing of speech versus non-speech sounds in children with an Autism Spectrum Disorder (ASD). However, brain mechanisms underlying these speech processing abnormalities, and to what extent they are related to poor language abilities in this population remain unknown. In the current study, we utilized a novel paired repetition paradigm in order to investigate ERP responses associated with the detection and discrimination of speech and non-speech sounds in 4- to 6-year old children with ASD, compared with gender and verbal age matched controls. ERPs were recorded while children passively listened to pairs of stimuli that were either both speech sounds, both non-speech sounds, speech followed by non-speech, or non-speech followed by speech. Control participants exhibited N330 match/mismatch responses measured from temporal electrodes, reflecting speech versus non-speech detection, bilaterally, whereas children with ASD exhibited this effect only over temporal electrodes in the left hemisphere. Furthermore, while the control groups exhibited match/mismatch effects at approximately 600 ms (central N600, temporal P600) when a non-speech sound was followed by a speech sound, these effects were absent in the ASD group. These findings suggest that children with ASD fail to activate right hemisphere mechanisms, likely associated with social or emotional aspects of speech detection, when distinguishing non-speech from speech stimuli. Together, these results demonstrate the presence of atypical speech versus non-speech processing in children with ASD when compared with typically developing children matched on verbal age.
Chung, Tae Sub; Suh, Jung Ho; Kim, Dong Ik; Kim, Gwi Eon; Hong, Won Phy; Lee, Won Sang
Total laryngectomee requires some form of alaryngeal speech for communication. Generally, esophageal speech is regarded as the most available and comfortable technique for alaryngeal speech. But esophageal speech is difficult to train, so many patients are unable to attain esophageal speech for communication. To understand mechanism of esophageal of esophageal speech on total laryngectomee, evaluation of anatomical change of the pharyngoesophageal segment is very important. We used video fluoroscopy for evaluation of pharyngesophageal segment during esophageal speech. Eighteen total laryngectomees were evaluated with video fluoroscopy from Dec. 1986 to May 1987 at Y.U.M.C. Our results were as follows: 1. Peseudoglottis is the most important factor for esophageal speech, which is visualized in 7 cases among 8 cases of excellent esophageal speech group. 2. Two cases of longer A-P diameter at the pseudoglottis have the best quality of esophageal speech than others. 3. Two cases of mucosal vibration at the pharyngoesophageal segment can make excellent esophageal speech. 4. The cases of failed esophageal speech are poor aerophagia in 6 cases, abscence of pseudoglottis in 4 cases and poor air ejection in 3 cases. 5. Aerophagia synchronizes with diaphragmatic motion in 8 cases of excellent esophageal speech.
Full Text Available In patients suffering from head and neck cancer, speech intelligibility is often restricted. For assessment and outcome measurements, automatic speech recognition systems have previously been shown to be appropriate for objective and quick evaluation of intelligibility. In this study we investigate the applicability of the method to speech disorders caused by head and neck cancer. Intelligibility was quantified by speech recognition on recordings of a standard text read by 41 German laryngectomized patients with cancer of the larynx or hypopharynx and 49 German patients who had suffered from oral cancer. The speech recognition provides the percentage of correctly recognized words of a sequence, that is, the word recognition rate. Automatic evaluation was compared to perceptual ratings by a panel of experts and to an age-matched control group. Both patient groups showed significantly lower word recognition rates than the control group. Automatic speech recognition yielded word recognition rates which complied with experts' evaluation of intelligibility on a significant level. Automatic speech recognition serves as a good means with low effort to objectify and quantify the most important aspect of pathologic speech—the intelligibility. The system was successfully applied to voice and speech disorders.
This PhD thesis in human-computer interfaces (informatics) studies the case of the anaesthesia record used during medical operations and the possibility to supplement it with speech recognition facilities. Problems and limitations have been identified with the traditional paper-based anaesthesia...... and inaccuracies in the anaesthesia record. Supplementing the electronic anaesthesia record interface with speech input facilities is proposed as one possible solution to a part of the problem. The testing of the various hypotheses has involved the development of a prototype of an electronic anaesthesia record...... interface with speech input facilities in Danish. The evaluation of the new interface was carried out in a full-scale anaesthesia simulator. This has been complemented by laboratory experiments on several aspects of speech recognition for this type of use, e.g. the effects of noise on speech recognition...
...] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities... for telecommunications relay services (TRS) by eliminating standards for Internet-based relay services... comments, identified by CG Docket No. 03-123, by any of the following methods: Electronic Filers: Comments...
Conversational prosody or tone of voice (e.g. intonation, pauses, speech rate etc.) plays an essential role in our daily communication. Research studies in various contexts have shown that prosody can function as an interactional device for the management of our social interaction (Hellermann, 2003, Wennerstrom, 2001, Wells and Macfarlane, 1998, Couper-Kuhlen, 1996). However, not much research focus has been given to the pedagogical implications of conversational prosody in classroom teaching...
A. A. Karpov
Full Text Available We present analytical survey of state-of-the-art actual tasks in the area of computational paralinguistics, as well as the recent achievements of automatic systems for paralinguistic analysis of conversational speech. Paralinguistics studies non-verbal aspects of human communication and speech such as: natural emotions, accents, psycho-physiological states, pronunciation features, speaker’s voice parameters, etc. We describe architecture of a baseline computer system for acoustical paralinguistic analysis, its main components and useful speech processing methods. We present some information on an International contest called Computational Paralinguistics Challenge (ComParE, which is held each year since 2009 in the framework of the International conference INTERSPEECH organized by the International Speech Communication Association. We present sub-challenges (tasks that were proposed at the ComParE Challenges in 2009-2016, and analyze winning computer systems for each sub-challenge and obtained results. The last completed ComParE-2015 Challenge was organized in September 2015 in Germany and proposed 3 sub-challenges: 1 Degree of Nativeness (DN sub-challenge, determination of nativeness degree of speakers based on acoustics; 2 Parkinson's Condition (PC sub-challenge, recognition of a degree of Parkinson’s condition based on speech analysis; 3 Eating Condition (EC sub-challenge, determination of the eating condition state during speaking or a dialogue, and classification of consumed food type (one of seven classes of food by the speaker. In the last sub-challenge (EC, the winner was a joint Turkish-Russian team consisting of the authors of the given paper. We have developed the most efficient computer-based system for detection and classification of the corresponding (EC acoustical paralinguistic events. The paper deals with the architecture of this system, its main modules and methods, as well as the description of used training and evaluation
Sandor, A.; Moses, H. R.
Currently on the International Space Station (ISS) and other space vehicles Caution & Warning (C&W) alerts are represented with various auditory tones that correspond to the type of event. This system relies on the crew's ability to remember what each tone represents in a high stress, high workload environment when responding to the alert. Furthermore, crew receive a year or more in advance of the mission that makes remembering the semantic meaning of the alerts more difficult. The current system works for missions conducted close to Earth where ground operators can assist as needed. On long duration missions, however, they will need to work off-nominal events autonomously. There is evidence that speech alarms may be easier and faster to recognize, especially during an off-nominal event. The Information Presentation Directed Research Project (FY07-FY09) funded by the Human Research Program included several studies investigating C&W alerts. The studies evaluated tone alerts currently in use with NASA flight deck displays along with candidate speech alerts. A follow-on study used four types of speech alerts to investigate how quickly various types of auditory alerts with and without a speech component - either at the beginning or at the end of the tone - can be identified. Even though crew were familiar with the tone alert from training or direct mission experience, alerts starting with a speech component were identified faster than alerts starting with a tone. The current study replicated the results from the previous study in a more rigorous experimental design to determine if the candidate speech alarms are ready for transition to operations or if more research is needed. Four types of alarms (caution, warning, fire, and depressurization) were presented to participants in both tone and speech formats in laboratory settings and later in the Human Exploration Research Analog (HERA). In the laboratory study, the alerts were presented by software and participants were
Schall, Sonja; von Kriegstein, Katharina
It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.
Herbelin, Bruno; Jensen, Karl Kristoffer; Graugaard, Lars
Speech is both beautiful and informative. In this work, a conceptual study of the speech, through investigation of the tower of Babel, the archetypal phonemes, and a study of the reasons of uses of language is undertaken in order to create an artistic work investigating the nature of speech. The ....... The artwork is presented at the Re:New festival in May 2008....
Theys, Catherine; van Wieringen, Astrid; De Nil, Luc F.
This study presents survey data on 58 Dutch-speaking patients with neurogenic stuttering following various neurological injuries. Stroke was the most prevalent cause of stuttering in our patients, followed by traumatic brain injury, neurodegenerative diseases, and other causes. Speech and non-speech characteristics were analyzed separately for…
Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders provides a survey of methods designed to aid clinicians in the diagnosis and monitoring of speech disorders such as dysarthria and dyspraxia, with an emphasis on the signal processing techniques, statistical validity of the results presented in the literature, and the appropriateness of methods that do not require specialized equipment, rigorously controlled recording procedures or highly skilled personnel to interpret results. Such techniques offer the promise of a simple and cost-effective, yet objective, assessment of a range of medical conditions, which would be of great value to clinicians. The ideal scenario would begin with the collection of examples of the clients’ speech, either over the phone or using portable recording devices operated by non-specialist nursing staff. The recordings could then be analyzed initially to aid diagnosis of conditions, and subsequently to monitor the clients’ progress and res...
Ding, Nai; Patel, Aniruddh D; Chen, Lin; Butler, Henry; Luo, Cheng; Poeppel, David
Speech and music have structured rhythms. Here we discuss a major acoustic correlate of spoken and musical rhythms, the slow (0.25-32Hz) temporal modulations in sound intensity and compare the modulation properties of speech and music. We analyze these modulations using over 25h of speech and over 39h of recordings of Western music. We show that the speech modulation spectrum is highly consistent across 9 languages (including languages with typologically different rhythmic characteristics). A different, but similarly consistent modulation spectrum is observed for music, including classical music played by single instruments of different types, symphonic, jazz, and rock. The temporal modulations of speech and music show broad but well-separated peaks around 5 and 2Hz, respectively. These acoustically dominant time scales may be intrinsic features of speech and music, a possibility which should be investigated using more culturally diverse samples in each domain. Distinct modulation timescales for speech and music could facilitate their perceptual analysis and its neural processing. Copyright © 2017 Elsevier Ltd. All rights reserved.
Christiner, Markus; Reiterer, Susanne M.
In previous research on speech imitation, musicality, and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Forty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64% of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66% of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi) could be explained by working memory together with a singer's sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and auditory memory with singing fitting better into the category of “speech” on the productive level and “music” on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. (1) Motor flexibility and the ability to sing improve language and musical function. (2) Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. (3) The ability to sing improves the memory span of the auditory working memory. PMID:24319438
Shriberg, Lawrence D.; Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.
Purpose: Previous articles in this supplement described rationale for and development of the pause marker (PM), a diagnostic marker of childhood apraxia of speech (CAS), and studies supporting its validity and reliability. The present article assesses the theoretical coherence of the PM with speech processing deficits in CAS. Method: PM and other…
Lockton, Elaine; Adams, Catherine; Collins, Anna
Background: Children who have social communication disorder (CwSCD) demonstrate persistent difficulties with language pragmatics in conversations and other verbal interactions. Speech-language interventions for these children often include promotion of metapragmatic awareness (MPA); that is, the ability to identify explicitly and reflect upon…
Moharir, Madhavi; Barnett, Noel; Taras, Jillian; Cole, Martha; Ford-Jones, E Lee; Levin, Leo
Failure to recognize and intervene early in speech and language delays can lead to multifaceted and potentially severe consequences for early child development and later literacy skills. While routine evaluations of speech and language during well-child visits are recommended, there is no standardized (office) approach to facilitate this. Furthermore, extensive wait times for speech and language pathology consultation represent valuable lost time for the child and family. Using speech and language expertise, and paediatric collaboration, key content for an office-based tool was developed. early and accurate identification of speech and language delays as well as children at risk for literacy challenges; appropriate referral to speech and language services when required; and teaching and, thus, empowering parents to create rich and responsive language environments at home. Using this tool, in combination with the Canadian Paediatric Society's Read, Speak, Sing and Grow Literacy Initiative, physicians will be better positioned to offer practical strategies to caregivers to enhance children's speech and language capabilities. The tool represents a strategy to evaluate speech and language delays. It depicts age-specific linguistic/phonetic milestones and suggests interventions. The tool represents a practical interim treatment while the family is waiting for formal speech and language therapy consultation.
Informed consent mandates for abortion providers may infringe the First Amendment's freedom of speech. On the other hand, they may reinforce the physician's duty to obtain informed consent. Courts can promote both doctrines by ensuring that compelled physician speech pertains to medical facts about abortion rather than abortion ideology and that compelled speech is truthful and not misleading. © 2015 American Society of Law, Medicine & Ethics, Inc.
Bruce, Carolyn; Braidwood, Ursula; Newton, Caroline
Evidence shows that speakers adjust their speech depending on the demands of the listener. However, it is unclear whether people with acquired communication disorders can and do make similar adaptations. This study investigated the impact of different conversational settings on the intelligibility of a speaker with acquired communication difficulties. Twenty-eight assessors listened to recordings of the speaker reading aloud 40 words and 32 sentences to a listener who was either face-to-face or unseen. The speaker's ability to convey information was measured by the accuracy of assessors' orthographic transcriptions of the words and sentences. Assessors' scores were significantly higher in the unseen condition for the single word task particularly if they had heard the face-to-face condition first. Scores for the sentence task were significantly higher in the second presentation regardless of the condition. The results from this study suggest that therapy conducted in situations where the client is not able to see their conversation partner may encourage them to perform at a higher level and increase the clarity of their speech. Readers will be able to describe: (1) the range of conversational adjustments made by speakers without communication difficulties; (2) differences between these tasks in offering contextual information to the listener; and (3) the potential for using challenging communicative situations to improve the performance of adults with communication disorders. Copyright © 2013 Elsevier Inc. All rights reserved.
Benesty, Jacob; Chen, Jingdong
We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red
Full Text Available The authors analyse the effect of speech rate variation on Afrikaans phone stability from an acoustic perspective. Specifically they introduce two techniques for the acoustic analysis of speech rate variation, apply these techniques to an Afrikaans...
Hitch, Graham J.; And Others
Reports on experiments to determine effects of overt speech on children's use of inner speech in short-term memory. Word length and phonemic similarity had greater effects on older children and when pictures were labeled at presentation. Suggests that speaking or listening to speech activates an internal articulatory loop. (Author/GH)