WorldWideScience

Sample records for tasks tseltal speakers

  1. y la cultura tseltal

    Directory of Open Access Journals (Sweden)

    Antonio Paoli

    2006-01-01

    Full Text Available Se presenta un panorama general sobre las lenguas mayas; se ejemplifican brevemente sus formas de variación y se hace una sugerencia metódica para aproximarse paulatinamente a la comprensión de la lengua y la cultura tseltal. En cinco pequeñas frases, sugeridas para iniciarse en esta bella lengua, ya se alcanza a percibir el contraste contundente entre visiones del mundo codificadas en el tseltal y en el español. Este artículo nos introduce a algunas estructuras gramaticales claves del tseltal, comunes a la gran mayoría de las lenguas mayas. En la segunda mitad del artículo se nos aproxima a nociones que se traducen por “educación”, “autonomía” e “identidad” del mundo tseltal; esto se realiza mediante un análisis sociolingüístico y una aproximación humanística a este pueblo indígena del sureste mexicano.

  2. A Nonword Repetition Task for Speakers with Misarticulations: The Syllable Repetition Task (SRT)

    Science.gov (United States)

    Shriberg, Lawrence D.; Lohmeier, Heather L.; Campbell, Thomas F.; Dollaghan, Christine A.; Green, Jordan R.; Moore, Christopher A.

    2009-01-01

    Purpose: Conceptual and methodological confounds occur when non(sense) word repetition tasks are administered to speakers who do not have the target speech sounds in their phonetic inventories or who habitually misarticulate targeted speech sounds. In this article, the authors (a) describe a nonword repetition task, the Syllable Repetition Task…

  3. Elementos de la praxis y del corpus del conocimiento etnoecológico tseltal en comunidades de la Sierra Norte de Chiapas

    Directory of Open Access Journals (Sweden)

    José Ramón Rodríguez Moreno

    2014-01-01

    Full Text Available Las estrategias de uso y manejo de los ecosistemas como las que practican los tseltales en la Sierra Norte de Chiapas han sido insistentemente descritas por una visión occidental “científica” como sistemas de baja productividad y derrochadoras de recursos naturales. En este estudio se reseñan algunos aspectos del cosmos, corpus y praxis del conocimiento que se posee en comunidades tseltales, a partir del hilo conductor de un rubro específico del campo mexicano, el maíz, y de otras manifestaciones del conocimiento local, que permiten desmontar la marginación del sistema de conocimiento local con que lo ha revestido la visión occidental.

  4. Ecoturismo y reapropiación social de recursos naturales entre los tseltales de El Corralito, Oxchuc, Chiapas

    Directory of Open Access Journals (Sweden)

    Julio César Sánchez Morales

    2012-01-01

    Full Text Available El objetivo de este artículo es analizar el proceso de reapropiación social de los recursos naturales entre los tseltales de la comunidad El Corralito, municipio de Oxchuc, Chiapas, luego de la implementación de un proyecto ecoturístico en sus tierras desde el año 2002. Este proceso de reapropiación social ha englobado elementos sociales, culturales y económicos a nivel del grupo tseltal; para comprender estos procesos, indagamos en las siguientes dimensiones: presencia de innovaciones y experimentación, tanto en lo organizativo, económico y ambiental; así como la participación local y la capacidad de agencia para el desarrollo del proyecto ecoturístico en la zona. Ahora bien, el trabajo se desarrolló con el apoyo del instrumental metodológico cualitativo: entrevistas a profundidad, etnografías, observación participante y encuestas, de la que se obtuvieron datos cuantitativos.

  5. Speaker Recognition

    DEFF Research Database (Denmark)

    Mølgaard, Lasse Lohilahti; Jørgensen, Kasper Winther

    2005-01-01

    Speaker recognition is basically divided into speaker identification and speaker verification. Verification is the task of automatically determining if a person really is the person he or she claims to be. This technology can be used as a biometric feature for verifying the identity of a person...

  6. Autonomía, socialización y comunidad tseltal.

    Directory of Open Access Journals (Sweden)

    Antonio Paoli

    2002-01-01

    Full Text Available Se caracteriza a la autonomía entre los tseltales en el estado de Chiapas, México y se muestran algunas pautas fundamentales de socialización e integración social tradicional. En este artículo se presenta a modo de hipótesis algunas tendencias sobre la ruptura de formas ancestrales de integración y solidaridad. Nos aproximamos a su lengua, su organización social y su historia política. Las tres dimensiones son complementarias, ya que pretendemos ubicar al lector en una amplia perspectiva para que desde ella se comprenda la socialización y así entender la integración social y algunas de sus grandes rupturas. Tratamos de presentar una introducción a su autonomía, principalmente la pequeña comunidad y la comarca india o comunidad de comunidades, pero se apunta también a lo familiar y al pueblo indio.

  7. Aprendizaje tseltal: construir conocimientos con la alegría del corazón

    Directory of Open Access Journals (Sweden)

    Jorge Urdapilleta Carrasco

    2016-07-01

    Full Text Available Para lograr una educación intercultural basada en la comprensión e inclusión de los valores de la cultura tseltal, se describen los principales elementos identificados en diferentes situaciones de la vida cotidiana, y posteriormente se analizan mediante la etnografía educativa interpretativa en dos casos: en el ámbito familiar y en un espacio de capacitación. Se busca examinar la influencia de los valores en la construcción social de conocimientos, y se concluye que es la “risa del corazón” y la aplicación práctica lo que más influye en que se produzca un aprendizaje significativo.

  8. Lo etnojuvenil. Un análisis sobre el cambio sociocultural entre tsotsiles, tseltales y choles

    Directory of Open Access Journals (Sweden)

    Tania Cruz Salazar

    2017-01-01

    Full Text Available Estudio cómo lo juvenil indígena se reconfigura en tanto nueva etnicidad generacional, de una forma alejada de “la costumbre” comunitaria. Mis reflexiones se basan en relatos que ilustran las condiciones étnicas de los jóvenes en un tiempo nuevo, con una visión diferente a la de sus padres y abuelos, con búsquedas, valores, emociones y expectativas que indican nuevos rumbos. Usé el método etnográfico y realicé entrevistas, conversaciones informales, observación participante y revisión bibliográfica y hemerográfica durante siete años en comunidades tsotsiles, tseltales y choles. Este material develó contenidos transversales de lo etnojuvenil, la nueva categoría analítica que propongo.

  9. Utterance Verification for Text-Dependent Speaker Recognition

    DEFF Research Database (Denmark)

    Kinnunen, Tomi; Sahidullah, Md; Kukanov, Ivan

    2016-01-01

    Text-dependent automatic speaker verification naturally calls for the simultaneous verification of speaker identity and spoken content. These two tasks can be achieved with automatic speaker verification (ASV) and utterance verification (UV) technologies. While both have been addressed previously...

  10. The neural correlates of agrammatism: Evidence from aphasic and healthy speakers performing an overt picture description task

    Directory of Open Access Journals (Sweden)

    Eva eSchoenberger

    2014-03-01

    Full Text Available Functional brain imaging studies have improved our knowledge of the neural localization of language functions and the functional recovery after a lesion. However, the neural correlates of agrammatic symptoms in aphasia remain largely unknown. The present fMRI study examined the neural correlates of morpho-syntactic encoding and agrammatic errors in continuous language production by combining three approaches. First, the neural mechanisms underlying natural morpho-syntactic processing in a picture description task were analyzed in 15 healthy speakers. Second, agrammatic-like speech behavior was induced in the same group of healthy speakers to study the underlying functional processes by limiting the utterance length. In a third approach, five agrammatic participants performed the picture description task to gain insights in the neural correlates of agrammatism and the functional reorganization of language processing after stroke. In all approaches, utterances were analyzed for syntactic completeness, complexity and morphology. Event-related data analysis was conducted by defining every clause-like unit (CLU as an event with its onset-time and duration. Agrammatic and correct CLUs were contrasted. Due to the small sample size as well as heterogeneous lesion sizes and sites with lesion foci in the insula lobe, inferior frontal, superior temporal and inferior parietal areas the activation patterns in the agrammatic speakers were analyzed on a single subject level. In the group of healthy speakers, posterior temporal and inferior parietal areas were associated with greater morpho-syntactic demands in complete and complex CLUs. The intentional manipulation of morpho-syntactic structures and the omission of function words were associated with additional inferior frontal activation. Overall, the results revealed that the investigation of the neural correlates of agrammatic language production can be reasonably conducted with an overt language production

  11. The neural correlates of agrammatism: Evidence from aphasic and healthy speakers performing an overt picture description task.

    Science.gov (United States)

    Schönberger, Eva; Heim, Stefan; Meffert, Elisabeth; Pieperhoff, Peter; da Costa Avelar, Patricia; Huber, Walter; Binkofski, Ferdinand; Grande, Marion

    2014-01-01

    Functional brain imaging studies have improved our knowledge of the neural localization of language functions and the functional reorganization after a lesion. However, the neural correlates of agrammatic symptoms in aphasia remain largely unknown. The present fMRI study examined the neural correlates of morpho-syntactic encoding and agrammatic errors in continuous language production by combining three approaches. First, the neural mechanisms underlying natural morpho-syntactic processing in a picture description task were analyzed in 15 healthy speakers. Second, agrammatic-like speech behavior was induced in the same group of healthy speakers to study the underlying functional processes by limiting the utterance length. In a third approach, five agrammatic participants performed the picture description task to gain insights in the neural correlates of agrammatism and the functional reorganization of language processing after stroke. In all approaches, utterances were analyzed for syntactic completeness, complexity, and morphology. Event-related data analysis was conducted by defining every clause-like unit (CLU) as an event with its onset-time and duration. Agrammatic and correct CLUs were contrasted. Due to the small sample size as well as heterogeneous lesion sizes and sites with lesion foci in the insula lobe, inferior frontal, superior temporal and inferior parietal areas the activation patterns in the agrammatic speakers were analyzed on a single subject level. In the group of healthy speakers, posterior temporal and inferior parietal areas were associated with greater morpho-syntactic demands in complete and complex CLUs. The intentional manipulation of morpho-syntactic structures and the omission of function words were associated with additional inferior frontal activation. Overall, the results revealed that the investigation of the neural correlates of agrammatic language production can be reasonably conducted with an overt language production paradigm.

  12. An introduction to application-independent evaluation of speaker recognition systems

    NARCIS (Netherlands)

    Leeuwen, D.A. van; Brümmer, N.

    2007-01-01

    In the evaluation of speaker recognition systems - an important part of speaker classification [1], the trade-off between missed speakers and false alarms has always been an important diagnostic tool. NIST has defined the task of speaker detection with the associated Detection Cost Function (DCF) to

  13. Physiological responses at short distances from a parametric speaker

    Directory of Open Access Journals (Sweden)

    Lee Soomin

    2012-06-01

    Full Text Available Abstract In recent years, parametric speakers have been used in various circumstances. In our previous studies, we verified that the physiological burden of the sound of parametric speaker set at 2.6 m from the subjects was lower than that of the general speaker. However, nothing has yet been demonstrated about the effects of the sound of a parametric speaker at the shorter distance between parametric speakers the human body. Therefore, we studied this effect on physiological functions and task performance. Nine male subjects participated in this study. They completed three consecutive sessions: a 20-minute quiet period as a baseline, a 30-minute mental task period with general speakers or parametric speakers, and a 20-minute recovery period. We measured electrocardiogram (ECG photoplethysmogram (PTG, electroencephalogram (EEG, systolic and diastolic blood pressure. Four experiments, one with a speaker condition (general speaker and parametric speaker, the other with a distance condition (0.3 m and 1.0 m, were conducted respectively at the same time of day on separate days. To examine the effects of the speaker and distance, three-way repeated measures ANOVA (speaker factor x distance factor x time factor were conducted. In conclusion, we found that the physiological responses were not significantly different between the speaker condition and the distance condition. Meanwhile, it was shown that the physiological burdens increased with progress in time independently of speaker condition and distance condition. In summary, the effects of the parametric speaker at the 2.6 m distance were not obtained at the distance of 1 m or less.

  14. Understanding speaker attitudes from prosody by adults with Parkinson's disease.

    Science.gov (United States)

    Monetta, Laura; Cheang, Henry S; Pell, Marc D

    2008-09-01

    The ability to interpret vocal (prosodic) cues during social interactions can be disrupted by Parkinson's disease, with notable effects on how emotions are understood from speech. This study investigated whether PD patients who have emotional prosody deficits exhibit further difficulties decoding the attitude of a speaker from prosody. Vocally inflected but semantically nonsensical 'pseudo-utterances' were presented to listener groups with and without PD in two separate rating tasks. Task I required participants to rate how confident a speaker sounded from their voice and Task 2 required listeners to rate how polite the speaker sounded for a comparable set of pseudo-utterances. The results showed that PD patients were significantly less able than HC participants to use prosodic cues to differentiate intended levels of speaker confidence in speech, although the patients could accurately detect the politelimpolite attitude of the speaker from prosody in most cases. Our data suggest that many PD patients fail to use vocal cues to effectively infer a speaker's emotions as well as certain attitudes in speech such as confidence, consistent with the idea that the basal ganglia play a role in the meaningful processing of prosodic sequences in spoken language (Pell & Leonard, 2003).

  15. Presenting and processing information in background noise: A combined speaker-listener perspective.

    Science.gov (United States)

    Bockstael, Annelies; Samyn, Laurie; Corthals, Paul; Botteldooren, Dick

    2018-01-01

    Transferring information orally in background noise is challenging, for both speaker and listener. Successful transfer depends on complex interaction between characteristics related to listener, speaker, task, background noise, and context. To fully assess the underlying real-life mechanisms, experimental design has to mimic this complex reality. In the current study, the effects of different types of background noise have been studied in an ecologically valid test design. Documentary-style information had to be presented by the speaker and simultaneously acquired by the listener in four conditions: quiet, unintelligible multitalker babble, fluctuating city street noise, and little varying highway noise. For both speaker and listener, the primary task was to focus on the content that had to be transferred. In addition, for the speakers, the occurrence of hesitation phenomena was assessed. The listener had to perform an additional secondary task to address listening effort. For the listener the condition with the most eventful background noise, i.e., fluctuating city street noise, appeared to be the most difficult with markedly longer duration of the secondary task. In the same fluctuating background noise, speech appeared to be less disfluent, suggesting a higher level of concentration from the speaker's side.

  16. Speaker diarization system on the 2007 NIST rich transcription meeting recognition evaluation

    Science.gov (United States)

    Sun, Hanwu; Nwe, Tin Lay; Koh, Eugene Chin Wei; Bin, Ma; Li, Haizhou

    2007-09-01

    This paper presents a speaker diarization system developed at the Institute for Infocomm Research (I2R) for NIST Rich Transcription 2007 (RT-07) evaluation task. We describe in details our primary approaches for the speaker diarization on the Multiple Distant Microphones (MDM) conditions in conference room scenario. Our proposed system consists of six modules: 1). Least-mean squared (NLMS) adaptive filter for the speaker direction estimate via Time Difference of Arrival (TDOA), 2). An initial speaker clustering via two-stage TDOA histogram distribution quantization approach, 3). Multiple microphone speaker data alignment via GCC-PHAT Time Delay Estimate (TDE) among all the distant microphone channel signals, 4). A speaker clustering algorithm based on GMM modeling approach, 5). Non-speech removal via speech/non-speech verification mechanism and, 6). Silence removal via "Double-Layer Windowing"(DLW) method. We achieves error rate of 31.02% on the 2006 Spring (RT-06s) MDM evaluation task and a competitive overall error rate of 15.32% for the NIST Rich Transcription 2007 (RT-07) MDM evaluation task.

  17. Quantile Acoustic Vectors vs. MFCC Applied to Speaker Verification

    Directory of Open Access Journals (Sweden)

    Mayorga-Ortiz Pedro

    2014-02-01

    Full Text Available In this paper we describe speaker and command recognition related experiments, through quantile vectors and Gaussian Mixture Modelling (GMM. Over the past several years GMM and MFCC have become two of the dominant approaches for modelling speaker and speech recognition applications. However, memory and computational costs are important drawbacks, because autonomous systems suffer processing and power consumption constraints; thus, having a good trade-off between accuracy and computational requirements is mandatory. We decided to explore another approach (quantile vectors in several tasks and a comparison with MFCC was made. Quantile acoustic vectors are proposed for speaker verification and command recognition tasks and the results showed very good recognition efficiency. This method offered a good trade-off between computation times, characteristics vector complexity and overall achieved efficiency.

  18. Tipología de productores de ganado bovino en la región indígena XIV Tulijá-tseltal-chol de Chiapas, México

    Directory of Open Access Journals (Sweden)

    Jorge Antonio Velázquez Avendaño

    2015-01-01

    Full Text Available Con el objetivo de evaluar la tipología de productores en la región socio-económica XIV Tulijá-Tseltal-Chol en el estado de Chiapas, México se hizo un estudio con carácter regional de los sistemas productivos. Se u tilizaron entrevistas directas con productores (317 de las unidades de producción agropecuaria, haciendo recorridos por l os municipios de esta región que se caracterizan por estar formadas de etnias de origen mayense (Tseltales, Choles y Tsotsiles, con una fuerte presencia de mestizos y de descendientes de europeos. Se exploraron 48 variables que fueron preseleccionadas quedando 11 par a su análisis por medio del análisis multivariado. Los resultados muestran que se pueden diferenciar cuatro tipos de productores que comparten actividades productivas, como dar sa les minerales y participar en campañas zoosanitarias, pero, diferentes en otros aspectos como el nivel educativo. Otro aspecto compartido es el bajo nivel de desarrollo tecnológico que parece demostrar la escasa promoción de su uso que podría mejorar la capacidad productiva y que en apa riencia parece estar limitado por la cantidad de terreno. Se concluye que la tipología descrita es factible de utilizarse como clasificación para comprender e im pulsar que los recursos que se destinen para apuntalar la producción regional deben contener políticas diferenciada s, y a su vez generar tecnología moderna apropiada, de real aplicación a las condiciones geo-ecológicas de la regi ón, así como la promoción del uso de la misma que reconozca la biodiversidad y la producción sustentable como parte de las estrategias de desarrollo.

  19. The AMI speaker diarization system for NIST RT06s meeting data

    NARCIS (Netherlands)

    Leeuwen, D.A. van; Huijbregts, Marijn

    2006-01-01

    We describe the systems submitted to the NIST RT06s evaluation for the Speech Activity Detection (SAD) and Speaker Diarization (SPKR) tasks. For speech activity detection, a new analysis methodology is presented that generalizes the Detection Erorr Tradeoff analysis commonly used in speaker

  20. The AMI speaker diarization system for NIST RT06s meeting data

    NARCIS (Netherlands)

    van Leeuwen, David A.; Huijbregts, M.A.H.

    2007-01-01

    We describe the systems submitted to the NIST RT06s evaluation for the Speech Activity Detection (SAD) and Speaker Diarization (SPKR) tasks. For speech activity detection, a new analysis methodology is presented that generalizes the Detection Erorr Tradeoff analysis commonly used in speaker detection

  1. Cross-cultural differences in adult Theory of Mind abilities: A comparison of native-English speakers and native-Chinese speakers on the Self/Other Differentiation task.

    Science.gov (United States)

    Bradford, Elisabeth Ef; Jentzsch, Ines; Gomez, Juan-Carlos; Chen, Yulu; Zhang, Da; Su, Yanjie

    2018-02-01

    Theory of Mind (ToM) refers to the ability to compute and attribute mental states to ourselves and other people. It is currently unclear whether ToM abilities are universal or whether they can be culturally influenced. To address this question, this research explored potential differences in engagement of ToM processes between two different cultures, Western (individualist) and Chinese (collectivist), using a sample of healthy adults. Participants completed a computerised false-belief task, in which they attributed beliefs to either themselves or another person, in a matched design, allowing direct comparison between "Self"- and "Other"-oriented conditions. Results revealed that both native-English speakers and native-Chinese individuals responded significantly faster to self-oriented than other-oriented questions. Results also showed that when a trial required a "perspective-shift," participants from both cultures were slower to shift from Self-to-Other than from Other-to-Self. Results indicate that despite differences in collectivism scores, culture does not influence task performance, with similar results found for both Western and non-Western participants, suggesting core and potentially universal similarities in the ToM mechanism across these two cultures.

  2. Estrategia audiovisual de comunicación política en la Selva en Chiapas: la experiencia de los comunicadores tseltales Mariano Estrada y Arturo Pérez

    Directory of Open Access Journals (Sweden)

    Delmar Ulises Méndez Gómez

    2018-01-01

    Full Text Available En México, la formación de comunicadores que desarrollan su quehacer en las comunidades y se vinculan con los pueblos originarios a los que pertenecen, ha sido importante en las últimas décadas porque ha permitido la ampliación y circulación de la voz de quienes han sido invisibilizados y negados, así como el ejercicio de su derecho a la comunicación e información. En este texto se examina el trabajo audiovisual de los comunicadores tseltales Mariano Estrada y Arturo Pérez, originarios de la región Selva de Chiapas, su proceso de formación, el sentido político de su trabajo y sus estrategias de comunicación.

  3. Recognition of speaker-dependent continuous speech with KEAL

    Science.gov (United States)

    Mercier, G.; Bigorgne, D.; Miclet, L.; Le Guennec, L.; Querre, M.

    1989-04-01

    A description of the speaker-dependent continuous speech recognition system KEAL is given. An unknown utterance, is recognized by means of the followng procedures: acoustic analysis, phonetic segmentation and identification, word and sentence analysis. The combination of feature-based, speaker-independent coarse phonetic segmentation with speaker-dependent statistical classification techniques is one of the main design features of the acoustic-phonetic decoder. The lexical access component is essentially based on a statistical dynamic programming technique which aims at matching a phonemic lexical entry containing various phonological forms, against a phonetic lattice. Sentence recognition is achieved by use of a context-free grammar and a parsing algorithm derived from Earley's parser. A speaker adaptation module allows some of the system parameters to be adjusted by matching known utterances with their acoustical representation. The task to be performed, described by its vocabulary and its grammar, is given as a parameter of the system. Continuously spoken sentences extracted from a 'pseudo-Logo' language are analyzed and results are presented.

  4. Speaker Linking and Applications using Non-Parametric Hashing Methods

    Science.gov (United States)

    2016-09-08

    nonparametric estimate of a multivariate density function,” The Annals of Math- ematical Statistics , vol. 36, no. 3, pp. 1049–1051, 1965. [9] E. A. Patrick...Speaker Linking and Applications using Non-Parametric Hashing Methods† Douglas Sturim and William M. Campbell MIT Lincoln Laboratory, Lexington, MA...with many approaches [1, 2]. For this paper, we focus on using i-vectors [2], but the methods apply to any embedding. For the task of speaker QBE and

  5. Direct Speaker Gaze Promotes Trust in Truth-Ambiguous Statements.

    Science.gov (United States)

    Kreysa, Helene; Kessler, Luise; Schweinberger, Stefan R

    2016-01-01

    A speaker's gaze behaviour can provide perceivers with a multitude of cues which are relevant for communication, thus constituting an important non-verbal interaction channel. The present study investigated whether direct eye gaze of a speaker affects the likelihood of listeners believing truth-ambiguous statements. Participants were presented with videos in which a speaker produced such statements with either direct or averted gaze. The statements were selected through a rating study to ensure that participants were unlikely to know a-priori whether they were true or not (e.g., "sniffer dogs cannot smell the difference between identical twins"). Participants indicated in a forced-choice task whether or not they believed each statement. We found that participants were more likely to believe statements by a speaker looking at them directly, compared to a speaker with averted gaze. Moreover, when participants disagreed with a statement, they were slower to do so when the statement was uttered with direct (compared to averted) gaze, suggesting that the process of rejecting a statement as untrue may be inhibited when that statement is accompanied by direct gaze.

  6. Optimization of multilayer neural network parameters for speaker recognition

    Science.gov (United States)

    Tovarek, Jaromir; Partila, Pavol; Rozhon, Jan; Voznak, Miroslav; Skapa, Jan; Uhrin, Dominik; Chmelikova, Zdenka

    2016-05-01

    This article discusses the impact of multilayer neural network parameters for speaker identification. The main task of speaker identification is to find a specific person in the known set of speakers. It means that the voice of an unknown speaker (wanted person) belongs to a group of reference speakers from the voice database. One of the requests was to develop the text-independent system, which means to classify wanted person regardless of content and language. Multilayer neural network has been used for speaker identification in this research. Artificial neural network (ANN) needs to set parameters like activation function of neurons, steepness of activation functions, learning rate, the maximum number of iterations and a number of neurons in the hidden and output layers. ANN accuracy and validation time are directly influenced by the parameter settings. Different roles require different settings. Identification accuracy and ANN validation time were evaluated with the same input data but different parameter settings. The goal was to find parameters for the neural network with the highest precision and shortest validation time. Input data of neural networks are a Mel-frequency cepstral coefficients (MFCC). These parameters describe the properties of the vocal tract. Audio samples were recorded for all speakers in a laboratory environment. Training, testing and validation data set were split into 70, 15 and 15 %. The result of the research described in this article is different parameter setting for the multilayer neural network for four speakers.

  7. Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

    Directory of Open Access Journals (Sweden)

    Umit H. Yapanel

    2008-08-01

    Full Text Available A proven method for achieving effective automatic speech recognition (ASR due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN, despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN, where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER by 24%, and (ii for a diverse noisy speech task (SPINE 2, where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.

  8. Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

    Directory of Open Access Journals (Sweden)

    Yapanel UmitH

    2008-01-01

    Full Text Available A proven method for achieving effective automatic speech recognition (ASR due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN, despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN, where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER by 24%, and (ii for a diverse noisy speech task (SPINE 2, where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.

  9. A Method to Integrate GMM, SVM and DTW for Speaker Recognition

    Directory of Open Access Journals (Sweden)

    Ing-Jr Ding

    2014-01-01

    Full Text Available This paper develops an effective and efficient scheme to integrate Gaussian mixture model (GMM, support vector machine (SVM, and dynamic time wrapping (DTW for automatic speaker recognition. GMM and SVM are two popular classifiers for speaker recognition applications. DTW is a fast and simple template matching method, and it is frequently seen in applications of speech recognition. In this work, DTW does not play a role to perform speech recognition, and it will be employed to be a verifier for verification of valid speakers. The proposed combination scheme of GMM, SVM and DTW, called SVMGMM-DTW, for speaker recognition in this study is a two-phase verification process task including GMM-SVM verification of the first phase and DTW verification of the second phase. By providing a double check to verify the identity of a speaker, it will be difficult for imposters to try to pass the security protection; therefore, the safety degree of speaker recognition systems will be largely increased. A series of experiments designed on door access control applications demonstrated that the superiority of the developed SVMGMM-DTW on speaker recognition accuracy.

  10. Speaker segmentation and clustering

    OpenAIRE

    Kotti, M; Moschou, V; Kotropoulos, C

    2008-01-01

    07.08.13 KB. Ok to add the accepted version to Spiral, Elsevier says ok whlile mandate not enforced. This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker...

  11. Musical Sophistication and the Effect of Complexity on Auditory Discrimination in Finnish Speakers

    Science.gov (United States)

    Dawson, Caitlin; Aalto, Daniel; Šimko, Juraj; Vainio, Martti; Tervaniemi, Mari

    2017-01-01

    Musical experiences and native language are both known to affect auditory processing. The present work aims to disentangle the influences of native language phonology and musicality on behavioral and subcortical sound feature processing in a population of musically diverse Finnish speakers as well as to investigate the specificity of enhancement from musical training. Finnish speakers are highly sensitive to duration cues since in Finnish, vowel and consonant duration determine word meaning. Using a correlational approach with a set of behavioral sound feature discrimination tasks, brainstem recordings, and a musical sophistication questionnaire, we find no evidence for an association between musical sophistication and more precise duration processing in Finnish speakers either in the auditory brainstem response or in behavioral tasks, but they do show an enhanced pitch discrimination compared to Finnish speakers with less musical experience and show greater duration modulation in a complex task. These results are consistent with a ceiling effect set for certain sound features which corresponds to the phonology of the native language, leaving an opportunity for music experience-based enhancement of sound features not explicitly encoded in the language (such as pitch, which is not explicitly encoded in Finnish). Finally, the pattern of duration modulation in more musically sophisticated Finnish speakers suggests integrated feature processing for greater efficiency in a real world musical situation. These results have implications for research into the specificity of plasticity in the auditory system as well as to the effects of interaction of specific language features with musical experiences. PMID:28450829

  12. Musical Sophistication and the Effect of Complexity on Auditory Discrimination in Finnish Speakers.

    Science.gov (United States)

    Dawson, Caitlin; Aalto, Daniel; Šimko, Juraj; Vainio, Martti; Tervaniemi, Mari

    2017-01-01

    Musical experiences and native language are both known to affect auditory processing. The present work aims to disentangle the influences of native language phonology and musicality on behavioral and subcortical sound feature processing in a population of musically diverse Finnish speakers as well as to investigate the specificity of enhancement from musical training. Finnish speakers are highly sensitive to duration cues since in Finnish, vowel and consonant duration determine word meaning. Using a correlational approach with a set of behavioral sound feature discrimination tasks, brainstem recordings, and a musical sophistication questionnaire, we find no evidence for an association between musical sophistication and more precise duration processing in Finnish speakers either in the auditory brainstem response or in behavioral tasks, but they do show an enhanced pitch discrimination compared to Finnish speakers with less musical experience and show greater duration modulation in a complex task. These results are consistent with a ceiling effect set for certain sound features which corresponds to the phonology of the native language, leaving an opportunity for music experience-based enhancement of sound features not explicitly encoded in the language (such as pitch, which is not explicitly encoded in Finnish). Finally, the pattern of duration modulation in more musically sophisticated Finnish speakers suggests integrated feature processing for greater efficiency in a real world musical situation. These results have implications for research into the specificity of plasticity in the auditory system as well as to the effects of interaction of specific language features with musical experiences.

  13. Direct Speaker Gaze Promotes Trust in Truth-Ambiguous Statements.

    Directory of Open Access Journals (Sweden)

    Helene Kreysa

    Full Text Available A speaker's gaze behaviour can provide perceivers with a multitude of cues which are relevant for communication, thus constituting an important non-verbal interaction channel. The present study investigated whether direct eye gaze of a speaker affects the likelihood of listeners believing truth-ambiguous statements. Participants were presented with videos in which a speaker produced such statements with either direct or averted gaze. The statements were selected through a rating study to ensure that participants were unlikely to know a-priori whether they were true or not (e.g., "sniffer dogs cannot smell the difference between identical twins". Participants indicated in a forced-choice task whether or not they believed each statement. We found that participants were more likely to believe statements by a speaker looking at them directly, compared to a speaker with averted gaze. Moreover, when participants disagreed with a statement, they were slower to do so when the statement was uttered with direct (compared to averted gaze, suggesting that the process of rejecting a statement as untrue may be inhibited when that statement is accompanied by direct gaze.

  14. Ordered short-term memory differs in signers and speakers: Implications for models of short-term memory

    OpenAIRE

    Bavelier, Daphne; Newport, Elissa L.; Hall, Matt; Supalla, Ted; Boutla, Mrim

    2008-01-01

    Capacity limits in linguistic short-term memory (STM) are typically measured with forward span tasks in which participants are asked to recall lists of words in the order presented. Using such tasks, native signers of American Sign Language (ASL) exhibit smaller spans than native speakers (Boutla, Supalla, Newport, & Bavelier, 2004). Here, we test the hypothesis that this population difference reflects differences in the way speakers and signers maintain temporal order information in short-te...

  15. On the status of the phoneme /b/ in heritage speakers of Spanish

    Directory of Open Access Journals (Sweden)

    Rajiv Rao

    2014-12-01

    Full Text Available This study examined intervocalic productions of /b/ in heritage speakers of Spanish residing in the United States. Eleven speakers were divided into two groups based on at-home exposure to Spanish, and subsequently completed reading and picture description tasks eliciting productions of intervocalic /b/ showing variation in word position, syllable stress, and orthography. The mixed-effects results revealed that while both groups manifested three clear phonetic categories, the group with more at-home experience followed a phonological rule of spirantization to a pure approximant to a higher degree across the data. The less-target-like stop and tense approximant allophones appeared more in the reading task, in stressed syllables, and in the less experienced group. Word boundary position interacted with group and task to induce less-target-like forms as well. The findings emphasize the influence of language background, linguistic context, orthography, and cognitive demands of tasks in accounting for heritage phonetics and phonology.

  16. Does a Speaking Task Affect Second Language Comprehensibility?

    Science.gov (United States)

    Crowther, Dustin; Trofimovich, Pavel; Isaacs, Talia; Saito, Kazuya

    2015-01-01

    The current study investigated task effects on listener perception of second language (L2) comprehensibility (ease of understanding). Sixty university-level adult speakers of English from 4 first language (L1) backgrounds (Chinese, Romance, Hindi, Farsi), with 15 speakers per group, were recorded performing 2 tasks (IELTS long-turn speaking task…

  17. Speaker Authentication

    CERN Document Server

    Li, Qi (Peter)

    2012-01-01

    This book focuses on use of voice as a biometric measure for personal authentication. In particular, "Speaker Recognition" covers two approaches in speaker authentication: speaker verification (SV) and verbal information verification (VIV). The SV approach attempts to verify a speaker’s identity based on his/her voice characteristics while the VIV approach validates a speaker’s identity through verification of the content of his/her utterance(s). SV and VIV can be combined for new applications. This is still a new research topic with significant potential applications. The book provides with a broad overview of the recent advances in speaker authentication while giving enough attention to advanced and useful algorithms and techniques. It also provides a step by step introduction to the current state of the speaker authentication technology, from the fundamental concepts to advanced algorithms. We will also present major design methodologies and share our experience in developing real and successful speake...

  18. Comparing headphone and speaker effects on simulated driving.

    Science.gov (United States)

    Nelson, T M; Nilsson, T H

    1990-12-01

    Twelve persons drove for three hours in an automobile simulator while listening to music at sound level 63dB over stereo headphones during one session and from a dashboard speaker during another session. They were required to steer a mountain highway, maintain a certain indicated speed, shift gears, and respond to occasional hazards. Steering and speed control were dependent on visual cues. The need to shift and the hazards were indicated by sound and vibration effects. With the headphones, the driver's average reaction time for the most complex task presented--shifting gears--was about one-third second longer than with the speaker. The use of headphones did not delay the development of subjective fatigue.

  19. The effects of L2 proficiency level on the processing of wh-questions among Dutch second language speakers of English

    NARCIS (Netherlands)

    Jackson, C.N.; Hell, J.G. van

    2011-01-01

    Using a self-paced reading task, the present study explores how Dutch-English L2 speakers parse English wh-subject-extractions and wh-object-extractions. Results suggest that English native speakers and highly-proficient Dutch–English L2 speakers do not always exhibit measurable signs of on-line

  20. A fundamental residue pitch perception bias for tone language speakers

    Science.gov (United States)

    Petitti, Elizabeth

    A complex tone composed of only higher-order harmonics typically elicits a pitch percept equivalent to the tone's missing fundamental frequency (f0). When judging the direction of residue pitch change between two such tones, however, listeners may have completely opposite perceptual experiences depending on whether they are biased to perceive changes based on the overall spectrum or the missing f0 (harmonic spacing). Individual differences in residue pitch change judgments are reliable and have been associated with musical experience and functional neuroanatomy. Tone languages put greater pitch processing demands on their speakers than non-tone languages, and we investigated whether these lifelong differences in linguistic pitch processing affect listeners' bias for residue pitch. We asked native tone language speakers and native English speakers to perform a pitch judgment task for two tones with missing fundamental frequencies. Given tone pairs with ambiguous pitch changes, listeners were asked to judge the direction of pitch change, where the direction of their response indicated whether they attended to the overall spectrum (exhibiting a spectral bias) or the missing f0 (exhibiting a fundamental bias). We found that tone language speakers are significantly more likely to perceive pitch changes based on the missing f0 than English speakers. These results suggest that tone-language speakers' privileged experience with linguistic pitch fundamentally tunes their basic auditory processing.

  1. Speech variability effects on recognition accuracy associated with concurrent task performance by pilots

    Science.gov (United States)

    Simpson, C. A.

    1985-01-01

    In the present study of the responses of pairs of pilots to aircraft warning classification tasks using an isolated word, speaker-dependent speech recognition system, the induced stress was manipulated by means of different scoring procedures for the classification task and by the inclusion of a competitive manual control task. Both speech patterns and recognition accuracy were analyzed, and recognition errors were recorded by type for an isolated word speaker-dependent system and by an offline technique for a connected word speaker-dependent system. While errors increased with task loading for the isolated word system, there was no such effect for task loading in the case of the connected word system.

  2. The Acquisition of English Focus Marking by Non-Native Speakers

    Science.gov (United States)

    Baker, Rachel Elizabeth

    This dissertation examines Mandarin and Korean speakers' acquisition of English focus marking, which is realized by accenting particular words within a focused constituent. It is important for non-native speakers to learn how accent placement relates to focus in English because appropriate accent placement and realization makes a learner's English more native-like and easier to understand. Such knowledge may also improve their English comprehension skills. In this study, 20 native English speakers, 20 native Mandarin speakers, and 20 native Korean speakers participated in four experiments: (1) a production experiment, in which they were recorded reading the answers to questions, (2) a perception experiment, in which they were asked to determine which word in a recording was the last prominent word, (3) an understanding experiment, in which they were asked whether the answers in recorded question-answer pairs had context-appropriate prosody, and (4) an accent placement experiment, in which they were asked which word they would make prominent in a particular context. Finally, a new group of native English speakers listened to utterances produced in the production experiment, and determined whether the prosody of each utterance was appropriate for its context. The results of the five experiments support a novel predictive model for second language prosodic focus marking acquisition. This model holds that both transfer of linguistic features from a learner's native language (L1) and features of their second language (L2) affect learners' acquisition of prosodic focus marking. As a result, the model includes two complementary components: the Transfer Component and the L2 Challenge Component. The Transfer Component predicts that prosodic structures in the L2 will be more easily acquired by language learners that have similar structures in their L1 than those who do not, even if there are differences between the L1 and L2 in how the structures are realized. The L2

  3. Variation among heritage speakers: Sequential vs. simultaneous bilinguals

    Directory of Open Access Journals (Sweden)

    Teresa Lee

    2013-08-01

    Full Text Available This study examines the differences in the grammatical knowledge of two types of heritage speakers of Korean. Early simultaneous bilinguals are exposed to both English and the heritage language from birth, whereas early sequential bilinguals are exposed to the heritage language first and then to English upon schooling. A listening comprehension task involving relative clauses was conducted with 51 beginning-level Korean heritage speakers. The results showed that the early sequential bilinguals exhibited much more accurate knowledge than the early simultaneous bilinguals, who lacked rudimentary knowledge of Korean relative clauses. Drawing on the findings of adult and child Korean L1 data on the acquisition of relative clauses, the performance of each group is discussed with respect to attrition and incomplete acquisition of the heritage language.

  4. Vocal caricatures reveal signatures of speaker identity

    Science.gov (United States)

    López, Sabrina; Riera, Pablo; Assaneo, María Florencia; Eguía, Manuel; Sigman, Mariano; Trevisan, Marcos A.

    2013-12-01

    What are the features that impersonators select to elicit a speaker's identity? We built a voice database of public figures (targets) and imitations produced by professional impersonators. They produced one imitation based on their memory of the target (caricature) and another one after listening to the target audio (replica). A set of naive participants then judged identity and similarity of pairs of voices. Identity was better evoked by the caricatures and replicas were perceived to be closer to the targets in terms of voice similarity. We used this data to map relevant acoustic dimensions for each task. Our results indicate that speaker identity is mainly associated with vocal tract features, while perception of voice similarity is related to vocal folds parameters. We therefore show the way in which acoustic caricatures emphasize identity features at the cost of loosing similarity, which allows drawing an analogy with caricatures in the visual space.

  5. Visual and auditory digit-span performance in native and nonnative speakers

    NARCIS (Netherlands)

    Olsthoorn, N.M.; Andringa, S.; Hulstijn, J.H.

    2014-01-01

    We compared 121 native and 114 non-native speakers of Dutch (with 35 different first languages) on four digit-span tasks, varying modality (visual/auditory) and direction (forward/backward). An interaction was observed between nativeness and modality, such that, while natives performed better than

  6. Speaker's voice as a memory cue.

    Science.gov (United States)

    Campeanu, Sandra; Craik, Fergus I M; Alain, Claude

    2015-02-01

    Speaker's voice occupies a central role as the cornerstone of auditory social interaction. Here, we review the evidence suggesting that speaker's voice constitutes an integral context cue in auditory memory. Investigation into the nature of voice representation as a memory cue is essential to understanding auditory memory and the neural correlates which underlie it. Evidence from behavioral and electrophysiological studies suggest that while specific voice reinstatement (i.e., same speaker) often appears to facilitate word memory even without attention to voice at study, the presence of a partial benefit of similar voices between study and test is less clear. In terms of explicit memory experiments utilizing unfamiliar voices, encoding methods appear to play a pivotal role. Voice congruency effects have been found when voice is specifically attended at study (i.e., when relatively shallow, perceptual encoding takes place). These behavioral findings coincide with neural indices of memory performance such as the parietal old/new recollection effect and the late right frontal effect. The former distinguishes between correctly identified old words and correctly identified new words, and reflects voice congruency only when voice is attended at study. Characterization of the latter likely depends upon voice memory, rather than word memory. There is also evidence to suggest that voice effects can be found in implicit memory paradigms. However, the presence of voice effects appears to depend greatly on the task employed. Using a word identification task, perceptual similarity between study and test conditions is, like for explicit memory tests, crucial. In addition, the type of noise employed appears to have a differential effect. While voice effects have been observed when white noise is used at both study and test, using multi-talker babble does not confer the same results. In terms of neuroimaging research modulations, characterization of an implicit memory effect

  7. Processing advantage for emotional words in bilingual speakers.

    Science.gov (United States)

    Ponari, Marta; Rodríguez-Cuadrado, Sara; Vinson, David; Fox, Neil; Costa, Albert; Vigliocco, Gabriella

    2015-10-01

    Effects of emotion on word processing are well established in monolingual speakers. However, studies that have assessed whether affective features of words undergo the same processing in a native and nonnative language have provided mixed results: Studies that have found differences between native language (L1) and second language (L2) processing attributed the difference to the fact that L2 learned late in life would not be processed affectively, because affective associations are established during childhood. Other studies suggest that adult learners show similar effects of emotional features in L1 and L2. Differences in affective processing of L2 words can be linked to age and context of learning, proficiency, language dominance, and degree of similarity between L2 and L1. Here, in a lexical decision task on tightly matched negative, positive, and neutral words, highly proficient English speakers from typologically different L1s showed the same facilitation in processing emotionally valenced words as native English speakers, regardless of their L1, the age of English acquisition, or the frequency and context of English use. (c) 2015 APA, all rights reserved).

  8. A general auditory bias for handling speaker variability in speech? Evidence in humans and songbirds

    Directory of Open Access Journals (Sweden)

    Buddhamas eKriengwatana

    2015-08-01

    Full Text Available Different speakers produce the same speech sound differently, yet listeners are still able to reliably identify the speech sound. How listeners can adjust their perception to compensate for speaker differences in speech, and whether these compensatory processes are unique only to humans, is still not fully understood. In this study we compare the ability of humans and zebra finches to categorize vowels despite speaker variation in speech in order to test the hypothesis that accommodating speaker and gender differences in isolated vowels can be achieved without prior experience with speaker-related variability. Using a behavioural Go/No-go task and identical stimuli, we compared Australian English adults’ (naïve to Dutch and zebra finches’ (naïve to human speech ability to categorize /ɪ/ and /ɛ/ vowels of an novel Dutch speaker after learning to discriminate those vowels from only one other speaker. Experiment 1 and 2 presented vowels of two speakers interspersed or blocked, respectively. Results demonstrate that categorization of vowels is possible without prior exposure to speaker-related variability in speech for zebra finches, and in non-native vowel categories for humans. Therefore, this study is the first to provide evidence for what might be a species-shared auditory bias that may supersede speaker-related information during vowel categorization. It additionally provides behavioural evidence contradicting a prior hypothesis that accommodation of speaker differences is achieved via the use of formant ratios. Therefore, investigations of alternative accounts of vowel normalization that incorporate the possibility of an auditory bias for disregarding inter-speaker variability are warranted.

  9. Working with Speakers.

    Science.gov (United States)

    Pestel, Ann

    1989-01-01

    The author discusses working with speakers from business and industry to present career information at the secondary level. Advice for speakers is presented, as well as tips for program coordinators. (CH)

  10. On the optimization of a mixed speaker array in an enclosed space using the virtual-speaker weighting method

    Science.gov (United States)

    Peng, Bo; Zheng, Sifa; Liao, Xiangning; Lian, Xiaomin

    2018-03-01

    In order to achieve sound field reproduction in a wide frequency band, multiple-type speakers are used. The reproduction accuracy is not only affected by the signals sent to the speakers, but also depends on the position and the number of each type of speaker. The method of optimizing a mixed speaker array is investigated in this paper. A virtual-speaker weighting method is proposed to optimize both the position and the number of each type of speaker. In this method, a virtual-speaker model is proposed to quantify the increment of controllability of the speaker array when the speaker number increases. While optimizing a mixed speaker array, the gain of the virtual-speaker transfer function is used to determine the priority orders of the candidate speaker positions, which optimizes the position of each type of speaker. Then the relative gain of the virtual-speaker transfer function is used to determine whether the speakers are redundant, which optimizes the number of each type of speaker. Finally the virtual-speaker weighting method is verified by reproduction experiments of the interior sound field in a passenger car. The results validate that the optimum mixed speaker array can be obtained using the proposed method.

  11. Accent modulates access to word meaning: Evidence for a speaker-model account of spoken word recognition.

    Science.gov (United States)

    Cai, Zhenguang G; Gilbert, Rebecca A; Davis, Matthew H; Gaskell, M Gareth; Farrar, Lauren; Adler, Sarah; Rodd, Jennifer M

    2017-11-01

    Speech carries accent information relevant to determining the speaker's linguistic and social background. A series of web-based experiments demonstrate that accent cues can modulate access to word meaning. In Experiments 1-3, British participants were more likely to retrieve the American dominant meaning (e.g., hat meaning of "bonnet") in a word association task if they heard the words in an American than a British accent. In addition, results from a speeded semantic decision task (Experiment 4) and sentence comprehension task (Experiment 5) confirm that accent modulates on-line meaning retrieval such that comprehension of ambiguous words is easier when the relevant word meaning is dominant in the speaker's dialect. Critically, neutral-accent speech items, created by morphing British- and American-accented recordings, were interpreted in a similar way to accented words when embedded in a context of accented words (Experiment 2). This finding indicates that listeners do not use accent to guide meaning retrieval on a word-by-word basis; instead they use accent information to determine the dialectic identity of a speaker and then use their experience of that dialect to guide meaning access for all words spoken by that person. These results motivate a speaker-model account of spoken word recognition in which comprehenders determine key characteristics of their interlocutor and use this knowledge to guide word meaning access. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  12. Hemispheric lateralization of linguistic prosody recognition in comparison to speech and speaker recognition.

    Science.gov (United States)

    Kreitewolf, Jens; Friederici, Angela D; von Kriegstein, Katharina

    2014-11-15

    Hemispheric specialization for linguistic prosody is a controversial issue. While it is commonly assumed that linguistic prosody and emotional prosody are preferentially processed in the right hemisphere, neuropsychological work directly comparing processes of linguistic prosody and emotional prosody suggests a predominant role of the left hemisphere for linguistic prosody processing. Here, we used two functional magnetic resonance imaging (fMRI) experiments to clarify the role of left and right hemispheres in the neural processing of linguistic prosody. In the first experiment, we sought to confirm previous findings showing that linguistic prosody processing compared to other speech-related processes predominantly involves the right hemisphere. Unlike previous studies, we controlled for stimulus influences by employing a prosody and speech task using the same speech material. The second experiment was designed to investigate whether a left-hemispheric involvement in linguistic prosody processing is specific to contrasts between linguistic prosody and emotional prosody or whether it also occurs when linguistic prosody is contrasted against other non-linguistic processes (i.e., speaker recognition). Prosody and speaker tasks were performed on the same stimulus material. In both experiments, linguistic prosody processing was associated with activity in temporal, frontal, parietal and cerebellar regions. Activation in temporo-frontal regions showed differential lateralization depending on whether the control task required recognition of speech or speaker: recognition of linguistic prosody predominantly involved right temporo-frontal areas when it was contrasted against speech recognition; when contrasted against speaker recognition, recognition of linguistic prosody predominantly involved left temporo-frontal areas. The results show that linguistic prosody processing involves functions of both hemispheres and suggest that recognition of linguistic prosody is based on

  13. The Mechanism of Speech Processing in Congenital Amusia: Evidence from Mandarin Speakers

    OpenAIRE

    Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren

    2012-01-01

    Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimin...

  14. Compliment Responses of Thai and Punjabi Speakers of English in Thailand

    Science.gov (United States)

    Sachathep, Sukchai

    2014-01-01

    This variational pragmatics (VP) study investigates the similarities and differences of compliment responses (CR) between Thai and Punjabi speakers of English in Thailand, focusing on the strategies used in CR when the microsociolinguistic variables are integrated into the Discourse Completion Task (DCT). The participants were 20 Thai and 20…

  15. Measuring Cognitive Task Demands Using Dual-Task Methodology, Subjective Self-Ratings, and Expert Judgments: A Validation Study

    Science.gov (United States)

    Revesz, Andrea; Michel, Marije; Gilabert, Roger

    2016-01-01

    This study explored the usefulness of dual-task methodology, self-ratings, and expert judgments in assessing task-generated cognitive demands as a way to provide validity evidence for manipulations of task complexity. The participants were 96 students and 61 English as a second language (ESL) teachers. The students, 48 English native speakers and…

  16. Supervised and Unsupervised Speaker Adaptation in the NIST 2005 Speaker Recognition Evaluation

    National Research Council Canada - National Science Library

    Hansen, Eric G; Slyh, Raymond E; Anderson, Timothy R

    2006-01-01

    Starting in 2004, the annual NIST Speaker Recognition Evaluation (SRE) has added an optional unsupervised speaker adaptation track where test files are processed sequentially and one may update the target model...

  17. Perceptual and acoustic analysis of lexical stress in Greek speakers with dysarthria.

    Science.gov (United States)

    Papakyritsis, Ioannis; Müller, Nicole

    2014-01-01

    The study reported in this paper investigated the abilities of Greek speakers with dysarthria to signal lexical stress at the single word level. Three speakers with dysarthria and two unimpaired control participants were recorded completing a repetition task of a list of words consisting of minimal pairs of Greek disyllabic words contrasted by lexical stress location only. Fourteen listeners were asked to determine the attempted stress location for each word pair. Acoustic analyses of duration and intensity ratios, both within and across words, were undertaken to identify possible acoustic correlates of the listeners' judgments concerning stress location. Acoustic and perceptual data indicate that while each participant with dysarthria in this study had some difficulty in signaling stress unambiguously, the pattern of difficulty was different for each speaker. Further, it was found that the relationship between the listeners' judgments of stress location and the acoustic data was not conclusive.

  18. The effect on recognition memory of noise cancelling headphones in a noisy environment with native and nonnative speakers

    Directory of Open Access Journals (Sweden)

    Brett R C Molesworth

    2014-01-01

    Full Text Available Noise has the potential to impair cognitive performance. For nonnative speakers, the effect of noise on performance is more severe than their native counterparts. What remains unknown is the effectiveness of countermeasures such as noise attenuating devices in such circumstances. Therefore, the main aim of the present research was to examine the effectiveness of active noise attenuating countermeasures in the presence of simulated aircraft noise for both native and nonnative English speakers. Thirty-two participants, half native English speakers and half native German speakers completed four recognition (cued recall tasks presented in English under four different audio conditions, all in the presence of simulated aircraft noise. The results of the research indicated that in simulated aircraft noise at 65 dB(A, performance of nonnative English speakers was poorer than for native English speakers. The beneficial effects of noise cancelling headphones in improving the signal to noise ratio led to an improved performance for nonnative speakers. These results have particular importance for organizations operating in a safety-critical environment such as aviation.

  19. Encoding, rehearsal, and recall in signers and speakers: shared network but differential engagement.

    Science.gov (United States)

    Bavelier, D; Newman, A J; Mukherjee, M; Hauser, P; Kemeny, S; Braun, A; Boutla, M

    2008-10-01

    Short-term memory (STM), or the ability to hold verbal information in mind for a few seconds, is known to rely on the integrity of a frontoparietal network of areas. Here, we used functional magnetic resonance imaging to ask whether a similar network is engaged when verbal information is conveyed through a visuospatial language, American Sign Language, rather than speech. Deaf native signers and hearing native English speakers performed a verbal recall task, where they had to first encode a list of letters in memory, maintain it for a few seconds, and finally recall it in the order presented. The frontoparietal network described to mediate STM in speakers was also observed in signers, with its recruitment appearing independent of the modality of the language. This finding supports the view that signed and spoken STM rely on similar mechanisms. However, deaf signers and hearing speakers differentially engaged key structures of the frontoparietal network as the stages of STM unfold. In particular, deaf signers relied to a greater extent than hearing speakers on passive memory storage areas during encoding and maintenance, but on executive process areas during recall. This work opens new avenues for understanding similarities and differences in STM performance in signers and speakers.

  20. The relation between working memory and language comprehension in signers and speakers.

    Science.gov (United States)

    Emmorey, Karen; Giezen, Marcel R; Petrich, Jennifer A F; Spurgeon, Erin; O'Grady Farnady, Lucinda

    2017-06-01

    This study investigated the relation between linguistic and spatial working memory (WM) resources and language comprehension for signed compared to spoken language. Sign languages are both linguistic and visual-spatial, and therefore provide a unique window on modality-specific versus modality-independent contributions of WM resources to language processing. Deaf users of American Sign Language (ASL), hearing monolingual English speakers, and hearing ASL-English bilinguals completed several spatial and linguistic serial recall tasks. Additionally, their comprehension of spatial and non-spatial information in ASL and spoken English narratives was assessed. Results from the linguistic serial recall tasks revealed that the often reported advantage for speakers on linguistic short-term memory tasks does not extend to complex WM tasks with a serial recall component. For English, linguistic WM predicted retention of non-spatial information, and both linguistic and spatial WM predicted retention of spatial information. For ASL, spatial WM predicted retention of spatial (but not non-spatial) information, and linguistic WM did not predict retention of either spatial or non-spatial information. Overall, our findings argue against strong assumptions of independent domain-specific subsystems for the storage and processing of linguistic and spatial information and furthermore suggest a less important role for serial encoding in signed than spoken language comprehension. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Ordered Short-Term Memory Differs in Signers and Speakers: Implications for Models of Short-Term Memory

    Science.gov (United States)

    Bavelier, Daphne; Newport, Elissa L.; Hall, Matt; Supalla, Ted; Boutla, Mrim

    2008-01-01

    Capacity limits in linguistic short-term memory (STM) are typically measured with forward span tasks in which participants are asked to recall lists of words in the order presented. Using such tasks, native signers of American Sign Language (ASL) exhibit smaller spans than native speakers ([Boutla, M., Supalla, T., Newport, E. L., & Bavelier, D.…

  2. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment

    Science.gov (United States)

    2015-10-01

    Dallas Erik Jonsson School of Engineering & Computer Science EC32 P.O. Box 830688 Richardson, Texas 75083-0688 8. PERFORMING ORGANIZATION REPORT...87 4.3 Whisper Based Processing for ASR ………………………………………….…. 92 5.0 Task 5: SPEAKER STATE ASSESSMENT/ ENVIROMENTAL SNIFFING (SSA/ENVS...Dec. 7-10, 2014 [3] S. Amuda, H. Boril, A. Sangwan, J.H.L. Hansen, T.S. Ibiyemi, “ Engineering analysis and recognition of Nigerian English: An

  3. Speaker identification for the improvement of the security communication between law enforcement units

    Science.gov (United States)

    Tovarek, Jaromir; Partila, Pavol

    2017-05-01

    This article discusses the speaker identification for the improvement of the security communication between law enforcement units. The main task of this research was to develop the text-independent speaker identification system which can be used for real-time recognition. This system is designed for identification in the open set. It means that the unknown speaker can be anyone. Communication itself is secured, but we have to check the authorization of the communication parties. We have to decide if the unknown speaker is the authorized for the given action. The calls are recorded by IP telephony server and then these recordings are evaluate using classification If the system evaluates that the speaker is not authorized, it sends a warning message to the administrator. This message can detect, for example a stolen phone or other unusual situation. The administrator then performs the appropriate actions. Our novel proposal system uses multilayer neural network for classification and it consists of three layers (input layer, hidden layer, and output layer). A number of neurons in input layer corresponds with the length of speech features. Output layer then represents classified speakers. Artificial Neural Network classifies speech signal frame by frame, but the final decision is done over the complete record. This rule substantially increases accuracy of the classification. Input data for the neural network are a thirteen Mel-frequency cepstral coefficients, which describe the behavior of the vocal tract. These parameters are the most used for speaker recognition. Parameters for training, testing and validation were extracted from recordings of authorized users. Recording conditions for training data correspond with the real traffic of the system (sampling frequency, bit rate). The main benefit of the research is the system developed for text-independent speaker identification which is applied to secure communication between law enforcement units.

  4. When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.

    Science.gov (United States)

    Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola

    2017-11-01

    Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Similar speaker recognition using nonlinear analysis

    International Nuclear Information System (INIS)

    Seo, J.P.; Kim, M.S.; Baek, I.C.; Kwon, Y.H.; Lee, K.S.; Chang, S.W.; Yang, S.I.

    2004-01-01

    Speech features of the conventional speaker identification system, are usually obtained by linear methods in spectral space. However, these methods have the drawback that speakers with similar voices cannot be distinguished, because the characteristics of their voices are also similar in spectral space. To overcome the difficulty in linear methods, we propose to use the correlation exponent in the nonlinear space as a new feature vector for speaker identification among persons with similar voices. We show that our proposed method surprisingly reduces the error rate of speaker identification system to speakers with similar voices

  6. Improving Language Production Using Subtitled Similar Task Videos

    Science.gov (United States)

    Arslanyilmaz, Abdurrahman; Pedersen, Susan

    2010-01-01

    This study examines the effects of subtitled similar task videos on language production by nonnative speakers (NNSs) in an online task-based language learning (TBLL) environment. Ten NNS-NNS dyads collaboratively completed four communicative tasks, using an online TBLL environment specifically designed for this study and a chat tool in…

  7. Complimenting Functions by Native English Speakers and Iranian EFL Learners: A Divergence or Convergence

    Directory of Open Access Journals (Sweden)

    Ali Akbar Ansarin

    2016-01-01

    Full Text Available The study of compliment speech act has been under investigation on many occasions in recent years. In this study, an attempt is made to explore appraisals performed by native English speakers and Iranian EFL learners to find out how these two groups diverge or converge from each other with regard to complimenting patterns and norms. The participants of the study were 60 advanced Iranian EFL learners who were speaking Persian as their first language and 60 native English speakers. Through a written Discourse Completion Task comprised of eight different scenarios, compliments were analyzed with regard to topics (performance, personality, possession, and skill, functions (explicit, implicit, and opt-out, gender differences and the common positive adjectives used by two groups of native and nonnative participants. The findings suggested that native English speakers praised individuals more implicitly in comparison with Iranian EFL learners and native speakers provided opt-outs more frequently than Iranian EFL learners did. The analysis of data by Chi-square showed that gender and macro functions are independent of each other among Iranian EFL learners’ compliments while for native speakers, gender played a significant role in the distribution of appraisals. Iranian EFL learners’ complimenting patterns converge more towards those of native English speakers. Moreover, both groups favored explicit compliments. However, Iranian EFL learners were more inclined to provide explicit compliments. It can be concluded that there were more similarities rather than differences between Iranian EFL learners and native English speakers regarding compliment speech act. The results of this study can benefit researchers, teachers, material developers, and EFL learners.

  8. Lexical access in a bilingual speaker with dementia: Changes over time.

    Science.gov (United States)

    Lind, Marianne; Simonsen, Hanne Gram; Ribu, Ingeborg Sophie Bjønness; Svendsen, Bente Ailin; Svennevig, Jan; de Bot, Kees

    2018-01-01

    In this article, we explore the naming skills of a bilingual English-Norwegian speaker diagnosed with Primary Progressive Aphasia, in each of his languages across three different speech contexts: confrontation naming, semi-spontaneous narrative (picture description), and conversation, and at two points in time: 12 and 30 months post diagnosis, respectively. The results are discussed in light of two main theories of lexical retrieval in healthy, elderly speakers: the Transmission Deficit Hypothesis and the Inhibitory Deficit Theory. Our data show that, consistent with the participant's premorbid use of and proficiency in the two languages, his performance in his L2 is lower than in his L1, but this difference diminishes as the disease progresses. This is the case across the three speech contexts; however, the difference is smaller in the narrative task, where his performance is very low in both languages already at the first measurement point. Despite his word finding problems, he is able to take active part in conversation, particularly in his L1 and more so at the first measurement point. In addition to the task effect, we find effects of word class, frequency, and cognateness on his naming skills. His performance seems to support the Transmission Deficit Hypothesis. By combining different tools and methods of analysis, we get a more comprehensive picture of the impact of the dementia on the speaker's languages from an intra-individual as well as an inter-individual perspective, which may be useful in research as well as in clinical practice.

  9. Objective measures to improve the selection of training speakers in HMM-based child speech synthesis

    CSIR Research Space (South Africa)

    Govender, Avashna

    2016-12-01

    Full Text Available Building synthetic child voices is considered a difficult task due to the challenges associated with data collection. As a result, speaker adaptation in conjunction with Hidden Markov Model (HMM)-based synthesis has become prevalent in this domain...

  10. Using Reversed MFCC and IT-EM for Automatic Speaker Verification

    Directory of Open Access Journals (Sweden)

    Sheeraz Memon

    2012-01-01

    Full Text Available This paper proposes text independent automatic speaker verification system using IMFCC (Inverse/ Reverse Mel Frequency Coefficients and IT-EM (Information Theoretic Expectation Maximization. To perform speaker verification, feature extraction using Mel scale has been widely applied and has established better results. The IMFCC is based on inverse Mel-scale. The IMFCC effectively captures information available at the high frequency formants which is ignored by the MFCC. In this paper the fusion of MFCC and IMFCC at input level is proposed. GMMs (Gaussian Mixture Models based on EM (Expectation Maximization have been widely used for classification of text independent verification. However EM comes across the convergence issue. In this paper we use our proposed IT-EM which has faster convergence, to train speaker models. IT-EM uses information theory principles such as PDE (Parzen Density Estimation and KL (Kullback-Leibler divergence measure. IT-EM acclimatizes the weights, means and covariances, like EM. However, IT-EM process is not performed on feature vector sets but on a set of centroids obtained using IT (Information Theoretic metric. The IT-EM process at once diminishes divergence measure between PDE estimates of features distribution within a given class and the centroids distribution within the same class. The feature level fusion and IT-EM is tested for the task of speaker verification using NIST2001 and NIST2004. The experimental evaluation validates that MFCC/IMFCC has better results than the conventional delta/MFCC feature set. The MFCC/IMFCC feature vector size is also much smaller than the delta MFCC thus reducing the computational burden as well. IT-EM method also showed faster convergence, than the conventional EM method, and thus it leads to higher speaker recognition scores.

  11. Grammatical Planning Units during Real-Time Sentence Production in Speakers with Agrammatic Aphasia and Healthy Speakers

    Science.gov (United States)

    Lee, Jiyeon; Yoshida, Masaya; Thompson, Cynthia K.

    2015-01-01

    Purpose: Grammatical encoding (GE) is impaired in agrammatic aphasia; however, the nature of such deficits remains unclear. We examined grammatical planning units during real-time sentence production in speakers with agrammatic aphasia and control speakers, testing two competing models of GE. We queried whether speakers with agrammatic aphasia…

  12. Wh-question intonation in Peninsular Spanish: Multiple contours and the effect of task type

    Directory of Open Access Journals (Sweden)

    Nicholas C. Henriksen

    2009-06-01

    Full Text Available This paper reports on an experimental investigation of wh-question intonation in Peninsular Spanish. Speech data were collected from six León, Spain Peninsular Spanish speakers, and oral production data were elicited under two conditions: a computerized sentence reading task and an information gap task-oriented dialogue. The latter task was an adaptation of the HCRC Map Task method (cf. Anderson et al., 1991 and was designed to elicit multiple wh-question productions in an unscripted and more spontaneous speech style than the standard sentence reading task. Results indicate that four contours exist in the tonal inventory of the six speakers. The two most frequent contours were a final rise contour and a nuclear circumflex contour. Systematic task-based differences were found for four of the six speakers, indicating that sentence reading task data alone may not accurately reflect spontaneous speech tonal patterns (cf. Cruttenden, 2007; but see also Lickley, Schepman, & Ladd, 2005. The experimental findings serve to clarify a number of assumptions about the syntax-prosody interface underlying wh-question utterance signaling; they also have implications for research methods in intonation and task-based variation in laboratory phonology.

  13. Task choice and semantic interference in picture naming

    NARCIS (Netherlands)

    Piai, V.; Roelofs, A.P.A.; Schriefers, H.J.

    2015-01-01

    Evidence from dual-task performance indicates that speakers prefer not to select simultaneous responses in picture naming and another unrelated task, suggesting a response selection bottleneck in naming. In particular, when participants respond to tones with a manual response and name pictures with

  14. The 2016 NIST Speaker Recognition Evaluation

    Science.gov (United States)

    2017-08-20

    impact on system performance. Index Terms: NIST evaluation, NIST SRE, speaker detection, speaker recognition, speaker verification 1. Introduction NIST... self -reported. Second, there were two training conditions in SRE16, namely fixed and open. In the fixed training condition, par- ticipants were only

  15. Arctic Visiting Speakers Series (AVS)

    Science.gov (United States)

    Fox, S. E.; Griswold, J.

    2011-12-01

    The Arctic Visiting Speakers (AVS) Series funds researchers and other arctic experts to travel and share their knowledge in communities where they might not otherwise connect. Speakers cover a wide range of arctic research topics and can address a variety of audiences including K-12 students, graduate and undergraduate students, and the general public. Host applications are accepted on an on-going basis, depending on funding availability. Applications need to be submitted at least 1 month prior to the expected tour dates. Interested hosts can choose speakers from an online Speakers Bureau or invite a speaker of their choice. Preference is given to individuals and organizations to host speakers that reach a broad audience and the general public. AVS tours are encouraged to span several days, allowing ample time for interactions with faculty, students, local media, and community members. Applications for both domestic and international visits will be considered. Applications for international visits should involve participation of more than one host organization and must include either a US-based speaker or a US-based organization. This is a small but important program that educates the public about Arctic issues. There have been 27 tours since 2007 that have impacted communities across the globe including: Gatineau, Quebec Canada; St. Petersburg, Russia; Piscataway, New Jersey; Cordova, Alaska; Nuuk, Greenland; Elizabethtown, Pennsylvania; Oslo, Norway; Inari, Finland; Borgarnes, Iceland; San Francisco, California and Wolcott, Vermont to name a few. Tours have included lectures to K-12 schools, college and university students, tribal organizations, Boy Scout troops, science center and museum patrons, and the general public. There are approximately 300 attendees enjoying each AVS tour, roughly 4100 people have been reached since 2007. The expectations for each tour are extremely manageable. Hosts must submit a schedule of events and a tour summary to be posted online

  16. Teaching English to speakers of other languages an introduction

    CERN Document Server

    Nunan, David

    2015-01-01

    David Nunan's dynamic learner-centered teaching style has informed and inspired countless TESOL educators around the world. In this fresh, straightforward introduction to teaching English to speakers of other languages he presents teaching techniques and procedures along with the underlying theory and principles. Complex theories and research studies are explained in a clear and comprehensible, yet non-trivial, manner without trivializing them. Practical examples of how to develop teaching materials and tasks from sound principles provide rich illustrations of theoretical constructs.

  17. Hybrid Speaker Recognition Using Universal Acoustic Model

    Science.gov (United States)

    Nishimura, Jun; Kuroda, Tadahiro

    We propose a novel speaker recognition approach using a speaker-independent universal acoustic model (UAM) for sensornet applications. In sensornet applications such as “Business Microscope”, interactions among knowledge workers in an organization can be visualized by sensing face-to-face communication using wearable sensor nodes. In conventional studies, speakers are detected by comparing energy of input speech signals among the nodes. However, there are often synchronization errors among the nodes which degrade the speaker recognition performance. By focusing on property of the speaker's acoustic channel, UAM can provide robustness against the synchronization error. The overall speaker recognition accuracy is improved by combining UAM with the energy-based approach. For 0.1s speech inputs and 4 subjects, speaker recognition accuracy of 94% is achieved at the synchronization error less than 100ms.

  18. Pitch perception and production in congenital amusia: Evidence from Cantonese speakers

    OpenAIRE

    Liu, Fang; Chan, Alice H. D.; Ciocca, Valter; Roquet, Catherine; Peretz, Isabelle; Wong, Patrick C. M.

    2016-01-01

    This study investigated pitch perception and production in speech and music in individuals with congenital amusia (a disorder of musical pitch processing) who are native speakers of Cantonese, a tone language with a highly complex tonal system. Sixteen Cantonese-speaking congenital amusics and 16 controls performed a set of lexical tone perception, production, singing, and psychophysical pitch threshold tasks. Their tone production accuracy and singing proficiency were subsequently judged by ...

  19. Text-Independent Speaker Identification Using the Histogram Transform Model

    DEFF Research Database (Denmark)

    Ma, Zhanyu; Yu, Hong; Tan, Zheng-Hua

    2016-01-01

    In this paper, we propose a novel probabilistic method for the task of text-independent speaker identification (SI). In order to capture the dynamic information during SI, we design a super-MFCCs features by cascading three neighboring Mel-frequency Cepstral coefficients (MFCCs) frames together....... These super-MFCC vectors are utilized for probabilistic model training such that the speaker’s characteristics can be sufficiently captured. The probability density function (PDF) of the aforementioned super-MFCCs features is estimated by the recently proposed histogram transform (HT) method. To recedes...

  20. Multimodal Speaker Diarization.

    Science.gov (United States)

    Noulas, A; Englebienne, G; Krose, B J A

    2012-01-01

    We present a novel probabilistic framework that fuses information coming from the audio and video modality to perform speaker diarization. The proposed framework is a Dynamic Bayesian Network (DBN) that is an extension of a factorial Hidden Markov Model (fHMM) and models the people appearing in an audiovisual recording as multimodal entities that generate observations in the audio stream, the video stream, and the joint audiovisual space. The framework is very robust to different contexts, makes no assumptions about the location of the recording equipment, and does not require labeled training data as it acquires the model parameters using the Expectation Maximization (EM) algorithm. We apply the proposed model to two meeting videos and a news broadcast video, all of which come from publicly available data sets. The results acquired in speaker diarization are in favor of the proposed multimodal framework, which outperforms the single modality analysis results and improves over the state-of-the-art audio-based speaker diarization.

  1. Physiological Indices of Bilingualism: Oral–Motor Coordination and Speech Rate in Bengali–English Speakers

    Science.gov (United States)

    Chakraborty, Rahul; Goffman, Lisa; Smith, Anne

    2009-01-01

    Purpose To examine how age of immersion and proficiency in a 2nd language influence speech movement variability and speaking rate in both a 1st language and a 2nd language. Method A group of 21 Bengali–English bilingual speakers participated. Lip and jaw movements were recorded. For all 21 speakers, lip movement variability was assessed based on productions of Bengali (L1; 1st language) and English (L2; 2nd language) sentences. For analyses related to the influence of L2 proficiency on speech production processes, participants were sorted into low- (n = 7) and high-proficiency (n = 7) groups. Lip movement variability and speech rate were evaluated for both of these groups across L1 and L2 sentences. Results Surprisingly, adult bilingual speakers produced equally consistent speech movement patterns in their production of L1 and L2. When groups were sorted according to proficiency, highly proficient speakers were marginally more variable in their L1. In addition, there were some phoneme-specific effects, most markedly that segments not shared by both languages were treated differently in production. Consistent with previous studies, movement durations were longer for less proficient speakers in both L1 and L2. Interpretation In contrast to those of child learners, the speech motor systems of adult L2 speakers show a high degree of consistency. Such lack of variability presumably contributes to protracted difficulties with acquiring nativelike pronunciation in L2. The proficiency results suggest bidirectional interactions across L1 and L2, which is consistent with hypotheses regarding interference and the sharing of phonological space. A slower speech rate in less proficient speakers implies that there are increased task demands on speech production processes. PMID:18367680

  2. English Speakers Attend More Strongly than Spanish Speakers to Manner of Motion when Classifying Novel Objects and Events

    Science.gov (United States)

    Kersten, Alan W.; Meissner, Christian A.; Lechuga, Julia; Schwartz, Bennett L.; Albrechtsen, Justin S.; Iglesias, Adam

    2010-01-01

    Three experiments provide evidence that the conceptualization of moving objects and events is influenced by one's native language, consistent with linguistic relativity theory. Monolingual English speakers and bilingual Spanish/English speakers tested in an English-speaking context performed better than monolingual Spanish speakers and bilingual…

  3. Nonoccurrence of Negotiation of Meaning in Task-Based Synchronous Computer-Mediated Communication

    Science.gov (United States)

    Van Der Zwaard, Rose; Bannink, Anne

    2016-01-01

    This empirical study investigated the occurrence of meaning negotiation in an interactive synchronous computer-mediated second language (L2) environment. Sixteen dyads (N = 32) consisting of nonnative speakers (NNSs) and native speakers (NSs) of English performed 2 different tasks using videoconferencing and written chat. The data were coded and…

  4. Communication Interface for Mexican Spanish Dysarthric Speakers

    Directory of Open Access Journals (Sweden)

    Gladys Bonilla-Enriquez

    2012-03-01

    Full Text Available La disartria es una discapacidad motora del habla caracterizada por debilidad o poca coordinación de los músculos del habla. Esta condición puede ser causada por un infarto, parálisis cerebral, o por una lesión severa en el cerebro. Para mexicanos con esta condición hay muy pocas, si es que hay alguna, tecnologías de asistencia para mejorar sus habilidades sociales de interacción. En este artículo presentamos nuestros avances hacia el desarrollo de una interfazde comunicación para hablantes con disartria cuya lengua materna sea el español mexicano. La metodología propuesta depende de (1 diseño especial de un corpus de entrenamiento con voz normal y recursos limitados, (2 adaptación de usuario estándar, y (3 control de la perplejidad del modelo de lenguaje para lograr alta precisión en el Reconocimiento Automático del Habla (RAH. La interfaz permite al usuario y terapéuta el realizar actividades como adaptación dinámica de usuario, adaptación de vocabulario, y síntesis de texto a voz. Pruebas en vivo fueron realizadas con un usuario con disartria leve, logrando precisiones de 93%-95% para habla espontánea.Dysarthria is a motor speech disorder due to weakness or poor coordination of the speechmuscles. This condition can be caused by a stroke, cerebral palsy, or by a traumatic braininjury. For Mexican people with this condition there are few, if any, assistive technologies to improve their social interaction skills. In this paper we present our advances towards the development of a communication interface for dysarthric speakers whose native language is Mexican Spanish. We propose a methodology that relies on (1 special design of a training normal-speech corpus with limited resources, (2 standard speaker adaptation, and (3 control of language model perplexity, to achieve high Automatic Speech Recognition (ASR accuracy. The interface allows the user and therapist to perform tasks such as dynamic speaker adaptation, vocabulary

  5. The Speaker Gender Gap at Critical Care Conferences.

    Science.gov (United States)

    Mehta, Sangeeta; Rose, Louise; Cook, Deborah; Herridge, Margaret; Owais, Sawayra; Metaxa, Victoria

    2018-06-01

    To review women's participation as faculty at five critical care conferences over 7 years. Retrospective analysis of five scientific programs to identify the proportion of females and each speaker's profession based on conference conveners, program documents, or internet research. Three international (European Society of Intensive Care Medicine, International Symposium on Intensive Care and Emergency Medicine, Society of Critical Care Medicine) and two national (Critical Care Canada Forum, U.K. Intensive Care Society State of the Art Meeting) annual critical care conferences held between 2010 and 2016. Female faculty speakers. None. Male speakers outnumbered female speakers at all five conferences, in all 7 years. Overall, women represented 5-31% of speakers, and female physicians represented 5-26% of speakers. Nursing and allied health professional faculty represented 0-25% of speakers; in general, more than 50% of allied health professionals were women. Over the 7 years, Society of Critical Care Medicine had the highest representation of female (27% overall) and nursing/allied health professional (16-25%) speakers; notably, male physicians substantially outnumbered female physicians in all years (62-70% vs 10-19%, respectively). Women's representation on conference program committees ranged from 0% to 40%, with Society of Critical Care Medicine having the highest representation of women (26-40%). The female proportions of speakers, physician speakers, and program committee members increased significantly over time at the Society of Critical Care Medicine and U.K. Intensive Care Society State of the Art Meeting conferences (p gap at critical care conferences, with male faculty outnumbering female faculty. This gap is more marked among physician speakers than those speakers representing nursing and allied health professionals. Several organizational strategies can address this gender gap.

  6. Audiovisual perceptual learning with multiple speakers.

    Science.gov (United States)

    Mitchel, Aaron D; Gerfen, Chip; Weiss, Daniel J

    2016-05-01

    One challenge for speech perception is between-speaker variability in the acoustic parameters of speech. For example, the same phoneme (e.g. the vowel in "cat") may have substantially different acoustic properties when produced by two different speakers and yet the listener must be able to interpret these disparate stimuli as equivalent. Perceptual tuning, the use of contextual information to adjust phonemic representations, may be one mechanism that helps listeners overcome obstacles they face due to this variability during speech perception. Here we test whether visual contextual cues to speaker identity may facilitate the formation and maintenance of distributional representations for individual speakers, allowing listeners to adjust phoneme boundaries in a speaker-specific manner. We familiarized participants to an audiovisual continuum between /aba/ and /ada/. During familiarization, the "b-face" mouthed /aba/ when an ambiguous token was played, while the "D-face" mouthed /ada/. At test, the same ambiguous token was more likely to be identified as /aba/ when paired with a stilled image of the "b-face" than with an image of the "D-face." This was not the case in the control condition when the two faces were paired equally with the ambiguous token. Together, these results suggest that listeners may form speaker-specific phonemic representations using facial identity cues.

  7. Speakers' choice of frame in binary choice

    Directory of Open Access Journals (Sweden)

    Marc van Buiten

    2009-02-01

    Full Text Available A distinction is proposed between extit{recommending for} preferred choice options and extit{recommending against} non-preferred choice options. In binary choice, both recommendation modes are logically, though not psychologically, equivalent. We report empirical evidence showing that speakers recommending for preferred options predominantly select positive frames, which are less common when speakers recommend against non-preferred options. In addition, option attractiveness is shown to affect speakers' choice of frame, and adoption of recommendation mode. The results are interpreted in terms of three compatibility effects, (i extit{recommendation mode---valence framing compatibility}: speakers' preference for positive framing is enhanced under extit{recommending for} and diminished under extit{recommending against} instructions, (ii extit{option attractiveness---valence framing compatibility}: speakers' preference for positive framing is more pronounced for attractive than for unattractive options, and (iii extit{recommendation mode---option attractiveness compatibility}: speakers are more likely to adopt a extit{recommending for} approach for attractive than for unattractive binary choice pairs.

  8. Linguistic and Cognitive Effects of Bilingualism with Regional Minority Languages: A Study of Sardinian–Italian Adult Speakers

    Science.gov (United States)

    Garraffa, Maria; Obregon, Mateo; Sorace, Antonella

    2017-01-01

    This study explores the effects of bilingualism in Sardinian as a regional minority language on the linguistic competence in Italian as the dominant language and on non-linguistic cognitive abilities. Sardinian/Italian adult speakers and monolingual Italian speakers living in the same geographical area of Sardinia were compared in two kinds of tasks: (a) verbal and non-verbal cognitive tasks targeting working memory and attentional control and (b) tasks of linguistic abilities in Italian focused on the comprehension of sentences differing in grammatical complexity. Although no difference was found between bilinguals and monolinguals in the cognitive control of attention, bilinguals performed better on working memory tasks. Bilinguals with lower formal education were found to be faster at comprehension of one type of complex sentence (center embedded object relative clauses). In contrast, bilinguals and monolinguals with higher education showed comparable slower processing of complex sentences. These results show that the effects of bilingualism are modulated by type of language experience and education background: positive effects of active bilingualism on the dominant language are visible in bilinguals with lower education, whereas the effects of higher literacy in Italian obliterate those of active bilingualism in bilinguals and monolinguals with higher education. PMID:29163288

  9. Speaker-dependent Dictionary-based Speech Enhancement for Text-Dependent Speaker Verification

    DEFF Research Database (Denmark)

    Thomsen, Nicolai Bæk; Thomsen, Dennis Alexander Lehmann; Tan, Zheng-Hua

    2016-01-01

    not perform well in this setting. In this work we compare the performance of different noise reduction methods under different noise conditions in terms of speaker verification when the text is known and the system is trained on clean data (mis-matched conditions). We furthermore propose a new approach based......The problem of text-dependent speaker verification under noisy conditions is becoming ever more relevant, due to increased usage for authentication in real-world applications. Classical methods for noise reduction such as spectral subtraction and Wiener filtering introduce distortion and do...... on dictionary-based noise reduction and compare it to the baseline methods....

  10. Apology Strategy in English By Native Speaker

    Directory of Open Access Journals (Sweden)

    Mezia Kemala Sari

    2016-05-01

    Full Text Available This research discussed apology strategies in English by native speaker. This descriptive study was presented within the framework of Pragmatics based on the forms of strategies due to the coding manual as found in CCSARP (Cross-Cultural Speech Acts Realization Project.The goals of this study were to describe the apology strategies in English by native speaker and identify the influencing factors of it. Data were collected through the use of the questionnaire in the form of Discourse Completion Test, which was distributed to 30 native speakers. Data were classified based on the degree of familiarity and the social distance between speaker and hearer and then the data of native will be separated and classified by the type of strategies in coding manual. The results of this study are the pattern of apology strategies of native speaker brief with the pattern that potentially occurs IFID plus Offer of repair plus Taking on responsibility. While Alerters, Explanation and Downgrading appear with less number of percentage. Then, the factors that influence the apology utterance by native speakers are the social situation, the degree of familiarity and degree of the offence which more complicated the mistake tend to produce the most complex utterances by the speaker.

  11. Pitch Correlogram Clustering for Fast Speaker Identification

    Directory of Open Access Journals (Sweden)

    Nitin Jhanwar

    2004-12-01

    Full Text Available Gaussian mixture models (GMMs are commonly used in text-independent speaker identification systems. However, for large speaker databases, their high computational run-time limits their use in online or real-time speaker identification situations. Two-stage identification systems, in which the database is partitioned into clusters based on some proximity criteria and only a single-cluster GMM is run in every test, have been suggested in literature to speed up the identification process. However, most clustering algorithms used have shown limited success, apparently because the clustering and GMM feature spaces used are derived from similar speech characteristics. This paper presents a new clustering approach based on the concept of a pitch correlogram that captures frame-to-frame pitch variations of a speaker rather than short-time spectral characteristics like cepstral coefficient, spectral slopes, and so forth. The effectiveness of this two-stage identification process is demonstrated on the IVIE corpus of 110 speakers. The overall system achieves a run-time advantage of 500% as well as a 10% reduction of error in overall speaker identification.

  12. Who spoke when? Audio-based speaker location estimation for diarization

    NARCIS (Netherlands)

    Dadvar, M.

    2011-01-01

    Speaker diarization is the process which detects active speakers and groups those speech signals which has been uttered by the same speaker. Generally we can find two main applications for speaker diarization. Automatic Speech Recognition systems make use of the speaker homogeneous clusters to adapt

  13. Speaker-specific variability of phoneme durations

    CSIR Research Space (South Africa)

    Van Heerden, CJ

    2007-11-01

    Full Text Available The durations of phonemes varies for different speakers. To this end, the correlations between phonemes across different speakers are studied and a novel approach to predict unknown phoneme durations from the values of known phoneme durations for a...

  14. Unsupervised Speaker Change Detection for Broadcast News Segmentation

    DEFF Research Database (Denmark)

    Jørgensen, Kasper Winther; Mølgaard, Lasse Lohilahti; Hansen, Lars Kai

    2006-01-01

    This paper presents a speaker change detection system for news broadcast segmentation based on a vector quantization (VQ) approach. The system does not make any assumption about the number of speakers or speaker identity. The system uses mel frequency cepstral coefficients and change detection...

  15. A New Database for Speaker Recognition

    DEFF Research Database (Denmark)

    Feng, Ling; Hansen, Lars Kai

    2005-01-01

    In this paper we discuss properties of speech databases used for speaker recognition research and evaluation, and we characterize some popular standard databases. The paper presents a new database called ELSDSR dedicated to speaker recognition applications. The main characteristics of this database...

  16. Relationship between deficits of verbal short-term memory and auditory impairment among Cantonese speakers with aphasia

    Directory of Open Access Journals (Sweden)

    Diana W.L. Ho

    2014-04-01

    Our results suggested that Cantonese speakers demonstrated more difficulties in non-words and lexical decision tasks, potentially due to the additional linguistic factor of lexical tone (Cutler & Chen, 1997.The importance of lexical tone in phonological processing of tonal languages, which warrants further investigation, should be highlighted for clinical assessment as well as intervention.

  17. Speaker Segmentation and Clustering Using Gender Information

    Science.gov (United States)

    2006-02-01

    used in the first stages of segmentation forder information in the clustering of the opposite-gender speaker diarization of news broadcasts. files, the...AFRL-HE-WP-TP-2006-0026 AIR FORCE RESEARCH LABORATORY Speaker Segmentation and Clustering Using Gender Information Brian M. Ore General Dynamics...COVERED (From - To) February 2006 ProceedinLgs 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Speaker Segmentation and Clustering Using Gender Information 5b

  18. (En)countering native-speakerism global perspectives

    CERN Document Server

    Holliday, Adrian; Swan, Anne

    2015-01-01

    The book addresses the issue of native-speakerism, an ideology based on the assumption that 'native speakers' of English have a special claim to the language itself, through critical qualitative studies of the lived experiences of practising teachers and students in a range of scenarios.

  19. Robust speaker recognition in noisy environments

    CERN Document Server

    Rao, K Sreenivasa

    2014-01-01

    This book discusses speaker recognition methods to deal with realistic variable noisy environments. The text covers authentication systems for; robust noisy background environments, functions in real time and incorporated in mobile devices. The book focuses on different approaches to enhance the accuracy of speaker recognition in presence of varying background environments. The authors examine: (a) Feature compensation using multiple background models, (b) Feature mapping using data-driven stochastic models, (c) Design of super vector- based GMM-SVM framework for robust speaker recognition, (d) Total variability modeling (i-vectors) in a discriminative framework and (e) Boosting method to fuse evidences from multiple SVM models.

  20. FPGA Implementation for GMM-Based Speaker Identification

    Directory of Open Access Journals (Sweden)

    Phaklen EhKan

    2011-01-01

    Full Text Available In today's society, highly accurate personal identification systems are required. Passwords or pin numbers can be forgotten or forged and are no longer considered to offer a high level of security. The use of biological features, biometrics, is becoming widely accepted as the next level for security systems. Biometric-based speaker identification is a method of identifying persons from their voice. Speaker-specific characteristics exist in speech signals due to different speakers having different resonances of the vocal tract. These differences can be exploited by extracting feature vectors such as Mel-Frequency Cepstral Coefficients (MFCCs from the speech signal. A well-known statistical modelling process, the Gaussian Mixture Model (GMM, then models the distribution of each speaker's MFCCs in a multidimensional acoustic space. The GMM-based speaker identification system has features that make it promising for hardware acceleration. This paper describes the hardware implementation for classification of a text-independent GMM-based speaker identification system. The aim was to produce a system that can perform simultaneous identification of large numbers of voice streams in real time. This has important potential applications in security and in automated call centre applications. A speedup factor of ninety was achieved compared to a software implementation on a standard PC.

  1. A system of automatic speaker recognition on a minicomputer

    International Nuclear Information System (INIS)

    El Chafei, Cherif

    1978-01-01

    This study describes a system of automatic speaker recognition using the pitch of the voice. The pre-treatment consists in the extraction of the speakers' discriminating characteristics taken from the pitch. The programme of recognition gives, firstly, a preselection and then calculates the distance between the speaker's characteristics to be recognized and those of the speakers already recorded. An experience of recognition has been realized. It has been undertaken with 15 speakers and included 566 tests spread over an intermittent period of four months. The discriminating characteristics used offer several interesting qualities. The algorithms concerning the measure of the characteristics on one hand, the speakers' classification on the other hand, are simple. The results obtained in real time with a minicomputer are satisfactory. Furthermore they probably could be improved if we considered other speaker's discriminating characteristics but this was unfortunately not in our possibilities. (author) [fr

  2. Speaker diarization system using HXLPS and deep neural network

    Directory of Open Access Journals (Sweden)

    V. Subba Ramaiah

    2018-03-01

    Full Text Available In general, speaker diarization is defined as the process of segmenting the input speech signal and grouped the homogenous regions with regard to the speaker identity. The main idea behind this system is that it is able to discriminate the speaker signal by assigning the label of the each speaker signal. Due to rapid growth of broadcasting and meeting, the speaker diarization is burdensome to enhance the readability of the speech transcription. In order to solve this issue, Holoentropy with the eXtended Linear Prediction using autocorrelation Snapshot (HXLPS and deep neural network (DNN is proposed for the speaker diarization system. The HXLPS extraction method is newly developed by incorporating the Holoentropy with the XLPS. Once we attain the features, the speech and non-speech signals are detected by the Voice Activity Detection (VAD method. Then, i-vector representation of every segmented signal is obtained using Universal Background Model (UBM model. Consequently, DNN is utilized to assign the label for the speaker signal which is then clustered according to the speaker label. The performance is analysed using the evaluation metrics, such as tracking distance, false alarm rate and diarization error rate. The outcome of the proposed method ensures the better diarization performance by achieving the lower DER of 1.36% based on lambda value and DER of 2.23% depends on the frame length. Keywords: Speaker diarization, HXLPS feature extraction, Voice activity detection, Deep neural network, Speaker clustering, Diarization Error Rate (DER

  3. Speaker emotion recognition: from classical classifiers to deep neural networks

    Science.gov (United States)

    Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri

    2018-04-01

    Speaker emotion recognition is considered among the most challenging tasks in recent years. In fact, automatic systems for security, medicine or education can be improved when considering the speech affective state. In this paper, a twofold approach for speech emotion classification is proposed. At the first side, a relevant set of features is adopted, and then at the second one, numerous supervised training techniques, involving classic methods as well as deep learning, are experimented. Experimental results indicate that deep architecture can improve classification performance on two affective databases, the Berlin Dataset of Emotional Speech and the SAVEE Dataset Surrey Audio-Visual Expressed Emotion.

  4. Lexical Access in Persian Normal Speakers: Picture Naming, Verbal Fluency and Spontaneous Speech

    Directory of Open Access Journals (Sweden)

    Zahra Sadat Ghoreishi

    2014-06-01

    Full Text Available Objectives: Lexical access is the process by which the basic conceptual, syntactical and morpho-phonological information of words are activated. Most studies of lexical access have focused on picture naming. There is hardly any previous research on other parameters of lexical access such as verbal fluency and analysis of connected speech in Persian normal participants. This study investigates the lexical access performance in normal speakers in different issues such as age, sex and education. Methods: The performance of 120 adult Persian speakers in three tasks including picture naming, verbal fluency and connected speech, was examined using "Persian Lexical Access Assessment Package”. The performance of participants between two gender groups (male/female, three education groups (below 5 years, above 12 years, between 5 and 12 years and three age groups (18-35 years, 36-55 years, 56-75 years were compared. Results: According to findings, picture naming increased with increasing education and decreased with increasing age. The performance of participants in phonological and semantic verbal fluency showed improvement with age and education. No significant difference was seen between males and females in verbal fluency task. In the analysis of connected speech there were no significant differences between different age and education groups and just mean length of utterance in males was significantly higher than females. Discussion: The findings could be a primitive scale for comparison between normal subjects and patients in lexical access tasks, furthermore it could be a horizon for planning of treatment goals in patients with word finding problem according to age, gender and education.

  5. Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

    Science.gov (United States)

    Kayasith, Prakasith; Theeramunkong, Thanaruk

    It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.

  6. Learning speaker-specific characteristics with a deep neural architecture.

    Science.gov (United States)

    Chen, Ke; Salman, Ahmad

    2011-11-01

    Speech signals convey various yet mixed information ranging from linguistic to speaker-specific information. However, most of acoustic representations characterize all different kinds of information as whole, which could hinder either a speech or a speaker recognition (SR) system from producing a better performance. In this paper, we propose a novel deep neural architecture (DNA) especially for learning speaker-specific characteristics from mel-frequency cepstral coefficients, an acoustic representation commonly used in both speech recognition and SR, which results in a speaker-specific overcomplete representation. In order to learn intrinsic speaker-specific characteristics, we come up with an objective function consisting of contrastive losses in terms of speaker similarity/dissimilarity and data reconstruction losses used as regularization to normalize the interference of non-speaker-related information. Moreover, we employ a hybrid learning strategy for learning parameters of the deep neural networks: i.e., local yet greedy layerwise unsupervised pretraining for initialization and global supervised learning for the ultimate discriminative goal. With four Linguistic Data Consortium (LDC) benchmarks and two non-English corpora, we demonstrate that our overcomplete representation is robust in characterizing various speakers, no matter whether their utterances have been used in training our DNA, and highly insensitive to text and languages spoken. Extensive comparative studies suggest that our approach yields favorite results in speaker verification and segmentation. Finally, we discuss several issues concerning our proposed approach.

  7. Robustness-related issues in speaker recognition

    CERN Document Server

    Zheng, Thomas Fang

    2017-01-01

    This book presents an overview of speaker recognition technologies with an emphasis on dealing with robustness issues. Firstly, the book gives an overview of speaker recognition, such as the basic system framework, categories under different criteria, performance evaluation and its development history. Secondly, with regard to robustness issues, the book presents three categories, including environment-related issues, speaker-related issues and application-oriented issues. For each category, the book describes the current hot topics, existing technologies, and potential research focuses in the future. The book is a useful reference book and self-learning guide for early researchers working in the field of robust speech recognition.

  8. Real Time Recognition Of Speakers From Internet Audio Stream

    Directory of Open Access Journals (Sweden)

    Weychan Radoslaw

    2015-09-01

    Full Text Available In this paper we present an automatic speaker recognition technique with the use of the Internet radio lossy (encoded speech signal streams. We show an influence of the audio encoder (e.g., bitrate on the speaker model quality. The model of each speaker was calculated with the use of the Gaussian mixture model (GMM approach. Both the speaker recognition and the further analysis were realized with the use of short utterances to facilitate real time processing. The neighborhoods of the speaker models were analyzed with the use of the ISOMAP algorithm. The experiments were based on four 1-hour public debates with 7–8 speakers (including the moderator, acquired from the Polish radio Internet services. The presented software was developed with the MATLAB environment.

  9. Accent Attribution in Speakers with Foreign Accent Syndrome

    Science.gov (United States)

    Verhoeven, Jo; De Pauw, Guy; Pettinato, Michele; Hirson, Allen; Van Borsel, John; Marien, Peter

    2013-01-01

    Purpose: The main aim of this experiment was to investigate the perception of Foreign Accent Syndrome in comparison to speakers with an authentic foreign accent. Method: Three groups of listeners attributed accents to conversational speech samples of 5 FAS speakers which were embedded amongst those of 5 speakers with a real foreign accent and 5…

  10. Task choice and semantic interference in picture naming

    OpenAIRE

    Piai, V.; Roelofs, A.P.A.; Schriefers, H.J.

    2015-01-01

    Evidence from dual-task performance indicates that speakers prefer not to select simultaneous responses in picture naming and another unrelated task, suggesting a response selection bottleneck in naming. In particular, when participants respond to tones with a manual response and name pictures with superimposed semantically related or unrelated distractor words, semantic interference in naming tends to be constant across stimulus onset asynchronies (SOAs) between the tone stimulus and the pic...

  11. Differential Modulation of Performance in Insight and Divergent Thinking Tasks with tDCS

    Science.gov (United States)

    Goel, Vinod; Eimontaite, Iveta; Goel, Amit; Schindler, Igor

    2015-01-01

    While both insight and divergent thinking tasks are used to study creativity, there are reasons to believe that the two may call upon very different mechanisms. To explore this hypothesis, we administered a verbal insight task (riddles) and a divergent thinking task (verbal fluency) to 16 native English speakers and 16 non-native English speakers…

  12. Comparison of Diarization Tools for Building Speaker Database

    Directory of Open Access Journals (Sweden)

    Eva Kiktova

    2015-01-01

    Full Text Available This paper compares open source diarization toolkits (LIUM, DiarTK, ALIZE-Lia_Ral, which were designed for extraction of speaker identity from audio records without any prior information about the analysed data. The comparative study of used diarization tools was performed for three different types of analysed data (broadcast news - BN and TV shows. Corresponding values of achieved DER measure are presented here. The automatic speaker diarization system developed by LIUM was able to identified speech segments belonging to speakers at very good level. Its segmentation outputs can be used to build a speaker database.

  13. The Role of Task and Listener Characteristics in Second Language Listening

    Science.gov (United States)

    Brunfaut, Tineke; Révész, Andrea

    2015-01-01

    This study investigated the relationship between second language (L2) listening and a range of task and listener characteristics. More specifically, for a group of 93 nonnative English speakers, the researchers examined the extent to which linguistic complexity of the listening task input and response, and speed and explicitness of the input, were…

  14. CUERPO, CH’ULEL Y LAB ELEMENTOS DE LA CONFIGURACIÓN DE LA PERSONA TSELTAL EN YAJALÓN, CHIAPAS

    Directory of Open Access Journals (Sweden)

    Óscar Sánchez Carrillo

    2007-12-01

    Full Text Available El presente artículo tiene el propósito de analizar la relación de las diferentes entidades anímicas y su contraparte corporal, chanul, que configuran e integran a la persona tseltal de las comunidades del municipio de Yajalón, Chiapas. El objetivo es enunciar las representaciones y/o nociones de los actores sobre el cuerpo y sus entidades anímicas residentes: el ch’ulel y lab entre otras criaturas, yalak’, que lo habitan. Así la persona se configura con el único propósito de trazar la línea de la vida y su destino en el Balumilal-Tierra-Cosmos. No es de extrañar que en el lenguaje sagrado, k’opontik Dios, en las oraciones y cánticos de los diferentes ritos religiosos y terapéuticos se establezca un paralelismo entre el cuerpo humano y la Tierra humanizada, espacio en cuyo interior residen una extraordinaria cantidad de seres sobrenaturales que la habitan, yalak’ y chambalam, y al mismo tiempo tolera a los hombres y animales en su superficie.   ABSTRACT The present article pretends to analyze the relation of different spiritual entities and their corporal (body- counterpart, that configure and shape the tzeltal person from the communities of Yajalon (county in northern Chiapas. The objective is to enounce the conceptions which the subjects have about the body and his animic entities that reside within: amongst others, the ch’ulel and lab creatures. With the animic entities the person configures itself with the only purpose to trace the line of his life and his destiny in the Balumilal – which means Earth and Cosmos. It is not surprising that the sacred language –K’opontik Dios-, in the prayers and songs of different religious and therapeutic rituals, establishes a parallelism between the human body and the humanized Earth. The earth is perceived as a space which encloses an extraordinary quantity of supernatural beings (yalak’ and chambalam and at the same time, as one that tolerates humans and animals on its

  15. Role of Speaker Cues in Attention Inference

    Directory of Open Access Journals (Sweden)

    Jin Joo Lee

    2017-10-01

    Full Text Available Current state-of-the-art approaches to emotion recognition primarily focus on modeling the nonverbal expressions of the sole individual without reference to contextual elements such as the co-presence of the partner. In this paper, we demonstrate that the accurate inference of listeners’ social-emotional state of attention depends on accounting for the nonverbal behaviors of their storytelling partner, namely their speaker cues. To gain a deeper understanding of the role of speaker cues in attention inference, we conduct investigations into real-world interactions of children (5–6 years old storytelling with their peers. Through in-depth analysis of human–human interaction data, we first identify nonverbal speaker cues (i.e., backchannel-inviting cues and listener responses (i.e., backchannel feedback. We then demonstrate how speaker cues can modify the interpretation of attention-related backchannels as well as serve as a means to regulate the responsiveness of listeners. We discuss the design implications of our findings toward our primary goal of developing attention recognition models for storytelling robots, and we argue that social robots can proactively use speaker cues to form more accurate inferences about the attentive state of their human partners.

  16. Speaker and Observer Perceptions of Physical Tension during Stuttering.

    Science.gov (United States)

    Tichenor, Seth; Leslie, Paula; Shaiman, Susan; Yaruss, J Scott

    2017-01-01

    Speech-language pathologists routinely assess physical tension during evaluation of those who stutter. If speakers experience tension that is not visible to clinicians, then judgments of severity may be inaccurate. This study addressed this potential discrepancy by comparing judgments of tension by people who stutter and expert clinicians to determine if clinicians could accurately identify the speakers' experience of physical tension. Ten adults who stutter were audio-video recorded in two speaking samples. Two board-certified specialists in fluency evaluated the samples using the Stuttering Severity Instrument-4 and a checklist adapted for this study. Speakers rated their tension using the same forms, and then discussed their experiences in a qualitative interview so that themes related to physical tension could be identified. The degree of tension reported by speakers was higher than that observed by specialists. Tension in parts of the body that were less visible to the observer (chest, abdomen, throat) was reported more by speakers than by specialists. The thematic analysis revealed that speakers' experience of tension changes over time and that these changes may be related to speakers' acceptance of stuttering. The lack of agreement between speaker and specialist perceptions of tension suggests that using self-reports is a necessary component for supporting the accurate diagnosis of tension in stuttering. © 2018 S. Karger AG, Basel.

  17. Forensic speaker recognition

    NARCIS (Netherlands)

    Meuwly, Didier

    2013-01-01

    The aim of forensic speaker recognition is to establish links between individuals and criminal activities, through audio speech recordings. This field is multidisciplinary, combining predominantly phonetics, linguistics, speech signal processing, and forensic statistics. On these bases, expert-based

  18. Data-Model Relationship in Text-Independent Speaker Recognition

    Directory of Open Access Journals (Sweden)

    Stapert Robert

    2005-01-01

    Full Text Available Text-independent speaker recognition systems such as those based on Gaussian mixture models (GMMs do not include time sequence information (TSI within the model itself. The level of importance of TSI in speaker recognition is an interesting question and one addressed in this paper. Recent works has shown that the utilisation of higher-level information such as idiolect, pronunciation, and prosodics can be useful in reducing speaker recognition error rates. In accordance with these developments, the aim of this paper is to show that as more data becomes available, the basic GMM can be enhanced by utilising TSI, even in a text-independent mode. This paper presents experimental work incorporating TSI into the conventional GMM. The resulting system, known as the segmental mixture model (SMM, embeds dynamic time warping (DTW into a GMM framework. Results are presented on the 2000-speaker SpeechDat Welsh database which show improved speaker recognition performance with the SMM.

  19. Inferring speaker attributes in adductor spasmodic dysphonia: ratings from unfamiliar listeners.

    Science.gov (United States)

    Isetti, Derek; Xuereb, Linnea; Eadie, Tanya L

    2014-05-01

    To determine whether unfamiliar listeners' perceptions of speakers with adductor spasmodic dysphonia (ADSD) differ from control speakers on the parameters of relative age, confidence, tearfulness, and vocal effort and are related to speaker-rated vocal effort or voice-specific quality of life. Twenty speakers with ADSD (including 6 speakers with ADSD plus tremor) and 20 age- and sex-matched controls provided speech recordings, completed a voice-specific quality-of-life instrument (Voice Handicap Index; Jacobson et al., 1997), and rated their own vocal effort. Twenty listeners evaluated speech samples for relative age, confidence, tearfulness, and vocal effort using rating scales. Listeners judged speakers with ADSD as sounding significantly older, less confident, more tearful, and more effortful than control speakers (p < .01). Increased vocal effort was strongly associated with decreased speaker confidence (rs = .88-.89) and sounding more tearful (rs = .83-.85). Self-rated speaker effort was moderately related (rs = .45-.52) to listener impressions. Listeners' perceptions of confidence and tearfulness were also moderately associated with higher Voice Handicap Index scores (rs = .65-.70). Unfamiliar listeners judge speakers with ADSD more negatively than control speakers, with judgments extending beyond typical clinical measures. The results have implications for counseling and understanding the psychosocial effects of ADSD.

  20. Examining age-related differences in auditory attention control using a task-switching procedure.

    Science.gov (United States)

    Lawo, Vera; Koch, Iring

    2014-03-01

    Using a novel task-switching variant of dichotic selective listening, we examined age-related differences in the ability to intentionally switch auditory attention between 2 speakers defined by their sex. In our task, young (M age = 23.2 years) and older adults (M age = 66.6 years) performed a numerical size categorization on spoken number words. The task-relevant speaker was indicated by a cue prior to auditory stimulus onset. The cuing interval was either short or long and varied randomly trial by trial. We found clear performance costs with instructed attention switches. These auditory attention switch costs decreased with prolonged cue-stimulus interval. Older adults were generally much slower (but not more error prone) than young adults, but switching-related effects did not differ across age groups. These data suggest that the ability to intentionally switch auditory attention in a selective listening task is not compromised in healthy aging. We discuss the role of modality-specific factors in age-related differences.

  1. Speaker-dependent Multipitch Tracking Using Deep Neural Networks

    Science.gov (United States)

    2015-01-01

    sentences spoken by each of 34 speakers (18 male, 16 female). Two male and two female speakers (No. 1, 2, 18, 20, same as [30]), denoted as MA1, MA2 ...Engineering Technical Report #12, 2015 Speaker Pairs MA1- MA2 MA1-FE1 MA1-FE2 MA2 -FE1 MA2 -FE2 FE1-FE2 E T ot al 0 10 20 30 40 50 60 70 80 Jin and Wang Hu and...Pitch 1 Estimated Pitch 2 (d) Figure 6: Multipitch tracking results on a test mixture (pbbv6n and priv3n) for the MA1- MA2 speaker pair. (a) Groundtruth

  2. Joint Single-Channel Speech Separation and Speaker Identification

    DEFF Research Database (Denmark)

    Mowlaee, Pejman; Saeidi, Rahim; Tan, Zheng-Hua

    2010-01-01

    In this paper, we propose a closed loop system to improve the performance of single-channel speech separation in a speaker independent scenario. The system is composed of two interconnected blocks: a separation block and a speaker identiſcation block. The improvement is accomplished by incorporat......In this paper, we propose a closed loop system to improve the performance of single-channel speech separation in a speaker independent scenario. The system is composed of two interconnected blocks: a separation block and a speaker identiſcation block. The improvement is accomplished...... enhances the quality of the separated output signals. To assess the improvements, the results are reported in terms of PESQ for both target and masked signals....

  3. Communicating with the crowd: speakers use abstract messages when addressing larger audiences.

    Science.gov (United States)

    Joshi, Priyanka D; Wakslak, Cheryl J

    2014-02-01

    Audience characteristics often shape communicators' message framing. Drawing from construal level theory, we suggest that when speaking to many individuals, communicators frame messages in terms of superordinate characteristics that focus attention on the essence of the message. On the other hand, when communicating with a single individual, communicators increasingly describe events and actions in terms of their concrete details. Using different communication tasks and measures of construal, we show that speakers communicating with many individuals, compared with 1 person, describe events more abstractly (Study 1), describe themselves as more trait-like (Study 2), and use more desirability-related persuasive messages (Study 3). Furthermore, speakers' motivation to communicate with their audience moderates their tendency to frame messages based on audience size (Studies 3 and 4). This audience-size abstraction effect is eliminated when a large audience is described as homogeneous, suggesting that people use abstract construal strategically in order to connect across a disparate group of individuals (Study 5). Finally, we show that participants' experienced fluency in communication is influenced by the match between message abstraction and audience size (Study 6).

  4. Automatic Speaker Recognition for Mobile Forensic Applications

    Directory of Open Access Journals (Sweden)

    Mohammed Algabri

    2017-01-01

    Full Text Available Presently, lawyers, law enforcement agencies, and judges in courts use speech and other biometric features to recognize suspects. In general, speaker recognition is used for discriminating people based on their voices. The process of determining, if a suspected speaker is the source of trace, is called forensic speaker recognition. In such applications, the voice samples are most probably noisy, the recording sessions might mismatch each other, the sessions might not contain sufficient recording for recognition purposes, and the suspect voices are recorded through mobile channel. The identification of a person through his voice within a forensic quality context is challenging. In this paper, we propose a method for forensic speaker recognition for the Arabic language; the King Saud University Arabic Speech Database is used for obtaining experimental results. The advantage of this database is that each speaker’s voice is recorded in both clean and noisy environments, through a microphone and a mobile channel. This diversity facilitates its usage in forensic experimentations. Mel-Frequency Cepstral Coefficients are used for feature extraction and the Gaussian mixture model-universal background model is used for speaker modeling. Our approach has shown low equal error rates (EER, within noisy environments and with very short test samples.

  5. Request Strategies in Everyday Interactions of Persian and English Speakers

    Directory of Open Access Journals (Sweden)

    Shiler Yazdanfar

    2016-12-01

    Full Text Available Cross-cultural studies of speech acts in different linguistic contexts might have interesting implications for language researchers and practitioners. Drawing on the Speech Act Theory, the present study aimed at conducting a comparative study of request speech act in Persian and English. Specifically, the study endeavored to explore the request strategies used in daily interactions of Persian and English speakers based on directness level and supportive moves. To this end, English and Persian TV series were observed and requestive utterances were transcribed. The utterances were then categorized based on Blum-Kulka and Olshtain’s Cross-Cultural Study of Speech Act Realization Pattern (CCSARP for directness level and internal and external mitigation devises. According to the results, although speakers of both languages opted for the direct level as their most frequently used strategy in their daily interactions, the English speakers used more conventionally indirect strategies than the Persian speakers did, and the Persian speakers used more non-conventionally indirect strategies than the English speakers did. Furthermore, the analyzed data revealed the fact that American English speakers use more mitigation devices in their daily interactions with friends and family members than Persian speakers.

  6. Speaker Reliability Guides Children's Inductive Inferences about Novel Properties

    Science.gov (United States)

    Kim, Sunae; Kalish, Charles W.; Harris, Paul L.

    2012-01-01

    Prior work shows that children can make inductive inferences about objects based on their labels rather than their appearance (Gelman, 2003). A separate line of research shows that children's trust in a speaker's label is selective. Children accept labels from a reliable speaker over an unreliable speaker (e.g., Koenig & Harris, 2005). In the…

  7. Comparison of different speech tasks among adults who stutter and adults who do not stutter

    Directory of Open Access Journals (Sweden)

    Ana Paula Ritto

    2016-03-01

    Full Text Available OBJECTIVES: In this study, we compared the performance of both fluent speakers and people who stutter in three different speaking situations: monologue speech, oral reading and choral reading. This study follows the assumption that the neuromotor control of speech can be influenced by external auditory stimuli in both speakers who stutter and speakers who do not stutter. METHOD: Seventeen adults who stutter and seventeen adults who do not stutter were assessed in three speaking tasks: monologue, oral reading (solo reading aloud and choral reading (reading in unison with the evaluator. Speech fluency and rate were measured for each task. RESULTS: The participants who stuttered had a lower frequency of stuttering during choral reading than during monologue and oral reading. CONCLUSIONS: According to the dual premotor system model, choral speech enhanced fluency by providing external cues for the timing of each syllable compensating for deficient internal cues.

  8. Guest Speakers in School-Based Sexuality Education

    Science.gov (United States)

    McRee, Annie-Laurie; Madsen, Nikki; Eisenberg, Marla E.

    2014-01-01

    This study, using data from a statewide survey (n = 332), examined teachers' practices regarding the inclusion of guest speakers to cover sexuality content. More than half of teachers (58%) included guest speakers. In multivariate analyses, teachers who taught high school, had professional preparation in health education, or who received…

  9. The Communication of Public Speaking Anxiety: Perceptions of Asian and American Speakers.

    Science.gov (United States)

    Martini, Marianne; And Others

    1992-01-01

    Finds that U.S. audiences perceive Asian speakers to have more speech anxiety than U.S. speakers, even though Asian speakers do not self-report higher anxiety levels. Confirms that speech state anxiety is not communicated effectively between speakers and audiences for Asian or U.S. speakers. (SR)

  10. The mechanism of speech processing in congenital amusia: evidence from Mandarin speakers.

    Directory of Open Access Journals (Sweden)

    Fang Liu

    Full Text Available Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results.

  11. The mechanism of speech processing in congenital amusia: evidence from Mandarin speakers.

    Science.gov (United States)

    Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren

    2012-01-01

    Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results.

  12. Task choice and semantic interference in picture naming.

    Science.gov (United States)

    Piai, Vitória; Roelofs, Ardi; Schriefers, Herbert

    2015-05-01

    Evidence from dual-task performance indicates that speakers prefer not to select simultaneous responses in picture naming and another unrelated task, suggesting a response selection bottleneck in naming. In particular, when participants respond to tones with a manual response and name pictures with superimposed semantically related or unrelated distractor words, semantic interference in naming tends to be constant across stimulus onset asynchronies (SOAs) between the tone stimulus and the picture-word stimulus. In the present study, we examine whether semantic interference in picture naming depends on SOA in case of a task choice (naming the picture vs reading the word of a picture-word stimulus) based on tones. This situation requires concurrent processing of the tone stimulus and the picture-word stimulus, but not a manual response to the tones. On each trial, participants either named a picture or read aloud a word depending on the pitch of a tone, which was presented simultaneously with picture-word onset or 350 ms or 1000 ms before picture-word onset. Semantic interference was present with tone pre-exposure, but absent when tone and picture-word stimulus were presented simultaneously. Against the background of the available studies, these results support an account according to which speakers tend to avoid concurrent response selection, but can engage in other types of concurrent processing, such as task choices. Copyright © 2015 Elsevier B.V. All rights reserved.

  13. Application of Native Speaker Models for Identifying Deviations in Rhetorical Moves in Non-Native Speaker Manuscripts

    Directory of Open Access Journals (Sweden)

    Assef Khalili

    2016-06-01

    Full Text Available Introduction: Explicit teaching of generic conventions of a text genre, usually extracted from native-speaker (NS manuscripts, has long been emphasized in the teaching of Academic Writing inEnglish for Specific Purposes (henceforthESP classes, both in theory and practice. While consciousness-raising about rhetorical structure can be instrumental to non-native speakers(NNS, it has to be admitted that most works done in the field of ESP have tended to focus almost exclusively on native-speaker (NS productions, giving scant attention to non-native speaker (NNS manuscripts. That is, having outlined established norms for good writing on the basis of NS productions, few have been inclined to provide a descriptive account of NNS attempts at trying to produce a research article (RA in English. That is what we have tried to do in the present research. Methods: We randomly selected 20 RAs in dentistry and used two well-established models for results and discussion sections to try to describe the move structure of these articles and show the points of divergence from the established norms. Results: The results pointed to significant divergences that could seriously compromise the quality of an RA. Conclusion: It is believed that the insights gained on the deviations in NNS manuscripts could prove very useful in designing syllabi for ESP classes.

  14. Speaker Clustering for a Mixture of Singing and Reading (Preprint)

    Science.gov (United States)

    2012-03-01

    diarization [2, 3] which answers the ques- tion of ”who spoke when?” is a combination of speaker segmentation and clustering. Although it is possible to...focuses on speaker clustering, the techniques developed here can be applied to speaker diarization . For the remainder of this paper, the term ”speech...and retrieval,” Proceedings of the IEEE, vol. 88, 2000. [2] S. Tranter and D. Reynolds, “An overview of automatic speaker diarization systems,” IEEE

  15. Human and automatic speaker recognition over telecommunication channels

    CERN Document Server

    Fernández Gallardo, Laura

    2016-01-01

    This work addresses the evaluation of the human and the automatic speaker recognition performances under different channel distortions caused by bandwidth limitation, codecs, and electro-acoustic user interfaces, among other impairments. Its main contribution is the demonstration of the benefits of communication channels of extended bandwidth, together with an insight into how speaker-specific characteristics of speech are preserved through different transmissions. It provides sufficient motivation for considering speaker recognition as a criterion for the migration from narrowband to enhanced bandwidths, such as wideband and super-wideband.

  16. Electrophysiology of subject-verb agreement mediated by speakers' gender.

    Science.gov (United States)

    Hanulíková, Adriana; Carreiras, Manuel

    2015-01-01

    An important property of speech is that it explicitly conveys features of a speaker's identity such as age or gender. This event-related potential (ERP) study examined the effects of social information provided by a speaker's gender, i.e., the conceptual representation of gender, on subject-verb agreement. Despite numerous studies on agreement, little is known about syntactic computations generated by speaker characteristics extracted from the acoustic signal. Slovak is well suited to investigate this issue because it is a morphologically rich language in which agreement involves features for number, case, and gender. Grammaticality of a sentence can be evaluated by checking a speaker's gender as conveyed by his/her voice. We examined how conceptual information about speaker gender, which is not syntactic but rather social and pragmatic in nature, is interpreted for the computation of agreement patterns. ERP responses to verbs disagreeing with the speaker's gender (e.g., a sentence including a masculine verbal inflection spoken by a female person 'the neighbors were upset because I (∗)stoleMASC plums') elicited a larger early posterior negativity compared to correct sentences. When the agreement was purely syntactic and did not depend on the speaker's gender, a disagreement between a formally marked subject and the verb inflection (e.g., the womanFEM (∗)stoleMASC plums) resulted in a larger P600 preceded by a larger anterior negativity compared to the control sentences. This result is in line with proposals according to which the recruitment of non-syntactic information such as the gender of the speaker results in N400-like effects, while formally marked syntactic features lead to structural integration as reflected in a LAN/P600 complex.

  17. Speakers of different languages process the visual world differently.

    Science.gov (United States)

    Chabal, Sarah; Marian, Viorica

    2015-06-01

    Language and vision are highly interactive. Here we show that people activate language when they perceive the visual world, and that this language information impacts how speakers of different languages focus their attention. For example, when searching for an item (e.g., clock) in the same visual display, English and Spanish speakers look at different objects. Whereas English speakers searching for the clock also look at a cloud, Spanish speakers searching for the clock also look at a gift, because the Spanish names for gift (regalo) and clock (reloj) overlap phonologically. These different looking patterns emerge despite an absence of direct language input, showing that linguistic information is automatically activated by visual scene processing. We conclude that the varying linguistic information available to speakers of different languages affects visual perception, leading to differences in how the visual world is processed. (c) 2015 APA, all rights reserved).

  18. Multimodal Speaker Diarization

    NARCIS (Netherlands)

    Noulas, A.; Englebienne, G.; Kröse, B.J.A.

    2012-01-01

    We present a novel probabilistic framework that fuses information coming from the audio and video modality to perform speaker diarization. The proposed framework is a Dynamic Bayesian Network (DBN) that is an extension of a factorial Hidden Markov Model (fHMM) and models the people appearing in an

  19. Language production in a shared task: Cumulative Semantic Interference from self- and other-produced context words.

    Science.gov (United States)

    Hoedemaker, Renske S; Ernst, Jessica; Meyer, Antje S; Belke, Eva

    2017-01-01

    This study assessed the effects of semantic context in the form of self-produced and other-produced words on subsequent language production. Pairs of participants performed a joint picture naming task, taking turns while naming a continuous series of pictures. In the single-speaker version of this paradigm, naming latencies have been found to increase for successive presentations of exemplars from the same category, a phenomenon known as Cumulative Semantic Interference (CSI). As expected, the joint-naming task showed a within-speaker CSI effect, such that naming latencies increased as a function of the number of category exemplars named previously by the participant (self-produced items). Crucially, we also observed an across-speaker CSI effect, such that naming latencies slowed as a function of the number of category members named by the participant's task partner (other-produced items). The magnitude of the across-speaker CSI effect did not vary as a function of whether or not the listening participant could see the pictures their partner was naming. The observation of across-speaker CSI suggests that the effect originates at the conceptual level of the language system, as proposed by Belke's (2013) Conceptual Accumulation account. Whereas self-produced and other-produced words both resulted in a CSI effect on naming latencies, post-experiment free recall rates were higher for self-produced than other-produced items. Together, these results suggest that both speaking and listening result in implicit learning at the conceptual level of the language system but that these effects are independent of explicit learning as indicated by item recall. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Electropalatographic (EPG) assessment of tongue-to-palate contacts in dysarthric speakers following TBI.

    Science.gov (United States)

    Kuruvilla, Mili S; Murdoch, Bruce E; Goozee, Justine V

    2008-09-01

    The aim of the investigation was to compare EPG-derived spatial and timing measures between a group of 11 dysarthric individuals post-severe TBI and 10 age- and sex-matched neurologically non-impaired individuals. Participants of the TBI group were diagnosed with dysarthria ranging from mild-to-moderate-severe dysarthria. Each participant from the TBI and comparison group was fitted with a custom-made artificial acrylic palate that recorded lingual palatal contact during target consonant production in sentence- and syllable-repetition tasks at a habitual rate and loudness level. Analysis of temporal parameters between the comparison and TBI groups revealed prolonged durations of the various phases of consonant production, which were attributed to articulatory slowness, impaired speech motor control, impaired accuracy, and impaired coordination of articulatory movements in the dysarthric speakers post-TBI. For the spatial measurements, quantitative analysis, as well as visual inspection of the tongue-to-palate contact diagrams, indicated spatial aberrations in dysarthric speech post-TBI. Both the spatial and temporal aberrations may have at least partially caused the perceptual judgement of articulatory impairments in the dysarthric speakers.

  1. Studies on inter-speaker variability in speech and its application in ...

    Indian Academy of Sciences (India)

    tic representation of vowel realizations by different speakers. ... in regional background, education level and gender of speaker. A more ...... formal maps such as bilinear transform and its generalizations for speaker normalization. Since.

  2. Examining Age-Related Differences in Auditory Attention Control Using a Task-Switching Procedure

    OpenAIRE

    Vera Lawo; Iring Koch

    2014-01-01

    Objectives. Using a novel task-switching variant of dichotic selective listening, we examined age-related differences in the ability to intentionally switch auditory attention between 2 speakers defined by their sex.

  3. Content-specific coordination of listeners' to speakers' EEG during communication.

    Science.gov (United States)

    Kuhlen, Anna K; Allefeld, Carsten; Haynes, John-Dylan

    2012-01-01

    Cognitive neuroscience has recently begun to extend its focus from the isolated individual mind to two or more individuals coordinating with each other. In this study we uncover a coordination of neural activity between the ongoing electroencephalogram (EEG) of two people-a person speaking and a person listening. The EEG of one set of twelve participants ("speakers") was recorded while they were narrating short stories. The EEG of another set of twelve participants ("listeners") was recorded while watching audiovisual recordings of these stories. Specifically, listeners watched the superimposed videos of two speakers simultaneously and were instructed to attend either to one or the other speaker. This allowed us to isolate neural coordination due to processing the communicated content from the effects of sensory input. We find several neural signatures of communication: First, the EEG is more similar among listeners attending to the same speaker than among listeners attending to different speakers, indicating that listeners' EEG reflects content-specific information. Secondly, listeners' EEG activity correlates with the attended speakers' EEG, peaking at a time delay of about 12.5 s. This correlation takes place not only between homologous, but also between non-homologous brain areas in speakers and listeners. A semantic analysis of the stories suggests that listeners coordinate with speakers at the level of complex semantic representations, so-called "situation models". With this study we link a coordination of neural activity between individuals directly to verbally communicated information.

  4. Forensic Speaker Recognition Law Enforcement and Counter-Terrorism

    CERN Document Server

    Patil, Hemant

    2012-01-01

    Forensic Speaker Recognition: Law Enforcement and Counter-Terrorism is an anthology of the research findings of 35 speaker recognition experts from around the world. The volume provides a multidimensional view of the complex science involved in determining whether a suspect’s voice truly matches forensic speech samples, collected by law enforcement and counter-terrorism agencies, that are associated with the commission of a terrorist act or other crimes. While addressing such topics as the challenges of forensic case work, handling speech signal degradation, analyzing features of speaker recognition to optimize voice verification system performance, and designing voice applications that meet the practical needs of law enforcement and counter-terrorism agencies, this material all sounds a common theme: how the rigors of forensic utility are demanding new levels of excellence in all aspects of speaker recognition. The contributors are among the most eminent scientists in speech engineering and signal process...

  5. Gricean Semantics and Vague Speaker-Meaning

    OpenAIRE

    Schiffer, Stephen

    2017-01-01

    Presentations of Gricean semantics, including Stephen Neale’s in “Silent Reference,” totally ignore vagueness, even though virtually every utterance is vague. I ask how Gricean semantics might be adjusted to accommodate vague speaker-meaning. My answer is that it can’t accommodate it: the Gricean program collapses in the face of vague speaker-meaning. The Gricean might, however, fi nd some solace in knowing that every other extant meta-semantic and semantic program is in the same boat.

  6. Effect of lisping on audience evaluation of male speakers.

    Science.gov (United States)

    Mowrer, D E; Wahl, P; Doolan, S J

    1978-05-01

    The social consequences of adult listeners' first impression of lisping were evaluated in two studies. Five adult speakers were rated by adult listeners with regard to speaking ability, intelligence, education, masculinity, and friendship. Results from both studies indicate that listeners rate adult speakers who demonstrate frontal lisping lower than nonlispers in all five categories investigated. Efforts to correct frontal lisping are justifiable on the basis of the poor impression lisping speakers make on the listener.

  7. Brain activations during bimodal dual tasks depend on the nature and combination of component tasks

    Directory of Open Access Journals (Sweden)

    Emma eSalo

    2015-02-01

    Full Text Available We used functional magnetic resonance imaging to investigate brain activations during nine different dual tasks in which the participants were required to simultaneously attend to concurrent streams of spoken syllables and written letters. They performed a phonological, spatial or simple (speaker-gender or font-shade discrimination task within each modality. We expected to find activations associated specifically with dual tasking especially in the frontal and parietal cortices. However, no brain areas showed systematic dual task enhancements common for all dual tasks. Further analysis revealed that dual tasks including component tasks that were according to Baddeley’s model modality atypical, that is, the auditory spatial task or the visual phonological task, were not associated with enhanced frontal activity. In contrast, for other dual tasks, activity specifically associated with dual tasking was found in the left or bilateral frontal cortices. Enhanced activation in parietal areas, however, appeared not to be specifically associated with dual tasking per se, but rather with intermodal attention switching. We also expected effects of dual tasking in left frontal supramodal phonological processing areas when both component tasks required phonological processing and in right parietal supramodal spatial processing areas when both tasks required spatial processing. However, no such effects were found during these dual tasks compared with their component tasks performed separately. Taken together, the current results indicate that activations during dual tasks depend in a complex manner on specific demands of component tasks.

  8. Consistency between verbal and non-verbal affective cues: a clue to speaker credibility.

    Science.gov (United States)

    Gillis, Randall L; Nilsen, Elizabeth S

    2017-06-01

    Listeners are exposed to inconsistencies in communication; for example, when speakers' words (i.e. verbal) are discrepant with their demonstrated emotions (i.e. non-verbal). Such inconsistencies introduce ambiguity, which may render a speaker to be a less credible source of information. Two experiments examined whether children make credibility discriminations based on the consistency of speakers' affect cues. In Experiment 1, school-age children (7- to 8-year-olds) preferred to solicit information from consistent speakers (e.g. those who provided a negative statement with negative affect), over novel speakers, to a greater extent than they preferred to solicit information from inconsistent speakers (e.g. those who provided a negative statement with positive affect) over novel speakers. Preschoolers (4- to 5-year-olds) did not demonstrate this preference. Experiment 2 showed that school-age children's ratings of speakers were influenced by speakers' affect consistency when the attribute being judged was related to information acquisition (speakers' believability, "weird" speech), but not general characteristics (speakers' friendliness, likeability). Together, findings suggest that school-age children are sensitive to, and use, the congruency of affect cues to determine whether individuals are credible sources of information.

  9. Cost-Sensitive Learning for Emotion Robust Speaker Recognition

    Directory of Open Access Journals (Sweden)

    Dongdong Li

    2014-01-01

    Full Text Available In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.

  10. Cost-sensitive learning for emotion robust speaker recognition.

    Science.gov (United States)

    Li, Dongdong; Yang, Yingchun; Dai, Weihui

    2014-01-01

    In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.

  11. Young Children's Sensitivity to Speaker Gender When Learning from Others

    Science.gov (United States)

    Ma, Lili; Woolley, Jacqueline D.

    2013-01-01

    This research explores whether young children are sensitive to speaker gender when learning novel information from others. Four- and 6-year-olds ("N" = 144) chose between conflicting statements from a male versus a female speaker (Studies 1 and 3) or decided which speaker (male or female) they would ask (Study 2) when learning about the functions…

  12. Fluency profile: comparison between Brazilian and European Portuguese speakers.

    Science.gov (United States)

    Castro, Blenda Stephanie Alves e; Martins-Reis, Vanessa de Oliveira; Baptista, Ana Catarina; Celeste, Letícia Correa

    2014-01-01

    The purpose of the study was to compare the speech fluency of Brazilian Portuguese speakers with that of European Portuguese speakers. The study participants were 76 individuals of any ethnicity or skin color aged 18-29 years. Of the participants, 38 lived in Brazil and 38 in Portugal. Speech samples from all participants were obtained and analyzed according to the variables of typology and frequency of speech disruptions and speech rate. Descriptive and inferential statistical analyses were performed to assess the association between the fluency profile and linguistic variant variables. We found that the speech rate of European Portuguese speakers was higher than the speech rate of Brazilian Portuguese speakers in words per minute (p=0.004). The qualitative distribution of the typology of common dysfluencies (pPortuguese speakers is not available, speech therapists in Portugal can use the same speech fluency assessment as has been used in Brazil to establish a diagnosis of stuttering, especially in regard to typical and stuttering dysfluencies, with care taken when evaluating the speech rate.

  13. Cross-cultural differences in mental representations of time: evidence from an implicit nonlinguistic task.

    Science.gov (United States)

    Fuhrman, Orly; Boroditsky, Lera

    2010-11-01

    Across cultures people construct spatial representations of time. However, the particular spatial layouts created to represent time may differ across cultures. This paper examines whether people automatically access and use culturally specific spatial representations when reasoning about time. In Experiment 1, we asked Hebrew and English speakers to arrange pictures depicting temporal sequences of natural events, and to point to the hypothesized location of events relative to a reference point. In both tasks, English speakers (who read left to right) arranged temporal sequences to progress from left to right, whereas Hebrew speakers (who read right to left) arranged them from right to left, replicating previous work. In Experiments 2 and 3, we asked the participants to make rapid temporal order judgments about pairs of pictures presented one after the other (i.e., to decide whether the second picture showed a conceptually earlier or later time-point of an event than the first picture). Participants made responses using two adjacent keyboard keys. English speakers were faster to make "earlier" judgments when the "earlier" response needed to be made with the left response key than with the right response key. Hebrew speakers showed exactly the reverse pattern. Asking participants to use a space-time mapping inconsistent with the one suggested by writing direction in their language created interference, suggesting that participants were automatically creating writing-direction consistent spatial representations in the course of their normal temporal reasoning. It appears that people automatically access culturally specific spatial representations when making temporal judgments even in nonlinguistic tasks. Copyright © 2010 Cognitive Science Society, Inc.

  14. A hybrid generative-discriminative approach to speaker diarization

    NARCIS (Netherlands)

    Noulas, A.K.; van Kasteren, T.; Kröse, B.J.A.

    2008-01-01

    In this paper we present a sound probabilistic approach to speaker diarization. We use a hybrid framework where a distribution over the number of speakers at each point of a multimodal stream is estimated with a discriminative model. The output of this process is used as input in a generative model

  15. Noise Reduction with Microphone Arrays for Speaker Identification

    Energy Technology Data Exchange (ETDEWEB)

    Cohen, Z

    2011-12-22

    Reducing acoustic noise in audio recordings is an ongoing problem that plagues many applications. This noise is hard to reduce because of interfering sources and non-stationary behavior of the overall background noise. Many single channel noise reduction algorithms exist but are limited in that the more the noise is reduced; the more the signal of interest is distorted due to the fact that the signal and noise overlap in frequency. Specifically acoustic background noise causes problems in the area of speaker identification. Recording a speaker in the presence of acoustic noise ultimately limits the performance and confidence of speaker identification algorithms. In situations where it is impossible to control the environment where the speech sample is taken, noise reduction filtering algorithms need to be developed to clean the recorded speech of background noise. Because single channel noise reduction algorithms would distort the speech signal, the overall challenge of this project was to see if spatial information provided by microphone arrays could be exploited to aid in speaker identification. The goals are: (1) Test the feasibility of using microphone arrays to reduce background noise in speech recordings; (2) Characterize and compare different multichannel noise reduction algorithms; (3) Provide recommendations for using these multichannel algorithms; and (4) Ultimately answer the question - Can the use of microphone arrays aid in speaker identification?

  16. Parametric Representation of the Speaker's Lips for Multimodal Sign Language and Speech Recognition

    Science.gov (United States)

    Ryumin, D.; Karpov, A. A.

    2017-05-01

    In this article, we propose a new method for parametric representation of human's lips region. The functional diagram of the method is described and implementation details with the explanation of its key stages and features are given. The results of automatic detection of the regions of interest are illustrated. A speed of the method work using several computers with different performances is reported. This universal method allows applying parametrical representation of the speaker's lipsfor the tasks of biometrics, computer vision, machine learning, and automatic recognition of face, elements of sign languages, and audio-visual speech, including lip-reading.

  17. LEARNING VECTOR QUANTIZATION FOR ADAPTED GAUSSIAN MIXTURE MODELS IN AUTOMATIC SPEAKER IDENTIFICATION

    Directory of Open Access Journals (Sweden)

    IMEN TRABELSI

    2017-05-01

    Full Text Available Speaker Identification (SI aims at automatically identifying an individual by extracting and processing information from his/her voice. Speaker voice is a robust a biometric modality that has a strong impact in several application areas. In this study, a new combination learning scheme has been proposed based on Gaussian mixture model-universal background model (GMM-UBM and Learning vector quantization (LVQ for automatic text-independent speaker identification. Features vectors, constituted by the Mel Frequency Cepstral Coefficients (MFCC extracted from the speech signal are used to train the New England subset of the TIMIT database. The best results obtained (90% for gender- independent speaker identification, 97 % for male speakers and 93% for female speakers for test data using 36 MFCC features.

  18. Visual speaker gender affects vowel identification in Danish

    DEFF Research Database (Denmark)

    Larsen, Charlotte; Tøndering, John

    2013-01-01

    The experiment examined the effect of visual speaker gender on the vowel perception of 20 native Danish-speaking subjects. Auditory stimuli consisting of a continuum between /muːlə/ ‘muzzle’ and /moːlə/ ‘pier’ generated using TANDEM-STRAIGHT matched with video clips of a female and a male speaker...

  19. Bilingual and Monolingual Children Prefer Native-Accented Speakers

    Directory of Open Access Journals (Sweden)

    Andre L. eSouza

    2013-12-01

    Full Text Available Adults and young children prefer to affiliate with some individuals rather than others. Studies have shown that monolingual children show in-group biases for individuals who speak their native language without a foreign accent (Kinzler, Dupoux, & Spelke, 2007. Some studies have suggested that bilingual children are less influenced than monolinguals by language variety when attributing personality traits to different speakers (Anisfeld & Lambert, 1964, which could indicate that bilinguals have fewer in-group biases and perhaps greater social flexibility. However, no previous studies have compared monolingual and bilingual children’s reactions to speakers with unfamiliar foreign accents. In the present study, we investigated the social preferences of 5-year-old English and French monolinguals and English-French bilinguals. Contrary to our predictions, both monolingual and bilingual preschoolers preferred to be friends with native-accented speakers over speakers who spoke their dominant language with an unfamiliar foreign accent. This result suggests that both monolingual and bilingual children have strong preferences for in-group members who use a familiar language variety, and that bilingualism does not lead to generalized social flexibility.

  20. Bilingual and monolingual children prefer native-accented speakers.

    Science.gov (United States)

    Souza, André L; Byers-Heinlein, Krista; Poulin-Dubois, Diane

    2013-01-01

    Adults and young children prefer to affiliate with some individuals rather than others. Studies have shown that monolingual children show in-group biases for individuals who speak their native language without a foreign accent (Kinzler et al., 2007). Some studies have suggested that bilingual children are less influenced than monolinguals by language variety when attributing personality traits to different speakers (Anisfeld and Lambert, 1964), which could indicate that bilinguals have fewer in-group biases and perhaps greater social flexibility. However, no previous studies have compared monolingual and bilingual children's reactions to speakers with unfamiliar foreign accents. In the present study, we investigated the social preferences of 5-year-old English and French monolinguals and English-French bilinguals. Contrary to our predictions, both monolingual and bilingual preschoolers preferred to be friends with native-accented speakers over speakers who spoke their dominant language with an unfamiliar foreign accent. This result suggests that both monolingual and bilingual children have strong preferences for in-group members who use a familiar language variety, and that bilingualism does not lead to generalized social flexibility.

  1. Differences in Sickness Allowance Receipt between Swedish Speakers and Finnish Speakers in Finland

    Directory of Open Access Journals (Sweden)

    Kaarina S. Reini

    2017-12-01

    Full Text Available Previous research has documented lower disability retirement and mortality rates of Swedish speakers as compared with Finnish speakers in Finland. This paper is the first to compare the two language groups with regard to the receipt of sickness allowance, which is an objective health measure that reflects a less severe poor health condition. Register-based data covering the years 1988-2011 are used. We estimate logistic regression models with generalized estimating equations to account for repeated observations at the individual level. We find that Swedish-speaking men have approximately 30 percent lower odds of receiving sickness allowance than Finnish-speaking men, whereas the difference in women is about 15 percent. In correspondence with previous research on all-cause mortality at working ages, we find no language-group difference in sickness allowance receipt in the socially most successful subgroup of the population.

  2. On the improvement of speaker diarization by detecting overlapped speech

    OpenAIRE

    Hernando Pericás, Francisco Javier; Hernando Pericás, Francisco Javier

    2010-01-01

    Simultaneous speech in meeting environment is responsible for a certain amount of errors caused by standard speaker diarization systems. We are presenting an overlap detection system for far-field data based on spectral and spatial features, where the spatial features obtained on different microphone pairs are fused by means of principal component analysis. Detected overlap segments are applied for speaker diarization in order to increase the purity of speaker clusters an...

  3. Comprehending non-native speakers: theory and evidence for adjustment in manner of processing.

    Science.gov (United States)

    Lev-Ari, Shiri

    2014-01-01

    Non-native speakers have lower linguistic competence than native speakers, which renders their language less reliable in conveying their intentions. We suggest that expectations of lower competence lead listeners to adapt their manner of processing when they listen to non-native speakers. We propose that listeners use cognitive resources to adjust by increasing their reliance on top-down processes and extracting less information from the language of the non-native speaker. An eye-tracking study supports our proposal by showing that when following instructions by a non-native speaker, listeners make more contextually-induced interpretations. Those with relatively high working memory also increase their reliance on context to anticipate the speaker's upcoming reference, and are less likely to notice lexical errors in the non-native speech, indicating that they take less information from the speaker's language. These results contribute to our understanding of the flexibility in language processing and have implications for interactions between native and non-native speakers.

  4. Role of Speaker Cues in Attention Inference

    OpenAIRE

    Jin Joo Lee; Cynthia Breazeal; David DeSteno

    2017-01-01

    Current state-of-the-art approaches to emotion recognition primarily focus on modeling the nonverbal expressions of the sole individual without reference to contextual elements such as the co-presence of the partner. In this paper, we demonstrate that the accurate inference of listeners’ social-emotional state of attention depends on accounting for the nonverbal behaviors of their storytelling partner, namely their speaker cues. To gain a deeper understanding of the role of speaker cues in at...

  5. Speaker recognition through NLP and CWT modeling.

    Energy Technology Data Exchange (ETDEWEB)

    Brown-VanHoozer, A.; Kercel, S. W.; Tucker, R. W.

    1999-06-23

    The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the ''huge population'' problem by seeking two completely different kinds of characterizing features. These features are extracted using the techniques of Neuro-Linguistic Programming (NLP) and the continuous wavelet transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant

  6. Race in Conflict with Heritage: "Black" Heritage Language Speaker of Japanese

    Science.gov (United States)

    Doerr, Neriko Musha; Kumagai, Yuri

    2014-01-01

    "Heritage language speaker" is a relatively new term to denote minority language speakers who grew up in a household where the language was used or those who have a family, ancestral, or racial connection to the minority language. In research on heritage language speakers, overlap between these 2 definitions is often assumed--that is,…

  7. Are Cantonese-speakers really descriptivists? Revisiting cross-cultural semantics.

    Science.gov (United States)

    Lam, Barry

    2010-05-01

    In an article in Cognition [Machery, E., Mallon, R., Nichols, S., & Stich, S. (2004). Semantics cross-cultural style. Cognition, 92, B1-B12] present data which purports to show that East Asian Cantonese-speakers tend to have descriptivist intuitions about the referents of proper names, while Western English-speakers tend to have causal-historical intuitions about proper names. Machery et al. take this finding to support the view that some intuitions, the universality of which they claim is central to philosophical theories, vary according to cultural background. Machery et al. conclude from their findings that the philosophical methodology of consulting intuitions about hypothetical cases is flawed vis a vis the goal of determining truths about some philosophical domains like philosophical semantics. In the following study, three new vignettes in English were given to Western native English-speakers, and Cantonese translations were given to native Cantonese-speaking immigrants from a Cantonese community in Southern California. For all three vignettes, questions were given to elicit intuitions about the referent of a proper name and the truth-value of an uttered sentence containing a proper name. The results from this study reveal that East Asian Cantonese-speakers do not differ from Western English-speakers in ways that support Machery et al.'s conclusions. This new data concerning the intuitions of Cantonese-speakers raises questions about whether cross-cultural variation in answers to questions on certain vignettes reveal genuine differences in intuitions, or whether such differences stem from non-intuitional differences, such as differences in linguistic competence. Copyright 2009 Elsevier B.V. All rights reserved.

  8. Analysis of human scream and its impact on text-independent speaker verification.

    Science.gov (United States)

    Hansen, John H L; Nandwana, Mahesh Kumar; Shokouhi, Navid

    2017-04-01

    Scream is defined as sustained, high-energy vocalizations that lack phonological structure. Lack of phonological structure is how scream is identified from other forms of loud vocalization, such as "yell." This study investigates the acoustic aspects of screams and addresses those that are known to prevent standard speaker identification systems from recognizing the identity of screaming speakers. It is well established that speaker variability due to changes in vocal effort and Lombard effect contribute to degraded performance in automatic speech systems (i.e., speech recognition, speaker identification, diarization, etc.). However, previous research in the general area of speaker variability has concentrated on human speech production, whereas less is known about non-speech vocalizations. The UT-NonSpeech corpus is developed here to investigate speaker verification from scream samples. This study considers a detailed analysis in terms of fundamental frequency, spectral peak shift, frame energy distribution, and spectral tilt. It is shown that traditional speaker recognition based on the Gaussian mixture models-universal background model framework is unreliable when evaluated with screams.

  9. Robust Digital Speech Watermarking For Online Speaker Recognition

    Directory of Open Access Journals (Sweden)

    Mohammad Ali Nematollahi

    2015-01-01

    Full Text Available A robust and blind digital speech watermarking technique has been proposed for online speaker recognition systems based on Discrete Wavelet Packet Transform (DWPT and multiplication to embed the watermark in the amplitudes of the wavelet’s subbands. In order to minimize the degradation effect of the watermark, these subbands are selected where less speaker-specific information was available (500 Hz–3500 Hz and 6000 Hz–7000 Hz. Experimental results on Texas Instruments Massachusetts Institute of Technology (TIMIT, Massachusetts Institute of Technology (MIT, and Mobile Biometry (MOBIO show that the degradation for speaker verification and identification is 1.16% and 2.52%, respectively. Furthermore, the proposed watermark technique can provide enough robustness against different signal processing attacks.

  10. The Main Concept Analysis: Validation and sensitivity in differentiating discourse produced by unimpaired English speakers from individuals with aphasia and dementia of Alzheimer type.

    Science.gov (United States)

    Kong, Anthony Pak-Hin; Whiteside, Janet; Bargmann, Peggy

    2016-10-01

    Discourse from speakers with dementia and aphasia is associated with comparable but not identical deficits, necessitating appropriate methods to differentiate them. The current study aims to validate the Main Concept Analysis (MCA) to be used for eliciting and quantifying discourse among native typical English speakers and to establish its norm, and investigate the validity and sensitivity of the MCA to compare discourse produced by individuals with fluent aphasia, non-fluent aphasia, or dementia of Alzheimer's type (DAT), and unimpaired elderly. Discourse elicited through a sequential picture description task was collected from 60 unimpaired participants to determine the MCA scoring criteria; 12 speakers with fluent aphasia, 12 with non-fluent aphasia, 13 with DAT, and 20 elderly participants from the healthy group were compared on the finalized MCA. Results of MANOVA revealed significant univariate omnibus effects of speaker group as an independent variable on each main concept index. MCA profiles differed significantly between all participant groups except dementia versus fluent aphasia. Correlations between the MCA performances and the Western Aphasia Battery and Cognitive Linguistic Quick Test were found to be statistically significant among the clinical groups. The MCA was appropriate to be used among native speakers of English. The results also provided further empirical evidence of discourse deficits in aphasia and dementia. Practitioners can use the MCA to evaluate discourse production systemically and objectively.

  11. A Joint Approach for Single-Channel Speaker Identification and Speech Separation

    DEFF Research Database (Denmark)

    Mowlaee, Pejman; Saeidi, Rahim; Christensen, Mads Græsbøll

    2012-01-01

    ) accuracy, here, we report the objective and subjective results as well. The results show that the proposed system performs as well as the best of the state-of-the-art in terms of perceived quality while its performance in terms of speaker identification and automatic speech recognition results......In this paper, we present a novel system for joint speaker identification and speech separation. For speaker identification a single-channel speaker identification algorithm is proposed which provides an estimate of signal-to-signal ratio (SSR) as a by-product. For speech separation, we propose...... a sinusoidal model-based algorithm. The speech separation algorithm consists of a double-talk/single-talk detector followed by a minimum mean square error estimator of sinusoidal parameters for finding optimal codevectors from pre-trained speaker codebooks. In evaluating the proposed system, we start from...

  12. A Study on Metadiscoursive Interaction in the MA Theses of the Native Speakers of English and the Turkish Speakers of English

    Science.gov (United States)

    Köroglu, Zehra; Tüm, Gülden

    2017-01-01

    This study has been conducted to evaluate the TM usage in the MA theses written by the native speakers (NSs) of English and the Turkish speakers (TSs) of English. The purpose is to compare the TM usage in the introduction, results and discussion, and conclusion sections by both groups' randomly selected MA theses in the field of ELT between the…

  13. Speaker Recognition from Emotional Speech Using I-vector Approach

    Directory of Open Access Journals (Sweden)

    MACKOVÁ Lenka

    2014-05-01

    Full Text Available In recent years the concept of i-vectors become very popular and successful in the field of the speaker verification. The basic principle of i-vectors is that each utterance is represented by fixed-length feature vector of low-dimension. In the literature for purpose of speaker verification various recordings obtained from telephones or microphones were used. The aim of this experiment was to perform speaker verification using speaker model trained with emotional recordings on i-vector basis. The Mel Frequency Cepstral Coefficients (MFCC, log energy, their deltas and acceleration coefficients were used in process of features extraction. As the classification methods of the verification system Mahalanobis distance metric in combination with Eigen Factor Radial normalization was used and in the second approach Cosine Distance Scoring (CSS metric with Within-class Covariance Normalization as a channel compensation was employed. This verification system used emotional recordings of male subjects from freely available German emotional database (Emo-DB.

  14. Segmentation of the Speaker's Face Region with Audiovisual Correlation

    Science.gov (United States)

    Liu, Yuyu; Sato, Yoichi

    The ability to find the speaker's face region in a video is useful for various applications. In this work, we develop a novel technique to find this region within different time windows, which is robust against the changes of view, scale, and background. The main thrust of our technique is to integrate audiovisual correlation analysis into a video segmentation framework. We analyze the audiovisual correlation locally by computing quadratic mutual information between our audiovisual features. The computation of quadratic mutual information is based on the probability density functions estimated by kernel density estimation with adaptive kernel bandwidth. The results of this audiovisual correlation analysis are incorporated into graph cut-based video segmentation to resolve a globally optimum extraction of the speaker's face region. The setting of any heuristic threshold in this segmentation is avoided by learning the correlation distributions of speaker and background by expectation maximization. Experimental results demonstrate that our method can detect the speaker's face region accurately and robustly for different views, scales, and backgrounds.

  15. The TNO speaker diarization system for NIST RT05s meeting data

    NARCIS (Netherlands)

    Leeuwen, D.A. van

    2006-01-01

    The TNO speaker speaker diarization system is based on a standard BIC segmentation and clustering algorithm. Since for the NIST Rich Transcription speaker dizarization evaluation measure correct speech detection appears to be essential, we have developed a speech activity detector (SAD) as well.

  16. Internal request modification by first and second language speakers ...

    African Journals Online (AJOL)

    This study focuses on the question of whether Luganda English speakers would negatively transfer into their English speech the use of syntactic and lexical down graders resulting in pragmatic failure. Data were collected from Luganda and Luganda English speakers by means of a Discourse Completion Test (DCT) ...

  17. Speaker Input Variability Does Not Explain Why Larger Populations Have Simpler Languages.

    Science.gov (United States)

    Atkinson, Mark; Kirby, Simon; Smith, Kenny

    2015-01-01

    A learner's linguistic input is more variable if it comes from a greater number of speakers. Higher speaker input variability has been shown to facilitate the acquisition of phonemic boundaries, since data drawn from multiple speakers provides more information about the distribution of phonemes in a speech community. It has also been proposed that speaker input variability may have a systematic influence on individual-level learning of morphology, which can in turn influence the group-level characteristics of a language. Languages spoken by larger groups of people have less complex morphology than those spoken in smaller communities. While a mechanism by which the number of speakers could have such an effect is yet to be convincingly identified, differences in speaker input variability, which is thought to be larger in larger groups, may provide an explanation. By hindering the acquisition, and hence faithful cross-generational transfer, of complex morphology, higher speaker input variability may result in structural simplification. We assess this claim in two experiments which investigate the effect of such variability on language learning, considering its influence on a learner's ability to segment a continuous speech stream and acquire a morphologically complex miniature language. We ultimately find no evidence to support the proposal that speaker input variability influences language learning and so cannot support the hypothesis that it explains how population size determines the structural properties of language.

  18. Signal-to-Signal Ratio Independent Speaker Identification for Co-channel Speech Signals

    DEFF Research Database (Denmark)

    Saeidi, Rahim; Mowlaee, Pejman; Kinnunen, Tomi

    2010-01-01

    In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is recorded by one microphone only. The goal is to identify both of the speakers from their mixed signal. High recognition accuracies have already been reported when an accurately...

  19. Does verbatim sentence recall underestimate the language competence of near-native speakers?

    Directory of Open Access Journals (Sweden)

    Judith eSchweppe

    2015-02-01

    Full Text Available Verbatim sentence recall is widely used to test the language competence of native and non-native speakers since it involves comprehension and production of connected speech. However, we assume that, to maintain surface information, sentence recall relies particularly on attentional resources, which differentially affects native and non-native speakers. Since even in near-natives language processing is less automatized than in native speakers, processing a sentence in a foreign language plus retaining its surface may result in a cognitive overload. We contrasted sentence recall performance of German native speakers with that of highly proficient non-natives. Non-natives recalled the sentences significantly poorer than the natives, but performed equally well on a cloze test. This implies that sentence recall underestimates the language competence of good non-native speakers in mixed groups with native speakers. The findings also suggest that theories of sentence recall need to consider both its linguistic and its attentional aspects.

  20. Are Cantonese-Speakers Really Descriptivists? Revisiting Cross-Cultural Semantics

    Science.gov (United States)

    Lam, Barry

    2010-01-01

    In an article in "Cognition" [Machery, E., Mallon, R., Nichols, S., & Stich, S. (2004). "Semantics cross-cultural style." "Cognition, 92", B1-B12] present data which purports to show that East Asian Cantonese-speakers tend to have descriptivist intuitions about the referents of proper names, while Western English-speakers tend to have…

  1. The immediate and chronic influence of spatio-temporal metaphors on the mental representations of time in English, Mandarin, and Mandarin-English speakers

    Directory of Open Access Journals (Sweden)

    Vicky T. Lai

    2013-04-01

    Full Text Available In this paper we examine whether experience with spatial metaphors for time has an influence on people’s representation of time. In particular we ask whether spatiotemporal metaphors can have both chronic and immediate effects on temporal thinking. In Study 1, we examine the prevalence of ego-moving representations for time in Mandarin speakers, English speakers, and Mandarin-English (ME bilinguals. As predicted by observations in linguistic analyses, we find that Mandarin speakers are less likely to take an ego-moving perspective than are English speakers. Further, we find that ME bilinguals tested in English are less likely to take an ego-moving perspective than are English monolinguals (an effect of L1 on meaning-making in L2, and also that ME bilinguals tested in Mandarin are more likely to take an ego-moving perspective than are Mandarin monolinguals (an effect of L2 on meaning-making in L1. These findings demonstrate that habits of metaphor use in one language can influence temporal reasoning in another language, suggesting the metaphors can have a chronic effect on patterns in thought. In Study 2 we test Mandarin speakers using either horizontal or vertical metaphors in the immediate context of the task. We find that Mandarin speakers are more likely to construct front-back representations of time when understanding front-back metaphors, and more likely to construct up-down representations of time when understanding up-down metaphors. These findings demonstrate that spatiotemporal metaphors can also have an immediate influence on temporal reasoning. Taken together, these findings demonstrate that the metaphors we use to talk about time have both immediate and long-term consequences for how we conceptualize and reason about this fundamental domain of experience.

  2. Utilising Tree-Based Ensemble Learning for Speaker Segmentation

    DEFF Research Database (Denmark)

    Abou-Zleikha, Mohamed; Tan, Zheng-Hua; Christensen, Mads Græsbøll

    2014-01-01

    In audio and speech processing, accurate detection of the changing points between multiple speakers in speech segments is an important stage for several applications such as speaker identification and tracking. Bayesian Information Criteria (BIC)-based approaches are the most traditionally used...... for a certain condition, the model becomes biased to the data used for training limiting the model’s generalisation ability. In this paper, we propose a BIC-based tuning-free approach for speaker segmentation through the use of ensemble-based learning. A forest of segmentation trees is constructed in which each...... tree is trained using a sampled version of the speech segment. During the tree construction process, a set of randomly selected points in the input sequence is examined as potential segmentation points. The point that yields the highest ΔBIC is chosen and the same process is repeated for the resultant...

  3. Teaching Portuguese to Spanish Speakers: A Case for Trilingualism

    Science.gov (United States)

    Carvalho, Ana M.; Freire, Juliana Luna; da Silva, Antonio J. B.

    2010-01-01

    Portuguese is the sixth-most-spoken native language in the world, with approximately 240,000,000 speakers. Within the United States, there is a growing demand for K-12 language programs to engage the community of Portuguese heritage speakers. According to the 2000 U.S. census, 85,000 school-age children speak Portuguese at home. As a result, more…

  4. Aerodynamic Characteristics of Syllable and Sentence Productions in Normal Speakers.

    Science.gov (United States)

    Thiel, Cedric; Yang, Jin; Crawley, Brianna; Krishna, Priya; Murry, Thomas

    2018-01-08

    Aerodynamic measures of subglottic air pressure (Ps) and airflow rate (AFR) are used to select behavioral voice therapy versus surgical treatment for voice disorders. However, these measures are usually taken during a series of syllables, which differs from conversational speech. Repeated syllables do not share the variation found in even simple sentences, and patients may use their best rather than typical voice unless specifically instructed otherwise. This study examined the potential differences in estimated Ps and AFR in syllable and sentence production and their effects on a measure of vocal efficiency in normal speakers. Prospective study. Measures of estimated Ps, AFR, and aerodynamic vocal efficiency (AVE) were obtained from 19 female and four male speakers ages 22-44 years with no history of voice disorders. Subjects repeated a series of /pa/ syllables and a sentence at comfortable effort level into a face mask with a pressure-sensing tube between the lips. AVE varies as a function of the speech material in normal subjects. Ps measures were significantly higher for the sentence-production samples than for the syllable-production samples. AFR was higher during sentence production than syllable production, but the difference was not statistically significant. AVE values were significantly higher for syllable versus sentence productions. The results suggest that subjects increase Ps and AFR in sentence compared with syllable production. Speaking task is a critical factor when considering measures of AVE, and this preliminary study provides a basis for further aerodynamic studies of patient populations. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  5. Speaker Introductions at Internal Medicine Grand Rounds: Forms of Address Reveal Gender Bias.

    Science.gov (United States)

    Files, Julia A; Mayer, Anita P; Ko, Marcia G; Friedrich, Patricia; Jenkins, Marjorie; Bryan, Michael J; Vegunta, Suneela; Wittich, Christopher M; Lyle, Melissa A; Melikian, Ryan; Duston, Trevor; Chang, Yu-Hui H; Hayes, Sharonne N

    2017-05-01

    Gender bias has been identified as one of the drivers of gender disparity in academic medicine. Bias may be reinforced by gender subordinating language or differential use of formality in forms of address. Professional titles may influence the perceived expertise and authority of the referenced individual. The objective of this study is to examine how professional titles were used in the same and mixed-gender speaker introductions at Internal Medicine Grand Rounds (IMGR). A retrospective observational study of video-archived speaker introductions at consecutive IMGR was conducted at two different locations (Arizona, Minnesota) of an academic medical center. Introducers and speakers at IMGR were physician and scientist peers holding MD, PhD, or MD/PhD degrees. The primary outcome was whether or not a speaker's professional title was used during the first form of address during speaker introductions at IMGR. As secondary outcomes, we evaluated whether or not the speakers professional title was used in any form of address during the introduction. Three hundred twenty-one forms of address were analyzed. Female introducers were more likely to use professional titles when introducing any speaker during the first form of address compared with male introducers (96.2% [102/106] vs. 65.6% [141/215]; p form of address 97.8% (45/46) compared with male dyads who utilized a formal title 72.4% (110/152) of the time (p = 0.007). In mixed-gender dyads, where the introducer was female and speaker male, formal titles were used 95.0% (57/60) of the time. Male introducers of female speakers utilized professional titles 49.2% (31/63) of the time (p addressed by professional title than were men introduced by men. Differential formality in speaker introductions may amplify isolation, marginalization, and professional discomfiture expressed by women faculty in academic medicine.

  6. Shhh… I Need Quiet! Children's Understanding of American, British, and Japanese-accented English Speakers.

    Science.gov (United States)

    Bent, Tessa; Holt, Rachael Frush

    2018-02-01

    Children's ability to understand speakers with a wide range of dialects and accents is essential for efficient language development and communication in a global society. Here, the impact of regional dialect and foreign-accent variability on children's speech understanding was evaluated in both quiet and noisy conditions. Five- to seven-year-old children ( n = 90) and adults ( n = 96) repeated sentences produced by three speakers with different accents-American English, British English, and Japanese-accented English-in quiet or noisy conditions. Adults had no difficulty understanding any speaker in quiet conditions. Their performance declined for the nonnative speaker with a moderate amount of noise; their performance only substantially declined for the British English speaker (i.e., below 93% correct) when their understanding of the American English speaker was also impeded. In contrast, although children showed accurate word recognition for the American and British English speakers in quiet conditions, they had difficulty understanding the nonnative speaker even under ideal listening conditions. With a moderate amount of noise, their perception of British English speech declined substantially and their ability to understand the nonnative speaker was particularly poor. These results suggest that although school-aged children can understand unfamiliar native dialects under ideal listening conditions, their ability to recognize words in these dialects may be highly susceptible to the influence of environmental degradation. Fully adult-like word identification for speakers with unfamiliar accents and dialects may exhibit a protracted developmental trajectory.

  7. Beyond the Language: Listener Comments on Extra-Linguistic Cues in Perception Tasks

    Science.gov (United States)

    Gnevsheva, Ksenia

    2016-01-01

    We know little about what raters rely on when participating in accentedness perception tasks as their qualitative comments are rarely scrutinised. At the same time, we know that (assumed) social information influences listener behaviour. This study investigates rater attitudes to and stereotypes about speakers of different varieties of English,…

  8. Effects of context and word class on lexical retrieval in Chinese speakers with anomic aphasia.

    Science.gov (United States)

    Law, Sam-Po; Kong, Anthony Pak-Hin; Lai, Loretta Wing-Shan; Lai, Christy

    2015-01-01

    Differences in processing nouns and verbs have been investigated intensely in psycholinguistics and neuropsychology in past decades. However, the majority of studies examining retrieval of these word classes have involved tasks of single word stimuli or responses. While the results have provided rich information for addressing issues about grammatical class distinctions, it is unclear whether they have adequate ecological validity for understanding lexical retrieval in connected speech which characterizes daily verbal communication. Previous investigations comparing retrieval of nouns and verbs in single word production and connected speech have reported either discrepant performance between the two contexts with presence of word class dissociation in picture naming but absence in connected speech, or null effects of word class. In addition, word finding difficulties have been found to be less severe in connected speech than picture naming. However, these studies have failed to match target stimuli of the two word classes and between tasks on psycholinguistic variables known to affect performance in response latency and/or accuracy. The present study compared lexical retrieval of nouns and verbs in picture naming and connected speech from picture description, procedural description, and story-telling among 19 Chinese speakers with anomic aphasia and their age, gender, and education matched healthy controls, to understand the influence of grammatical class on word production across speech contexts when target items were balanced for confounding variables between word classes and tasks. Elicitation of responses followed the protocol of the AphasiaBank consortium (http://talkbank.org/AphasiaBank/). Target words for confrontation naming were based on well-established naming tests, while those for narrative were drawn from a large database of normal speakers. Selected nouns and verbs in the two contexts were matched for age-of-acquisition (AoA) and familiarity

  9. Google Home: smart speaker as environmental control unit.

    Science.gov (United States)

    Noda, Kenichiro

    2017-08-23

    Environmental Control Units (ECU) are devices or a system that allows a person to control appliances in their home or work environment. Such system can be utilized by clients with physical and/or functional disability to enhance their ability to control their environment, to promote independence and improve their quality of life. Over the last several years, there have been an emergence of several inexpensive, commercially-available, voice activated smart speakers into the market such as Google Home and Amazon Echo. These smart speakers are equipped with far field microphone that supports voice recognition, and allows for complete hand-free operation for various purposes, including for playing music, for information retrieval, and most importantly, for environmental control. Clients with disability could utilize these features to turn the unit into a simple ECU that is completely voice activated and wirelessly connected to appliances. Smart speakers, with their ease of setup, low cost and versatility, may be a more affordable and accessible alternative to the traditional ECU. Implications for Rehabilitation Environmental Control Units (ECU) enable independence for physically and functionally disabled clients, and reduce burden and frequency of demands on carers. Traditional ECU can be costly and may require clients to learn specialized skills to use. Smart speakers have the potential to be used as a new-age ECU by overcoming these barriers, and can be used by a wider range of clients.

  10. An exploratory study of voice change associated with healthy speakers after transcutaneous electrical stimulation to laryngeal muscles.

    Science.gov (United States)

    Fowler, Linda P; Gorham-Rowan, Mary; Hapner, Edie R

    2011-01-01

    The purpose of this study was to determine if measurable changes in fundamental frequency (F(0)) and relative sound level (RSL) occurred in healthy speakers after transcutaneous electrical stimulation (TES) as applied via VitalStim (Chattanooga Group, Chattanooga, TN). A prospective, repeated-measures design. Ten healthy female and 10 healthy male speakers, 20-53 years of age, participated in the study. All participants were nonsmokers and reported negative history for voice disorders. Participants received 1 hour of TES while engaged in eating, drinking, and conversation to simulate a typical dysphagia therapy protocol. Voice recordings were obtained before and immediately after TES. The voice samples consisted of a sustained vowel task and reading of the Rainbow Passage. Measurements of F(0) and RSL were obtained using TF32 (Milenkovic, 2005, University of Wisconsin). The participants also reported any sensations 5 minutes and 24 hours after TES. Measurable changes in F(0) and RSL were found for both tasks but were variable in direction and magnitude. These changes were not statistically significant. Subjective comments ranged from reports of a vocal warm-up feeling to delayed onset muscle soreness. These findings demonstrate that application of TES produces measurable changes in F(0) and RSL. However, the direction and magnitude of these changes are highly variable. Further research is needed to determine factors that may affect the extent to which TES contributes to significant changes in voice. Copyright © 2011 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  11. Key-note speaker: Predictors of weight loss after preventive Health consultations

    DEFF Research Database (Denmark)

    Lous, Jørgen; Freund, Kirsten S

    2018-01-01

    Invited key-note speaker ved conferencen: Preventive Medicine and Public Health Conference 2018, July 16-17, London.......Invited key-note speaker ved conferencen: Preventive Medicine and Public Health Conference 2018, July 16-17, London....

  12. Automaticity and stability of adaptation to a foreign-accented speaker

    NARCIS (Netherlands)

    Witteman, M.J.; Bardhan, N.P.; Weber, A.C.; McQueen, J.M.

    2015-01-01

    In three cross-modal priming experiments we asked whether adaptation to a foreign-accented speaker is automatic, and whether adaptation can be seen after a long delay between initial exposure and test. Dutch listeners were exposed to a Hebrew-accented Dutch speaker with two types of Dutch words:

  13. Dysprosody and Stimulus Effects in Cantonese Speakers with Parkinson's Disease

    Science.gov (United States)

    Ma, Joan K.-Y.; Whitehill, Tara; Cheung, Katherine S.-K.

    2010-01-01

    Background: Dysprosody is a common feature in speakers with hypokinetic dysarthria. However, speech prosody varies across different types of speech materials. This raises the question of what is the most appropriate speech material for the evaluation of dysprosody. Aims: To characterize the prosodic impairment in Cantonese speakers with…

  14. Profiles of an Acquisition Generation: Nontraditional Heritage Speakers of Spanish

    Science.gov (United States)

    DeFeo, Dayna Jean

    2018-01-01

    Though definitions vary, the literature on heritage speakers of Spanish identifies two primary attributes: a linguistic and cultural connection to the language. This article profiles four Anglo college students who grew up in bilingual or Spanish-dominant communities in the Southwest who self-identified as Spanish heritage speakers, citing…

  15. THE HUMOROUS SPEAKER: THE CONSTRUCTION OF ETHOS IN COMEDY

    Directory of Open Access Journals (Sweden)

    Maria Flávia Figueiredo

    2016-07-01

    Full Text Available The rhetoric is guided by three dimensions: logos, pathos and ethos. Logos is the speech itself, pathos are the passions that the speaker, through logos, awakens in his audience, and ethos is the image that the speaker creates of himself, also through logos, in front of an audience. The rhetorical genres are three: deliberative (which drives the audience or the judge to think about future events, characterizing them as convenient or harmful, judiciary (the audience thinks about past events in order to classify them as fair or unfair and epidictic (the audience will judge any fact occurred, or even the character of a person as beautiful or not. According to Figueiredo (2014 and based on Eggs (2005, we advocate that ethos is not a mark left by the speaker only in rhetorical genres, but in any textual genre, once the result of human production, the simplest choices in textual construction, are able to reproduce something that is closely linked to speaker, thus, demarcating hir/her ethos. To verify this assumption, we selected a display of a video of the comedian Danilo Gentili, which will be examined in the light of Rhetoric and Textual Linguistics. So, our objective is to find, in the stand-up comedy genre, marks left by the speaker in the speech that characterizes his/her ethos. The analysis results show that ethos, discursive genre and communicational purpose amalgamate in an indissoluble complex in which the success of one of them interdepends on how the other was built.

  16. Defining "Native Speaker" in Multilingual Settings: English as a Native Language in Asia

    Science.gov (United States)

    Hansen Edwards, Jette G.

    2017-01-01

    The current study examines how and why speakers of English from multilingual contexts in Asia are identifying as native speakers of English. Eighteen participants from different contexts in Asia, including Singapore, Malaysia, India, Taiwan, and The Philippines, who self-identified as native speakers of English participated in hour-long interviews…

  17. Using Avatars for Improving Speaker Identification in Captioning

    Science.gov (United States)

    Vy, Quoc V.; Fels, Deborah I.

    Captioning is the main method for accessing television and film content by people who are deaf or hard-of-hearing. One major difficulty consistently identified by the community is that of knowing who is speaking particularly for an off screen narrator. A captioning system was created using a participatory design method to improve speaker identification. The final prototype contained avatars and a coloured border for identifying specific speakers. Evaluation results were very positive; however participants also wanted to customize various components such as caption and avatar location.

  18. Sensitivity to phonological context in L2 spelling: evidence from Russian ESL speakers

    DEFF Research Database (Denmark)

    Dich, Nadya

    2010-01-01

    The study attempts to investigate factors underlying the development of spellers’ sensitivity to phonological context in English. Native English speakers and Russian speakers of English as a second language (ESL) were tested on their ability to use information about the coda to predict the spelling...... on the information about the coda when spelling vowels in nonwords. In both native and non-native speakers, context sensitivity was predicted by English word spelling; in Russian ESL speakers this relationship was mediated by English proficiency. L1 spelling proficiency did not facilitate L2 context sensitivity...

  19. An analysis of topics and vocabulary in Chinese oral narratives by normal speakers and speakers with fluent aphasia.

    Science.gov (United States)

    Law, Sam-Po; Kong, Anthony Pak-Hin; Lai, Christy

    2018-01-01

    This study analysed the topic and vocabulary of Chinese speakers based on language samples of personal recounts in a large spoken Chinese database recently made available in the public domain, i.e. Cantonese AphasiaBank ( http://www.speech.hku.hk/caphbank/search/ ). The goal of the analysis is to offer clinicians a rich source for selecting ecologically valid training materials for rehabilitating Chinese-speaking people with aphasia (PWA) in the design and planning of culturally and linguistically appropriate treatments. Discourse production of 65 Chinese-speaking PWA of fluent types (henceforth, PWFA) and their non-aphasic controls narrating an important event in their life were extracted from Cantonese AphasiaBank. Analyses of topics and vocabularies in terms of part-of-speech, word frequency, lexical semantics, and diversity were conducted. There was significant overlap in topics between the two groups. While the vocabulary was larger for controls than that of PWFA as expected, they were similar in distribution across parts-of-speech, frequency of occurrence, and the ratio of concrete to abstract items in major open word classes. Moreover, proportionately more different verbs than nouns were employed at the individual level for both speaker groups. The findings provide important implications for guiding directions of aphasia rehabilitation not only of fluent but also non-fluent Chinese aphasic speakers.

  20. Revisiting vocal perception in non-human animals: a review of vowel discrimination, speaker voice recognition, and speaker normalization

    Directory of Open Access Journals (Sweden)

    Buddhamas eKriengwatana

    2015-01-01

    Full Text Available The extent to which human speech perception evolved by taking advantage of predispositions and pre-existing features of vertebrate auditory and cognitive systems remains a central question in the evolution of speech. This paper reviews asymmetries in vowel perception, speaker voice recognition, and speaker normalization in non-human animals – topics that have not been thoroughly discussed in relation to the abilities of non-human animals, but are nonetheless important aspects of vocal perception. Throughout this paper we demonstrate that addressing these issues in non-human animals is relevant and worthwhile because many non-human animals must deal with similar issues in their natural environment. That is, they must also discriminate between similar-sounding vocalizations, determine signaler identity from vocalizations, and resolve signaler-dependent variation in vocalizations from conspecifics. Overall, we find that, although plausible, the current evidence is insufficiently strong to conclude that directional asymmetries in vowel perception are specific to humans, or that non-human animals can use voice characteristics to recognize human individuals. However, we do find some indication that non-human animals can normalize speaker differences. Accordingly, we identify avenues for future research that would greatly improve and advance our understanding of these topics.

  1. The Status of Native Speaker Intuitions in a Polylectal Grammar.

    Science.gov (United States)

    Debose, Charles E.

    A study of one speaker's intuitions about and performance in Black English is presented with relation to Saussure's "langue-parole" dichotomy. Native speakers of a language have intuitions about the static synchronic entities although the data of their speaking is variable and panchronic. These entities are in a diglossic relationship to each…

  2. Progress in the AMIDA speaker diarization system for meeting data

    NARCIS (Netherlands)

    Leeuwen, D.A. van; Konečný, M.

    2008-01-01

    In this paper we describe the AMIDA speaker dizarization system as it was submitted to the NIST Rich Transcription evaluation 2007 for conference room data. This is done in the context of the history of this system and other speaker diarization systems. One of the goals of our system is to have as

  3. Speaker and Accent Variation Are Handled Differently: Evidence in Native and Non-Native Listeners

    Science.gov (United States)

    Kriengwatana, Buddhamas; Terry, Josephine; Chládková, Kateřina; Escudero, Paola

    2016-01-01

    Listeners are able to cope with between-speaker variability in speech that stems from anatomical sources (i.e. individual and sex differences in vocal tract size) and sociolinguistic sources (i.e. accents). We hypothesized that listeners adapt to these two types of variation differently because prior work indicates that adapting to speaker/sex variability may occur pre-lexically while adapting to accent variability may require learning from attention to explicit cues (i.e. feedback). In Experiment 1, we tested our hypothesis by training native Dutch listeners and Australian-English (AusE) listeners without any experience with Dutch or Flemish to discriminate between the Dutch vowels /I/ and /ε/ from a single speaker. We then tested their ability to classify /I/ and /ε/ vowels of a novel Dutch speaker (i.e. speaker or sex change only), or vowels of a novel Flemish speaker (i.e. speaker or sex change plus accent change). We found that both Dutch and AusE listeners could successfully categorize vowels if the change involved a speaker/sex change, but not if the change involved an accent change. When AusE listeners were given feedback on their categorization responses to the novel speaker in Experiment 2, they were able to successfully categorize vowels involving an accent change. These results suggest that adapting to accents may be a two-step process, whereby the first step involves adapting to speaker differences at a pre-lexical level, and the second step involves adapting to accent differences at a contextual level, where listeners have access to word meaning or are given feedback that allows them to appropriately adjust their perceptual category boundaries. PMID:27309889

  4. Children's Understanding That Utterances Emanate from Minds: Using Speaker Belief To Aid Interpretation.

    Science.gov (United States)

    Mitchell, Peter; Robinson, Elizabeth J.; Thompson, Doreen E.

    1999-01-01

    Three experiments examined 3- to 6-year olds' ability to use a speaker's utterance based on false belief to identify which of several referents was intended. Found that many 4- to 5-year olds performed correctly only when it was unnecessary to consider the speaker's belief. When the speaker gave an ambiguous utterance, many 3- to 6-year olds…

  5. Comparative Analysys of Speech Parameters for the Design of Speaker Verification Systems

    National Research Council Canada - National Science Library

    Souza, A

    2001-01-01

    Speaker verification systems are basically composed of three stages: feature extraction, feature processing and comparison of the modified features from speaker voice and from the voice that should be...

  6. Modeling methods of MEMS micro-speaker with electrostatic working principle

    Science.gov (United States)

    Tumpold, D.; Kaltenbacher, M.; Glacer, C.; Nawaz, M.; Dehé, A.

    2013-05-01

    The market for mobile devices like tablets, laptops or mobile phones is increasing rapidly. Device housings get thinner and energy efficiency is more and more important. Micro-Electro-Mechanical-System (MEMS) loudspeakers, fabricated in complementary metal oxide semiconductor (CMOS) compatible technology merge energy efficient driving technology with cost economical fabrication processes. In most cases, the fabrication of such devices within the design process is a lengthy and costly task. Therefore, the need for computer modeling tools capable of precisely simulating the multi-field interactions is increasing. The accurate modeling of such MEMS devices results in a system of coupled partial differential equations (PDEs) describing the interaction between the electric, mechanical and acoustic field. For the efficient and accurate solution we apply the Finite Element (FE) method. Thereby, we fully take the nonlinear effects into account: electrostatic force, charged moving body (loaded membrane) in an electric field, geometric nonlinearities and mechanical contact during the snap-in case between loaded membrane and stator. To efficiently handle the coupling between the mechanical and acoustic fields, we apply Mortar FE techniques, which allow different grid sizes along the coupling interface. Furthermore, we present a recently developed PML (Perfectly Matched Layer) technique, which allows limiting the acoustic computational domain even in the near field without getting spurious reflections. For computations towards the acoustic far field we us a Kirchhoff Helmholtz integral (e.g, to compute the directivity pattern). We will present simulations of a MEMS speaker system based on a single sided driving mechanism as well as an outlook on MEMS speakers using double stator systems (pull-pull-system), and discuss their efficiency (SPL) and quality (THD) towards the generated acoustic sound.

  7. The (TNO) Speaker Diarization System for NIST Rich Transcription Evaluation 2005 for meeting data

    NARCIS (Netherlands)

    Leeuwen, D.A. van

    2005-01-01

    Abstract. The TNO speaker speaker diarization system is based on a standard BIC segmentation and clustering algorithm. Since for the NIST Rich Transcription speaker dizarization evaluation measure correct speech detection appears to be essential, we have developed a speech activity detector (SAD) as

  8. Popular Public Discourse at Speakers' Corner: Negotiating Cultural Identities in Interaction

    DEFF Research Database (Denmark)

    McIlvenny, Paul

    1996-01-01

    , religious and general topical 'soap-box' oration. However, audiences are not passive receivers of rhetorical messages. They are active negotiators of interpretations and alignments that may conflict with the speaker's and other audience members' orientations to prior talk. Speakers' Corner is a space...

  9. Do We Perceive Others Better than Ourselves? A Perceptual Benefit for Noise-Vocoded Speech Produced by an Average Speaker.

    Directory of Open Access Journals (Sweden)

    William L Schuerman

    Full Text Available In different tasks involving action perception, performance has been found to be facilitated when the presented stimuli were produced by the participants themselves rather than by another participant. These results suggest that the same mental representations are accessed during both production and perception. However, with regard to spoken word perception, evidence also suggests that listeners' representations for speech reflect the input from their surrounding linguistic community rather than their own idiosyncratic productions. Furthermore, speech perception is heavily influenced by indexical cues that may lead listeners to frame their interpretations of incoming speech signals with regard to speaker identity. In order to determine whether word recognition evinces similar self-advantages as found in action perception, it was necessary to eliminate indexical cues from the speech signal. We therefore asked participants to identify noise-vocoded versions of Dutch words that were based on either their own recordings or those of a statistically average speaker. The majority of participants were more accurate for the average speaker than for themselves, even after taking into account differences in intelligibility. These results suggest that the speech representations accessed during perception of noise-vocoded speech are more reflective of the input of the speech community, and hence that speech perception is not necessarily based on representations of one's own speech.

  10. Language-Specific Effects on Story and Procedural Narrative tasks between Korean-speaking and English-speaking Individuals with Aphasia

    Directory of Open Access Journals (Sweden)

    Jee Eun Sung

    2015-04-01

    Results suggested that Korean-speaking individuals with aphasia produced more numbers of different verbs, number of verbs per utterance and higher VNRs than English speakers. Both groups generated more words in story. The significant two-way interactions between the language group and task type suggested that there are task-specific effects on linguistic measures across the groups. The study implied that the linguistic characteristics differentially affected language symptoms of aphasia across the different languages and task types.

  11. Using timing information in speaker verification

    CSIR Research Space (South Africa)

    Van Heerden, CJ

    2005-11-01

    Full Text Available This paper presents an analysis of temporal information as a feature for use in speaker verification systems. The relevance of temporal information in a speaker’s utterances is investigated, both with regard to improving the robustness of modern...

  12. Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

    Directory of Open Access Journals (Sweden)

    Patterson Eric K

    2002-01-01

    Full Text Available Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multiple speakers in an application environment, are significantly reduced by the additional information provided by visual features. This paper presents information on a new audio-visual database, a feature study on moving speakers, and on baseline results for the whole speaker group. Although a few databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-visual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The Clemson University Audio-Visual Experiments (CUAVE database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7000 utterances. It contains a wide variety of speakers and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. A feature study of connected digit strings is also discussed. It compares stationary and moving talkers in a speaker-independent grouping. An image-processing-based contour technique, an image transform method, and a deformable template scheme are used in this comparison to obtain visual features. This paper also presents methods and results in an attempt to make these techniques more robust to speaker movement. Finally, initial baseline speaker-independent results are included using all speakers, and conclusions as well as suggested areas of research are

  13. A simple optical method for measuring the vibration amplitude of a speaker

    OpenAIRE

    UEDA, Masahiro; YAMAGUCHI, Toshihiko; KAKIUCHI, Hiroki; SUGA, Hiroshi

    1999-01-01

    A simple optical method has been proposed for measuring the vibration amplitude of a speaker vibrating with a frequency of approximately 10 kHz. The method is based on a multiple reflection between a vibrating speaker plane and a mirror parallel to that speaker plane. The multiple reflection can magnify a dispersion of the laser beam caused by the vibration, and easily make a measurement of the amplitude. The measuring sensitivity ranges between sub-microns and 1 mm. A preliminary experim...

  14. Coronal View Ultrasound Imaging of Movement in Different Segments of the Tongue during Paced Recital: Findings from Four Normal Speakers and a Speaker with Partial Glossectomy

    Science.gov (United States)

    Bressmann, Tim; Flowers, Heather; Wong, Willy; Irish, Jonathan C.

    2010-01-01

    The goal of this study was to quantitatively describe aspects of coronal tongue movement in different anatomical regions of the tongue. Four normal speakers and a speaker with partial glossectomy read four repetitions of a metronome-paced poem. Their tongue movement was recorded in four coronal planes using two-dimensional B-mode ultrasound…

  15. Performance Assessment and the Components of the Oral Construct across Different Tasks and Rater Groups.

    Science.gov (United States)

    Chalhoub-Deville, Micheline

    This study investigated whether different groups of native speakers assess second language learners' language skills differently for three elicitation techniques. Subjects were six learners of college-level Arabic as a second language, tape-recorded performing three tasks: participating in a modified oral proficiency interview, narrating a picture…

  16. Intelligibility of Standard German and Low German to Speakers of Dutch

    NARCIS (Netherlands)

    Gooskens, C.S.; Kürschner, Sebastian; van Bezooijen, R.

    2011-01-01

    This paper reports on the intelligibility of spoken Low German and Standard German for speakers of Dutch. Two aspects are considered. First, the relative potential for intelligibility of the Low German variety of Bremen and the High German variety of Modern Standard German for speakers of Dutch is

  17. Speaker detection for conversational robots using synchrony between audio and video

    NARCIS (Netherlands)

    Noulas, A.; Englebienne, G.; Terwijn, B.; Kröse, B.; Hanheide, M.; Zender, H.

    2010-01-01

    This paper compares different methods for detecting the speaking person when multiple persons are interacting with a robot. We evaluate the state-of-the-art speaker detection methods on the iCat robot. These methods use the synchrony between audio and video to locate the most probable speaker. We

  18. Perception of English palatal codas by Korean speakers of English

    Science.gov (United States)

    Yeon, Sang-Hee

    2003-04-01

    This study aimed at looking at perception of English palatal codas by Korean speakers of English to determine if perception problems are the source of production problems. In particular, first, this study looked at the possible first language effect on the perception of English palatal codas. Second, a possible perceptual source of vowel epenthesis after English palatal codas was investigated. In addition, individual factors, such as length of residence, TOEFL score, gender and academic status, were compared to determine if those affected the varying degree of the perception accuracy. Eleven adult Korean speakers of English as well as three native speakers of English participated in the study. Three sets of a perception test including identification of minimally different English pseudo- or real words were carried out. The results showed that, first, the Korean speakers perceived the English codas significantly worse than the Americans. Second, the study supported the idea that Koreans perceived an extra /i/ after the final affricates due to final release. Finally, none of the individual factors explained the varying degree of the perceptional accuracy. In particular, TOEFL scores and the perception test scores did not have any statistically significant association.

  19. Evaluating acoustic speaker normalization algorithms: evidence from longitudinal child data.

    Science.gov (United States)

    Kohn, Mary Elizabeth; Farrington, Charlie

    2012-03-01

    Speaker vowel formant normalization, a technique that controls for variation introduced by physical differences between speakers, is necessary in variationist studies to compare speakers of different ages, genders, and physiological makeup in order to understand non-physiological variation patterns within populations. Many algorithms have been established to reduce variation introduced into vocalic data from physiological sources. The lack of real-time studies tracking the effectiveness of these normalization algorithms from childhood through adolescence inhibits exploration of child participation in vowel shifts. This analysis compares normalization techniques applied to data collected from ten African American children across five time points. Linear regressions compare the reduction in variation attributable to age and gender for each speaker for the vowels BEET, BAT, BOT, BUT, and BOAR. A normalization technique is successful if it maintains variation attributable to a reference sociolinguistic variable, while reducing variation attributable to age. Results indicate that normalization techniques which rely on both a measure of central tendency and range of the vowel space perform best at reducing variation attributable to age, although some variation attributable to age persists after normalization for some sections of the vowel space. © 2012 Acoustical Society of America

  20. Do children go for the nice guys? The influence of speaker benevolence and certainty on selective word learning.

    Science.gov (United States)

    Bergstra, Myrthe; DE Mulder, Hannah N M; Coopmans, Peter

    2018-04-06

    This study investigated how speaker certainty (a rational cue) and speaker benevolence (an emotional cue) influence children's willingness to learn words in a selective learning paradigm. In two experiments four- to six-year-olds learnt novel labels from two speakers and, after a week, their memory for these labels was reassessed. Results demonstrated that children retained the label-object pairings for at least a week. Furthermore, children preferred to learn from certain over uncertain speakers, but they had no significant preference for nice over nasty speakers. When the cues were combined, children followed certain speakers, even if they were nasty. However, children did prefer to learn from nice and certain speakers over nasty and certain speakers. These results suggest that rational cues regarding a speaker's linguistic competence trump emotional cues regarding a speaker's affective status in word learning. However, emotional cues were found to have a subtle influence on this process.

  1. Effects of context and word class on lexical retrieval in Chinese speakers with anomic aphasia

    Science.gov (United States)

    Law, Sam-Po; Kong, Anthony Pak-Hin; Lai, Loretta Wing-Shan; Lai, Christy

    2014-01-01

    Background Differences in processing nouns and verbs have been investigated intensely in psycholinguistics and neuropsychology in past decades. However, the majority of studies examining retrieval of these word classes have involved tasks of single word stimuli or responses. While the results have provided rich information for addressing issues about grammatical class distinctions, it is unclear whether they have adequate ecological validity for understanding lexical retrieval in connected speech which characterizes daily verbal communication. Previous investigations comparing retrieval of nouns and verbs in single word production and connected speech have reported either discrepant performance between the two contexts with presence of word class dissociation in picture naming but absence in connected speech, or null effects of word class. In addition, word finding difficulties have been found to be less severe in connected speech than picture naming. However, these studies have failed to match target stimuli of the two word classes and between tasks on psycholinguistic variables known to affect performance in response latency and/or accuracy. Aims The present study compared lexical retrieval of nouns and verbs in picture naming and connected speech from picture description, procedural description, and story-telling among 19 Chinese speakers with anomic aphasia and their age, gender, and education matched healthy controls, to understand the influence of grammatical class on word production across speech contexts when target items were balanced for confounding variables between word classes and tasks. Methods & Procedures Elicitation of responses followed the protocol of the AphasiaBank consortium (http://talkbank.org/AphasiaBank/). Target words for confrontation naming were based on well-established naming tests, while those for narrative were drawn from a large database of normal speakers. Selected nouns and verbs in the two contexts were matched for age

  2. Teaching Semantic Radicals Facilitates Inferring New Character Meaning in Sentence Reading for Nonnative Chinese Speakers

    Directory of Open Access Journals (Sweden)

    Thi Phuong Nguyen

    2017-10-01

    Full Text Available This study investigates the effects of teaching semantic radicals in inferring the meanings of unfamiliar characters among nonnative Chinese speakers. A total of 54 undergraduates majoring in Chinese Language from a university in Hanoi, Vietnam, who had 1 year of learning experience in Chinese were assigned to two experimental groups that received instructional intervention, called “old-for-new” semantic radical teaching, through two counterbalanced sets of semantic radicals, with one control group. All of the students completed pre- and post-tests of a sentence cloze task where they were required to choose an appropriate character that fit the sentence context among four options. The four options shared the same phonetic radicals but had different semantic radicals. The results showed that the pre-test and post-test score increases were significant for the experimental groups, but not for the control group. Most importantly, the experimental groups successfully transferred the semantic radical strategy to figure out the meanings of unfamiliar characters containing semantic radicals that had not been taught. The results demonstrate the effectiveness of teaching semantic radicals for lexical inference in sentence reading for nonnative speakers, and highlight the ability of transfer learning to acquire semantic categories of sub-lexical units (semantic radicals in Chinese characters among foreign language learners.

  3. Effects of Language Background on Gaze Behavior: A Crosslinguistic Comparison Between Korean and German Speakers

    Science.gov (United States)

    Goller, Florian; Lee, Donghoon; Ansorge, Ulrich; Choi, Soonja

    2017-01-01

    Languages differ in how they categorize spatial relations: While German differentiates between containment (in) and support (auf) with distinct spatial words—(a) den Kuli IN die Kappe stecken (”put pen in cap”); (b) die Kappe AUF den Kuli stecken (”put cap on pen”)—Korean uses a single spatial word (kkita) collapsing (a) and (b) into one semantic category, particularly when the spatial enclosure is tight-fit. Korean uses a different word (i.e., netha) for loose-fits (e.g., apple in bowl). We tested whether these differences influence the attention of the speaker. In a crosslinguistic study, we compared native German speakers with native Korean speakers. Participants rated the similarity of two successive video clips of several scenes where two objects were joined or nested (either in a tight or loose manner). The rating data show that Korean speakers base their rating of similarity more on tight- versus loose-fit, whereas German speakers base their rating more on containment versus support (in vs. auf). Throughout the experiment, we also measured the participants’ eye movements. Korean speakers looked equally long at the moving Figure object and at the stationary Ground object, whereas German speakers were more biased to look at the Ground object. Additionally, Korean speakers also looked more at the region where the two objects touched than did German speakers. We discuss our data in the light of crosslinguistic semantics and the extent of their influence on spatial cognition and perception. PMID:29362644

  4. Repeat what after whom? Exploring variable selectivity in a cross-dialectal shadowing task.

    Directory of Open Access Journals (Sweden)

    Abby eWalker

    2015-05-01

    Full Text Available Twenty women from Christchurch, New Zealand and sixteen from Columbus Ohio (dialect region U.S. Midland participated in a bimodal lexical naming task where they repeated monosyllabic words after four speakers from four regional dialects: New Zealand, Australia, U.S. Inland North and U.S. Midland. The resulting utterances were acoustically analyzed, and presented to listeners on Amazon Mechanical Turk in an AXB task. Convergence is observed, but differs depending on the dialect of the speaker, the dialect of the model, the particular word class being shadowed, and the order in which dialects are presented to participants. We argue that these patterns are generally consistent with findings that convergence is promoted by a large phonetic distance between shadower and model (Babel, 2010, contra Kim, Horton & Bradlow, 2011, and greater existing variability in a vowel class (Babel, 2012. The results also suggest that more comparisons of accommodation towards different dialects are warranted, and that the investigation of the socio-indexical meaning of specific linguistic forms in context is a promising avenue for understanding variable selectivity in convergence.

  5. The Sound of Voice: Voice-Based Categorization of Speakers' Sexual Orientation within and across Languages.

    Directory of Open Access Journals (Sweden)

    Simone Sulpizio

    Full Text Available Empirical research had initially shown that English listeners are able to identify the speakers' sexual orientation based on voice cues alone. However, the accuracy of this voice-based categorization, as well as its generalizability to other languages (language-dependency and to non-native speakers (language-specificity, has been questioned recently. Consequently, we address these open issues in 5 experiments: First, we tested whether Italian and German listeners are able to correctly identify sexual orientation of same-language male speakers. Then, participants of both nationalities listened to voice samples and rated the sexual orientation of both Italian and German male speakers. We found that listeners were unable to identify the speakers' sexual orientation correctly. However, speakers were consistently categorized as either heterosexual or gay on the basis of how they sounded. Moreover, a similar pattern of results emerged when listeners judged the sexual orientation of speakers of their own and of the foreign language. Overall, this research suggests that voice-based categorization of sexual orientation reflects the listeners' expectations of how gay voices sound rather than being an accurate detector of the speakers' actual sexual identity. Results are discussed with regard to accuracy, acoustic features of voices, language dependency and language specificity.

  6. Popular Public Discourse at Speakers' Corner: Negotiating Cultural Identities in Interaction

    DEFF Research Database (Denmark)

    McIlvenny, Paul

    1996-01-01

    In this paper I examine how cultural identities are actively negotiated in popular debate at a multicultural public setting in London. Speakers at Speakers' Corner manage the local construction of group affiliation, audience response and argument in and through talk, within the context of ethnic...... in which participant 'citizens' in the public sphere can actively struggle over cultural representation and identities. Using transcribed examples of video data recorded at Speakers' Corner my paper will examine how cultural identity is invoked in the management of active participation. Audiences...... and their affiliations are regulated and made accountable through the routines of membership categorisation and the policing of cultural identities and their imaginary borders....

  7. Proficiency in English sentence stress production by Cantonese speakers who speak English as a second language (ESL).

    Science.gov (United States)

    Ng, Manwa L; Chen, Yang

    2011-12-01

    The present study examined English sentence stress produced by native Cantonese speakers who were speaking English as a second language (ESL). Cantonese ESL speakers' proficiency in English stress production as perceived by English-speaking listeners was also studied. Acoustical parameters associated with sentence stress including fundamental frequency (F0), vowel duration, and intensity were measured from the English sentences produced by 40 Cantonese ESL speakers. Data were compared with those obtained from 40 native speakers of American English. The speech samples were also judged by eight native listeners who were native speakers of American English for placement, degree, and naturalness of stress. Results showed that Cantonese ESL speakers were able to use F0, vowel duration, and intensity to differentiate sentence stress patterns. Yet, both female and male Cantonese ESL speakers exhibited consistently higher F0 in stressed words than English speakers. Overall, Cantonese ESL speakers were found to be proficient in using duration and intensity to signal sentence stress, in a way comparable with English speakers. In addition, F0 and intensity were found to correlate closely with perceptual judgement and the degree of stress with the naturalness of stress.

  8. Phonological processing in Mandarin speakers with congenital amusia.

    Science.gov (United States)

    Wang, Xiao; Peng, Gang

    2014-12-01

    Although there is an emerging consensus that both musical and linguistic pitch processing can be problematic for individuals with a developmental disorder termed congenital amusia, the nature of such a pitch-processing deficit, especially that demonstrated in a speech setting, remains unclear. Therefore, this study tested the performance of native Mandarin speakers, both with and without amusia, on discrimination and imitation tasks for Cantonese level tones, aiming to shed light on this issue. Results suggest that the impact of the phonological deficit, coupled with that of the domain-general pitch deficit, could provide a more comprehensive interpretation of Mandarin amusics' speech impairment. Specifically, when there was a high demand for pitch sensitivity, as in fine-grained pitch discriminations, the operation of the pitch-processing deficit played the more predominant role in modulating amusics' speech performance. But when the demand was low, as in discriminating naturally produced Cantonese level tones, the impact of the phonological deficit was more pronounced compared to that of the pitch-processing deficit. However, despite their perceptual deficits, Mandarin amusics' imitation abilities were comparable to controls'. Such selective impairment in tonal perception suggests that the phonological deficit more severely implicates amusics' input pathways.

  9. Articulatory Movements during Vowels in Speakers with Dysarthria and Healthy Controls

    Science.gov (United States)

    Yunusova, Yana; Weismer, Gary; Westbury, John R.; Lindstrom, Mary J.

    2008-01-01

    Purpose: This study compared movement characteristics of markers attached to the jaw, lower lip, tongue blade, and dorsum during production of selected English vowels by normal speakers and speakers with dysarthria due to amyotrophic lateral sclerosis (ALS) or Parkinson disease (PD). The study asked the following questions: (a) Are movement…

  10. A Comparison of Coverbal Gesture Use in Oral Discourse Among Speakers With Fluent and Nonfluent Aphasia

    Science.gov (United States)

    Law, Sam-Po; Chak, Gigi Wan-Chi

    2017-01-01

    Purpose Coverbal gesture use, which is affected by the presence and degree of aphasia, can be culturally specific. The purpose of this study was to compare gesture use among Cantonese-speaking individuals: 23 neurologically healthy speakers, 23 speakers with fluent aphasia, and 21 speakers with nonfluent aphasia. Method Multimedia data of discourse samples from these speakers were extracted from the Cantonese AphasiaBank. Gestures were independently annotated on their forms and functions to determine how gesturing rate and distribution of gestures differed across speaker groups. A multiple regression was conducted to determine the most predictive variable(s) for gesture-to-word ratio. Results Although speakers with nonfluent aphasia gestured most frequently, the rate of gesture use in counterparts with fluent aphasia did not differ significantly from controls. Different patterns of gesture functions in the 3 speaker groups revealed that gesture plays a minor role in lexical retrieval whereas its role in enhancing communication dominates among the speakers with aphasia. The percentages of complete sentences and dysfluency strongly predicted the gesturing rate in aphasia. Conclusions The current results supported the sketch model of language–gesture association. The relationship between gesture production and linguistic abilities and clinical implications for gesture-based language intervention for speakers with aphasia are also discussed. PMID:28609510

  11. Gender Identification of the Speaker Using VQ Method

    Directory of Open Access Journals (Sweden)

    Vasif V. Nabiyev

    2009-11-01

    Full Text Available Speaking is the easiest and natural form of communication between people. Intensive studies are made in order to provide this communication via computers between people. The systems using voice biometric technology are attracting attention especially in the angle of cost and usage. When compared with the other biometic systems the application is much more practical. For example by using a microphone placed in the environment voice record can be obtained even without notifying the user and the system can be applied. Moreover the remote access facility is one of the other advantages of voice biometry. In this study, it is aimed to automatically determine the gender of the speaker through the speech waves which include personal information. If the speaker gender can be determined while composing models according to the gender information, the success of voice recognition systems can be increased in an important degree. Generally all the speaker recognition systems are composed of two parts which are feature extraction and matching. Feature extraction is the procedure in which the least information presenting the speech and the speaker is determined through voice signal. There are different features used in voice applications such as LPC, MFCC and PLP. In this study as a feature vector MFCC is used. Feature mathcing is the procedure in which the features derived from unknown speakers and known speaker group are compared. According to the text used in comparison the system is devided to two parts that are text dependent and text independent. While the same text is used in text dependent systems, different texts are used in indepentent text systems. Nowadays, DTW and HMM are text dependent, VQ and GMM are text indepentent matching methods. In this study due to the high success ratio and simple application features VQ approach is used.In this study a system which determines the speaker gender automatically and text independent is proposed. The proposed

  12. Use of the BAT with a Cantonese-Putonghua Speaker with Aphasia

    Science.gov (United States)

    Kong, Anthony Pak-Hin; Weekes, Brendan Stuart

    2011-01-01

    The aim of this article is to illustrate the use of the Bilingual Aphasia Test (BAT) with a Cantonese-Putonghua speaker. We describe G, who is a relatively young Chinese bilingual speaker with aphasia. G's communication abilities in his L2, Putonghua, were impaired following brain damage. This impairment caused specific difficulties in…

  13. Methods of Speakers\\' Effects on the Audience

    Directory of Open Access Journals (Sweden)

    فریبا حسینی

    2010-09-01

    Full Text Available Methods of Speakers' Effects on the Audience    Nasrollah Shameli *   Fariba Hosayni **     Abstract   This article is focused on four issues. The first issue is related to the speaker's external appearance including the beauty of face, the power of his voice, moves and signals by hand, the stick and eyebrow as well as the height. Such characteristics could have an important effect on the audience. The second issue is related to internal features of the speaker. These include the ethics of the preacher , his/her piety and intention on the speakers based on their personalities, habits and emotions, knowledge and culture, and speed of learning. The third issue is concerned with the appearance of the lecture. Words should be clear enough as well as being mixed with Quranic verses, poetry and proverbs. The final issue is related to the content. It is argued that the subject of the talk should be in accordance with the level of understanding of listeners as well as being new and interesting for them.   3 - A phenomenon rhetoric: It was noted in this section How to give words and phrases so that these words and phrases are clear, correct, mixed in parables, governance and Quranic verses, and appropriate their meaning.   4 - the content of Oratory : It was noted in this section to the topic of Oratory and say that the Oratory should be the theme commensurate with the minds of audiences and also should mean that agree with the case may be, then I say: that the rhetoric if the theme was innovative and new is affecting more and more on the audience.     Key words : Oratory , Preacher , Audience, Influence of speech     * Associate Professor, Department of Arabic Language and Literature, University of Isfahan E-mail: Dr-Nasrolla Shameli@Yahoo.com   * * M.A. in Arabic Language and Literature from Isfahan University E-mail: faribahosayni@yahoo.com

  14. Language production in a shared task: Cumulative semantic interference from self- and other-produced context words

    NARCIS (Netherlands)

    Hoedemaker, R.S.; Ernst, J.; Meyer, A.S.; Belke, E.

    2017-01-01

    This study assessed the effects of semantic context in the form of self-produced and other-produced words on subsequent language production. Pairs of participants performed a joint picture naming task, taking turns while naming a continuous series of pictures. In the single-speaker version of this

  15. Gender parity trends for invited speakers at four prominent virology conference series.

    Science.gov (United States)

    Kalejta, Robert F; Palmenberg, Ann C

    2017-06-07

    Scientific conferences are most beneficial to participants when they showcase significant new experimental developments, accurately summarize the current state of the field, and provide strong opportunities for collaborative networking. A top-notch slate of invited speakers, assembled by conference organizers or committees, is key to achieving these goals. The perceived underrepresentation of female speakers at prominent scientific meetings is currently a popular topic for discussion, but one that often lacks supportive data. We compiled the full rosters of invited speakers over the last 35 years for four prominent international virology conferences, the American Society for Virology Annual Meeting (ASV), the International Herpesvirus Workshop (IHW), the Positive-Strand RNA Virus Symposium (PSR), and the Gordon Research Conference on Viruses & Cells (GRC). The rosters were cross-indexed by unique names, gender, year, and repeat invitations. When plotted as gender-dependent trends over time, all four conferences showed a clear proclivity for male-dominated invited speaker lists. Encouragingly, shifts toward parity are emerging within all units, but at different rates. Not surprisingly, both selection of a larger percentage of first time participants and the presence of a woman on the speaker selection committee correlated with improved parity. Session chair information was also collected for the IHW and GRC. These visible positions also displayed a strong male dominance over time that is eroding slowly. We offer our personal interpretation of these data to aid future organizers achieve improved equity among the limited number of available positions for session moderators and invited speakers. IMPORTANCE Politicians and media members have a tendency to cite anecdotes as conclusions without any supporting data. This happens so frequently now, that a name for it has emerged: fake news. Good science proceeds otherwise. The under representation of women as invited

  16. Language control in different contexts: the behavioural ecology of bilingual speakers

    Directory of Open Access Journals (Sweden)

    David William Green

    2011-05-01

    Full Text Available This paper proposes that different experimental contexts (single or dual language contexts permit different neural loci at which words in the target language can be selected. However, in order to develop a fuller understanding of the neural circuit mediating language control we need to consider the community context in which bilingual speakers typically use their two languages (the behavioural ecology of bilingual speakers. The contrast between speakers from code-switching and non-code switching communities offers a way to increase our understanding of the cortical, subcortical and, in particular, cerebellar structures involved in language control. It will also help us identify the non-verbal behavioural correlates associated with these control processes.

  17. Artificially intelligent recognition of Arabic speaker using voice print-based local features

    Science.gov (United States)

    Mahmood, Awais; Alsulaiman, Mansour; Muhammad, Ghulam; Akram, Sheeraz

    2016-11-01

    Local features for any pattern recognition system are based on the information extracted locally. In this paper, a local feature extraction technique was developed. This feature was extracted in the time-frequency plain by taking the moving average on the diagonal directions of the time-frequency plane. This feature captured the time-frequency events producing a unique pattern for each speaker that can be viewed as a voice print of the speaker. Hence, we referred to this technique as voice print-based local feature. The proposed feature was compared to other features including mel-frequency cepstral coefficient (MFCC) for speaker recognition using two different databases. One of the databases used in the comparison is a subset of an LDC database that consisted of two short sentences uttered by 182 speakers. The proposed feature attained 98.35% recognition rate compared to 96.7% for MFCC using the LDC subset.

  18. Objective eye-gaze behaviour during face-to-face communication with proficient alaryngeal speakers: a preliminary study.

    Science.gov (United States)

    Evitts, Paul; Gallop, Robert

    2011-01-01

    There is a large body of research demonstrating the impact of visual information on speaker intelligibility in both normal and disordered speaker populations. However, there is minimal information on which specific visual features listeners find salient during conversational discourse. To investigate listeners' eye-gaze behaviour during face-to-face conversation with normal, laryngeal and proficient alaryngeal speakers. Sixty participants individually participated in a 10-min conversation with one of four speakers (typical laryngeal, tracheoesophageal, oesophageal, electrolaryngeal; 15 participants randomly assigned to one mode of speech). All speakers were > 85% intelligible and were judged to be 'proficient' by two certified speech-language pathologists. Participants were fitted with a head-mounted eye-gaze tracking device (Mobile Eye, ASL) that calculated the region of interest and mean duration of eye-gaze. Self-reported gaze behaviour was also obtained following the conversation using a 10 cm visual analogue scale. While listening, participants viewed the lower facial region of the oesophageal speaker more than the normal or tracheoesophageal speaker. Results of non-hierarchical cluster analyses showed that while listening, the pattern of eye-gaze was predominantly directed at the lower face of the oesophageal and electrolaryngeal speaker and more evenly dispersed among the background, lower face, and eyes of the normal and tracheoesophageal speakers. Finally, results show a low correlation between self-reported eye-gaze behaviour and objective regions of interest data. Overall, results suggest similar eye-gaze behaviour when healthy controls converse with normal and tracheoesophageal speakers and that participants had significantly different eye-gaze patterns when conversing with an oesophageal speaker. Results are discussed in terms of existing eye-gaze data and its potential implications on auditory-visual speech perception. © 2011 Royal College of Speech

  19. Speaker Prediction based on Head Orientations

    NARCIS (Netherlands)

    Rienks, R.J.; Poppe, Ronald Walter; van Otterlo, M.; Poel, Mannes; Poel, M.; Nijholt, A.; Nijholt, Antinus

    2005-01-01

    To gain insight into gaze behavior in meetings, this paper compares the results from a Naive Bayes classifier, Neural Networks and humans on speaker prediction in four-person meetings given solely the azimuth head angles. The Naive Bayes classifier scored 69.4% correctly, Neural Networks 62.3% and

  20. An acoustic analysis of English vowels produced by speakers of seven different native-language backgrounds

    NARCIS (Netherlands)

    Heuven, van V.J.J.P.; Gooskens, C.

    2017-01-01

    We measured F1, F2 and duration of ten English monophthongs produced by American native speakers and by Danish, Norwegian, Swedish, Dutch, Hungarian and Chinese L2 speakers. We hypothesized that (i) L2 speakers would approximate the English vowels more closely as the phonological distance between

  1. Model Essay as a Feedback Tool in Task 2 of the IELTS Writing Exam Instruction for Slovene Students

    Directory of Open Access Journals (Sweden)

    Nina Bostič Bishop

    2011-05-01

    Full Text Available The paper discusses using a model essay as a feedback tool when teaching EFL writing to Slovene EFL students in the context of Task 2 of the IELTS Writing exam. In the present study, four IELTS students of two different levels were asked to write a response to a Task 2 IELTS Writing Exam question and compare it to a native speaker or a native speaker-like model essay by means of note-taking. The notes were then analyzed, and the findings offer an insight into what aspects of the English language Slovene students noticed and how frequently they noticed individual language items. An analysis of the differences and similarities in the quality and quantity of noticing depending on the students’ level is also provided. A comparison with a Japanese study made by Abe in 2008 has been done. Finally, recommendations for future research are made.

  2. Speaker transfer in children's peer conversation: completing communication-aid-mediated contributions.

    Science.gov (United States)

    Clarke, Michael; Bloch, Steven; Wilkinson, Ray

    2013-03-01

    Managing the exchange of speakers from one person to another effectively is a key issue for participants in everyday conversational interaction. Speakers use a range of resources to indicate, in advance, when their turn will come to an end, and listeners attend to such signals in order to know when they might legitimately speak. Using the principles and findings from conversation analysis, this paper examines features of speaker transfer in a conversation between a boy with cerebral palsy who has been provided with a voice-output communication aid (VOCA), and a peer without physical or communication difficulties. Specifically, the analysis focuses on turn exchange, where a VOCA-mediated contribution approach completion, and the child without communication needs is due to speak next.

  3. Speaker information affects false recognition of unstudied lexical-semantic associates.

    Science.gov (United States)

    Luthra, Sahil; Fox, Neal P; Blumstein, Sheila E

    2018-05-01

    Recognition of and memory for a spoken word can be facilitated by a prior presentation of that word spoken by the same talker. However, it is less clear whether this speaker congruency advantage generalizes to facilitate recognition of unheard related words. The present investigation employed a false memory paradigm to examine whether information about a speaker's identity in items heard by listeners could influence the recognition of novel items (critical intruders) phonologically or semantically related to the studied items. In Experiment 1, false recognition of semantically associated critical intruders was sensitive to speaker information, though only when subjects attended to talker identity during encoding. Results from Experiment 2 also provide some evidence that talker information affects the false recognition of critical intruders. Taken together, the present findings indicate that indexical information is able to contact the lexical-semantic network to affect the processing of unheard words.

  4. Study of audio speakers containing ferrofluid

    Energy Technology Data Exchange (ETDEWEB)

    Rosensweig, R E [34 Gloucester Road, Summit, NJ 07901 (United States); Hirota, Y; Tsuda, S [Ferrotec, 1-4-14 Kyobashi, chuo-Ku, Tokyo 104-0031 (Japan); Raj, K [Ferrotec, 33 Constitution Drive, Bedford, NH 03110 (United States)

    2008-05-21

    This work validates a method for increasing the radial restoring force on the voice coil in audio speakers containing ferrofluid. In addition, a study is made of factors influencing splash loss of the ferrofluid due to shock. Ferrohydrodynamic analysis is employed throughout to model behavior, and predictions are compared to experimental data.

  5. Designing, Modeling, Constructing, and Testing a Flat Panel Speaker and Sound Diffuser for a Simulator

    Science.gov (United States)

    Dillon, Christina

    2013-01-01

    The goal of this project was to design, model, build, and test a flat panel speaker and frame for a spherical dome structure being made into a simulator. The simulator will be a test bed for evaluating an immersive environment for human interfaces. This project focused on the loud speakers and a sound diffuser for the dome. The rest of the team worked on an Ambisonics 3D sound system, video projection system, and multi-direction treadmill to create the most realistic scene possible. The main programs utilized in this project, were Pro-E and COMSOL. Pro-E was used for creating detailed figures for the fabrication of a frame that held a flat panel loud speaker. The loud speaker was made from a thin sheet of Plexiglas and 4 acoustic exciters. COMSOL, a multiphysics finite analysis simulator, was used to model and evaluate all stages of the loud speaker, frame, and sound diffuser. Acoustical testing measurements were utilized to create polar plots from the working prototype which were then compared to the COMSOL simulations to select the optimal design for the dome. The final goal of the project was to install the flat panel loud speaker design in addition to a sound diffuser on to the wall of the dome. After running tests in COMSOL on various speaker configurations, including a warped Plexiglas version, the optimal speaker design included a flat piece of Plexiglas with a rounded frame to match the curvature of the dome. Eight of these loud speakers will be mounted into an inch and a half of high performance acoustic insulation, or Thinsulate, that will cover the inside of the dome. The following technical paper discusses these projects and explains the engineering processes used, knowledge gained, and the projected future goals of this project

  6. Flexible spatial perspective-taking: conversational partners weigh multiple cues in collaborative tasks.

    Science.gov (United States)

    Galati, Alexia; Avraamides, Marios N

    2013-01-01

    Research on spatial perspective-taking often focuses on the cognitive processes of isolated individuals as they adopt or maintain imagined perspectives. Collaborative studies of spatial perspective-taking typically examine speakers' linguistic choices, while overlooking their underlying processes and representations. We review evidence from two collaborative experiments that examine the contribution of social and representational cues to spatial perspective choices in both language and the organization of spatial memory. Across experiments, speakers organized their memory representations according to the convergence of various cues. When layouts were randomly configured and did not afford intrinsic cues, speakers encoded their partner's viewpoint in memory, if available, but did not use it as an organizing direction. On the other hand, when the layout afforded an intrinsic structure, speakers organized their spatial memories according to the person-centered perspective reinforced by the layout's structure. Similarly, in descriptions, speakers considered multiple cues whether available a priori or at the interaction. They used partner-centered expressions more frequently (e.g., "to your right") when the partner's viewpoint was misaligned by a small offset or coincided with the layout's structure. Conversely, they used egocentric expressions more frequently when their own viewpoint coincided with the intrinsic structure or when the partner was misaligned by a computationally difficult, oblique offset. Based on these findings we advocate for a framework for flexible perspective-taking: people weigh multiple cues (including social ones) to make attributions about the relative difficulty of perspective-taking for each partner, and adapt behavior to minimize their collective effort. This framework is not specialized for spatial reasoning but instead emerges from the same principles and memory-depended processes that govern perspective-taking in non-spatial tasks.

  7. Data requirements for speaker independent acoustic models

    CSIR Research Space (South Africa)

    Badenhorst, JAC

    2008-11-01

    Full Text Available When developing speech recognition systems in resource-constrained environments, careful design of the training corpus can play an important role in compensating for data scarcity. One of the factors to consider relates to the speaker composition...

  8. During Threaded Discussions Are Non-Native English Speakers Always at a Disadvantage?

    Science.gov (United States)

    Shafer Willner, Lynn

    2014-01-01

    When participating in threaded discussions, under what conditions might non¬native speakers of English (NNSE) be at a comparative disadvantage to their classmates who are native speakers of English (NSE)? This study compares the threaded discussion perspectives of closely-matched NNSE and NSE adult students having different levels of threaded…

  9. Analysis of Acoustic Features in Speakers with Cognitive Disorders and Speech Impairments

    Science.gov (United States)

    Saz, Oscar; Simón, Javier; Rodríguez, W. Ricardo; Lleida, Eduardo; Vaquero, Carlos

    2009-12-01

    This work presents the results in the analysis of the acoustic features (formants and the three suprasegmental features: tone, intensity and duration) of the vowel production in a group of 14 young speakers suffering different kinds of speech impairments due to physical and cognitive disorders. A corpus with unimpaired children's speech is used to determine the reference values for these features in speakers without any kind of speech impairment within the same domain of the impaired speakers; this is 57 isolated words. The signal processing to extract the formant and pitch values is based on a Linear Prediction Coefficients (LPCs) analysis of the segments considered as vowels in a Hidden Markov Model (HMM) based Viterbi forced alignment. Intensity and duration are also based in the outcome of the automated segmentation. As main conclusion of the work, it is shown that intelligibility of the vowel production is lowered in impaired speakers even when the vowel is perceived as correct by human labelers. The decrease in intelligibility is due to a 30% of increase in confusability in the formants map, a reduction of 50% in the discriminative power in energy between stressed and unstressed vowels and to a 50% increase of the standard deviation in the length of the vowels. On the other hand, impaired speakers keep good control of tone in the production of stressed and unstressed vowels.

  10. Evaluation of Speakers at a National Continuing Medical Education (CME Course

    Directory of Open Access Journals (Sweden)

    Jannette Collins, MD, MEd, FCCP

    2002-12-01

    Full Text Available Purpose: Evaluations of a national radiology continuing medical education (CME course in thoracic imaging were analyzed to determine what constitutes effective and ineffective lecturing. Methods and Materials: Evaluations of sessions and individual speakers participating in a five-day course jointly sponsored by the Society of Thoracic Radiology (STR and the Radiological Society of North America (RSNA were tallied by the RSNA Department of Data Management and three members of the STR Training Committee. Comments were collated and analyzed to determine the number of positive and negative comments and common themes related to ineffective lecturing. Results: Twenty-two sessions were evaluated by 234 (75.7% of 309 professional registrants. Eighty-one speakers were evaluated by an average of 153 registrants (range, 2 – 313. Mean ratings for 10 items evaluating sessions ranged from 1.28 – 2.05 (1=most positive, 4=least positive; SD .451 - .902. The average speaker rating was 5.7 (1=very poor, 7=outstanding; SD 0.94; range 4.3 – 6.4. Total number of comments analyzed was 862, with 505 (58.6% considered positive and 404 (46.9% considered negative (the total number exceeds 862 as a “comment” could consist of both positive and negative statements. Poor content was mentioned most frequently, making up 107 (26.5% of 404 negative comments, and applied to 51 (63% of 81 speakers. Other negative comments, in order of decreasing frequency, were related to delivery, image slides, command of the English language, text slides, and handouts. Conclusions: Individual evaluations of speakers at a national CME course provided information regarding the quality of lectures that was not provided by evaluations of grouped presentations. Systematic review of speaker evaluations provided specific information related to the types and frequency of features related to ineffective lecturing. This information can be used to design CME course evaluations, design future CME

  11. 7 CFR 247.13 - Provisions for non-English or limited-English speakers.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 4 2010-01-01 2010-01-01 false Provisions for non-English or limited-English speakers... § 247.13 Provisions for non-English or limited-English speakers. (a) What must State and local agencies do to ensure that non-English or limited-English speaking persons are aware of their rights and...

  12. Analysis of Feature Extraction Methods for Speaker Dependent Speech Recognition

    Directory of Open Access Journals (Sweden)

    Gurpreet Kaur

    2017-02-01

    Full Text Available Speech recognition is about what is being said, irrespective of who is saying. Speech recognition is a growing field. Major progress is taking place on the technology of automatic speech recognition (ASR. Still, there are lots of barriers in this field in terms of recognition rate, background noise, speaker variability, speaking rate, accent etc. Speech recognition rate mainly depends on the selection of features and feature extraction methods. This paper outlines the feature extraction techniques for speaker dependent speech recognition for isolated words. A brief survey of different feature extraction techniques like Mel-Frequency Cepstral Coefficients (MFCC, Linear Predictive Coding Coefficients (LPCC, Perceptual Linear Prediction (PLP, Relative Spectra Perceptual linear Predictive (RASTA-PLP analysis are presented and evaluation is done. Speech recognition has various applications from daily use to commercial use. We have made a speaker dependent system and this system can be useful in many areas like controlling a patient vehicle using simple commands.

  13. Dissociations between word and picture naming in Persian speakers with aphasia

    Directory of Open Access Journals (Sweden)

    Mehdi Bakhtiar

    2014-04-01

    Full Text Available Studies of patients with aphasia have found dissociations in their ability to read words and name pictures (Hillis & Caramazza, 1995; Hillis & Caramazza, 1991. Persian orthography is characterised by nearly regular orthography-phonology (OP mappings however, the omission of some vowels in the script makes the OP mapping of many words less predictable. The aim of this study was to compare the predictive lexico-semantic variables across reading and picture naming tasks in Persian aphasia while considering the variability across participants and items using mixed modeling. Methods and Results A total of 21 brain-injured Persian-speaking patients suffering from aphasia were asked to name 200 normalized Snodgrass object pictures and words taken from Bakhtiar, Nilipour and Weekes (2013 in different sessions. The results showed that word naming performance was significantly better than object naming in Persian speakers with aphasia (p<0.0001. Applying McNemar’s test to examine individual differences found that 18 patients showed significantly better performance in word reading compared to picture naming, 2 patients showed no difference between naming and reading (i.e. case 1 and 10, and one patient (i.e. case 5 showed significantly better naming compared to reading χ (1=10.23, p< 0.01 (see also Figure 1. A mixed-effect logistic regression analysis revealed that the degree of spelling transparency (i.e. the number of letters in a word divided by the number of its phonemes had an effect on word naming (along with frequency, age of acquisition (AoA, and imageability and picture naming (along with image agreement, AoA, word length, frequency and name agreement with a much stronger effect on the word naming task (b= 1.67, SE= 0.41, z= 4.05, p< 0.0001 compared to the picture naming task (b= -0.64, SE= 0.32, z= 2, p< 0.05. Conclusion The dissociation between word naming and picture naming shown by many patients suggests at least two routes are available

  14. Tone Language Speakers and Musicians Share Enhanced Perceptual and Cognitive Abilities for Musical Pitch: Evidence for Bidirectionality between the Domains of Language and Music

    Science.gov (United States)

    Bidelman, Gavin M.; Hutka, Stefanie; Moreno, Sylvain

    2013-01-01

    Psychophysiological evidence suggests that music and language are intimately coupled such that experience/training in one domain can influence processing required in the other domain. While the influence of music on language processing is now well-documented, evidence of language-to-music effects have yet to be firmly established. Here, using a cross-sectional design, we compared the performance of musicians to that of tone-language (Cantonese) speakers on tasks of auditory pitch acuity, music perception, and general cognitive ability (e.g., fluid intelligence, working memory). While musicians demonstrated superior performance on all auditory measures, comparable perceptual enhancements were observed for Cantonese participants, relative to English-speaking nonmusicians. These results provide evidence that tone-language background is associated with higher auditory perceptual performance for music listening. Musicians and Cantonese speakers also showed superior working memory capacity relative to nonmusician controls, suggesting that in addition to basic perceptual enhancements, tone-language background and music training might also be associated with enhanced general cognitive abilities. Our findings support the notion that tone language speakers and musically trained individuals have higher performance than English-speaking listeners for the perceptual-cognitive processing necessary for basic auditory as well as complex music perception. These results illustrate bidirectional influences between the domains of music and language. PMID:23565267

  15. Tone language speakers and musicians share enhanced perceptual and cognitive abilities for musical pitch: evidence for bidirectionality between the domains of language and music.

    Science.gov (United States)

    Bidelman, Gavin M; Hutka, Stefanie; Moreno, Sylvain

    2013-01-01

    Psychophysiological evidence suggests that music and language are intimately coupled such that experience/training in one domain can influence processing required in the other domain. While the influence of music on language processing is now well-documented, evidence of language-to-music effects have yet to be firmly established. Here, using a cross-sectional design, we compared the performance of musicians to that of tone-language (Cantonese) speakers on tasks of auditory pitch acuity, music perception, and general cognitive ability (e.g., fluid intelligence, working memory). While musicians demonstrated superior performance on all auditory measures, comparable perceptual enhancements were observed for Cantonese participants, relative to English-speaking nonmusicians. These results provide evidence that tone-language background is associated with higher auditory perceptual performance for music listening. Musicians and Cantonese speakers also showed superior working memory capacity relative to nonmusician controls, suggesting that in addition to basic perceptual enhancements, tone-language background and music training might also be associated with enhanced general cognitive abilities. Our findings support the notion that tone language speakers and musically trained individuals have higher performance than English-speaking listeners for the perceptual-cognitive processing necessary for basic auditory as well as complex music perception. These results illustrate bidirectional influences between the domains of music and language.

  16. Tone language speakers and musicians share enhanced perceptual and cognitive abilities for musical pitch: evidence for bidirectionality between the domains of language and music.

    Directory of Open Access Journals (Sweden)

    Gavin M Bidelman

    Full Text Available Psychophysiological evidence suggests that music and language are intimately coupled such that experience/training in one domain can influence processing required in the other domain. While the influence of music on language processing is now well-documented, evidence of language-to-music effects have yet to be firmly established. Here, using a cross-sectional design, we compared the performance of musicians to that of tone-language (Cantonese speakers on tasks of auditory pitch acuity, music perception, and general cognitive ability (e.g., fluid intelligence, working memory. While musicians demonstrated superior performance on all auditory measures, comparable perceptual enhancements were observed for Cantonese participants, relative to English-speaking nonmusicians. These results provide evidence that tone-language background is associated with higher auditory perceptual performance for music listening. Musicians and Cantonese speakers also showed superior working memory capacity relative to nonmusician controls, suggesting that in addition to basic perceptual enhancements, tone-language background and music training might also be associated with enhanced general cognitive abilities. Our findings support the notion that tone language speakers and musically trained individuals have higher performance than English-speaking listeners for the perceptual-cognitive processing necessary for basic auditory as well as complex music perception. These results illustrate bidirectional influences between the domains of music and language.

  17. Within-category variance and lexical tone discrimination in native and non-native speakers

    NARCIS (Netherlands)

    Hoffmann, C.W.G.; Sadakata, M.; Chen, A.; Desain, P.W.M.; McQueen, J.M.; Gussenhove, C.; Chen, Y.; Dediu, D.

    2014-01-01

    In this paper, we show how acoustic variance within lexical tones in disyllabic Mandarin Chinese pseudowords affects discrimination abilities in both native and non-native speakers of Mandarin Chinese. Within-category acoustic variance did not hinder native speakers in discriminating between lexical

  18. The Acquisition of Clitic Pronouns in the Spanish Interlanguage of Peruvian Quechua Speakers.

    Science.gov (United States)

    Klee, Carol A.

    1989-01-01

    Analysis of four adult Quechua speakers' acquisition of clitic pronouns in Spanish revealed that educational attainment and amount of contact with monolingual Spanish speakers were positively related to native-like norms of competence in the use of object pronouns in Spanish. (CB)

  19. "I May Be a Native Speaker but I'm Not Monolingual": Reimagining "All" Teachers' Linguistic Identities in TESOL

    Science.gov (United States)

    Ellis, Elizabeth M.

    2016-01-01

    Teacher linguistic identity has so far mainly been researched in terms of whether a teacher identifies (or is identified by others) as a native speaker (NEST) or nonnative speaker (NNEST) (Moussu & Llurda, 2008; Reis, 2011). Native speakers are presumed to be monolingual, and nonnative speakers, although by definition bilingual, tend to be…

  20. Bridging Gaps in Common Ground: Speakers Design Their Gestures for Their Listeners

    Science.gov (United States)

    Hilliard, Caitlin; Cook, Susan Wagner

    2016-01-01

    Communication is shaped both by what we are trying to say and by whom we are saying it to. We examined whether and how shared information influences the gestures speakers produce along with their speech. Unlike prior work examining effects of common ground on speech and gesture, we examined a situation in which some speakers have the same amount…

  1. Incorporating Pass-Phrase Dependent Background Models for Text-Dependent Speaker verification

    DEFF Research Database (Denmark)

    Sarkar, Achintya Kumar; Tan, Zheng-Hua

    2018-01-01

    -dependent. We show that the proposed method significantly reduces the error rates of text-dependent speaker verification for the non-target types: target-wrong and impostor-wrong while it maintains comparable TD-SV performance when impostors speak a correct utterance with respect to the conventional system......In this paper, we propose pass-phrase dependent background models (PBMs) for text-dependent (TD) speaker verification (SV) to integrate the pass-phrase identification process into the conventional TD-SV system, where a PBM is derived from a text-independent background model through adaptation using...... the utterances of a particular pass-phrase. During training, pass-phrase specific target speaker models are derived from the particular PBM using the training data for the respective target model. While testing, the best PBM is first selected for the test utterance in the maximum likelihood (ML) sense...

  2. Speaker-Sex Discrimination for Voiced and Whispered Vowels at Short Durations

    OpenAIRE

    Smith, David R. R.

    2016-01-01

    Whispered vowels, produced with no vocal fold vibration, lack the periodic temporal fine structure which in voiced vowels underlies the perceptual attribute of pitch (a salient auditory cue to speaker sex). Voiced vowels possess no temporal fine structure at very short durations (below two glottal cycles). The prediction was that speaker-sex discrimination performance for whispered and voiced vowels would be similar for very short durations but, as stimulus duration increases, voiced vowel pe...

  3. Identifying the nonlinear mechanical behaviour of micro-speakers from their quasi-linear electrical response

    Science.gov (United States)

    Zilletti, Michele; Marker, Arthur; Elliott, Stephen John; Holland, Keith

    2017-05-01

    In this study model identification of the nonlinear dynamics of a micro-speaker is carried out by purely electrical measurements, avoiding any explicit vibration measurements. It is shown that a dynamic model of the micro-speaker, which takes into account the nonlinear damping characteristic of the device, can be identified by measuring the response between the voltage input and the current flowing into the coil. An analytical formulation of the quasi-linear model of the micro-speaker is first derived and an optimisation method is then used to identify a polynomial function which describes the mechanical damping behaviour of the micro-speaker. The analytical results of the quasi-linear model are compared with numerical results. This study potentially opens up the possibility of efficiently implementing nonlinear echo cancellers.

  4. Promoting Communities of Practice among Non-Native Speakers of English in Online Discussions

    Science.gov (United States)

    Kim, Hoe Kyeung

    2011-01-01

    An online discussion involving text-based computer-mediated communication has great potential for promoting equal participation among non-native speakers of English. Several studies claimed that online discussions could enhance the academic participation of non-native speakers of English. However, there is little research around participation…

  5. Learning foreign labels from a foreign speaker: the role of (limited) exposure to a second language.

    Science.gov (United States)

    Akhtar, Nameera; Menjivar, Jennifer; Hoicka, Elena; Sabbagh, Mark A

    2012-11-01

    Three- and four-year-olds (N = 144) were introduced to novel labels by an English speaker and a foreign speaker (of Nordish, a made-up language), and were asked to endorse one of the speaker's labels. Monolingual English-speaking children were compared to bilingual children and English-speaking children who were regularly exposed to a language other than English. All children tended to endorse the English speaker's labels when asked 'What do you call this?', but when asked 'What do you call this in Nordish?', children with exposure to a second language were more likely to endorse the foreign label than monolingual and bilingual children. The findings suggest that, at this age, exposure to, but not necessarily immersion in, more than one language may promote the ability to learn foreign words from a foreign speaker.

  6. Is the superior verbal memory span of Mandarin speakers due to faster rehearsal?

    Science.gov (United States)

    Mattys, Sven L; Baddeley, Alan; Trenkic, Danijela

    2018-04-01

    It is well established that digit span in native Chinese speakers is atypically high. This is commonly attributed to a capacity for more rapid subvocal rehearsal for that group. We explored this hypothesis by testing a group of English-speaking native Mandarin speakers on digit span and word span in both Mandarin and English, together with a measure of speed of articulation for each. When compared to the performance of native English speakers, the Mandarin group proved to be superior on both digit and word spans while predictably having lower spans in English. This suggests that the Mandarin advantage is not limited to digits. Speed of rehearsal correlated with span performance across materials. However, this correlation was more pronounced for English speakers than for any of the Chinese measures. Further analysis suggested that speed of rehearsal did not provide an adequate account of differences between Mandarin and English spans or for the advantage of digits over words. Possible alternative explanations are discussed.

  7. Phonological processing skills in 6 year old blind and sighted Persian speakers

    Directory of Open Access Journals (Sweden)

    Maryam Sadat Momen Vaghefi

    2013-03-01

    Full Text Available Background and Aim: Phonological processing skills include the abilities to restore, retrieve and use memorized phonological codes. The purpose of this research is to compare and evaluate phonological processing skills in 6-7 year old blind and sighted Persian speakers in Tehran, Iran.Methods: This research is an analysis-comparison study. The subjects were 24 blind and 24 sighted children. The evaluation test of reading and writing disorders in primary school students, linguistic and cognitive abilities test, and the naming subtest of the aphasia evaluation test were used as research tools.Results: Sighted children were found to perform better on phoneme recognition of nonwords and flower naming subtests; and the difference was significant (p<0.001. Blind children performed better in words and sentence memory; the difference was significant (p<0.001. There were no significant differences in other subtests.Conclusion: Blind children's better performance in memory tasks is due to the fact that they have powerful auditory memory.

  8. The native-speaker fever in English language teaching (ELT: Pitting pedagogical competence against historical origin

    Directory of Open Access Journals (Sweden)

    Anchimbe, Eric A.

    2006-01-01

    Full Text Available This paper discusses English language teaching (ELT around the world, and argues that as a profession, it should emphasise pedagogical competence rather than native-speaker requirement in the recruitment of teachers in English as a foreign language (EFL and English as a second language (ESL contexts. It establishes that being a native speaker does not make one automatically a competent speaker or, of that matter, a competent teacher of the language. It observes that on many grounds, including physical, sociocultural, technological and economic changes in the world as well as the status of English as official and national language in many post-colonial regions, the distinction between native and non-native speakers is no longer valid.

  9. Psychophysical Boundary for Categorization of Voiced-Voiceless Stop Consonants in Native Japanese Speakers

    Science.gov (United States)

    Tamura, Shunsuke; Ito, Kazuhito; Hirose, Nobuyuki; Mori, Shuji

    2018-01-01

    Purpose: The purpose of this study was to investigate the psychophysical boundary used for categorization of voiced-voiceless stop consonants in native Japanese speakers. Method: Twelve native Japanese speakers participated in the experiment. The stimuli were synthetic stop consonant-vowel stimuli varying in voice onset time (VOT) with…

  10. Action and object processing in brain-injured speakers of Chinese.

    Science.gov (United States)

    Arévalo, Analia L; Lu, Ching-Ching; Huang, Lydia B-Y; Bates, Elizabeth A; Dronkers, Nina F

    2011-11-01

    To see whether action and object processing across different tasks and modalities differs in brain-injured speakers of Chinese with varying fluency and lesion locations within the left hemisphere. Words and pictures representing actions and objects were presented to a group of 33 participants whose native and/or dominant language was Mandarin Chinese: 23 patients with left-hemisphere lesions due to stroke and 10 language-, age- and education-matched healthy control participants. A set of 120 stimulus items was presented to each participant in three different forms: as black and white line drawings (for picture-naming), as written words (for reading) and as aurally presented words (for word repetition). Patients were divided into groups for two separate analyses: Analysis 1 divided and compared patients based on fluency (Fluent vs. Nonfluent) and Analysis 2 compared patients based on lesion location (Anterior vs. Posterior). Both analyses yielded similar results: Fluent, Nonfluent, Anterior, and Posterior patients all produced significantly more errors when processing action (M = 0.73, SD = 0.45) relative to object (M = 0.79, SD = 0.41) stimuli, and this effect was strongest in the picture-naming task. As in our previous study with English-speaking participants using the same experimental design (Arévalo et al., 2007, Arévalo, Moineau, Saygin, Ludy, & Bates, 2005), we did not find evidence for a double-dissociation in action and object processing between groups with different lesion and fluency profiles. These combined data bring us closer to a more informed view of action/object processing in the brain in both healthy and brain-injured individuals.

  11. Does training make French speakers more able to identify lexical stress?

    OpenAIRE

    Schwab, Sandra; Llisterri, Joaquim

    2013-01-01

    This research takes the stress deafness hypothesis as a starting point (e.g. Dupoux et al., 2008), and, more specifically, the fact that French speakers present difficulties in perceiving lexical stress in a free-stress language. In this framework, we aim at determining whether a prosodic training could improve the ability of French speakers to identify the stressed syllable in Spanish words. Three groups of participants took part in this experiment. The Native group was composed of 16 speake...

  12. a sociophonetic study of young nigerian english speakers

    African Journals Online (AJOL)

    Oladipupo

    between male and female speakers in boundary consonant deletion, (F(1, .... speech perception (Foulkes 2006, Clopper & Pisoni, 2005, Thomas 2002). ... in Nigeria, and had had the privilege of travelling to Europe and the Americas for the.

  13. Classifications of Vocalic Segments from Articulatory Kinematics: Healthy Controls and Speakers with Dysarthria

    Science.gov (United States)

    Yunusova, Yana; Weismer, Gary G.; Lindstrom, Mary J.

    2011-01-01

    Purpose: In this study, the authors classified vocalic segments produced by control speakers (C) and speakers with dysarthria due to amyotrophic lateral sclerosis (ALS) or Parkinson's disease (PD); classification was based on movement measures. The researchers asked the following questions: (a) Can vowels be classified on the basis of selected…

  14. On the Use of Complementary Spectral Features for Speaker Recognition

    Directory of Open Access Journals (Sweden)

    Sridhar Krishnan

    2007-12-01

    Full Text Available The most popular features for speaker recognition are Mel frequency cepstral coefficients (MFCCs and linear prediction cepstral coefficients (LPCCs. These features are used extensively because they characterize the vocal tract configuration which is known to be highly speaker-dependent. In this work, several features are introduced that can characterize the vocal system in order to complement the traditional features and produce better speaker recognition models. The spectral centroid (SC, spectral bandwidth (SBW, spectral band energy (SBE, spectral crest factor (SCF, spectral flatness measure (SFM, Shannon entropy (SE, and Renyi entropy (RE were utilized for this purpose. This work demonstrates that these features are robust in noisy conditions by simulating some common distortions that are found in the speakers' environment and a typical telephone channel. Babble noise, additive white Gaussian noise (AWGN, and a bandpass channel with 1 dB of ripple were used to simulate these noisy conditions. The results show significant improvements in classification performance for all noise conditions when these features were used to complement the MFCC and ΔMFCC features. In particular, the SC and SCF improved performance in almost all noise conditions within the examined SNR range (10–40 dB. For example, in cases where there was only one source of distortion, classification improvements of up to 8% and 10% were achieved under babble noise and AWGN, respectively, using the SCF feature.

  15. Continuing Medical Education Speakers with High Evaluation Scores Use more Image-based Slides

    Directory of Open Access Journals (Sweden)

    Ferguson, Ian

    2017-01-01

    Full Text Available Although continuing medical education (CME presentations are common across health professions, it is unknown whether slide design is independently associated with audience evaluations of the speaker. Based on the conceptual framework of Mayer’s theory of multimedia learning, this study aimed to determine whether image use and text density in presentation slides are associated with overall speaker evaluations. This retrospective analysis of six sequential CME conferences (two annual emergency medicine conferences over a three-year period used a mixed linear regression model to assess whether postconference speaker evaluations were associated with image fraction (percentage of image-based slides per presentation and text density (number of words per slide. A total of 105 unique lectures were given by 49 faculty members, and 1,222 evaluations (70.1% response rate were available for analysis. On average, 47.4% (SD=25.36 of slides had at least one educationally-relevant image (image fraction. Image fraction significantly predicted overall higher evaluation scores [F(1, 100.676=6.158, p=0.015] in the mixed linear regression model. The mean (SD text density was 25.61 (8.14 words/slide but was not a significant predictor [F(1, 86.293=0.55, p=0.815]. Of note, the individual speaker [χ2 (1=2.952, p=0.003] and speaker seniority [F(3, 59.713=4.083, p=0.011] significantly predicted higher scores. This is the first published study to date assessing the linkage between slide design and CME speaker evaluations by an audience of practicing clinicians. The incorporation of images was associated with higher evaluation scores, in alignment with Mayer’s theory of multimedia learning. Contrary to this theory, however, text density showed no significant association, suggesting that these scores may be multifactorial. Professional development efforts should focus on teaching best practices in both slide design and presentation skills.

  16. Continuing Medical Education Speakers with High Evaluation Scores Use more Image-based Slides.

    Science.gov (United States)

    Ferguson, Ian; Phillips, Andrew W; Lin, Michelle

    2017-01-01

    Although continuing medical education (CME) presentations are common across health professions, it is unknown whether slide design is independently associated with audience evaluations of the speaker. Based on the conceptual framework of Mayer's theory of multimedia learning, this study aimed to determine whether image use and text density in presentation slides are associated with overall speaker evaluations. This retrospective analysis of six sequential CME conferences (two annual emergency medicine conferences over a three-year period) used a mixed linear regression model to assess whether post-conference speaker evaluations were associated with image fraction (percentage of image-based slides per presentation) and text density (number of words per slide). A total of 105 unique lectures were given by 49 faculty members, and 1,222 evaluations (70.1% response rate) were available for analysis. On average, 47.4% (SD=25.36) of slides had at least one educationally-relevant image (image fraction). Image fraction significantly predicted overall higher evaluation scores [F(1, 100.676)=6.158, p=0.015] in the mixed linear regression model. The mean (SD) text density was 25.61 (8.14) words/slide but was not a significant predictor [F(1, 86.293)=0.55, p=0.815]. Of note, the individual speaker [χ 2 (1)=2.952, p=0.003] and speaker seniority [F(3, 59.713)=4.083, p=0.011] significantly predicted higher scores. This is the first published study to date assessing the linkage between slide design and CME speaker evaluations by an audience of practicing clinicians. The incorporation of images was associated with higher evaluation scores, in alignment with Mayer's theory of multimedia learning. Contrary to this theory, however, text density showed no significant association, suggesting that these scores may be multifactorial. Professional development efforts should focus on teaching best practices in both slide design and presentation skills.

  17. Speaker-Sex Discrimination for Voiced and Whispered Vowels at Short Durations.

    Science.gov (United States)

    Smith, David R R

    2016-01-01

    Whispered vowels, produced with no vocal fold vibration, lack the periodic temporal fine structure which in voiced vowels underlies the perceptual attribute of pitch (a salient auditory cue to speaker sex). Voiced vowels possess no temporal fine structure at very short durations (below two glottal cycles). The prediction was that speaker-sex discrimination performance for whispered and voiced vowels would be similar for very short durations but, as stimulus duration increases, voiced vowel performance would improve relative to whispered vowel performance as pitch information becomes available. This pattern of results was shown for women's but not for men's voices. A whispered vowel needs to have a duration three times longer than a voiced vowel before listeners can reliably tell whether it's spoken by a man or woman (∼30 ms vs. ∼10 ms). Listeners were half as sensitive to information about speaker-sex when it is carried by whispered compared with voiced vowels.

  18. Infant sensitivity to speaker and language in learning a second label.

    Science.gov (United States)

    Bhagwat, Jui; Casasola, Marianella

    2014-02-01

    Two experiments examined when monolingual, English-learning 19-month-old infants learn a second object label. Two experimenters sat together. One labeled a novel object with one novel label, whereas the other labeled the same object with a different label in either the same or a different language. Infants were tested on their comprehension of each label immediately following its presentation. Infants mapped the first label at above chance levels, but they did so with the second label only when requested by the speaker who provided it (Experiment 1) or when the second experimenter labeled the object in a different language (Experiment 2). These results show that 19-month-olds learn second object labels but do not readily generalize them across speakers of the same language. The results highlight how speaker and language spoken guide infants' acceptance of second labels, supporting sociopragmatic views of word learning. Copyright © 2013 Elsevier Inc. All rights reserved.

  19. B Anand | Speakers | Indian Academy of Sciences

    Indian Academy of Sciences (India)

    However, the mechanism by which this protospacer fragment gets integrated in a directional fashion into the leader proximal end is elusive. The speakers group identified that the leader region abutting the first CRISPR repeat localizes Integration Host Factor (IHF) and Cas1-2 complex in Escherichia coli. IHF binding to the ...

  20. L2 speakers decompose morphologically complex verbs: fMRI evidence from priming of transparent derived verbs

    Directory of Open Access Journals (Sweden)

    Sophie eDe Grauwe

    2014-10-01

    Full Text Available In this fMRI long-lag priming study, we investigated the processing of Dutch semantically transparent, derived prefix verbs. In such words, the meaning of the word as a whole can be deduced from the meanings of its parts, e.g. wegleggen ‘put aside’. Many behavioral and some fMRI studies suggest that native (L1 speakers decompose transparent derived words. The brain region usually implicated in morphological decomposition is the left inferior frontal gyrus (LIFG. In non-native (L2 speakers, the processing of transparent derived words has hardly been investigated, especially in fMRI studies, and results are contradictory: Some studies find more reliance on holistic (i.e. non-decompositional processing by L2 speakers; some find no difference between L1 and L2 speakers. In this study, we wanted to find out whether Dutch transparent derived prefix verbs are decomposed or processed holistically by German L2 speakers of Dutch. Half of the derived verbs (e.g. omvallen ‘fall down’ were preceded by their stem (e.g. vallen ‘fall’ with a lag of 4 to 6 words (‘primed’; the other half (e.g. inslapen ‘fall asleep’ were not (‘unprimed’. L1 and L2 speakers of Dutch made lexical decisions on these visually presented verbs. Both ROI analyses and whole-brain analyses showed that there was a significant repetition suppression effect for primed compared to unprimed derived verbs in the LIFG. This was true both for the analyses over L2 speakers only and for the analyses over the two language groups together. The latter did not reveal any interaction with language group (L1 vs. L2 in the LIFG. Thus, L2 speakers show a clear priming effect in the LIFG, an area that has been associated with morphological decomposition. Our findings are consistent with the idea that L2 speakers engage in decomposition of transparent derived verbs rather than processing them holistically.

  1. Brain Plasticity in Speech Training in Native English Speakers Learning Mandarin Tones

    Science.gov (United States)

    Heinzen, Christina Carolyn

    The current study employed behavioral and event-related potential (ERP) measures to investigate brain plasticity associated with second-language (L2) phonetic learning based on an adaptive computer training program. The program utilized the acoustic characteristics of Infant-Directed Speech (IDS) to train monolingual American English-speaking listeners to perceive Mandarin lexical tones. Behavioral identification and discrimination tasks were conducted using naturally recorded speech, carefully controlled synthetic speech, and non-speech control stimuli. The ERP experiments were conducted with selected synthetic speech stimuli in a passive listening oddball paradigm. Identical pre- and post- tests were administered on nine adult listeners, who completed two-to-three hours of perceptual training. The perceptual training sessions used pair-wise lexical tone identification, and progressed through seven levels of difficulty for each tone pair. The levels of difficulty included progression in speaker variability from one to four speakers and progression through four levels of acoustic exaggeration of duration, pitch range, and pitch contour. Behavioral results for the natural speech stimuli revealed significant training-induced improvement in identification of Tones 1, 3, and 4. Improvements in identification of Tone 4 generalized to novel stimuli as well. Additionally, comparison between discrimination of across-category and within-category stimulus pairs taken from a synthetic continuum revealed a training-induced shift toward more native-like categorical perception of the Mandarin lexical tones. Analysis of the Mismatch Negativity (MMN) responses in the ERP data revealed increased amplitude and decreased latency for pre-attentive processing of across-category discrimination as a result of training. There were also laterality changes in the MMN responses to the non-speech control stimuli, which could reflect reallocation of brain resources in processing pitch patterns

  2. Congenital Amusia in Speakers of a Tone Language: Association with Lexical Tone Agnosia

    Science.gov (United States)

    Nan, Yun; Sun, Yanan; Peretz, Isabelle

    2010-01-01

    Congenital amusia is a neurogenetic disorder that affects the processing of musical pitch in speakers of non-tonal languages like English and French. We assessed whether this musical disorder exists among speakers of Mandarin Chinese who use pitch to alter the meaning of words. Using the Montreal Battery of Evaluation of Amusia, we tested 117…

  3. Phoneme Error Pattern by Heritage Speakers of Spanish on an English Word Recognition Test.

    Science.gov (United States)

    Shi, Lu-Feng

    2017-04-01

    Heritage speakers acquire their native language from home use in their early childhood. As the native language is typically a minority language in the society, these individuals receive their formal education in the majority language and eventually develop greater competency with the majority than their native language. To date, there have not been specific research attempts to understand word recognition by heritage speakers. It is not clear if and to what degree we may infer from evidence based on bilingual listeners in general. This preliminary study investigated how heritage speakers of Spanish perform on an English word recognition test and analyzed their phoneme errors. A prospective, cross-sectional, observational design was employed. Twelve normal-hearing adult Spanish heritage speakers (four men, eight women, 20-38 yr old) participated in the study. Their language background was obtained through the Language Experience and Proficiency Questionnaire. Nine English monolingual listeners (three men, six women, 20-41 yr old) were also included for comparison purposes. Listeners were presented with 200 Northwestern University Auditory Test No. 6 words in quiet. They repeated each word orally and in writing. Their responses were scored by word, word-initial consonant, vowel, and word-final consonant. Performance was compared between groups with Student's t test or analysis of variance. Group-specific error patterns were primarily descriptive, but intergroup comparisons were made using 95% or 99% confidence intervals for proportional data. The two groups of listeners yielded comparable scores when their responses were examined by word, vowel, and final consonant. However, heritage speakers of Spanish misidentified significantly more word-initial consonants and had significantly more difficulty with initial /p, b, h/ than their monolingual peers. The two groups yielded similar patterns for vowel and word-final consonants, but heritage speakers made significantly

  4. Combining Behavioral and ERP Methodologies to Investigate the Differences Between McGurk Effects Demonstrated by Cantonese and Mandarin Speakers

    Directory of Open Access Journals (Sweden)

    Juan Zhang

    2018-05-01

    Full Text Available The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration.

  5. Combining Behavioral and ERP Methodologies to Investigate the Differences Between McGurk Effects Demonstrated by Cantonese and Mandarin Speakers

    Science.gov (United States)

    Zhang, Juan; Meng, Yaxuan; McBride, Catherine; Fan, Xitao; Yuan, Zhen

    2018-01-01

    The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP) methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration. PMID:29780312

  6. Communication‐related affective, behavioral, and cognitive reactions in speakers with spasmodic dysphonia

    Science.gov (United States)

    Vanryckeghem, Martine

    2017-01-01

    Objectives To investigate the self‐perceived affective, behavioral, and cognitive reactions associated with communication of speakers with spasmodic dysphonia as a function of employment status. Study Design Prospective cross‐sectional investigation Methods 148 Participants with spasmodic dysphonia (SD) completed an adapted version of the Behavior Assessment Battery (BAB‐Voice), a multidimensional assessment of self‐perceived reactions to communication. The BAB‐Voice consisted of four subtests: the Speech Situation Checklist for A) Emotional Reaction (SSC‐ER) and B) Speech Disruption (SSC‐SD), C) the Behavior Checklist (BCL), and D) the Communication Attitude Test for Adults (BigCAT). Participants were assigned to groups based on employment status (working versus retired). Results Descriptive comparison of the BAB‐Voice in speakers with SD to previously published non‐dysphonic speaker data revealed substantially higher scores associated with SD across all four subtests. Multivariate Analysis of Variance (MANOVA) revealed no significantly different BAB‐Voice subtest scores as a function of SD group status (working vs. retired). Conclusions BAB‐Voice scores revealed that speakers with SD experienced substantial impact of their voice disorder on communication attitude, coping behaviors, and affective reactions in speaking situations as reflected in their high BAB scores. These impacts do not appear to be influenced by work status, as speakers with SD who were employed or retired experienced similar levels of affective and behavioral reactions in various speaking situations and cognitive responses. These findings are consistent with previously published pilot data. The specificity of items assessed by means of the BAB‐Voice may inform the clinician of valid patient‐centered treatment goals which target the impairment extended beyond the physiological dimension. Level of Evidence 2b PMID:29299525

  7. Communication-related affective, behavioral, and cognitive reactions in speakers with spasmodic dysphonia.

    Science.gov (United States)

    Watts, Christopher R; Vanryckeghem, Martine

    2017-12-01

    To investigate the self-perceived affective, behavioral, and cognitive reactions associated with communication of speakers with spasmodic dysphonia as a function of employment status. Prospective cross-sectional investigation. 148 Participants with spasmodic dysphonia (SD) completed an adapted version of the Behavior Assessment Battery (BAB-Voice), a multidimensional assessment of self-perceived reactions to communication. The BAB-Voice consisted of four subtests: the Speech Situation Checklist for A) Emotional Reaction (SSC-ER) and B) Speech Disruption (SSC-SD), C) the Behavior Checklist (BCL), and D) the Communication Attitude Test for Adults (BigCAT). Participants were assigned to groups based on employment status (working versus retired). Descriptive comparison of the BAB-Voice in speakers with SD to previously published non-dysphonic speaker data revealed substantially higher scores associated with SD across all four subtests. Multivariate Analysis of Variance (MANOVA) revealed no significantly different BAB-Voice subtest scores as a function of SD group status (working vs. retired). BAB-Voice scores revealed that speakers with SD experienced substantial impact of their voice disorder on communication attitude, coping behaviors, and affective reactions in speaking situations as reflected in their high BAB scores. These impacts do not appear to be influenced by work status, as speakers with SD who were employed or retired experienced similar levels of affective and behavioral reactions in various speaking situations and cognitive responses. These findings are consistent with previously published pilot data. The specificity of items assessed by means of the BAB-Voice may inform the clinician of valid patient-centered treatment goals which target the impairment extended beyond the physiological dimension. 2b.

  8. Schizophrenia among Sesotho speakers in South Africa | Mosotho ...

    African Journals Online (AJOL)

    Results: Core symptoms of schizophrenia among Sesotho speakers do not differ significantly from other cultures. However, the content of psychological symptoms such as delusions and hallucinations is strongly affected by cultural variables. Somatic symptoms such as headaches, palpitations, dizziness and excessive ...

  9. Sentence comprehension in Swahili-English bilingual agrammatic speakers

    NARCIS (Netherlands)

    Abuom, Tom O.; Shah, Emmah; Bastiaanse, Roelien

    For this study, sentence comprehension was tested in Swahili-English bilingual agrammatic speakers. The sentences were controlled for four factors: (1) order of the arguments (base vs. derived); (2) embedding (declarative vs. relative sentences); (3) overt use of the relative pronoun "who"; (4)

  10. An evidence-based rehabilitation program for tracheoesophageal speakers

    NARCIS (Netherlands)

    Jongmans, P.; Rossum, M.; As-Brooks, C.; Hilgers, F.; Pols, L.; Hilgers, F.J.M.; Pols, L.C.W.; van Rossum, M.; van den Brekel, M.W.M.

    2008-01-01

    Objectives: to develop an evidence-based therapy program aimed at improving tracheoesophageal speech intelligibility. The therapy program is based on particular problems found for TE speakers in a previous study as performed by the authors. Patients/Materials and Methods: 9 male laryngectomized

  11. On the same wavelength: predictable language enhances speaker-listener brain-to-brain synchrony in posterior superior temporal gyrus.

    Science.gov (United States)

    Dikker, Suzanne; Silbert, Lauren J; Hasson, Uri; Zevin, Jason D

    2014-04-30

    Recent research has shown that the degree to which speakers and listeners exhibit similar brain activity patterns during human linguistic interaction is correlated with communicative success. Here, we used an intersubject correlation approach in fMRI to test the hypothesis that a listener's ability to predict a speaker's utterance increases such neural coupling between speakers and listeners. Nine subjects listened to recordings of a speaker describing visual scenes that varied in the degree to which they permitted specific linguistic predictions. In line with our hypothesis, the temporal profile of listeners' brain activity was significantly more synchronous with the speaker's brain activity for highly predictive contexts in left posterior superior temporal gyrus (pSTG), an area previously associated with predictive auditory language processing. In this region, predictability differentially affected the temporal profiles of brain responses in the speaker and listeners respectively, in turn affecting correlated activity between the two: whereas pSTG activation increased with predictability in the speaker, listeners' pSTG activity instead decreased for more predictable sentences. Listeners additionally showed stronger BOLD responses for predictive images before sentence onset, suggesting that highly predictable contexts lead comprehenders to preactivate predicted words.

  12. Student perceptions of native and non-native speaker language instructors: A comparison of ESL and Spanish

    Directory of Open Access Journals (Sweden)

    Laura Callahan

    2006-12-01

    Full Text Available The question of the native vs. non-native speaker status of second and foreign language instructors has been investigated chiefly from the perspective of the teacher. Anecdotal evidence suggests that students have strong opinions on the relative qualities of instruction by native and non-native speakers. Most research focuses on students of English as a foreign or second language. This paper reports on data gathered through a questionnaire administered to 55 university students: 31 students of Spanish as FL and 24 students of English as SL. Qualitative results show what strengths students believe each type of instructor has, and quantitative results confirm that any gap students may perceive between the abilities of native and non-native instructors is not so wide as one might expect based on popular notions of the issue. ESL students showed a stronger preference for native-speaker instructors overall, and were at variance with the SFL students' ratings of native-speaker instructors' performance on a number of aspects. There was a significant correlation in both groups between having a family member who is a native speaker of the target language and student preference for and self-identification with a native speaker as instructor. (English text

  13. Thermal Stresses Analysis and Optimized TTP Processes to Achieved CNT-Based Diaphragm for Thin Panel Speakers

    Directory of Open Access Journals (Sweden)

    Feng-Min Lai

    2016-01-01

    Full Text Available Industrial companies popularly used the powder coating, classing, and thermal transfer printing (TTP technique to avoid oxidation on the metallic surface and stiffened speaker diaphragm. This study developed a TTP technique to fabricate a carbon nanotubes (CNTs stiffened speaker diaphragm for thin panel speaker. The self-developed TTP stiffening technique did not require a high curing temperature that decreased the mechanical property of CNTs. In addition to increasing the stiffness of diaphragm substrate, this technique alleviated the middle and high frequency attenuation associated with the smoothing sound pressure curve of thin panel speaker. The advantage of TTP technique is less harmful to the ecology, but it causes thermal residual stresses and some unstable connections between printed plates. Thus, this study used the numerical analysis software (ANSYS to analyze the stress and thermal of work piece which have not delaminated problems in transfer interface. The Taguchi quality engineering method was applied to identify the optimal manufacturing parameters. Finally, the optimal manufacturing parameters were employed to fabricate a CNT-based diaphragm, which was then assembled onto a speaker. The result indicated that the CNT-based diaphragm improved the sound pressure curve smoothness of the speaker, which produced a minimum high frequency dip difference (ΔdB value.

  14. The Space-Time Topography of English Speakers

    Science.gov (United States)

    Duman, Steve

    2016-01-01

    English speakers talk and think about Time in terms of physical space. The past is behind us, and the future is in front of us. In this way, we "map" space onto Time. This dissertation addresses the specificity of this physical space, or its topography. Inspired by languages like Yupno (Nunez, et al., 2012) and Bamileke-Dschang (Hyman,…

  15. The Effects of Fluency Enhancing Conditions on Sensorimotor Control of Speech in Typically Fluent Speakers: An EEG Mu Rhythm Study

    Directory of Open Access Journals (Sweden)

    Tiffani Kittilstved

    2018-04-01

    Full Text Available Objective: To determine whether changes in sensorimotor control resulting from speaking conditions that induce fluency in people who stutter (PWS can be measured using electroencephalographic (EEG mu rhythms in neurotypical speakers.Methods: Non-stuttering (NS adults spoke in one control condition (solo speaking and four experimental conditions (choral speech, delayed auditory feedback (DAF, prolonged speech and pseudostuttering. Independent component analysis (ICA was used to identify sensorimotor μ components from EEG recordings. Time-frequency analyses measured μ-alpha (8–13 Hz and μ-beta (15–25 Hz event-related synchronization (ERS and desynchronization (ERD during each speech condition.Results: 19/24 participants contributed μ components. Relative to the control condition, the choral and DAF conditions elicited increases in μ-alpha ERD in the right hemisphere. In the pseudostuttering condition, increases in μ-beta ERD were observed in the left hemisphere. No differences were present between the prolonged speech and control conditions.Conclusions: Differences observed in the experimental conditions are thought to reflect sensorimotor control changes. Increases in right hemisphere μ-alpha ERD likely reflect increased reliance on auditory information, including auditory feedback, during the choral and DAF conditions. In the left hemisphere, increases in μ-beta ERD during pseudostuttering may have resulted from the different movement characteristics of this task compared with the solo speaking task. Relationships to findings in stuttering are discussed.Significance: Changes in sensorimotor control related feedforward and feedback control in fluency-enhancing speech manipulations can be measured using time-frequency decompositions of EEG μ rhythms in neurotypical speakers. This quiet, non-invasive, and temporally sensitive technique may be applied to learn more about normal sensorimotor control and fluency enhancement in PWS.

  16. Does dynamic information about the speaker's face contribute to semantic speech processing? ERP evidence.

    Science.gov (United States)

    Hernández-Gutiérrez, David; Abdel Rahman, Rasha; Martín-Loeches, Manuel; Muñoz, Francisco; Schacht, Annekathrin; Sommer, Werner

    2018-07-01

    Face-to-face interactions characterize communication in social contexts. These situations are typically multimodal, requiring the integration of linguistic auditory input with facial information from the speaker. In particular, eye gaze and visual speech provide the listener with social and linguistic information, respectively. Despite the importance of this context for an ecological study of language, research on audiovisual integration has mainly focused on the phonological level, leaving aside effects on semantic comprehension. Here we used event-related potentials (ERPs) to investigate the influence of facial dynamic information on semantic processing of connected speech. Participants were presented with either a video or a still picture of the speaker, concomitant to auditory sentences. Along three experiments, we manipulated the presence or absence of the speaker's dynamic facial features (mouth and eyes) and compared the amplitudes of the semantic N400 elicited by unexpected words. Contrary to our predictions, the N400 was not modulated by dynamic facial information; therefore, semantic processing seems to be unaffected by the speaker's gaze and visual speech. Even though, during the processing of expected words, dynamic faces elicited a long-lasting late posterior positivity compared to the static condition. This effect was significantly reduced when the mouth of the speaker was covered. Our findings may indicate an increase of attentional processing to richer communicative contexts. The present findings also demonstrate that in natural communicative face-to-face encounters, perceiving the face of a speaker in motion provides supplementary information that is taken into account by the listener, especially when auditory comprehension is non-demanding. Copyright © 2018 Elsevier Ltd. All rights reserved.

  17. Infants' Selectively Pay Attention to the Information They Receive from a Native Speaker of Their Language.

    Science.gov (United States)

    Marno, Hanna; Guellai, Bahia; Vidal, Yamil; Franzoi, Julia; Nespor, Marina; Mehler, Jacques

    2016-01-01

    From the first moments of their life, infants show a preference for their native language, as well as toward speakers with whom they share the same language. This preference appears to have broad consequences in various domains later on, supporting group affiliations and collaborative actions in children. Here, we propose that infants' preference for native speakers of their language also serves a further purpose, specifically allowing them to efficiently acquire culture specific knowledge via social learning. By selectively attending to informants who are native speakers of their language and who probably also share the same cultural background with the infant, young learners can maximize the possibility to acquire cultural knowledge. To test whether infants would preferably attend the information they receive from a speaker of their native language, we familiarized 12-month-old infants with a native and a foreign speaker, and then presented them with movies where each of the speakers silently gazed toward unfamiliar objects. At test, infants' looking behavior to the two objects alone was measured. Results revealed that infants preferred to look longer at the object presented by the native speaker. Strikingly, the effect was replicated also with 5-month-old infants, indicating an early development of such preference. These findings provide evidence that young infants pay more attention to the information presented by a person with whom they share the same language. This selectivity can serve as a basis for efficient social learning by influencing how infants' allocate attention between potential sources of information in their environment.

  18. Towards PLDA-RBM based speaker recognition in mobile environment: Designing stacked/deep PLDA-RBM systems

    DEFF Research Database (Denmark)

    Nautsch, Andreas; Hao, Hong; Stafylakis, Themos

    2016-01-01

    recognition: two deep architectures are presented and examined, which aim at suppressing channel effects and recovering speaker-discriminative information on back-ends trained on a small dataset. Experiments are carried out on the MOBIO SRE'13 database, which is a challenging and publicly available dataset...... for mobile speaker recognition with limited amounts of training data. The experiments show that the proposed system outperforms the baseline i-vector/PLDA approach by relative gains of 31% on female and 9% on male speakers in terms of half total error rate....

  19. Neural bases of congenital amusia in tonal language speakers.

    Science.gov (United States)

    Zhang, Caicai; Peng, Gang; Shao, Jing; Wang, William S-Y

    2017-03-01

    Congenital amusia is a lifelong neurodevelopmental disorder of fine-grained pitch processing. In this fMRI study, we examined the neural bases of congenial amusia in speakers of a tonal language - Cantonese. Previous studies on non-tonal language speakers suggest that the neural deficits of congenital amusia lie in the music-selective neural circuitry in the right inferior frontal gyrus (IFG). However, it is unclear whether this finding can generalize to congenital amusics in tonal languages. Tonal language experience has been reported to shape the neural processing of pitch, which raises the question of how tonal language experience affects the neural bases of congenital amusia. To investigate this question, we examined the neural circuitries sub-serving the processing of relative pitch interval in pitch-matched Cantonese level tone and musical stimuli in 11 Cantonese-speaking amusics and 11 musically intact controls. Cantonese-speaking amusics exhibited abnormal brain activities in a widely distributed neural network during the processing of lexical tone and musical stimuli. Whereas the controls exhibited significant activation in the right superior temporal gyrus (STG) in the lexical tone condition and in the cerebellum regardless of the lexical tone and music conditions, no activation was found in the amusics in those regions, which likely reflects a dysfunctional neural mechanism of relative pitch processing in the amusics. Furthermore, the amusics showed abnormally strong activation of the right middle frontal gyrus and precuneus when the pitch stimuli were repeated, which presumably reflect deficits of attending to repeated pitch stimuli or encoding them into working memory. No significant group difference was found in the right IFG in either the whole-brain analysis or region-of-interest analysis. These findings imply that the neural deficits in tonal language speakers might differ from those in non-tonal language speakers, and overlap partly with the

  20. Time-Contrastive Learning Based DNN Bottleneck Features for Text-Dependent Speaker Verification

    DEFF Research Database (Denmark)

    Sarkar, Achintya Kumar; Tan, Zheng-Hua

    2017-01-01

    In this paper, we present a time-contrastive learning (TCL) based bottleneck (BN) feature extraction method for speech signals with an application to text-dependent (TD) speaker verification (SV). It is well-known that speech signals exhibit quasi-stationary behavior in and only in a short interval......, and the TCL method aims to exploit this temporal structure. More specifically, it trains deep neural networks (DNNs) to discriminate temporal events obtained by uniformly segmenting speech signals, in contrast to existing DNN based BN feature extraction methods that train DNNs using labeled data...... to discriminate speakers or pass-phrases or phones or a combination of them. In the context of speaker verification, speech data of fixed pass-phrases are used for TCL-BN training, while the pass-phrases used for TCL-BN training are excluded from being used for SV, so that the learned features can be considered...

  1. Orthography-Induced Length Contrasts in the Second Language Phonological Systems of L2 Speakers of English: Evidence from Minimal Pairs.

    Science.gov (United States)

    Bassetti, Bene; Sokolović-Perović, Mirjana; Mairano, Paolo; Cerni, Tania

    2018-06-01

    Research shows that the orthographic forms ("spellings") of second language (L2) words affect speech production in L2 speakers. This study investigated whether English orthographic forms lead L2 speakers to produce English homophonic word pairs as phonological minimal pairs. Targets were 33 orthographic minimal pairs, that is to say homophonic words that would be pronounced as phonological minimal pairs if orthography affects pronunciation. Word pairs contained the same target sound spelled with one letter or two, such as the /n/ in finish and Finnish (both /'fɪnɪʃ/ in Standard British English). To test for effects of length and type of L2 exposure, we compared Italian instructed learners of English, Italian-English late bilinguals with lengthy naturalistic exposure, and English natives. A reading-aloud task revealed that Italian speakers of English L2 produce two English homophonic words as a minimal pair distinguished by different consonant or vowel length, for instance producing the target /'fɪnɪʃ/ with a short [n] or a long [nː] to reflect the number of consonant letters in the spelling of the words finish and Finnish. Similar effects were found on the pronunciation of vowels, for instance in the orthographic pair scene-seen (both /siːn/). Naturalistic exposure did not reduce orthographic effects, as effects were found both in learners and in late bilinguals living in an English-speaking environment. It appears that the orthographic form of L2 words can result in the establishment of a phonological contrast that does not exist in the target language. Results have implications for models of L2 phonological development.

  2. Switches to English during French Service Encounters: Relationships with L2 French Speakers' Willingness to Communicate and Motivation

    Science.gov (United States)

    McNaughton, Stephanie; McDonough, Kim

    2015-01-01

    This exploratory study investigated second language (L2) French speakers' service encounters in the multilingual setting of Montreal, specifically whether switches to English during French service encounters were related to L2 speakers' willingness to communicate or motivation. Over a two-week period, 17 French L2 speakers in Montreal submitted…

  3. Willing Learners yet Unwilling Speakers in ESL Classrooms

    Directory of Open Access Journals (Sweden)

    Zuraidah Ali

    2007-12-01

    Full Text Available To some of us, speech production in ESL has become so natural and integral that we seem to take it for granted. We often do not even remember how we struggled through the initial process of mastering English. Unfortunately, to students who are still learning English, they seem to face myriad problems that make them appear unwilling or reluctant ESL speakers. This study will investigate this phenomenon which is very common in the ESL classroom. Setting its background on related research findings on this matter, a qualitative study was conducted among foreign students enrolled in the Intensive English Programme (IEP at Institute of Liberal Studies (IKAL, University Tenaga Nasional (UNITEN. The results will show and discuss an extent of truth behind this perplexing phenomenon: willing learners, yet unwilling speakers of ESL, in our effort to provide supportive learning cultures in second language acquisition (SLA to this group of students.

  4. English exposed common mistakes made by Chinese speakers

    CERN Document Server

    Hart, Steve

    2017-01-01

    Having analysed the most common English errors made in over 600 academic papers written by Chinese undergraduates, postgraduates, and researchers, Steve Hart has written an essential, practical guide specifically for the native Chinese speaker on how to write good academic English. English Exposed: Common Mistakes Made by Chinese Speakers is divided into three main sections. The first section examines errors made with verbs, nouns, prepositions, and other grammatical classes of words. The second section focuses on problems of word choice. In addition to helping the reader find the right word, it provides instruction for selecting the right style too. The third section covers a variety of other areas essential for the academic writer, such as using punctuation, adding appropriate references, referring to tables and figures, and selecting among various English date and time phrases. Using English Exposed will allow a writer to produce material where content and ideas-not language mistakes-speak the loudest.

  5. Neural decoding of attentional selection in multi-speaker environments without access to clean sources

    Science.gov (United States)

    O'Sullivan, James; Chen, Zhuo; Herrero, Jose; McKhann, Guy M.; Sheth, Sameer A.; Mehta, Ashesh D.; Mesgarani, Nima

    2017-10-01

    Objective. People who suffer from hearing impairments can find it difficult to follow a conversation in a multi-speaker environment. Current hearing aids can suppress background noise; however, there is little that can be done to help a user attend to a single conversation amongst many without knowing which speaker the user is attending to. Cognitively controlled hearing aids that use auditory attention decoding (AAD) methods are the next step in offering help. Translating the successes in AAD research to real-world applications poses a number of challenges, including the lack of access to the clean sound sources in the environment with which to compare with the neural signals. We propose a novel framework that combines single-channel speech separation algorithms with AAD. Approach. We present an end-to-end system that (1) receives a single audio channel containing a mixture of speakers that is heard by a listener along with the listener’s neural signals, (2) automatically separates the individual speakers in the mixture, (3) determines the attended speaker, and (4) amplifies the attended speaker’s voice to assist the listener. Main results. Using invasive electrophysiology recordings, we identified the regions of the auditory cortex that contribute to AAD. Given appropriate electrode locations, our system is able to decode the attention of subjects and amplify the attended speaker using only the mixed audio. Our quality assessment of the modified audio demonstrates a significant improvement in both subjective and objective speech quality measures. Significance. Our novel framework for AAD bridges the gap between the most recent advancements in speech processing technologies and speech prosthesis research and moves us closer to the development of cognitively controlled hearable devices for the hearing impaired.

  6. Insight into the Attitudes of Speakers of Urban Meccan Hijazi Arabic towards their Dialect

    Directory of Open Access Journals (Sweden)

    Sameeha D. Alahmadi

    2016-04-01

    Full Text Available The current study mainly aims to examine the attitudes of speakers of Urban Meccan Hijazi Arabic (UMHA towards their dialect, which is spoken in Mecca, Saudi Arabia. It also investigates whether the participants’ age, sex and educational level have any impact on their perception of their dialect. To this end, I designed a 5-point-Likert-scale questionnaire, requiring participants to rate their attitudes towards their dialect. I asked 80 participants, whose first language is UMHA, to fill out the questionnaire. On the basis of the three independent variables, namely, age, sex and educational level, the participants were divided into three groups: old and young speakers, male and female speakers and educated and uneducated speakers. The results reveal that in general, all the groups (young and old, male and female, and educated and uneducated participants have a sense of responsibility towards their dialect, making their attitudes towards their dialect positive. However, differences exist between the three groups. For instance, old speakers tend to express their pride of their dialect more than young speakers. The same pattern is observed in male and female groups. The results show that females may feel embarrassed to provide answers that may imply that they are not proud of their own dialect, since the majority of women in the Arab world, in general, are under more pressure to conform to the overt norms of the society than males. Therefore, I argue that most Arab women may not have the same freedom to express their opinions and feelings about various issues. Based on the results, the study concludes with some recommendations for further research.  Keywords: sociolinguistics, language attitudes, dialectology, social variables, Urban Meccan Hijazi Arabic

  7. Accuracy of MFCC-Based Speaker Recognition in Series 60 Device

    Directory of Open Access Journals (Sweden)

    Pasi Fränti

    2005-10-01

    Full Text Available A fixed point implementation of speaker recognition based on MFCC signal processing is considered. We analyze the numerical error of the MFCC and its effect on the recognition accuracy. Techniques to reduce the information loss in a converted fixed point implementation are introduced. We increase the signal processing accuracy by adjusting the ratio of presentation accuracy of the operators and the signal. The signal processing error is found out to be more important to the speaker recognition accuracy than the error in the classification algorithm. The results are verified by applying the alternative technique to speech data. We also discuss the specific programming requirements set up by the Symbian and Series 60.

  8. Lip-Synching Using Speaker-Specific Articulation, Shape and Appearance Models

    Directory of Open Access Journals (Sweden)

    Gaspard Breton

    2009-01-01

    Full Text Available We describe here the control, shape and appearance models that are built using an original photogrammetric method to capture characteristics of speaker-specific facial articulation, anatomy, and texture. Two original contributions are put forward here: the trainable trajectory formation model that predicts articulatory trajectories of a talking face from phonetic input and the texture model that computes a texture for each 3D facial shape according to articulation. Using motion capture data from different speakers and module-specific evaluation procedures, we show here that this cloning system restores detailed idiosyncrasies and the global coherence of visible articulation. Results of a subjective evaluation of the global system with competing trajectory formation models are further presented and commented.

  9. Umesh V Waghmare | Speakers | Indian Academy of Sciences

    Indian Academy of Sciences (India)

    Umesh V Waghmare. Theoretical Sciences Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur P.O., Bangalore 560 064, ... These ideas apply quite well to dynamical structure of a crystal, as described by the dispersion of its phonons or vibrational waves. The speakers group has shown an interesting ...

  10. A general auditory bias for handling speaker variability in speech? Evidence in humans and songbirds

    NARCIS (Netherlands)

    Kriengwatana, B.; Escudero, P.; Kerkhoven, A.H.; ten Cate, C.

    2015-01-01

    Different speakers produce the same speech sound differently, yet listeners are still able to reliably identify the speech sound. How listeners can adjust their perception to compensate for speaker differences in speech, and whether these compensatory processes are unique only to humans, is still

  11. Age differences in vocal emotion perception: on the role of speaker age and listener sex.

    Science.gov (United States)

    Sen, Antarika; Isaacowitz, Derek; Schirmer, Annett

    2017-10-24

    Older adults have greater difficulty than younger adults perceiving vocal emotions. To better characterise this effect, we explored its relation to age differences in sensory, cognitive and emotional functioning. Additionally, we examined the role of speaker age and listener sex. Participants (N = 163) aged 19-34 years and 60-85 years categorised neutral sentences spoken by ten younger and ten older speakers with a happy, neutral, sad, or angry voice. Acoustic analyses indicated that expressions from younger and older speakers denoted the intended emotion with similar accuracy. As expected, younger participants outperformed older participants and this effect was statistically mediated by an age-related decline in both optimism and working-memory. Additionally, age differences in emotion perception were larger for younger as compared to older speakers and a better perception of younger as compared to older speakers was greater in younger as compared to older participants. Last, a female perception benefit was less pervasive in the older than the younger group. Together, these findings suggest that the role of age for emotion perception is multi-faceted. It is linked to emotional and cognitive change, to processing biases that benefit young and own-age expressions, and to the different aptitudes of women and men.

  12. What makes a charismatic speaker?

    DEFF Research Database (Denmark)

    Niebuhr, Oliver; Voße, Jana; Brem, Alexander

    2016-01-01

    The former Apple CEO Steve Jobs was one of the most charismatic speakers of the past decades. However, there is, as yet, no detailed quantitative profile of his way of speaking. We used state-of-the-art computer techniques to acoustically analyze his speech behavior and relate it to reference...... samples. Our paper provides the first-ever acoustic profile of Steve Jobs, based on about 4000 syllables and 12,000 individual speech sounds from his two most outstanding and well-known product presentations: the introductions of the iPhone 4 and the iPad 2. Our results show that Steve Jobs stands out...

  13. Processing ser and estar to locate objects and events: An ERP study with L2 speakers of Spanish.

    Science.gov (United States)

    Dussias, Paola E; Contemori, Carla; Román, Patricia

    2014-01-01

    In Spanish locative constructions, a different form of the copula is selected in relation to the semantic properties of the grammatical subject: sentences that locate objects require estar while those that locate events require ser (both translated in English as 'to be'). In an ERP study, we examined whether second language (L2) speakers of Spanish are sensitive to the selectional restrictions that the different types of subjects impose on the choice of the two copulas. Twenty-four native speakers of Spanish and two groups of L2 Spanish speakers (24 beginners and 18 advanced speakers) were recruited to investigate the processing of 'object/event + estar/ser ' permutations. Participants provided grammaticality judgments on correct (object + estar ; event + ser ) and incorrect (object + ser ; event + estar ) sentences while their brain activity was recorded. In line with previous studies (Leone-Fernández, Molinaro, Carreiras, & Barber, 2012; Sera, Gathje, & Pintado, 1999), the results of the grammaticality judgment for the native speakers showed that participants correctly accepted object + estar and event + ser constructions. In addition, while 'object + ser ' constructions were considered grossly ungrammatical, 'event + estar ' combinations were perceived as unacceptable to a lesser degree. For these same participants, ERP recording time-locked to the onset of the critical word ' en ' showed a larger P600 for the ser predicates when the subject was an object than when it was an event (*La silla es en la cocina vs. La fiesta es en la cocina). This P600 effect is consistent with syntactic repair of the defining predicate when it does not fit with the adequate semantic properties of the subject. For estar predicates (La silla está en la cocina vs. *La fiesta está en la cocina), the findings showed a central-frontal negativity between 500-700 ms. Grammaticality judgment data for the L2 speakers of Spanish showed that beginners were significantly less accurate than

  14. The Effect of Noise on Relationships Between Speech Intelligibility and Self-Reported Communication Measures in Tracheoesophageal Speakers.

    Science.gov (United States)

    Eadie, Tanya L; Otero, Devon Sawin; Bolt, Susan; Kapsner-Smith, Mara; Sullivan, Jessica R

    2016-08-01

    The purpose of this study was to examine how sentence intelligibility relates to self-reported communication in tracheoesophageal speakers when speech intelligibility is measured in quiet and noise. Twenty-four tracheoesophageal speakers who were at least 1 year postlaryngectomy provided audio recordings of 5 sentences from the Sentence Intelligibility Test. Speakers also completed self-reported measures of communication-the Voice Handicap Index-10 and the Communicative Participation Item Bank short form. Speech recordings were presented to 2 groups of inexperienced listeners who heard sentences in quiet or noise. Listeners transcribed the sentences to yield speech intelligibility scores. Very weak relationships were found between intelligibility in quiet and measures of voice handicap and communicative participation. Slightly stronger, but still weak and nonsignificant, relationships were observed between measures of intelligibility in noise and both self-reported measures. However, 12 speakers who were more than 65% intelligible in noise showed strong and statistically significant relationships with both self-reported measures (R2 = .76-.79). Speech intelligibility in quiet is a weak predictor of self-reported communication measures in tracheoesophageal speakers. Speech intelligibility in noise may be a better metric of self-reported communicative function for speakers who demonstrate higher speech intelligibility in noise.

  15. Native Speakers' Perception of Non-Native English Speech

    Science.gov (United States)

    Jaber, Maysa; Hussein, Riyad F.

    2011-01-01

    This study is aimed at investigating the rating and intelligibility of different non-native varieties of English, namely French English, Japanese English and Jordanian English by native English speakers and their attitudes towards these foreign accents. To achieve the goals of this study, the researchers used a web-based questionnaire which…

  16. Behavioral and subcortical signatures of musical expertise in Mandarin Chinese speakers.

    Directory of Open Access Journals (Sweden)

    Caitlin Dawson

    Full Text Available Both musical training and native language have been shown to have experience-based plastic effects on auditory processing. However, the combined effects within individuals are unclear. Recent research suggests that musical training and tone language speaking are not clearly additive in their effects on processing of auditory features and that there may be a disconnect between perceptual and neural signatures of auditory feature processing. The literature has only recently begun to investigate the effects of musical expertise on basic auditory processing for different linguistic groups. This work provides a profile of primary auditory feature discrimination for Mandarin speaking musicians and nonmusicians. The musicians showed enhanced perceptual discrimination for both frequency and duration as well as enhanced duration discrimination in a multifeature discrimination task, compared to nonmusicians. However, there were no differences between the groups in duration processing of nonspeech sounds at a subcortical level or in subcortical frequency representation of a nonnative tone contour, for fo or for the first or second formant region. The results indicate that musical expertise provides a cognitive, but not subcortical, advantage in a population of Mandarin speakers.

  17. THE TASK TYPE EFFECT ON THE USE OF COMMUNICATION STRATEGIES

    Directory of Open Access Journals (Sweden)

    Elvir Shtavica

    2018-03-01

    Full Text Available An argument that many foreign language students encounter oral communication problems while they try to express their meaning to their partners has encouraged a number of eminent scholars to analyze the use of communication strategies based on the type of the task activity and the level of proficiency. In this paper, the task type effect and the students’ proficiency levels on the communication strategies employed by Kosovan and Bosnian speakers of English were investigated. The purpose of the study was to determine if the students’ proficiency levels and the task type influenced the choice and the number of communication strategies at lexical degree in verbal communication. The study numbered 20 participants in total; Kosovan and Bosnian languages that use English as a foreign language. The subjects were selected upon their degree of proficiency (i.e. Elementary and Intermediate levels and were asked to carry out three different types of the tasks: ten minutes of oral communication, picture story narration and photographic description. The data of the assigned tasks came from audio and video-recording. Thus, the current study used the taxonomy of communication strategies employed by Tarone (1977. Likewise, the communication strategies used by both levels of the students were observed and compared in special instances. It was summarized that the task type and the level of proficiency influenced the number and the choice of different communication strategies in verbal performances. To indicate the present observable facts, two main aspects of the nature of the given tasks were pointed out: context in the task and task demands, respectively.

  18. Facial Expression Generation from Speaker's Emotional States in Daily Conversation

    Science.gov (United States)

    Mori, Hiroki; Ohshima, Koh

    A framework for generating facial expressions from emotional states in daily conversation is described. It provides a mapping between emotional states and facial expressions, where the former is represented by vectors with psychologically-defined abstract dimensions, and the latter is coded by the Facial Action Coding System. In order to obtain the mapping, parallel data with rated emotional states and facial expressions were collected for utterances of a female speaker, and a neural network was trained with the data. The effectiveness of proposed method is verified by a subjective evaluation test. As the result, the Mean Opinion Score with respect to the suitability of generated facial expression was 3.86 for the speaker, which was close to that of hand-made facial expressions.

  19. White Native English Speakers Needed: The Rhetorical Construction of Privilege in Online Teacher Recruitment Spaces

    Science.gov (United States)

    Ruecker, Todd; Ives, Lindsey

    2015-01-01

    Over the past few decades, scholars have paid increasing attention to the role of native speakerism in the field of TESOL. Several recent studies have exposed instances of native speakerism in TESOL recruitment discourses published through a variety of media, but none have focused specifically on professional websites advertising programs in…

  20. Extending Situated Language Comprehension (Accounts) with Speaker and Comprehender Characteristics: Toward Socially Situated Interpretation.

    Science.gov (United States)

    Münster, Katja; Knoeferle, Pia

    2017-01-01

    More and more findings suggest a tight temporal coupling between (non-linguistic) socially interpreted context and language processing. Still, real-time language processing accounts remain largely elusive with respect to the influence of biological (e.g., age) and experiential (e.g., world and moral knowledge) comprehender characteristics and the influence of the 'socially interpreted' context, as for instance provided by the speaker. This context could include actions, facial expressions, a speaker's voice or gaze, and gestures among others. We review findings from social psychology, sociolinguistics and psycholinguistics to highlight the relevance of (the interplay between) the socially interpreted context and comprehender characteristics for language processing. The review informs the extension of an extant real-time processing account (already featuring a coordinated interplay between language comprehension and the non-linguistic visual context) with a variable ('ProCom') that captures characteristics of the language user and with a first approximation of the comprehender's speaker representation. Extending the CIA to the sCIA (social Coordinated Interplay Account) is the first step toward a real-time language comprehension account which might eventually accommodate the socially situated communicative interplay between comprehenders and speakers.

  1. Language complexity modulates 8- and 10-year-olds' success at using their theory of mind abilities in a communication task.

    Science.gov (United States)

    Wang, J Jessica; Ali, Muna; Frisson, Steven; Apperly, Ian A

    2016-09-01

    Basic competence in theory of mind is acquired during early childhood. Nonetheless, evidence suggests that the ability to take others' perspectives in communication improves continuously from middle childhood to the late teenage years. This indicates that theory of mind performance undergoes protracted developmental changes after the acquisition of basic competence. Currently, little is known about the factors that constrain children's performance or that contribute to age-related improvement. A sample of 39 8-year-olds and 56 10-year-olds were tested on a communication task in which a speaker's limited perspective needed to be taken into account and the complexity of the speaker's utterance varied. Our findings showed that 10-year-olds were generally less egocentric than 8-year-olds. Children of both ages committed more egocentric errors when a speaker uttered complex sentences compared with simple sentences. Both 8- and 10-year-olds were affected by the demand to integrate complex sentences with the speaker's limited perspective and to a similar degree. These results suggest that long after children's development of simple visual perspective-taking, their use of this ability to assist communication is substantially constrained by the complexity of the language involved. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Omission of definite and indefinite articles in the spontaneous speech of agrammatic speakers with Broca's aphasia

    NARCIS (Netherlands)

    Havik, E.; Bastiaanse, Y.R.M.

    2004-01-01

    Background: Cross-linguistic investigation of agrammatic speech in speakers of different languages allows us to tests theoretical accounts of the nature of agrammatism. A significant feature of the speech of many agrammatic speakers is a problem with article production. Mansson and Ahlsen (2001)

  3. Flexible spatial perspective-taking: Conversational partners weigh multiple cues in collaborative tasks

    Directory of Open Access Journals (Sweden)

    Alexia eGalati

    2013-09-01

    Full Text Available Research on spatial perspective-taking often focuses on the cognitive processes of isolated individuals as they adopt or maintain imagined perspectives. Collaborative studies of spatial perspective-taking typically examine speakers’ linguistic choices, while overlooking their underlying processes and representations. We review evidence from two collaborative experiments that examine the contribution of social and representational cues to spatial perspective choices in both language and the organization of spatial memory. Across experiments, speakers organized their memory representations according to the convergence of various cues. When layouts were randomly configured and did not afford intrinsic cues, speakers encoded their partner’s viewpoint in memory, if available, but did not use it as an organizing direction. On the other hand, when the layout afforded an intrinsic structure, speakers organized their spatial memories according to the person-centered perspective reinforced by the layout’s structure. Similarly, in descriptions, speakers considered multiple cues whether available a priori or at the interaction. They used partner-centered expressions more frequently (e.g., to your right when the partner’s viewpoint was misaligned by a small offset or coincided with the layout’s structure. Conversely, they used egocentric expressions more frequently when their own viewpoint coincided with the intrinsic structure or when the partner was misaligned by a computationally difficult, oblique offset. Based on these findings we advocate for a framework for flexible perspective-taking: people weigh multiple cues (including social ones to make attributions about the relative difficulty of perspective-taking for each partner, and adapt behavior to minimize their collective effort. This framework is not specialized for spatial reasoning but instead emerges from the same principles and memory-depended processes that govern perspective-taking in

  4. Left hemisphere lateralization for lexical and acoustic pitch processing in Cantonese speakers as revealed by mismatch negativity.

    Science.gov (United States)

    Gu, Feng; Zhang, Caicai; Hu, Axu; Zhao, Guoping

    2013-12-01

    For nontonal language speakers, speech processing is lateralized to the left hemisphere and musical processing is lateralized to the right hemisphere (i.e., function-dependent brain asymmetry). On the other hand, acoustic temporal processing is lateralized to the left hemisphere and spectral/pitch processing is lateralized to the right hemisphere (i.e., acoustic-dependent brain asymmetry). In this study, we examine whether the hemispheric lateralization of lexical pitch and acoustic pitch processing in tonal language speakers is consistent with the patterns of function- and acoustic-dependent brain asymmetry in nontonal language speakers. Pitch contrast in both speech stimuli (syllable /ji/ in Experiment 1) and nonspeech stimuli (harmonic tone in Experiment 1; pure tone in Experiment 2) was presented to native Cantonese speakers in passive oddball paradigms. We found that the mismatch negativity (MMN) elicited by lexical pitch contrast was lateralized to the left hemisphere, which is consistent with the pattern of function-dependent brain asymmetry (i.e., left hemisphere lateralization for speech processing) in nontonal language speakers. However, the MMN elicited by acoustic pitch contrast was also left hemisphere lateralized (harmonic tone in Experiment 1) or showed a tendency for left hemisphere lateralization (pure tone in Experiment 2), which is inconsistent with the pattern of acoustic-dependent brain asymmetry (i.e., right hemisphere lateralization for acoustic pitch processing) in nontonal language speakers. The consistent pattern of function-dependent brain asymmetry and the inconsistent pattern of acoustic-dependent brain asymmetry between tonal and nontonal language speakers can be explained by the hypothesis that the acoustic-dependent brain asymmetry is the consequence of a carryover effect from function-dependent brain asymmetry. Potential evolutionary implication of this hypothesis is discussed. © 2013.

  5. Credibility of native and non-native speakers of English revisited: Do non-native listeners feel the same?

    OpenAIRE

    Hanzlíková, Dagmar; Skarnitzl, Radek

    2017-01-01

    This study reports on research stimulated by Lev-Ari and Keysar (2010) who showed that native listeners find statements delivered by foreign-accented speakers to be less true than those read by native speakers. Our objective was to replicate the study with non-native listeners to see whether this effect is also relevant in international communication contexts. The same set of statements from the original study was recorded by 6 native and 6 nonnative speakers of English. 121 non-native listen...

  6. Disrupted behaviour in grammatical morphology in French speakers with autism spectrum disorders.

    Science.gov (United States)

    Le Normand, Marie-Thérèse; Blanc, Romuald; Caldani, Simona; Bonnet-Brilhault, Frédérique

    2018-01-18

    Mixed and inconsistent findings have been reported across languages concerning grammatical morphology in speakers with Autism Spectrum Disorders (ASD). Some researchers argue for a selective sparing of grammar whereas others claim to have identified grammatical deficits. The present study aimed to investigate this issue in 26 participants with ASD speaking European French who were matched on age, gender and SES to 26 participants with typical development (TD). The groups were compared regarding their productivity and accuracy of syntactic and agreement categories using the French MOR part-of-speech tagger available from the CHILDES. The groups significantly differed in productivity with respect to nouns, adjectives, determiners, prepositions and gender markers. Error analysis revealed that ASD speakers exhibited a disrupted behaviour in grammatical morphology. They made gender, tense and preposition errors and they omitted determiners and pronouns in nominal and verbal contexts. ASD speakers may have a reduced sensitivity to perceiving and processing the distributional structure of syntactic categories when producing grammatical morphemes and agreement categories. The theoretical and cross-linguistic implications of these findings are discussed.

  7. Improving Speaker Recognition by Biometric Voice Deconstruction

    Directory of Open Access Journals (Sweden)

    Luis Miguel eMazaira-Fernández

    2015-09-01

    Full Text Available Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g. YouTube to broadcast its message. In this new scenario, classical identification methods (such fingerprints or face recognition have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. Through the present paper, a new methodology to characterize speakers will be shown. This methodology is benefiting from the advances achieved during the last years in understanding and modelling voice production. The paper hypothesizes that a gender dependent characterization of speakers combined with the use of a new set of biometric parameters extracted from the components resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract gender-dependent extended biometric parameters are given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.

  8. Pragmatic Instruction May Not Be Necessary among Heritage Speakers of Spanish: A Study on Requests

    Science.gov (United States)

    Barros García, María J.; Bachelor, Jeremy W.

    2018-01-01

    This paper studies the pragmatic competence of U.S. heritage speakers of Spanish in an attempt to determine (a) the degree of pragmatic transfer from English to Spanish experienced by heritage speakers when producing different types of requests in Spanish; and (b) how to best teach pragmatics to students of Spanish as a Heritage Language (SHL).…

  9. Within the School and the Community--A Speaker's Bureau.

    Science.gov (United States)

    McClintock, Joy H.

    Student interest prompted the formation of a Speaker's Bureau in Seminole Senior High School, Seminole, Florida. First, students compiled a list of community contacts, including civic clubs, churches, retirement villages, newspaper offices, and the County School Administration media center. A letter of introduction was composed and speaking…

  10. Non-English speakers attend gastroenterology clinic appointments at higher rates than English speakers in a vulnerable patient population

    Science.gov (United States)

    Sewell, Justin L.; Kushel, Margot B.; Inadomi, John M.; Yee, Hal F.

    2009-01-01

    Goals We sought to identify factors associated with gastroenterology clinic attendance in an urban safety net healthcare system. Background Missed clinic appointments reduce the efficiency and availability of healthcare, but subspecialty clinic attendance among patients with established healthcare access has not been studied. Study We performed an observational study using secondary data from administrative sources to study patients referred to, and scheduled for an appointment in, the adult gastroenterology clinic serving the safety net healthcare system of San Francisco, California. Our dependent variable was whether subjects attended or missed a scheduled appointment. Analysis included multivariable logistic regression and classification tree analysis. 1,833 patients were referred and scheduled for an appointment between 05/2005 and 08/2006. Prisoners were excluded. All patients had a primary care provider. Results 683 patients (37.3%) missed their appointment; 1,150 (62.7%) attended. Language was highly associated with attendance in the logistic regression; non-English speakers were less likely than English speakers to miss an appointment (adjusted odds ratio 0.42 [0.28,0.63] for Spanish, 0.56 [0.38,0.82] for Asian language, p gastroenterology clinic appointment, not speaking English was most strongly associated with higher attendance rates. Patient related factors associated with not speaking English likely influence subspecialty clinic attendance rates, and these factors may differ from those affecting general healthcare access. PMID:19169147

  11. Forensic Automatic Speaker Recognition Based on Likelihood Ratio Using Acoustic-phonetic Features Measured Automatically

    Directory of Open Access Journals (Sweden)

    Huapeng Wang

    2015-01-01

    Full Text Available Forensic speaker recognition is experiencing a remarkable paradigm shift in terms of the evaluation framework and presentation of voice evidence. This paper proposes a new method of forensic automatic speaker recognition using the likelihood ratio framework to quantify the strength of voice evidence. The proposed method uses a reference database to calculate the within- and between-speaker variability. Some acoustic-phonetic features are extracted automatically using the software VoiceSauce. The effectiveness of the approach was tested using two Mandarin databases: A mobile telephone database and a landline database. The experiment's results indicate that these acoustic-phonetic features do have some discriminating potential and are worth trying in discrimination. The automatic acoustic-phonetic features have acceptable discriminative performance and can provide more reliable results in evidence analysis when fused with other kind of voice features.

  12. Language production in a shared task: Cumulative semantic interference from self- and other-produced context words

    OpenAIRE

    Hoedemaker, R.; Ernst, J.; Meyer, A.; Belke, E.

    2017-01-01

    This study assessed the effects of semantic context in the form of self-produced and other-produced words on subsequent language production. Pairs of participants performed a joint picture naming task, taking turns while naming a continuous series of pictures. In the single-speaker version of this paradigm, naming latencies have been found to increase for successive presentations of exemplars from the same category, a phenomenon known as Cumulative Semantic Interference (CSI). As expected, th...

  13. A virtual speaker in noisy classroom conditions: supporting or disrupting children's listening comprehension?

    Science.gov (United States)

    Nirme, Jens; Haake, Magnus; Lyberg Åhlander, Viveka; Brännström, Jonas; Sahlén, Birgitta

    2018-04-05

    Seeing a speaker's face facilitates speech recognition, particularly under noisy conditions. Evidence for how it might affect comprehension of the content of the speech is more sparse. We investigated how children's listening comprehension is affected by multi-talker babble noise, with or without presentation of a digitally animated virtual speaker, and whether successful comprehension is related to performance on a test of executive functioning. We performed a mixed-design experiment with 55 (34 female) participants (8- to 9-year-olds), recruited from Swedish elementary schools. The children were presented with four different narratives, each in one of four conditions: audio-only presentation in a quiet setting, audio-only presentation in noisy setting, audio-visual presentation in a quiet setting, and audio-visual presentation in a noisy setting. After each narrative, the children answered questions on the content and rated their perceived listening effort. Finally, they performed a test of executive functioning. We found significantly fewer correct answers to explicit content questions after listening in noise. This negative effect was only mitigated to a marginally significant degree by audio-visual presentation. Strong executive function only predicted more correct answers in quiet settings. Altogether, our results are inconclusive regarding how seeing a virtual speaker affects listening comprehension. We discuss how methodological adjustments, including modifications to our virtual speaker, can be used to discriminate between possible explanations to our results and contribute to understanding the listening conditions children face in a typical classroom.

  14. Gesturing by Speakers with Aphasia: How Does It Compare?

    Science.gov (United States)

    Mol, Lisette; Krahmer, Emiel; van de Sandt-Koenderman, Mieke

    2013-01-01

    Purpose: To study the independence of gesture and verbal language production. The authors assessed whether gesture can be semantically compensatory in cases of verbal language impairment and whether speakers with aphasia and control participants use similar depiction techniques in gesture. Method: The informativeness of gesture was assessed in 3…

  15. A comparison of three speaker-intrinsic vowel formant frequency normalization algorithms for sociophonetics

    DEFF Research Database (Denmark)

    Fabricius, Anne; Watt, Dominic; Johnson, Daniel Ezra

    2009-01-01

    from RP and Aberdeen English (northeast Scotland). We conclude that, for the data examined here, the S-centroid W&F procedures performs at least as well as the two most recognized speaker-intrinsic, vowel-extrinsic, formant-intrinsic normalization methods, Lobanov's (1971) z-score procedure and Nearey......This paper evaluates a speaker-intrinsic vowel formant frequency normalization algorithm initially proposed in Watt & Fabricius (2002). We compare how well this routine, known as the S-centroid procedure, performs as a sociophonetic research tool in three ways: reducing variance in area ratios...

  16. Pitch perception and production in congenital amusia: Evidence from Cantonese speakers.

    Science.gov (United States)

    Liu, Fang; Chan, Alice H D; Ciocca, Valter; Roquet, Catherine; Peretz, Isabelle; Wong, Patrick C M

    2016-07-01

    This study investigated pitch perception and production in speech and music in individuals with congenital amusia (a disorder of musical pitch processing) who are native speakers of Cantonese, a tone language with a highly complex tonal system. Sixteen Cantonese-speaking congenital amusics and 16 controls performed a set of lexical tone perception, production, singing, and psychophysical pitch threshold tasks. Their tone production accuracy and singing proficiency were subsequently judged by independent listeners, and subjected to acoustic analyses. Relative to controls, amusics showed impaired discrimination of lexical tones in both speech and non-speech conditions. They also received lower ratings for singing proficiency, producing larger pitch interval deviations and making more pitch interval errors compared to controls. Demonstrating higher pitch direction identification thresholds than controls for both speech syllables and piano tones, amusics nevertheless produced native lexical tones with comparable pitch trajectories and intelligibility as controls. Significant correlations were found between pitch threshold and lexical tone perception, music perception and production, but not between lexical tone perception and production for amusics. These findings provide further evidence that congenital amusia is a domain-general language-independent pitch-processing deficit that is associated with severely impaired music perception and production, mildly impaired speech perception, and largely intact speech production.

  17. Testing Template and Testing Concept of Operations for Speaker Authentication Technology

    National Research Council Canada - National Science Library

    Sipko, Marek M

    2006-01-01

    This thesis documents the findings of developing a generic testing template and supporting concept of operations for speaker verification technology as part of the Iraqi Enrollment via Voice Authentication Project (IEVAP...

  18. Accent, Intelligibility, and the Role of the Listener: Perceptions of English-Accented German by Native German Speakers

    Science.gov (United States)

    Hayes-Harb, Rachel; Watzinger-Tharp, Johanna

    2012-01-01

    We explore the relationship between accentedness and intelligibility, and investigate how listeners' beliefs about nonnative speech interact with their accentedness and intelligibility judgments. Native German speakers and native English learners of German produced German sentences, which were presented to 12 native German speakers in accentedness…

  19. Beyond the language given: the neural correlates of inferring speaker meaning.

    Science.gov (United States)

    Bašnáková, Jana; Weber, Kirsten; Petersson, Karl Magnus; van Berkum, Jos; Hagoort, Peter

    2014-10-01

    Even though language allows us to say exactly what we mean, we often use language to say things indirectly, in a way that depends on the specific communicative context. For example, we can use an apparently straightforward sentence like "It is hard to give a good presentation" to convey deeper meanings, like "Your talk was a mess!" One of the big puzzles in language science is how listeners work out what speakers really mean, which is a skill absolutely central to communication. However, most neuroimaging studies of language comprehension have focused on the arguably much simpler, context-independent process of understanding direct utterances. To examine the neural systems involved in getting at contextually constrained indirect meaning, we used functional magnetic resonance imaging as people listened to indirect replies in spoken dialog. Relative to direct control utterances, indirect replies engaged dorsomedial prefrontal cortex, right temporo-parietal junction and insula, as well as bilateral inferior frontal gyrus and right medial temporal gyrus. This suggests that listeners take the speaker's perspective on both cognitive (theory of mind) and affective (empathy-like) levels. In line with classic pragmatic theories, our results also indicate that currently popular "simulationist" accounts of language comprehension fail to explain how listeners understand the speaker's intended message. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. Brain activity during auditory and visual phonological, spatial and simple discrimination tasks.

    Science.gov (United States)

    Salo, Emma; Rinne, Teemu; Salonen, Oili; Alho, Kimmo

    2013-02-16

    We used functional magnetic resonance imaging to measure human brain activity during tasks demanding selective attention to auditory or visual stimuli delivered in concurrent streams. Auditory stimuli were syllables spoken by different voices and occurring in central or peripheral space. Visual stimuli were centrally or more peripherally presented letters in darker or lighter fonts. The participants performed a phonological, spatial or "simple" (speaker-gender or font-shade) discrimination task in either modality. Within each modality, we expected a clear distinction between brain activations related to nonspatial and spatial processing, as reported in previous studies. However, within each modality, different tasks activated largely overlapping areas in modality-specific (auditory and visual) cortices, as well as in the parietal and frontal brain regions. These overlaps may be due to effects of attention common for all three tasks within each modality or interaction of processing task-relevant features and varying task-irrelevant features in the attended-modality stimuli. Nevertheless, brain activations caused by auditory and visual phonological tasks overlapped in the left mid-lateral prefrontal cortex, while those caused by the auditory and visual spatial tasks overlapped in the inferior parietal cortex. These overlapping activations reveal areas of multimodal phonological and spatial processing. There was also some evidence for intermodal attention-related interaction. Most importantly, activity in the superior temporal sulcus elicited by unattended speech sounds was attenuated during the visual phonological task in comparison with the other visual tasks. This effect might be related to suppression of processing irrelevant speech presumably distracting the phonological task involving the letters. Copyright © 2012 Elsevier B.V. All rights reserved.

  1. Revisiting the role of language in spatial cognition: Categorical perception of spatial relations in English and Korean speakers.

    Science.gov (United States)

    Holmes, Kevin J; Moty, Kelsey; Regier, Terry

    2017-12-01

    The spatial relation of support has been regarded as universally privileged in nonlinguistic cognition and immune to the influence of language. English, but not Korean, obligatorily distinguishes support from nonsupport via basic spatial terms. Despite this linguistic difference, previous research suggests that English and Korean speakers show comparable nonlinguistic sensitivity to the support/nonsupport distinction. Here, using a paradigm previously found to elicit cross-language differences in color discrimination, we provide evidence for a difference in sensitivity to support/nonsupport between native English speakers and native Korean speakers who were late English learners and tested in a context that privileged Korean. Whereas the former group showed categorical perception (CP) when discriminating spatial scenes capturing the support/nonsupport distinction, the latter did not. An additional group of native Korean speakers-relatively early English learners tested in an English-salient context-patterned with the native English speakers in showing CP for support/nonsupport. These findings suggest that obligatory marking of support/nonsupport in one's native language can affect nonlinguistic sensitivity to this distinction, contra earlier findings, but that such sensitivity may also depend on aspects of language background and the immediate linguistic context.

  2. Age of acquisition and naming performance in Frisian-Dutch bilingual speakers with dementia.

    Science.gov (United States)

    Veenstra, Wencke S; Huisman, Mark; Miller, Nick

    2014-01-01

    Age of acquisition (AoA) of words is a recognised variable affecting language processing in speakers with and without language disorders. For bi- and multilingual speakers their languages can be differentially affected in neurological illness. Study of language loss in bilingual speakers with dementia has been relatively neglected. We investigated whether AoA of words was associated with level of naming impairment in bilingual speakers with probable Alzheimer's dementia within and across their languages. Twenty-six Frisian-Dutch bilinguals with mild to moderate dementia named 90 pictures in each language, employing items with rated AoA and other word variable measures matched across languages. Quantitative (totals correct) and qualitative (error types and (in)appropriate switching) aspects were measured. Impaired retrieval occurred in Frisian (Language 1) and Dutch (Language 2), with a significant effect of AoA on naming in both languages. Earlier acquired words were better preserved and retrieved. Performance was identical across languages, but better in Dutch when controlling for covariates. However, participants demonstrated more inappropriate code switching within the Frisian test setting. On qualitative analysis, no differences in overall error distribution were found between languages for early or late acquired words. There existed a significantly higher percentage of semantically than visually-related errors. These findings have implications for understanding problems in lexical retrieval among bilingual individuals with dementia and its relation to decline in other cognitive functions which may play a role in inappropriate code switching. We discuss the findings in the light of the close relationship between Frisian and Dutch and the pattern of usage across the life-span.

  3. Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆

    Science.gov (United States)

    Cao, Houwei; Verma, Ragini; Nenkova, Ani

    2015-01-01

    We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion.

  4. Why reference to the past is difficult for agrammatic speakers

    NARCIS (Netherlands)

    Bastiaanse, Roelien

    Many studies have shown that verb inflections are difficult to produce for agrammatic aphasic speakers: they are frequently omitted and substituted. The present article gives an overview of our search to understanding why this is the case. The hypothesis is that grammatical morphology referring to

  5. Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data

    Science.gov (United States)

    2017-08-20

    M speakers. We seek a probabilistic solution to domain adap- tation, and so we encode knowledge of the out-of-domain data in prior distributions...the VB solution from (16)-(21) becomes: µ =αȳ + (1− α)µout, (24) Σa =α ( 1 NT NT∑ n=1 〈ynyTn 〉 − ȳȳT ) + (1− α) Σouta (25) + α (1− α) ( ȳ − µout...non- English languages and from unseen channels. An inadequate in-domain set was provided, which consisted of 2272 samples from 1164 speakers, and

  6. Infants' Understanding of False Labeling Events: The Referential Roles of Words and the Speakers Who Use Them.

    Science.gov (United States)

    Koenig, Melissa A.; Echols, Catharine H.

    2003-01-01

    Four studies examined whether 16-month-olds' responses to true/false utterances interacted with their knowledge of human agents. Findings suggested that infants are developing a critical conception of human speakers as truthful communicators and that infants understand that human speakers may provide uniquely useful information when a word fails…

  7. Teaching the Native English Speaker How to Teach English

    Science.gov (United States)

    Odhuu, Kelli

    2014-01-01

    This article speaks to teachers who have been paired with native speakers (NSs) who have never taught before, and the feelings of frustration, discouragement, and nervousness on the teacher's behalf that can occur as a result. In order to effectively tackle this situation, teachers need to work together with the NSs. Teachers in this scenario…

  8. Multi-Frame Rate Based Multiple-Model Training for Robust Speaker Identification of Disguised Voice

    DEFF Research Database (Denmark)

    Prasad, Swati; Tan, Zheng-Hua; Prasad, Ramjee

    2013-01-01

    Speaker identification systems are prone to attack when voice disguise is adopted by the user. To address this issue,our paper studies the effect of using different frame rates on the accuracy of the speaker identification system for disguised voice.In addition, a multi-frame rate based multiple......-model training method is proposed. The experimental results show the superior performance of the proposed method compared to the commonly used single frame rate method for three types of disguised voice taken from the CHAINS corpus....

  9. Making sense of (exceptional) causal relations. A cross-cultural and cross-linguistic study.

    Science.gov (United States)

    Le Guen, Olivier; Samland, Jana; Friedrich, Thomas; Hanus, Daniel; Brown, Penelope

    2015-01-01

    In order to make sense of the world, humans tend to see causation almost everywhere. Although most causal relations may seem straightforward, they are not always construed in the same way cross-culturally. In this study, we investigate concepts of "chance," "coincidence," or "randomness" that refer to assumed relations between intention, action, and outcome in situations, and we ask how people from different cultures make sense of such non-law-like connections. Based on a framework proposed by Alicke (2000), we administered a task that aims to be a neutral tool for investigating causal construals cross-culturally and cross-linguistically. Members of four different cultural groups, rural Mayan Yucatec and Tseltal speakers from Mexico and urban students from Mexico and Germany, were presented with a set of scenarios involving various types of causal and non-causal relations and were asked to explain the described events. Three links varied as to whether they were present or not in the scenarios: Intention-to-Action, Action-to-Outcome, and Intention-to-Outcome. Our results show that causality is recognized in all four cultural groups. However, how causality and especially non-law-like relations are interpreted depends on the type of links, the cultural background and the language used. In all three groups, Action-to-Outcome is the decisive link for recognizing causality. Despite the fact that the two Mayan groups share similar cultural backgrounds, they display different ideologies regarding concepts of non-law-like relations. The data suggests that the concept of "chance" is not universal, but seems to be an explanation that only some cultural groups draw on to make sense of specific situations. Of particular importance is the existence of linguistic concepts in each language that trigger ideas of causality in the responses from each cultural group.

  10. Making sense of (exceptional) causal relations. A cross-cultural and cross-linguistic study

    Science.gov (United States)

    Le Guen, Olivier; Samland, Jana; Friedrich, Thomas; Hanus, Daniel; Brown, Penelope

    2015-01-01

    In order to make sense of the world, humans tend to see causation almost everywhere. Although most causal relations may seem straightforward, they are not always construed in the same way cross-culturally. In this study, we investigate concepts of “chance,” “coincidence,” or “randomness” that refer to assumed relations between intention, action, and outcome in situations, and we ask how people from different cultures make sense of such non-law-like connections. Based on a framework proposed by Alicke (2000), we administered a task that aims to be a neutral tool for investigating causal construals cross-culturally and cross-linguistically. Members of four different cultural groups, rural Mayan Yucatec and Tseltal speakers from Mexico and urban students from Mexico and Germany, were presented with a set of scenarios involving various types of causal and non-causal relations and were asked to explain the described events. Three links varied as to whether they were present or not in the scenarios: Intention-to-Action, Action-to-Outcome, and Intention-to-Outcome. Our results show that causality is recognized in all four cultural groups. However, how causality and especially non-law-like relations are interpreted depends on the type of links, the cultural background and the language used. In all three groups, Action-to-Outcome is the decisive link for recognizing causality. Despite the fact that the two Mayan groups share similar cultural backgrounds, they display different ideologies regarding concepts of non-law-like relations. The data suggests that the concept of “chance” is not universal, but seems to be an explanation that only some cultural groups draw on to make sense of specific situations. Of particular importance is the existence of linguistic concepts in each language that trigger ideas of causality in the responses from each cultural group. PMID:26579028

  11. Phraseology and Frequency of Occurrence on the Web: Native Speakers' Perceptions of Google-Informed Second Language Writing

    Science.gov (United States)

    Geluso, Joe

    2013-01-01

    Usage-based theories of language learning suggest that native speakers of a language are acutely aware of formulaic language due in large part to frequency effects. Corpora and data-driven learning can offer useful insights into frequent patterns of naturally occurring language to second/foreign language learners who, unlike native speakers, are…

  12. A Cross-Cultural Comparative Study of Apology Strategies Employed by Iranian EFL Learners and English Native Speakers

    Directory of Open Access Journals (Sweden)

    Elham Abedi

    2016-10-01

    Full Text Available The development of speech-act theory has provided the hearers with a better understanding of what speakers intend to perform in the act of communication. One type of speech act is apologizing. When an action or utterance has resulted in an offense, the offender needs to apologize. In the present study, an attempt was made to compare the apology strategies employed by Iranian EFL learners and those of English native speakers in order to find out the possible differences and similarities. To this end, a discourse completion test (DCT was given to 100 male and female Iranian EFL learners and English native speakers. The respondents were supposed to complete the DCTs based on nine situations, which varied in terms of power between the interlocutors and level of imposition. This study employed Cohen and Olshtain's (1981 model to classify various types of apology strategies. The obtained results revealed some similarities along with some (statistically insignificant differences between EFL learners and American English speakers in terms of their use of apology strategies. Furthermore, it was found that the illocutionary force indicating devices (IFIDs, such as request for forgiveness and an offer of apology were the strategies mostly employed by the Iranian EFL learners while taking on responsibility such as explicit self-blame, and expression of self-deficiency were found to be the strategies mostly used by English native speakers. In terms of gender, the male and female respondents more or less used the same apology strategies in response to the situations. The findings of the present research can be used by language teachers as well as sociolinguists. Keywords: Speech act theory, Speech act of apology, Apology strategies, Iranian EFL learners, English Native speakers, Gender

  13. Evidential Uses in the Spanish of Quechua Speakers in Peru.

    Science.gov (United States)

    Escobar, Anna Maria

    1994-01-01

    Analysis of recordings of spontaneous speech of native speakers of Quechua speaking Spanish as a second language reveals that, using verbal morphological resources of Spanish, they have grammaticalized an epistemic marking system resembling that of Quechua. Sources of this process in both Quechua and Spanish are analyzed. (MSE)

  14. Openings and Closings in Telephone Conversations between Native Spanish Speakers.

    Science.gov (United States)

    Coronel-Molina, Serafin M.

    1998-01-01

    A study analyzed the opening and closing sequences of 11 dyads of native Spanish-speakers in natural telephone conversations conducted in Spanish. The objective was to determine how closely Hispanic cultural patterns of conduct for telephone conversations follow the sequences outlined in previous research. It is concluded that Spanish…

  15. Age of acquisition and naming performance in Frisian-Dutch bilingual speakers with dementia

    Directory of Open Access Journals (Sweden)

    Wencke S. Veenstra

    Full Text Available Age of acquisition (AoA of words is a recognised variable affecting language processing in speakers with and without language disorders. For bi- and multilingual speakers their languages can be differentially affected in neurological illness. Study of language loss in bilingual speakers with dementia has been relatively neglected.OBJECTIVE:We investigated whether AoA of words was associated with level of naming impairment in bilingual speakers with probable Alzheimer's dementia within and across their languages.METHODS:Twenty-six Frisian-Dutch bilinguals with mild to moderate dementia named 90 pictures in each language, employing items with rated AoA and other word variable measures matched across languages. Quantitative (totals correct and qualitative (error types and (inappropriate switching aspects were measured.RESULTSImpaired retrieval occurred in Frisian (Language 1 and Dutch (Language 2, with a significant effect of AoA on naming in both languages. Earlier acquired words were better preserved and retrieved. Performance was identical across languages, but better in Dutch when controlling for covariates. However, participants demonstrated more inappropriate code switching within the Frisian test setting. On qualitative analysis, no differences in overall error distribution were found between languages for early or late acquired words. There existed a significantly higher percentage of semantically than visually-related errors.CONCLUSIONThese findings have implications for understanding problems in lexical retrieval among bilingual individuals with dementia and its relation to decline in other cognitive functions which may play a role in inappropriate code switching. We discuss the findings in the light of the close relationship between Frisian and Dutch and the pattern of usage across the life-span.

  16. Effects of a metronome on the filled pauses of fluent speakers.

    Science.gov (United States)

    Christenfeld, N

    1996-12-01

    Filled pauses (the "ums" and "uhs" that litter spontaneous speech) seem to be a product of the speaker paying deliberate attention to the normally automatic act of talking. This is the same sort of explanation that has been offered for stuttering. In this paper we explore whether a manipulation that has long been known to decrease stuttering, synchronizing speech to the beats of a metronome, will then also decrease filled pauses. Two experiments indicate that a metronome has a dramatic effect on the production of filled pauses. This effect is not due to any simplification or slowing of the speech and supports the view that a metronome causes speakers to attend more to how they are talking and less to what they are saying. It also lends support to the connection between stutters and filled pauses.

  17. A Novel Approach in Text-Independent Speaker Recognition in Noisy Environment

    Directory of Open Access Journals (Sweden)

    Nona Heydari Esfahani

    2014-10-01

    Full Text Available In this paper, robust text-independent speaker recognition is taken into consideration. The proposed method performs on manual silence-removed utterances that are segmented into smaller speech units containing few phones and at least one vowel. The segments are basic units for long-term feature extraction. Sub-band entropy is directly extracted in each segment. A robust vowel detection method is then applied on each segment to separate a high energy vowel that is used as unit for pitch frequency and formant extraction. By applying a clustering technique, extracted short-term features namely MFCC coefficients are combined with long term features. Experiments using MLP classifier show that the average speaker accuracy recognition rate is 97.33% for clean speech and 61.33% in noisy environment for -2db SNR, that shows improvement compared to other conventional methods.

  18. The Blame Game: Performance Analysis of Speaker Diarization System Components

    NARCIS (Netherlands)

    Huijbregts, M.A.H.; Wooters, Chuck

    2007-01-01

    In this paper we discuss the performance analysis of a speaker diarization system similar to the system that was submitted by ICSI at the NIST RT06s evaluation benchmark. The analysis that is based on a series of oracle experiments, provides a good understanding of the performance of each system

  19. Adaptive Communication: Languages with More Non-Native Speakers Tend to Have Fewer Word Forms.

    Directory of Open Access Journals (Sweden)

    Christian Bentz

    Full Text Available Explaining the diversity of languages across the world is one of the central aims of typological, historical, and evolutionary linguistics. We consider the effect of language contact-the number of non-native speakers a language has-on the way languages change and evolve. By analysing hundreds of languages within and across language families, regions, and text types, we show that languages with greater levels of contact typically employ fewer word forms to encode the same information content (a property we refer to as lexical diversity. Based on three types of statistical analyses, we demonstrate that this variance can in part be explained by the impact of non-native speakers on information encoding strategies. Finally, we argue that languages are information encoding systems shaped by the varying needs of their speakers. Language evolution and change should be modeled as the co-evolution of multiple intertwined adaptive systems: On one hand, the structure of human societies and human learning capabilities, and on the other, the structure of language.

  20. Adaptive Communication: Languages with More Non-Native Speakers Tend to Have Fewer Word Forms

    Science.gov (United States)

    Bentz, Christian; Verkerk, Annemarie; Kiela, Douwe; Hill, Felix; Buttery, Paula

    2015-01-01

    Explaining the diversity of languages across the world is one of the central aims of typological, historical, and evolutionary linguistics. We consider the effect of language contact-the number of non-native speakers a language has-on the way languages change and evolve. By analysing hundreds of languages within and across language families, regions, and text types, we show that languages with greater levels of contact typically employ fewer word forms to encode the same information content (a property we refer to as lexical diversity). Based on three types of statistical analyses, we demonstrate that this variance can in part be explained by the impact of non-native speakers on information encoding strategies. Finally, we argue that languages are information encoding systems shaped by the varying needs of their speakers. Language evolution and change should be modeled as the co-evolution of multiple intertwined adaptive systems: On one hand, the structure of human societies and human learning capabilities, and on the other, the structure of language. PMID:26083380

  1. Do Speakers and Listeners Observe the Gricean Maxim of Quantity?

    Science.gov (United States)

    Engelhardt, Paul E.; Bailey, Karl G. D.; Ferreira, Fernanda

    2006-01-01

    The Gricean Maxim of Quantity is believed to govern linguistic performance. Speakers are assumed to provide as much information as required for referent identification and no more, and listeners are believed to expect unambiguous but concise descriptions. In three experiments we examined the extent to which naive participants are sensitive to the…

  2. Speech overlap detection in a two-pass speaker diarization system

    NARCIS (Netherlands)

    Huijbregts, M.A.H.; Leeuwen, D.A. van; Jong, F. M. G de

    2009-01-01

    In this paper we present the two-pass speaker diarization system that we developed for the NIST RT09s evaluation. In the first pass of our system a model for speech overlap detection is gen- erated automatically. This model is used in two ways to reduce the diarization errors due to overlapping

  3. Speech overlap detection in a two-pass speaker diarization system

    NARCIS (Netherlands)

    Huijbregts, M.; Leeuwen, D.A. van; Jong, F.M.G. de

    2009-01-01

    In this paper we present the two-pass speaker diarization system that we developed for the NIST RT09s evaluation. In the first pass of our system a model for speech overlap detection is generated automatically. This model is used in two ways to reduce the diarization errors due to overlapping

  4. Reading and Vocabulary Recommendations for Spanish for Native Speakers Materials.

    Science.gov (United States)

    Spencer, Laura Gutierrez

    1995-01-01

    Focuses on the need for appropriate materials to address the needs of native speakers of Spanish who study Spanish in American universities and high schools. The most important factors influencing the selection of readings should include the practical nature of themes for reading and vocabulary development, level of difficulty, and variety in…

  5. Linear array of photodiodes to track a human speaker for video recording

    International Nuclear Information System (INIS)

    DeTone, D; Neal, H; Lougheed, R

    2012-01-01

    Communication and collaboration using stored digital media has garnered more interest by many areas of business, government and education in recent years. This is due primarily to improvements in the quality of cameras and speed of computers. An advantage of digital media is that it can serve as an effective alternative when physical interaction is not possible. Video recordings that allow for viewers to discern a presenter's facial features, lips and hand motions are more effective than videos that do not. To attain this, one must maintain a video capture in which the speaker occupies a significant portion of the captured pixels. However, camera operators are costly, and often do an imperfect job of tracking presenters in unrehearsed situations. This creates motivation for a robust, automated system that directs a video camera to follow a presenter as he or she walks anywhere in the front of a lecture hall or large conference room. Such a system is presented. The system consists of a commercial, off-the-shelf pan/tilt/zoom (PTZ) color video camera, a necklace of infrared LEDs and a linear photodiode array detector. Electronic output from the photodiode array is processed to generate the location of the LED necklace, which is worn by a human speaker. The computer controls the video camera movements to record video of the speaker. The speaker's vertical position and depth are assumed to remain relatively constant– the video camera is sent only panning (horizontal) movement commands. The LED necklace is flashed at 70Hz at a 50% duty cycle to provide noise-filtering capability. The benefit to using a photodiode array versus a standard video camera is its higher frame rate (4kHz vs. 60Hz). The higher frame rate allows for the filtering of infrared noise such as sunlight and indoor lighting–a capability absent from other tracking technologies. The system has been tested in a large lecture hall and is shown to be effective.

  6. Linear array of photodiodes to track a human speaker for video recording

    Science.gov (United States)

    DeTone, D.; Neal, H.; Lougheed, R.

    2012-12-01

    Communication and collaboration using stored digital media has garnered more interest by many areas of business, government and education in recent years. This is due primarily to improvements in the quality of cameras and speed of computers. An advantage of digital media is that it can serve as an effective alternative when physical interaction is not possible. Video recordings that allow for viewers to discern a presenter's facial features, lips and hand motions are more effective than videos that do not. To attain this, one must maintain a video capture in which the speaker occupies a significant portion of the captured pixels. However, camera operators are costly, and often do an imperfect job of tracking presenters in unrehearsed situations. This creates motivation for a robust, automated system that directs a video camera to follow a presenter as he or she walks anywhere in the front of a lecture hall or large conference room. Such a system is presented. The system consists of a commercial, off-the-shelf pan/tilt/zoom (PTZ) color video camera, a necklace of infrared LEDs and a linear photodiode array detector. Electronic output from the photodiode array is processed to generate the location of the LED necklace, which is worn by a human speaker. The computer controls the video camera movements to record video of the speaker. The speaker's vertical position and depth are assumed to remain relatively constant- the video camera is sent only panning (horizontal) movement commands. The LED necklace is flashed at 70Hz at a 50% duty cycle to provide noise-filtering capability. The benefit to using a photodiode array versus a standard video camera is its higher frame rate (4kHz vs. 60Hz). The higher frame rate allows for the filtering of infrared noise such as sunlight and indoor lighting-a capability absent from other tracking technologies. The system has been tested in a large lecture hall and is shown to be effective.

  7. Native Speakers as Teachers in Turkey: Non-Native Pre-Service English Teachers' Reactions to a Nation-Wide Project

    Science.gov (United States)

    Coskun, Abdullah

    2013-01-01

    Although English is now a recognized international language and the concept of native speaker is becoming more doubtful every day, the empowerment of the native speakers of English as language teaching professionals is still continuing (McKay, 2002), especially in Asian countries like China and Japan. One of the latest examples showing the…

  8. Referential Choices in a Collaborative Storytelling Task: Discourse Stages and Referential Complexity Matter.

    Science.gov (United States)

    Fossard, Marion; Achim, Amélie M; Rousier-Vercruyssen, Lucie; Gonzalez, Sylvia; Bureau, Alexandre; Champagne-Lavau, Maud

    2018-01-01

    During a narrative discourse, accessibility of the referents is rarely fixed once and for all. Rather, each referent varies in accessibility as the discourse unfolds, depending on the presence and prominence of the other referents. This leads the speaker to use various referential expressions to refer to the main protagonists of the story at different moments in the narrative. This study relies on a new, collaborative storytelling in sequence task designed to assess how speakers adjust their referential choices when they refer to different characters at specific discourse stages corresponding to the introduction, maintaining, or shift of the character in focus, in increasingly complex referential contexts. Referential complexity of the stories was manipulated through variations in the number of characters (1 vs. 2) and, for stories in which there were two characters, in their ambiguity in gender (different vs. same gender). Data were coded for the type of reference markers as well as the type of reference content (i.e., the extent of the information provided in the referential expression). Results showed that, beyond the expected effects of discourse stages on reference markers (more indefinite markers at the introduction stage, more pronouns at the maintaining stage, and more definite markers at the shift stage), the number of characters and their ambiguity in gender also modulated speakers' referential choices at specific discourse stages, For the maintaining stage, an effect of the number of characters was observed for the use of pronouns and of definite markers, with more pronouns when there was a single character, sometimes replaced by definite expressions when two characters were present in the story. For the shift stage, an effect of gender ambiguity was specifically noted for the reference content with more specific information provided in the referential expression when there was referential ambiguity. Reference content is an aspect of referential marking

  9. Referential Choices in a Collaborative Storytelling Task: Discourse Stages and Referential Complexity Matter

    Directory of Open Access Journals (Sweden)

    Marion Fossard

    2018-02-01

    Full Text Available During a narrative discourse, accessibility of the referents is rarely fixed once and for all. Rather, each referent varies in accessibility as the discourse unfolds, depending on the presence and prominence of the other referents. This leads the speaker to use various referential expressions to refer to the main protagonists of the story at different moments in the narrative. This study relies on a new, collaborative storytelling in sequence task designed to assess how speakers adjust their referential choices when they refer to different characters at specific discourse stages corresponding to the introduction, maintaining, or shift of the character in focus, in increasingly complex referential contexts. Referential complexity of the stories was manipulated through variations in the number of characters (1 vs. 2 and, for stories in which there were two characters, in their ambiguity in gender (different vs. same gender. Data were coded for the type of reference markers as well as the type of reference content (i.e., the extent of the information provided in the referential expression. Results showed that, beyond the expected effects of discourse stages on reference markers (more indefinite markers at the introduction stage, more pronouns at the maintaining stage, and more definite markers at the shift stage, the number of characters and their ambiguity in gender also modulated speakers' referential choices at specific discourse stages, For the maintaining stage, an effect of the number of characters was observed for the use of pronouns and of definite markers, with more pronouns when there was a single character, sometimes replaced by definite expressions when two characters were present in the story. For the shift stage, an effect of gender ambiguity was specifically noted for the reference content with more specific information provided in the referential expression when there was referential ambiguity. Reference content is an aspect of

  10. THE ROLE OF NON-NATIVE ENGLISH SPEAKER TEACHERS IN ENGLISH LANGUAGE LEARNING

    Directory of Open Access Journals (Sweden)

    Lutfi Ashar Mauludin

    2017-04-01

    Full Text Available Native-English Speaker Teachers (NESTs and Non-Native English Speaker Teachers (NNESTs have their own advantages and disadvantages. However, for English Language Learners (ELLs, NNESTs have more advantages in helping students to acquire English skills. At least there are three factors that can only be performed by NNESTs in English Language Learning. The factors are knowledge of the subject, effective communication, and understanding students‘ difficulties/needs. The NNESTs can effectively provide the clear explanation of knowledge of the language because they are supported by the same background and culture. NNESTs also can communicate with the students with all levels effectively. The use of L1 is effective to help students building their knowledge. Finally, NNESTs can provide the objectives and materials that are suitable with the needs of the students.

  11. Politics of Participation in Benoît Maubrey’s Speaker Sculptures

    DEFF Research Database (Denmark)

    Keylin, Vadim

    a designated number, or using Bluetooth or WiFi technologies, and express themselves freely through the sculpture. In my paper, I investigate the strategies of audience engagement the Maubrey employs and their applicability to the acoustic design of urban spaces. Through their numerous loudspeakers, Speaker...

  12. Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem

    NARCIS (Netherlands)

    Friedland, G.; Yeo, C.; Hung, H.

    2010-01-01

    The following article presents a novel audio-visual approach for unsupervised speaker localization in both time and space and systematically analyzes its unique properties. Using recordings from a single, low-resolution room overview camera and a single far-field microphone, a state-of-the-art

  13. Speech rate normalization used to improve speaker verification

    CSIR Research Space (South Africa)

    Van Heerden, CJ

    2006-11-01

    Full Text Available the normalized durations is then compared with the EER using unnormalized durations, and also with the EER when duration information is not employed. 2. Proposed phoneme duration modeling 2.1. Choosing parametric models Since the duration of a phoneme... the known transcription and the speaker-specific acoustic model described above. Only one pronunciation per word was allowed, thus resulting in 49 triphones. To decide which parametric model to use for the duration density func- tions of the triphones...

  14. Does the speaker's voice quality influence children's performance on a language comprehension test?

    Science.gov (United States)

    Lyberg-Åhlander, Viveka; Haake, Magnus; Brännström, Jonas; Schötz, Susanne; Sahlén, Birgitta

    2015-02-01

    A small number of studies have explored children's perception of speakers' voice quality and its possible influence on language comprehension. The aim of this explorative study was to investigate the relationship between the examiner's voice quality, the child's performance on a digital version of a language comprehension test, the Test for Reception of Grammar (TROG-2), and two measures of cognitive functioning. The participants were (n = 86) mainstreamed 8-year old children with typical language development. Two groups of children (n = 41/45) were presented with the TROG-2 through recordings of one female speaker: one group was presented with a typical voice and the other with a simulated dysphonic voice. Significant associations were found between executive functioning and language comprehension. The results also showed that children listening to the dysphonic voice achieved significantly lower scores for more difficult sentences ("the man but not the horse jumps") and used more self-corrections on simpler sentences ("the girl is sitting"). Findings suggest that a dysphonic speaker's voice may force the child to allocate capacity to the processing of the voice signal at the expense of comprehension. The findings have implications for clinical and research settings where standardized language tests are used.

  15. Using Closed-Set Speaker Identification Score Confidence to Enhance Audio-Based Collaborative Filtering for Multiple Users

    DEFF Research Database (Denmark)

    Shepstone, Sven Ewan; Tan, Zheng-Hua; Kristoffersen, Miklas Strøm

    2018-01-01

    In this paper, we utilize a closed-set speaker-identification approach to convey the ratings needed for collaborative filtering-based recommendation. Instead of explicitly providing a rating for a given program, users use a speech interface to dictate the desired rating after watching a movie. Due...... to the inaccuracies that may be imposed by a state-of-the-art speaker identification system, it is possible to mistake a user for another user in the household, especially when the users exhibit similar or identical age and gender demographics. This leads to the undesirable effect of injecting unwanted ratings...... into the collaborative rating matrix, and when the users have different tastes, can result in the recommendation of undesirable items. We therefore propose a simple confidence-based heuristic that utilizes the log-likelihood scores from the speaker identification front-end. The algorithm limits the degree to which...

  16. A Novel Approach to Speaker Weight Estimation Using a Fusion of the i-vector and NFA Frameworks

    DEFF Research Database (Denmark)

    Poorjam, Amir Hossein; Bahari, Mohamad Hasan; Van hamme, Hogo

    2017-01-01

    -negative Factor Analysis (NFA) framework which is based on a constrained factor analysis on GMM weight supervectors. Then, the available information in both Gaussian means and Gaussian weights is exploited through a feature-level fusion of the i-vectors and the NFA vectors. Finally, a least-squares support vector......This paper proposes a novel approach for automatic speaker weight estimation from spontaneous telephone speech signals. In this method, each utterance is modeled using the i-vector framework which is based on the factor analysis on Gaussian Mixture Model (GMM) mean supervectors, and the Non...... regression is employed to estimate the weight of speakers from the given utterances. The proposed approach is evaluated on spontaneous telephone speech signals of National Institute of Standards and Technology 2008 and 2010 Speaker Recognition Evaluation corpora. To investigate the effectiveness...

  17. MTGAN: Speaker Verification through Multitasking Triplet Generative Adversarial Networks

    OpenAIRE

    Ding, Wenhao; He, Liang

    2018-01-01

    In this paper, we propose an enhanced triplet method that improves the encoding process of embeddings by jointly utilizing generative adversarial mechanism and multitasking optimization. We extend our triplet encoder with Generative Adversarial Networks (GANs) and softmax loss function. GAN is introduced for increasing the generality and diversity of samples, while softmax is for reinforcing features about speakers. For simplification, we term our method Multitasking Triplet Generative Advers...

  18. Diversity in the lexical and syntactic abilities of fluent aphasic speakers

    NARCIS (Netherlands)

    Bastiaanse, Y.R.M.; Edwards, S.

    In an earlier study by the authors, it was suggested that some fluent aphasic speakers exhibit subtle grammatical deficits. In this paper, how far lexical accessing problems might account for these deficits is considered. For this study, spontaneous speech data collected from two groups of aphasic

  19. Procedure for inscription in the list of speakers at meetings of the Board of Governors

    International Nuclear Information System (INIS)

    2002-01-01

    Full text: 1. By Rule 23 (d) of the Provisional Rules of Procedure of the Board of Governors:' No Governor may address the Board without having previously obtained the permission of the presiding officer. The presiding officer shall call upon speakers in the order in which they signify their desire to speak. The presiding officer may call a speaker to order if his remarks are not relevant to the subject under discussion.' The following procedures are applied generally concerning the implementation of Rule 23. 2. Governors or other members of delegations who wish to speak on an item notify the Secretary of the Board of their intention to speak. A Secretariat staff member will also be present on the podium in the Boardroom each day from 9:30 a.m. until the Board meeting commences, to receive requests from delegations to be added to the list of speakers. After the meeting commences delegates wishing to speak should raise their flag to be recognized by the Secretary of the Board. The names of delegations are inscribed on a single list of speakers, maintained by the Secretary of the Board, in the order in which they have signified their wish to speak. e. It may be noted that, in the course of debate on a particular item, delegations may signify an intention to speak more than once, including speaking in response to issues raised by other delegations during the debate. When two or more delegations simultaneously indicate from the floor an intention to speak, their names are inscribed in the order in which they are brought to the attention of the Secretary. 4. The established practice is to first call on delegations wishing to speak on behalf of regional groups on any specific item, followed by individual delegations, in the order in which their names are inscribed on the list of speakers. 5. Member States will appreciate that these guidelines cannot cover all contingencies and that special cases may arise from time to time. The Chairman will exercise flexibility where

  20. Increased vocal intensity due to the Lombard effect in speakers with Parkinson's disease: simultaneous laryngeal and respiratory strategies.

    Science.gov (United States)

    Stathopoulos, Elaine T; Huber, Jessica E; Richardson, Kelly; Kamphaus, Jennifer; DeCicco, Devan; Darling, Meghan; Fulcher, Katrina; Sussman, Joan E

    2014-01-01

    The objective of the present study was to investigate whether speakers with hypophonia, secondary to Parkinson's disease (PD), would increases their vocal intensity when speaking in a noisy environment (Lombard effect). The other objective was to examine the underlying laryngeal and respiratory strategies used to increase vocal intensity. Thirty-three participants with PD were included for study. Each participant was fitted with the SpeechVive™ device that played multi-talker babble noise into one ear during speech. Using acoustic, aerodynamic and respiratory kinematic techniques, the simultaneous laryngeal and respiratory mechanisms used to regulate vocal intensity were examined. Significant group results showed that most speakers with PD (26/33) were successful at increasing their vocal intensity when speaking in the condition of multi-talker babble noise. They were able to support their increased vocal intensity and subglottal pressure with combined strategies from both the laryngeal and respiratory mechanisms. Individual speaker analysis indicated that the particular laryngeal and respiratory interactions differed among speakers. The SpeechVive™ device elicited higher vocal intensities from patients with PD. Speakers used different combinations of laryngeal and respiratory physiologic mechanisms to increase vocal intensity, thus suggesting that disease process does not uniformly affect the speech subsystems. Readers will be able to: (1) identify speech characteristics of people with Parkinson's disease (PD), (2) identify typical respiratory strategies for increasing sound pressure level (SPL), (3) identify typical laryngeal strategies for increasing SPL, (4) define the Lombard effect. Copyright © 2014 Elsevier Inc. All rights reserved.

  1. Extending Situated Language Comprehension (Accounts with Speaker and Comprehender Characteristics: Toward Socially Situated Interpretation

    Directory of Open Access Journals (Sweden)

    Katja Münster

    2018-01-01

    Full Text Available More and more findings suggest a tight temporal coupling between (non-linguistic socially interpreted context and language processing. Still, real-time language processing accounts remain largely elusive with respect to the influence of biological (e.g., age and experiential (e.g., world and moral knowledge comprehender characteristics and the influence of the ‘socially interpreted’ context, as for instance provided by the speaker. This context could include actions, facial expressions, a speaker’s voice or gaze, and gestures among others. We review findings from social psychology, sociolinguistics and psycholinguistics to highlight the relevance of (the interplay between the socially interpreted context and comprehender characteristics for language processing. The review informs the extension of an extant real-time processing account (already featuring a coordinated interplay between language comprehension and the non-linguistic visual context with a variable (‘ProCom’ that captures characteristics of the language user and with a first approximation of the comprehender’s speaker representation. Extending the CIA to the sCIA (social Coordinated Interplay Account is the first step toward a real-time language comprehension account which might eventually accommodate the socially situated communicative interplay between comprehenders and speakers.

  2. Language matters: thirteen-month-olds understand that the language a speaker uses constrains conventionality.

    Science.gov (United States)

    Scott, Jessica C; Henderson, Annette M E

    2013-11-01

    Object labels are valuable communicative tools because their meanings are shared among the members of a particular linguistic community. The current research was conducted to investigate whether 13-month-old infants appreciate that object labels should not be generalized across individuals who have been shown to speak different languages. Using a visual habituation paradigm, Experiment 1 tested whether infants would generalize a new object label that was taught to them by a speaker of a foreign language to a speaker from the infant's own linguistic group. The results suggest that infants do not expect 2 individuals who have been shown to speak different languages to use the same label to refer to the same object. The results of Experiment 2 reveal that infants do not generalize a new object label that was taught to them by a speaker of their native language to an individual who had been shown to speak a foreign language. These findings offer the first evidence that by the end of the 1st year of life, infants are sensitive to the fact that the conventional nature of language is constrained by the language that a person has been shown to speak.

  3. The role of the phonological loop in English word learning: a comparison of Chinese ESL learners and native speakers.

    Science.gov (United States)

    Hamada, Megumi; Koda, Keiko

    2011-04-01

    Although the role of the phonological loop in word-retention is well documented, research in Chinese character retention suggests the involvement of non-phonological encoding. This study investigated whether the extent to which the phonological loop contributes to learning and remembering visually introduced words varies between college-level Chinese ESL learners (N = 20) and native speakers of English (N = 20). The groups performed a paired associative learning task under two conditions (control versus articulatory suppression) with two word types (regularly spelled versus irregularly spelled words) differing in degree of phonological accessibility. The results demonstrated that both groups' recall declined when the phonological loop was made less available (with irregularly spelled words and in the articulatory suppression condition), but the decline was greater for the native group. These results suggest that word learning entails phonological encoding uniformly across learners, but the contribution of phonology varies among learners with diverse linguistic backgrounds.

  4. Achieving Speaker Gender Equity at the American Society for Microbiology General Meeting.

    Science.gov (United States)

    Casadevall, Arturo

    2015-08-04

    In 2015, the American Society for Microbiology (ASM) General Meeting essentially achieved gender equity, with 48.5% of the oral presentations being given by women. The mechanisms associated with increased female participation were (i) making the Program Committee aware of gender statistics, (ii) increasing female representation among session convener teams, and (iii) direct instruction to try to avoid all-male sessions. The experience with the ASM General Meeting shows that it is possible to increase the participation of female speakers in a relatively short time and suggests concrete steps that may be taken to achieve this at other meetings. Public speaking is very important for academic advancement in science. Historically women have been underrepresented as speakers in many scientific meetings. This article describes concrete steps that were associated with achieving gender equity at a major meeting. Copyright © 2015 Casadevall.

  5. Infants' preferences for native speakers are associated with an expectation of information

    DEFF Research Database (Denmark)

    Begus, Katarina; Gliga, Teodora; Southgate, Victoria

    2016-01-01

    Humans' preference for others who share our group membership is well documented, and this heightened valuation of in-group members seems to be rooted in early development. Before 12 mo of age, infants already show behavioral preferences for others who evidence cues to same-group membership...... such as race or native language, yet the function of this selectivity remains unclear. We examine one of these social biases, the preference for native speakers, and propose that this preference may result from infants' motivation to obtain information and the expectation that interactions with native speakers...... in situations when they can expect to receive information. We then used this neural measure of anticipatory theta activity to explore the expectations of 11-mo-olds when facing social partners who either speak the infants' native language or a foreign tongue (study 2). A larger increase in theta oscillations...

  6. Pharmaceutical speakers' bureaus, academic freedom, and the management of promotional speaking at academic medical centers.

    Science.gov (United States)

    Boumil, Marcia M; Cutrell, Emily S; Lowney, Kathleen E; Berman, Harris A

    2012-01-01

    Pharmaceutical companies routinely engage physicians, particularly those with prestigious academic credentials, to deliver "educational" talks to groups of physicians in the community to help market the company's brand-name drugs. Although presented as educational, and even though they provide educational content, these events are intended to influence decisions about drug selection in ways that are not based on the suitability and effectiveness of the product, but on the prestige and persuasiveness of the speaker. A number of state legislatures and most academic medical centers have attempted to restrict physician participation in pharmaceutical marketing activities, though most restrictions are not absolute and have proven difficult to enforce. This article reviews the literature on why Speakers' Bureaus have become a lightning rod for academic/industry conflicts of interest and examines the arguments of those who defend physician participation. It considers whether the restrictions on Speakers' Bureaus are consistent with principles of academic freedom and concludes with the legal and institutional efforts to manage industry speaking. © 2012 American Society of Law, Medicine & Ethics, Inc.

  7. Production of lexical stress in non-native speakers of American English: kinematic correlates of stress and transfer.

    Science.gov (United States)

    Chakraborty, Rahul; Goffman, Lisa

    2011-06-01

    To assess the influence of second language (L2) proficiency on production characteristics of rhythmic sequences in the L1 (Bengali) and L2 (English), with emphasis on linguistic transfer. One goal was to examine, using kinematic evidence, how L2 proficiency influences the production of iambic and trochaic words, focusing on temporal and spatial aspects of prosody. A second goal was to assess whether prosodic structure influences judgment of foreign accent. Twenty Bengali-English bilingual individuals, 10 with low proficiency in English and 10 with high proficiency in English, and 10 monolingual English speakers, participated. Lip and jaw movements were recorded while the bilingual participants produced Bengali and English words embedded in sentences. Lower lip movement amplitude and duration were measured in trochaic and iambic words. Six native English listeners judged the nativeness of the bilingual speakers. Evidence of L1-L2 transfer was observed through duration but not amplitude cues. More proficient L2 speakers varied duration to mark iambic stress. Perceptually, the high-proficiency group received relatively higher native-like accent ratings. Trochees were judged as more native than iambs. Even in the face of L1-L2 lexical stress transfer, nonnative speakers demonstrated knowledge of prosodic contrasts. Movement duration appears to be more amenable than amplitude to modifications.

  8. Speaker Recognition for Mobile User Authentication: An Android Solution

    OpenAIRE

    Brunet , Kevin; Taam , Karim; Cherrier , Estelle; Faye , Ndiaga; Rosenberger , Christophe

    2013-01-01

    National audience; This paper deals with a biometric solution for authentication on mobile devices. Among the possible biometric modalities, speaker recognition seems the most natural choice for a mobile phone. This work lies in the continuation of our previous work \\cite{Biosig2012}, where we evaluated a candidate algorithm in terms of performance and time processing. The proposed solution is implemented here as an Android application. Its performances are evaluated both on a public database...

  9. Multistage Data Selection-based Unsupervised Speaker Adaptation for Personalized Speech Emotion Recognition

    NARCIS (Netherlands)

    Kim, Jaebok; Park, Jeong-Sik

    This paper proposes an efficient speech emotion recognition (SER) approach that utilizes personal voice data accumulated on personal devices. A representative weakness of conventional SER systems is the user-dependent performance induced by the speaker independent (SI) acoustic model framework. But,

  10. The speaker's formant.

    Science.gov (United States)

    Bele, Irene Velsvik

    2006-12-01

    The current study concerns speaking voice quality in two groups of professional voice users, teachers (n = 35) and actors (n = 36), representing trained and untrained voices. The voice quality of text reading at two intensity levels was acoustically analyzed. The central concept was the speaker's formant (SPF), related to the perceptual characteristics "better normal voice quality" (BNQ) and "worse normal voice quality" (WNQ). The purpose of the current study was to get closer to the origin of the phenomenon of the SPF, and to discover the differences in spectral and formant characteristics between the two professional groups and the two voice quality groups. The acoustic analyses were long-term average spectrum (LTAS) and spectrographical measurements of formant frequencies. At very high intensities, the spectral slope was rather quandrangular without a clear SPF peak. The trained voices had a higher energy level in the SPF region compared with the untrained, significantly so in loud phonation. The SPF seemed to be related to both sufficiently strong overtones and a glottal setting, allowing for a lowering of F4 and a closeness of F3 and F4. However, the existence of SPF also in LTAS of the WNQ voices implies that more research is warranted concerning the formation of SPF, and concerning the acoustic correlates of the BNQ voices.

  11. Identifying Core Vocabulary for Urdu Language Speakers Using Augmentative Alternative Communication

    Science.gov (United States)

    Mukati, Abdul Samad

    2013-01-01

    The purpose of this research is to identify a core set of vocabulary used by native Urdu language (UL) speakers during dyadic conversation for social interaction and relationship building. This study was conducted in Karachi, Pakistan at an institution of higher education. This research seeks to distinguish between general (nonspecific…

  12. Decoding speech perception by native and non-native speakers using single-trial electrophysiological data.

    Directory of Open Access Journals (Sweden)

    Alex Brandmeyer

    Full Text Available Brain-computer interfaces (BCIs are systems that use real-time analysis of neuroimaging data to determine the mental state of their user for purposes such as providing neurofeedback. Here, we investigate the feasibility of a BCI based on speech perception. Multivariate pattern classification methods were applied to single-trial EEG data collected during speech perception by native and non-native speakers. Two principal questions were asked: 1 Can differences in the perceived categories of pairs of phonemes be decoded at the single-trial level? 2 Can these same categorical differences be decoded across participants, within or between native-language groups? Results indicated that classification performance progressively increased with respect to the categorical status (within, boundary or across of the stimulus contrast, and was also influenced by the native language of individual participants. Classifier performance showed strong relationships with traditional event-related potential measures and behavioral responses. The results of the cross-participant analysis indicated an overall increase in average classifier performance when trained on data from all participants (native and non-native. A second cross-participant classifier trained only on data from native speakers led to an overall improvement in performance for native speakers, but a reduction in performance for non-native speakers. We also found that the native language of a given participant could be decoded on the basis of EEG data with accuracy above 80%. These results indicate that electrophysiological responses underlying speech perception can be decoded at the single-trial level, and that decoding performance systematically reflects graded changes in the responses related to the phonological status of the stimuli. This approach could be used in extensions of the BCI paradigm to support perceptual learning during second language acquisition.

  13. The beneficial effect of a speaker's gestures on the listener's memory for action phrases: The pivotal role of the listener's premotor cortex.

    Science.gov (United States)

    Ianì, Francesco; Burin, Dalila; Salatino, Adriana; Pia, Lorenzo; Ricci, Raffaella; Bucciarelli, Monica

    2018-04-10

    Memory for action phrases improves in the listeners when the speaker accompanies them with gestures compared to when the speaker stays still. Since behavioral studies revealed a pivotal role of the listeners' motor system, we aimed to disentangle the role of primary motor and premotor cortices. Participants had to recall phrases uttered by a speaker in two conditions: in the gesture condition, the speaker performed gestures congruent with the action; in the no-gesture condition, the speaker stayed still. In Experiment 1, half of the participants underwent inhibitory rTMS over the hand/arm region of the left premotor cortex (PMC) and the other half over the hand/arm region of the left primary motor cortex (M1). The enactment effect disappeared only following rTMS over PMC. In Experiment 2, we detected the usual enactment effect after rTMS over vertex, thereby excluding possible nonspecific rTMS effects. These findings suggest that the information encoded in the premotor cortex is a crucial part of the memory trace. Copyright © 2018 Elsevier Inc. All rights reserved.

  14. The N400 effect during speaker-switch – Towards a conversational approach of measuring neural correlates of language

    Directory of Open Access Journals (Sweden)

    Tatiana Goregliad Fjaellingsdal

    2016-11-01

    Full Text Available Language occurs naturally in conversations. However, the study of the neural underpinnings of language has mainly taken place in single individuals using controlled language material. The interactive elements of a conversation (e.g., turn-taking are often not part of neurolinguistic setups. The prime reason is the difficulty to combine open unrestricted conversations with the requirements of neuroimaging. It is necessary to find a trade-off between the naturalness of a conversation and the restrictions imposed by neuroscientific methods to allow for ecologically more valid studies.Here we make an attempt to study the effects of a conversational element, namely turn-taking, on linguistic neural correlates, specifically the N400 effect. We focus on the physiological aspect of turn-taking, the speaker-switch, and its effect on the detectability of the N400 effect. The N400 event-related potential reflects expectation violations in a semantic context; the N400 effect describes the difference of the N400 amplitude between semantically expected and unexpected items.Sentences with semantically congruent and incongruent final words were presented in two turn-taking modes: (1 reading aloud first part of the sentence and listening to speaker-switch for the final word, and (2 listening to first part of the sentence and speaker-switch for the final word.A significant N400 effect was found for both turn-taking modes, which was not influenced by the mode itself. However, the mode significantly affected the P200, which was increased for the reading aloud mode compared to the listening mode.Our results show that an N400 effect can be detected during a speaker-switch. Speech articulation (reading aloud before the analyzed sentence fragment did also not impede the N400 effect detection for the final word. The speaker-switch, however, seems to influence earlier components of the electroencephalogram, related to processing of salient stimuli. We conclude that the N

  15. Does Grammatical Gender Influence Perception? A Study of Polish and French Speakers

    Directory of Open Access Journals (Sweden)

    Haertlé Izabella

    2017-12-01

    Full Text Available Can the perception of a word be influenced by its grammatical gender? Can it happen that speakers of one language perceive an object to have masculine features, while speakers of another language perceive the same object to have feminine features? Previous studies suggest that this is the case, and also that there is some supra-language gender categorisation of objects as natural/feminine and artefact/masculine. This study was an attempt to replicate these findings on another population of subjects. This is the first Polish study of this kind, comparing the perceptions of objects by Polish- and French-speaking individuals. The results of this study show that grammatical gender may cue people to assess objects as masculine or feminine. However, the findings of some previous studies, that feminine features are more often ascribed to natural objects than artifacts, were not replicated.

  16. Non-Native English Speakers and Nonstandard English: An In-Depth Investigation

    Science.gov (United States)

    Polat, Brittany

    2012-01-01

    Given the rising prominence of nonstandard varieties of English around the world (Jenkins 2007), learners of English as a second language are increasingly called on to communicate with speakers of both native and non-native nonstandard English varieties. In many classrooms around the world, however, learners continue to be exposed only to…

  17. Speaker's presentations. Energy supply security

    International Nuclear Information System (INIS)

    Pierret, Ch.

    2000-01-01

    This document is a collection of most of the papers used by the speakers of the European Seminar on Energy Supply Security organised in Paris (at the French Ministry of Economy, Finance and Industry) on 24 November 2000 by the General Direction of Energy and Raw Materials, in co-operation with the European Commission and the French Planning Office. About 250 attendees were present, including a lot of high level Civil Servants from the 15 European State members, and their questions have allowed to create a rich debate. It took place five days before the publication, on 29 November 2000, by the European Commission, of the Green Paper 'Towards a European Strategy for the Security of Energy Supply'. This French initiative, which took place within the framework of the European Presidency of the European Union, during the second half-year 2000. will bring a first impetus to the brainstorming launched by the Commission. (author)

  18. Adding More Fuel to the Fire: An Eye-Tracking Study of Idiom Processing by Native and Non-Native Speakers

    Science.gov (United States)

    Siyanova-Chanturia, Anna; Conklin, Kathy; Schmitt, Norbert

    2011-01-01

    Using eye-tracking, we investigate on-line processing of idioms in a biasing story context by native and non-native speakers of English. The stimuli are idioms used figuratively ("at the end of the day"--"eventually"), literally ("at the end of the day"--"in the evening"), and novel phrases ("at the end of the war"). Native speaker results…

  19. Priming can affect naming colours using the study-test procedure. Revealing the role of task conflict.

    Science.gov (United States)

    Sharma, Dinkar

    2016-11-14

    The Stroop paradigm has been widely used to study attention whilst its use to explore implicit memory have been mixed. Using the non-colour word Stroop task we tested contrasting predictions from the proactive-control/task-conflict model (Kalanthroff, Avnit, Henik, Davelaar & Usher, 2015) that implicate response conflict and task conflict for the priming effects. Using the study-test procedure 60 native English speakers were tested to determine whether priming effects from words that had previously been studied would cause interference when presented in a colour naming task. The results replicate a finding by MacLeod (1996) who showed no differences between the response latencies to studied and unstudied words. However, this pattern was predominately in the first half of the study where it was also found that both studied and unstudied words in a mixed block were slower to respond to than a block of pure unstudied words. The second half of the study showed stronger priming interference effects as well as a sequential modulation effect in which studied words slowed down the responses of studied words on the next trial. We discuss the role of proactive and reactive control processes and conclude that task conflict best explains the pattern of priming effects reported. Copyright © 2016. Published by Elsevier B.V.

  20. Correlation between low-proficiency in English and negative perceptions of what it means to be an English speaker

    Directory of Open Access Journals (Sweden)

    Kavarljit Kaur Gill

    2013-01-01

    Full Text Available Learning another language is very much affected by positive or negative connotations attached to the new language by the language learner. Entering Malaysian public universities there are many students with a low proficiency in English, despite spending eleven years studying English in schools. Could it be that the lack of progress among these students could be attributed to a negative view of what it means to be a speaker of English? This study investigated the perceptions of students at a public university, to determine whether there is a correlation between low-proficiency and negative perceptions of what it means to be an English speaker. Analysis of the results showed that Malaysian students have a very positive perception of what it means to be an English speaker.

  1. A Study of the Effect of Emotional State upon the Variation of the Fundamental Frequency of a Speaker

    Directory of Open Access Journals (Sweden)

    Marius Vasile GHIURCAU

    2010-01-01

    Full Text Available Telephone banking or brokering, building accesssystems or forensics are some of the areas in which speakerrecognition is continuously developing. Fundamental frequencyrepresents an important speech feature used in theseapplications. In this paper we present a study of the effect ofemotional state of a speaker upon the variation of thefundamental frequency of the speech signal. Human beings arequite frequently overwhelmed by various emotions and most ofthe time one can not really control these emotional states. Forthe purpose of our work we have used the Berlin emotionalspeech database which contains utterances of 10 speakers indifferent emotional situations: happy, angry, fearful, bored andneutral. The mean fundamental frequency and also the standarddeviation for every speaker in all the emotional states werecomputed. The results show a very strong influence of theemotional state upon frequency variation.

  2. An investigation of the use of co-verbal gestures in oral discourse among Chinese speakers with fluent versus non-fluent aphasia and healthy adults

    Directory of Open Access Journals (Sweden)

    Anthony Pak Hin Kong

    2015-04-01

    Full Text Available Introduction Co-verbal gestures can facilitate word production among persons with aphasia (PWA (Rose, Douglas, & Matyas, 2002 and play a communicative role for PWA to convey ideas (Sekine & Rose, 2013. Kong, Law, Kwan, Lai, and Lam (2015 recently reported a systematic approach to independently analyze gesture forms and functions in spontaneous oral discourse produced. When this annotation framework was used to compare speech-accompanying gestures used by PWA and unimpaired speakers, Kong, Law, Wat, and Lai (2013 found a significantly higher gesture-to-word ratio among PWAs. Speakers who were more severe in aphasia or produced a lower percentage of complete sentences or simple sentences in their narratives tended to use more gestures. Moreover, verbal-semantic processing impairment, but not the degree of hemiplegia, was found to affect PWAs’ employment of gestures. The current study aims to (1 investigate whether the frequency of gestural employment varied across speakers with non-fluent aphasia, fluent aphasia, and their controls, (2 examine how the distribution of gesture forms and functions differed across the three speaker groups, and (3 determine how well factors of complexity of linguistic output, aphasia severity, semantic processing integrity, and hemiplegia would predict the frequency of gesture use among PWAs. Method The participants included 23 Cantonese-speaking individuals with fluent aphasia, 21 with non-fluent aphasia, and 23 age- and education-matched controls. Three sets of language samples and video files were collected through the narrative tasks of recounting a personally important event, sequential description, and story-telling, using the Cantonese AphasiaBank protocol (Kong, Law, & Lee, 2009. While the language samples were linguistically quantified to reflect word- and sentential-level performance as well as discourse-level characteristics, the videos were annotated on the form and function of each gesture. All PWAs were

  3. Tormenta Espacial: Engaging Spanish Speakers in the Planetarium and K-12 Classroom

    Science.gov (United States)

    Salas, F.; Duncan, D.; Traub-Metlay, S.

    2008-06-01

    Reaching out to Spanish speakers is increasingly vital to workforce development and public support of space science projects. Building on a successful partnership with NASA's TIMED mission, LASP and Space Science Institute, Fiske Planetarium has translated its original planetarium show - ``Space Storm'' - into ``Tormenta Espacial.''

  4. Muchas Caras: Engaging Spanish Speakers in the Planetarium and K--12 Classroom

    Science.gov (United States)

    Traub-Metlay, S.; Salas, F.; Duncan, D.

    2008-11-01

    Reaching out to Spanish speakers is increasingly vital to workforce development and public support of space science projects. Fiske Planetarium offers Spanish translations of our newest planetarium shows, such as ``Las Personas del Telescopio Hubble'' (``The Many Faces of Hubble'') and ``Tormenta Espacial'' (``Space Storm'').

  5. Affective processing in bilingual speakers: disembodied cognition?

    Science.gov (United States)

    Pavlenko, Aneta

    2012-01-01

    A recent study by Keysar, Hayakawa, and An (2012) suggests that "thinking in a foreign language" may reduce decision biases because a foreign language provides a greater emotional distance than a native tongue. The possibility of such "disembodied" cognition is of great interest for theories of affect and cognition and for many other areas of psychological theory and practice, from clinical and forensic psychology to marketing, but first this claim needs to be properly evaluated. The purpose of this review is to examine the findings of clinical, introspective, cognitive, psychophysiological, and neuroimaging studies of affective processing in bilingual speakers in order to identify converging patterns of results, to evaluate the claim about "disembodied cognition," and to outline directions for future inquiry. The findings to date reveal two interrelated processing effects. First-language (L1) advantage refers to increased automaticity of affective processing in the L1 and heightened electrodermal reactivity to L1 emotion-laden words. Second-language (L2) advantage refers to decreased automaticity of affective processing in the L2, which reduces interference effects and lowers electrodermal reactivity to negative emotional stimuli. The differences in L1 and L2 affective processing suggest that in some bilingual speakers, in particular late bilinguals and foreign language users, respective languages may be differentially embodied, with the later learned language processed semantically but not affectively. This difference accounts for the reduction of framing biases in L2 processing in the study by Keysar et al. (2012). The follow-up discussion identifies the limits of the findings to date in terms of participant populations, levels of processing, and types of stimuli, puts forth alternative explanations of the documented effects, and articulates predictions to be tested in future research.

  6. Gender and Number Agreement in the Oral Production of Arabic Heritage Speakers

    Science.gov (United States)

    Albirini, Abdulkafi; Benmamoun, Elabbas; Chakrani, Brahim

    2013-01-01

    Heritage language acquisition has been characterized by various asymmetries, including the differential acquisition rates of various linguistic areas and the unbalanced acquisition of different categories within a single area. This paper examines Arabic heritage speakers' knowledge of subject-verb agreement versus noun-adjective agreement with the…

  7. Evaluation of speech errors in Putonghua speakers with cleft palate: a critical review of methodology issues.

    Science.gov (United States)

    Jiang, Chenghui; Whitehill, Tara L

    2014-04-01

    Speech errors associated with cleft palate are well established for English and several other Indo-European languages. Few articles describing the speech of Putonghua (standard Mandarin Chinese) speakers with cleft palate have been published in English language journals. Although methodological guidelines have been published for the perceptual speech evaluation of individuals with cleft palate, there has been no critical review of methodological issues in studies of Putonghua speakers with cleft palate. A literature search was conducted to identify relevant studies published over the past 30 years in Chinese language journals. Only studies incorporating perceptual analysis of speech were included. Thirty-seven articles which met inclusion criteria were analyzed and coded on a number of methodological variables. Reliability was established by having all variables recoded for all studies. This critical review identified many methodological issues. These design flaws make it difficult to draw reliable conclusions about characteristic speech errors in this group of speakers. Specific recommendations are made to improve the reliability and validity of future studies, as well to facilitate cross-center comparisons.

  8. Clear Speech - Mere Speech? How segmental and prosodic speech reduction shape the impression that speakers create on listeners

    DEFF Research Database (Denmark)

    Niebuhr, Oliver

    2017-01-01

    of reduction levels and perceived speaker attributes in which moderate reduction can make a better impression on listeners than no reduction. In addition to its relevance in reduction models and theories, this interplay is instructive for various fields of speech application from social robotics to charisma...... whether variation in the degree of reduction also has a systematic effect on the attributes we ascribe to the speaker who produces the speech signal. A perception experiment was carried out for German in which 46 listeners judged whether or not speakers showing 3 different combinations of segmental...... and prosodic reduction levels (unreduced, moderately reduced, strongly reduced) are appropriately described by 13 physical, social, and cognitive attributes. The experiment shows that clear speech is not mere speech, and less clear speech is not just reduced either. Rather, results revealed a complex interplay...

  9. "Necesita una vacuna": what Spanish-speakers want in text-message immunization reminders.

    Science.gov (United States)

    Ahlers-Schmidt, Carolyn R; Chesser, Amy; Brannon, Jennifer; Lopez, Venessa; Shah-Haque, Sapna; Williams, Katherine; Hart, Traci

    2013-08-01

    Appointment reminders help parents deal with complex immunization schedules. Preferred content of text-message reminders has been identified for English-speakers. Spanish-speaking parents of children under three years old were recruited to develop Spanish text-message immunization reminders. Structured interviews included questions about demographic characteristics, use of technology, and willingness to receive text reminders. Each participant was assigned to one user-centered design (UCD) test: card sort, needs analysis or comprehension testing. Respondents (N=54) were female (70%) and averaged 27 years of age (SD=7). A card sort of 20 immunization-related statements resulted in identification of seven pieces of critical information, which were compiled into eight example texts. These texts were ranked in the needs assessment and the top two were assessed for comprehension. All participants were able to understand the content and describe intention to act. Utilizing UCD testing, Spanish-speakers identified short, specific text content that differed from preferred content of English-speaking parents.

  10. Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker's mouth movements and speech.

    Science.gov (United States)

    Shahin, Antoine J; Shen, Stanley; Kerlin, Jess R

    2017-01-01

    We examined the relationship between tolerance for audiovisual onset asynchrony (AVOA) and the spectrotemporal fidelity of the spoken words and the speaker's mouth movements. In two experiments that only varied in the temporal order of sensory modality, visual speech leading (exp1) or lagging (exp2) acoustic speech, participants watched intact and blurred videos of a speaker uttering trisyllabic words and nonwords that were noise vocoded with 4-, 8-, 16-, and 32-channels. They judged whether the speaker's mouth movements and the speech sounds were in-sync or out-of-sync . Individuals perceived synchrony (tolerated AVOA) on more trials when the acoustic speech was more speech-like (8 channels and higher vs. 4 channels), and when visual speech was intact than blurred (exp1 only). These findings suggest that enhanced spectrotemporal fidelity of the audiovisual (AV) signal prompts the brain to widen the window of integration promoting the fusion of temporally distant AV percepts.

  11. Language contact phenomena in the language use of speakers of German descent and the significance of their language attitudes

    Directory of Open Access Journals (Sweden)

    Ries, Veronika

    2014-03-01

    Full Text Available Within the scope of my investigation on language use and language attitudes of People of German Descent from the USSR, I find almost regular different language contact phenomena, such as viel bliny habn=wir gbackt (engl.: 'we cooked lots of pancakes' (cf. Ries 2011. The aim of analysis is to examine both language use with regard to different forms of language contact and the language attitudes of the observed speakers. To be able to analyse both of these aspects and synthesize them, different types of data are required. The research is based on the following two data types: everyday conversations and interviews. In addition, the individual speakers' biography is a key part of the analysis, because it allows one to draw conclusions about language attitudes and use. This qualitative research is based on morpho-syntactic and interactional linguistic analysis of authentic spoken data. The data arise from a corpus compiled and edited by myself. My being a member of the examined group allowed me to build up an authentic corpus. The natural language use is analysed from the perspective of different language contact phenomena and potential functions of language alternations. One central issue is: How do speakers use the languages available to them, German and Russian? Structural characteristics such as code switching and discursive motives for these phenomena are discussed as results, together with the socio-cultural background of the individual speaker. Within the scope of this article I present exemplarily the data and results of one speaker.

  12. Effect of an 8-week practice of externally triggered speech on basal ganglia activity of stuttering and fluent speakers.

    Science.gov (United States)

    Toyomura, Akira; Fujii, Tetsunoshin; Kuriki, Shinya

    2015-04-01

    The neural mechanisms underlying stuttering are not well understood. It is known that stuttering appears when persons who stutter speak in a self-paced manner, but speech fluency is temporarily increased when they speak in unison with external trigger such as a metronome. This phenomenon is very similar to the behavioral improvement by external pacing in patients with Parkinson's disease. Recent imaging studies have also suggested that the basal ganglia are involved in the etiology of stuttering. In addition, previous studies have shown that the basal ganglia are involved in self-paced movement. Then, the present study focused on the basal ganglia and explored whether long-term speech-practice using external triggers can induce modification of the basal ganglia activity of stuttering speakers. Our study of functional magnetic resonance imaging revealed that stuttering speakers possessed significantly lower activity in the basal ganglia than fluent speakers before practice, especially when their speech was self-paced. After an 8-week speech practice of externally triggered speech using a metronome, the significant difference in activity between the two groups disappeared. The cerebellar vermis of stuttering speakers showed significantly decreased activity during the self-paced speech in the second compared to the first experiment. The speech fluency and naturalness of the stuttering speakers were also improved. These results suggest that stuttering is associated with defective motor control during self-paced speech, and that the basal ganglia and the cerebellum are involved in an improvement of speech fluency of stuttering by the use of external trigger. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Intonation Contrast in Cantonese Speakers with Hypokinetic Dysarthria Associated with Parkinson's Disease

    Science.gov (United States)

    Ma, Joan K.-Y.; Whitehill, Tara L.; So, Susanne Y.-S.

    2010-01-01

    Purpose: Speech produced by individuals with hypokinetic dysarthria associated with Parkinson's disease (PD) is characterized by a number of features including impaired speech prosody. The purpose of this study was to investigate intonation contrasts produced by this group of speakers. Method: Speech materials with a question-statement contrast…

  14. Acoustic cues to perception of word stress by English, Mandarin, and Russian speakers.

    Science.gov (United States)

    Chrabaszcz, Anna; Winn, Matthew; Lin, Candise Y; Idsardi, William J

    2014-08-01

    This study investigated how listeners' native language affects their weighting of acoustic cues (such as vowel quality, pitch, duration, and intensity) in the perception of contrastive word stress. Native speakers (N = 45) of typologically diverse languages (English, Russian, and Mandarin) performed a stress identification task on nonce disyllabic words with fully crossed combinations of each of the 4 cues in both syllables. The results revealed that although the vowel quality cue was the strongest cue for all groups of listeners, pitch was the second strongest cue for the English and the Mandarin listeners but was virtually disregarded by the Russian listeners. Duration and intensity cues were used by the Russian listeners to a significantly greater extent compared with the English and Mandarin participants. Compared with when cues were noncontrastive across syllables, cues were stronger when they were in the iambic contour than when they were in the trochaic contour. Although both English and Russian are stress languages and Mandarin is a tonal language, stress perception performance of the Mandarin listeners but not of the Russian listeners is more similar to that of the native English listeners, both in terms of weighting of the acoustic cues and the cues' relative strength in different word positions. The findings suggest that tuning of second-language prosodic perceptions is not entirely predictable by prosodic similarities across languages.

  15. Self-Disclosure in Initial Interactions amongst Speakers of American and Australian English

    Science.gov (United States)

    Haugh, Michael; Carbaugh, Donal

    2015-01-01

    Getting acquainted with others is one of the most basic interpersonal communication events. Yet there has only been a limited number of studies that have examined variation in the interactional practices through which unacquainted persons become acquainted and establish relationships across speakers of the same language. The current study focuses…

  16. The Halo surrounding native English speaker teachers in Indonesia

    Directory of Open Access Journals (Sweden)

    Angga Kramadibrata

    2016-01-01

    Full Text Available The Native Speaker Fallacy, a commonly held belief that Native English Speaker Teachers (NESTs are inherently better than Non-NESTs, has long been questioned by ELT researchers. However, this belief still stands strong in the general public. This research looks to understand how much a teacher’s nativeness affects a student’s attitude towards them, as well as the underlying reasons for their attitudes. Sixty seven respondents in two groups were asked to watch an animated teaching video, after which they completed a questionnaire that used Likert-scales to assess comprehensibility, clarity of explanation, engagement, and preference. The videos for both groups were identical apart from the narrator; one spoke in British English, while the other, Indian English. In addition, they were also visually identified as Caucasian and Asian, respectively. The video was controlled for speed of delivery. The quantitative data were then triangulated using qualitative data collected through open questions in the questionnaire as well as from a semi-structured interview conducted with 10 respondents. The data show that there is a significant implicit preference for NEST teachers in the video, as well as in respondent’s actual classes. However, when asked explicitly, respondents didn’t rank nativeness as a very important quality in English teachers. This discrepancy between implicit and explicit attitudes might be due to a subconscious cognitive bias, namely the Halo Effect, in which humans tend to make unjustified presumptions about a person based on known but irrelevant information.

  17. Beyond the initial 140 ms, lexical decision and reading aloud are different tasks: An ERP study with topographic analysis.

    Science.gov (United States)

    Mahé, Gwendoline; Zesiger, Pascal; Laganaro, Marina

    2015-11-15

    Most of our knowledge on the time-course of the mechanisms involved in reading derived from electrophysiological studies is based on lexical decision tasks. By contrast, very few ERP studies investigated the processes involved in reading aloud. It has been suggested that the lexical decision task provides a good index of the processes occurring during reading aloud, with only late processing differences related to task response modalities. However, some behavioral studies reported different sensitivity to psycholinguistic factors between the two tasks, suggesting that print processing could differ at earlier processing stages. The aim of the present study was thus to carry out an ERP comparison between lexical decision and reading aloud in order to determine when print processing differs between these two tasks. Twenty native French speakers performed a lexical decision task and a reading aloud task with the same written stimuli. Results revealed different electrophysiological patterns on both waveform amplitudes and global topography between lexical decision and reading aloud from about 140 ms after stimulus presentation for both words and pseudowords, i.e., as early as the N170 component. These results suggest that only very early, low-level visual processes are common to the two tasks which differ in core processes. Taken together, our main finding questions the use of the lexical decision task as an appropriate paradigm to investigate reading processes and warns against generalizing its results to word reading. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. Spanish Is Foreign: Heritage Speakers' Interpretations of the Introductory Spanish Language Curriculum

    Science.gov (United States)

    DeFeo, Dayna Jean

    2015-01-01

    This article presents a case study of the perceptions of Spanish heritage speakers enrolled in introductory-level Spanish foreign language courses. Despite their own identities that were linked to the United States and Spanish of the Borderlands, the participants felt that the curriculum acknowledged the Spanish of Spain and foreign countries but…

  19. Lexical and grammatical development in trilingual speakers of isiXhosa, English and Afrikaans

    Directory of Open Access Journals (Sweden)

    Anneke P. Potgieter

    2016-05-01

    Full Text Available Background: There is a dearth of normative data on linguistic development among child speakers of Southern African languages, especially in the case of the multilingual children who constitute the largest part of this population. This inevitably impacts on the accuracy of developmental assessments of such speakers. Already negative lay opinion on the effect of early multilingualism on language development rates could be exacerbated by the lack of developmental data, ultimately affecting choices regarding home and school language policies. Objectives: To establish whether trilinguals necessarily exhibit developmental delay when compared to monolinguals and, if so, whether this delay (1 occurs in terms of both lexical and grammatical development; and (2 in all three the trilinguals’ languages, regardless of input quantity. Method: Focusing on isiXhosa, South African English and Afrikaans, the study involved a comparison of 11 four-year-old developing trilinguals’ acquisition of vocabulary and passive constructions with that of 10 age-matched monolingual speakers of each language. Results: The trilinguals proved to be monolingual-like in their lexical development in the language to which, on average, they had been exposed most over time, that is, isiXhosa. No developmental delay was found in the trilinguals’ acquisition of passive constructions, regardless of the language of testing. Conclusion: As previously found for bilingual development, necessarily reduced quantity of exposure does not hinder lexical development in the trilinguals’ input dominant language. The overall lack of delay in their acquisition of the passive is interpreted as possible evidence of cross-linguistic bootstrapping and support for early multilingual exposure.

  20. Lexical and grammatical development in trilingual speakers of isiXhosa, English and Afrikaans

    Science.gov (United States)

    2016-01-01

    Background There is a dearth of normative data on linguistic development among child speakers of Southern African languages, especially in the case of the multilingual children who constitute the largest part of this population. This inevitably impacts on the accuracy of developmental assessments of such speakers. Already negative lay opinion on the effect of early multilingualism on language development rates could be exacerbated by the lack of developmental data, ultimately affecting choices regarding home and school language policies. Objectives To establish whether trilinguals necessarily exhibit developmental delay when compared to monolinguals and, if so, whether this delay (1) occurs in terms of both lexical and grammatical development; and (2) in all three the trilinguals’ languages, regardless of input quantity. Method Focusing on isiXhosa, South African English and Afrikaans, the study involved a comparison of 11 four-year-old developing trilinguals’ acquisition of vocabulary and passive constructions with that of 10 age-matched monolingual speakers of each language. Results The trilinguals proved to be monolingual-like in their lexical development in the language to which, on average, they had been exposed most over time, that is, isiXhosa. No developmental delay was found in the trilinguals’ acquisition of passive constructions, regardless of the language of testing. Conclusion As previously found for bilingual development, necessarily reduced quantity of exposure does not hinder lexical development in the trilinguals’ input dominant language. The overall lack of delay in their acquisition of the passive is interpreted as possible evidence of cross-linguistic bootstrapping and support for early multilingual exposure. PMID:27245133

  1. Lexical and grammatical development in trilingual speakers of isiXhosa, English and Afrikaans.

    Science.gov (United States)

    Potgieter, Anneke P

    2016-05-20

    There is a dearth of normative data on linguistic development among child speakers of Southern African languages, especially in the case of the multilingual children who constitute the largest part of this population. This inevitably impacts on the accuracy of developmental assessments of such speakers. Already negative lay opinion on the effect of early multilingualism on language development rates could be exacerbated by the lack of developmental data, ultimately affecting choices regarding home and school language policies. To establish whether trilinguals necessarily exhibit developmental delay when compared to monolinguals and, if so, whether this delay (1) occurs in terms of both lexical and grammatical development; and (2) in all three the trilinguals' languages, regardless of input quantity. Focusing on isiXhosa, South African English and Afrikaans, the study involved a comparison of 11 four-year-old developing trilinguals' acquisition of vocabulary and passive constructions with that of 10 age-matched monolingual speakers of each language. The trilinguals proved to be monolingual-like in their lexical development in the language to which, on average, they had been exposed most over time, that is, isiXhosa. No developmental delay was found in the trilinguals' acquisition of passive constructions, regardless of the language of testing. As previously found for bilingual development, necessarily reduced quantity of exposure does not hinder lexical development in the trilinguals' input dominant language. The overall lack of delay in their acquisition of the passive is interpreted as possible evidence of cross-linguistic bootstrapping and support for early multilingual exposure.

  2. Characterizing opto-electret based paper speakers by using a real-time projection Moiré metrology system

    Science.gov (United States)

    Chang, Ya-Ling; Hsu, Kuan-Yu; Lee, Chih-Kung

    2016-03-01

    Advancement of distributed piezo-electret sensors and actuators facilitates various smart systems development, which include paper speakers, opto-piezo/electret bio-chips, etc. The array-based loudspeaker system possess several advantages over conventional coil speakers, such as light-weightness, flexibility, low power consumption, directivity, etc. With the understanding that the performance of the large-area piezo-electret loudspeakers or even the microfluidic biochip transport behavior could be tailored by changing their dynamic behaviors, a full-field real-time high-resolution non-contact metrology system was developed. In this paper, influence of the resonance modes and the transient vibrations of an arraybased loudspeaker system on the acoustic effect were measured by using a real-time projection moiré metrology system and microphones. To make the paper speaker even more versatile, we combine the photosensitive material TiOPc into the original electret loudspeaker. The vibration of this newly developed opto-electret loudspeaker could be manipulated by illuminating different light-intensity patterns. Trying to facilitate the tailoring process of the opto-electret loudspeaker, projection moiré was adopted to measure its vibration. By recording the projected fringes which are modulated by the contours of the testing sample, the phase unwrapping algorithm can give us a continuous phase distribution which is proportional to the object height variations. With the aid of the projection moiré metrology system, the vibrations associated with each distinctive light pattern could be characterized. Therefore, we expect that the overall acoustic performance could be improved by finding the suitable illuminating patterns. In this manuscript, the system performance of the projection moiré and the optoelectret paper speakers were cross-examined and verified by the experimental results obtained.

  3. Measures of speech rhythm and the role of corpus-based word frequency: a multifactorial comparison of Spanish(-English speakers

    Directory of Open Access Journals (Sweden)

    Michael J. Harris

    2011-12-01

    Full Text Available In this study, we address various measures that have been employed to distinguish between syllable and stress- timed languages. This study differs from all previous ones by (i exploring and comparing multiple metrics within a quantitative and multifactorial perspective and by (ii also documenting the impact of corpus-based word frequency. We begin with the basic distinctions of speech rhythms, dealing with the differences between syllable-timed languages and stress-timed languages and several methods that have been used to attempt to distinguish between the two. We then describe how these metrics were used in the current study comparing the speech rhythms of Mexican Spanish speakers and bilingual English/Spanish speakers (speakers born to Mexican parents in California. More specifically, we evaluate how well various metrics of vowel duration variability as well as the so far understudied factor of corpus-based frequency allow to classify speakers as monolingual or bilingual. A binary logistic regression identifies several main effects and interactions. Most importantly, our results call the utility of a particular rhythm metric, the PVI, into question and indicate that corpus data in the form of lemma frequencies interact with two metrics of durational variability, suggesting that durational variability metrics should ideally be studied in conjunction with corpus-based frequency data.

  4. Topic Continuity in Informal Conversations between Native and Non-Native Speakers of English

    Science.gov (United States)

    Morris-Adams, Muna

    2013-01-01

    Topic management by non-native speakers (NNSs) during informal conversations has received comparatively little attention from researchers, and receives surprisingly little attention in second language learning and teaching. This article reports on one of the topic management strategies employed by international students during informal, social…

  5. The effect of L1 prosodic backgrounds of Cantonese and Japanese speakers on the perception of Mandarin tones after training

    Science.gov (United States)

    So, Connie K.

    2005-04-01

    The present study investigated to what extent ones' L1 prosodic backgrounds affect their learning of a new tonal system. The question as to whether native speakers of a tone language perform differently from those of a pitch accent language will be addressed. Twenty native speakers of Hong Kong Cantonese (a tone language) and Japanese (a pitch accent language) were assigned to two groups. All of them had had no prior knowledge of Mandarin, and had never received any form of musical training before they participated in the study. Their performance of the identification of Mandarin tones before and after a short-term training was compared. Analysis of listeners' tonal confusions in the pretest, posttest, and generalization tests revealed that both Cantonese and Japanese listeners had more confusion for two contrastive tone pairs: Tone 1-Tone 4, and Tone 2-Tone 3. Moreover, Cantonese speakers consistently had greater difficulty than Japanese speakers in distinguishing the tones in each pair. These imply that listeners L1 prosodic backgrounds are at work during the process of learning a new tonal system. The findings will be further discussed in terms of the Perceptual Assimilation Model (Best, 1995). [Work supported by SSHRC.

  6. Shielding voices: The modulation of binding processes between voice features and response features by task representations.

    Science.gov (United States)

    Bogon, Johanna; Eisenbarth, Hedwig; Landgraf, Steffen; Dreisbach, Gesine

    2017-09-01

    Vocal events offer not only semantic-linguistic content but also information about the identity and the emotional-motivational state of the speaker. Furthermore, most vocal events have implications for our actions and therefore include action-related features. But the relevance and irrelevance of vocal features varies from task to task. The present study investigates binding processes for perceptual and action-related features of spoken words and their modulation by the task representation of the listener. Participants reacted with two response keys to eight different words spoken by a male or a female voice (Experiment 1) or spoken by an angry or neutral male voice (Experiment 2). There were two instruction conditions: half of participants learned eight stimulus-response mappings by rote (SR), and half of participants applied a binary task rule (TR). In both experiments, SR instructed participants showed clear evidence for binding processes between voice and response features indicated by an interaction between the irrelevant voice feature and the response. By contrast, as indicated by a three-way interaction with instruction, no such binding was found in the TR instructed group. These results are suggestive of binding and shielding as two adaptive mechanisms that ensure successful communication and action in a dynamic social environment.

  7. The Human Communication Research Centre dialogue database.

    Science.gov (United States)

    Anderson, A H; Garrod, S C; Clark, A; Boyle, E; Mullin, J

    1992-10-01

    The HCRC dialogue database consists of over 700 transcribed and coded dialogues from pairs of speakers aged from seven to fourteen. The speakers are recorded while tackling co-operative problem-solving tasks and the same pairs of speakers are recorded over two years tackling 10 different versions of our two tasks. In addition there are over 200 dialogues recorded between pairs of undergraduate speakers engaged on versions of the same tasks. Access to the database, and to its accompanying custom-built search software, is available electronically over the JANET system by contacting liz@psy.glasgow.ac.uk, from whom further information about the database and a user's guide to the database can be obtained.

  8. Selective social learning in infancy: looking for mechanisms.

    Science.gov (United States)

    Crivello, Cristina; Phillips, Sara; Poulin-Dubois, Diane

    2018-05-01

    Although there is mounting evidence that selective social learning begins in infancy, the psychological mechanisms underlying this ability are currently a controversial issue. The purpose of this study is to investigate whether theory of mind abilities and statistical learning skills are related to infants' selective social learning. Seventy-seven 18-month-olds were first exposed to a reliable or an unreliable speaker and then completed a word learning task, two theory of mind tasks, and a statistical learning task. If domain-general abilities are linked to selective social learning, then infants who demonstrate superior performance on the statistical learning task should perform better on the selective learning task, that is, should be less likely to learn words from an unreliable speaker. Alternatively, if domain-specific abilities are involved, then superior performance on theory of mind tasks should be related to selective learning performance. Findings revealed that, as expected, infants were more likely to learn a novel word from a reliable speaker. Importantly, infants who passed a theory of mind task assessing knowledge attribution were significantly less likely to learn a novel word from an unreliable speaker compared to infants who failed this task. No such effect was observed for the other tasks. These results suggest that infants who possess superior social-cognitive abilities are more apt to reject an unreliable speaker as informant. A video abstract of this article can be viewed at: https://youtu.be/zuuCniHYzqo. © 2017 John Wiley & Sons Ltd.

  9. Wavelet Packet Entropy in Speaker-Independent Emotional State Detection from Speech Signal

    Directory of Open Access Journals (Sweden)

    Mina Kadkhodaei Elyaderani

    2015-01-01

    Full Text Available In this paper, wavelet packet entropy is proposed for speaker-independent emotion detection from speech. After pre-processing, wavelet packet decomposition using wavelet type db3 at level 4 is calculated and Shannon entropy in its nodes is calculated to be used as feature. In addition, prosodic features such as first four formants, jitter or pitch deviation amplitude, and shimmer or energy variation amplitude besides MFCC features are applied to complete the feature vector. Then, Support Vector Machine (SVM is used to classify the vectors in multi-class (all emotions or two-class (each emotion versus normal state format. 46 different utterances of a single sentence from Berlin Emotional Speech Dataset are selected. These are uttered by 10 speakers in sadness, happiness, fear, boredom, anger, and normal emotional state. Experimental results show that proposed features can improve emotional state detection accuracy in multi-class situation. Furthermore, adding to other features wavelet entropy coefficients increase the accuracy of two-class detection for anger, fear, and happiness.

  10. Bayesian Tracking within a Feedback Sensing Environment: Estimating Interacting, Spatially Constrained Complex Dynamical Systems from Multiple Sources of Controllable Devices

    Science.gov (United States)

    2014-07-25

    composition of simple temporal structures to a speaker diarization task with the goal of segmenting conference audio in the presence of an unknown number of...application domains including neuroimaging, diverse document selection, speaker diarization , stock modeling, and target tracking. We detail each of...recall performance than competing methods in a task of discovering articles preferred by the user • a gold-standard speaker diarization method, as

  11. An Evaluation of Native-speaker Judgements of Foreign-accented British and American English

    NARCIS (Netherlands)

    Doel, W.Z. van den

    2006-01-01

    This study is the first ever to employ a large-scale Internet survey to investigate priorities in English pronunciation training. Well over 500 native speakers from throughout the English-speaking world, including North America, the British Isles, Australia and New Zealand, were asked to detect and

  12. Participation of Second Language and Second Dialect Speakers in the Legal System.

    Science.gov (United States)

    Eades, Diana

    2003-01-01

    Overviews current theory and practice and research on second language and second dialect speakers and the language of the law. Suggests most of the studies on the topic have analyzed language in courtrooms, where access to data is much easier than in other legal settings, such as police interviews, mediation sessions, or lawyer-client interviews.…

  13. Neural Control of Rising and Falling Tones in Mandarin Speakers Who Stutter

    Science.gov (United States)

    Howell, Peter; Jiang, Jing; Peng, Danling; Lu, Chunming

    2012-01-01

    Neural control of rising and falling tones in Mandarin people who stutter (PWS) was examined by comparing with that which occurs in fluent speakers [Howell, Jiang, Peng, and Lu (2012). Neural control of fundamental frequency rise and fall in Mandarin tones. "Brain and Language, 121"(1), 35-46]. Nine PWS and nine controls were scanned. Functional…

  14. Speakers' acceptance of real-time speech exchange indicates that we use auditory feedback to specify the meaning of what we say.

    Science.gov (United States)

    Lind, Andreas; Hall, Lars; Breidegard, Björn; Balkenius, Christian; Johansson, Petter

    2014-06-01

    Speech is usually assumed to start with a clearly defined preverbal message, which provides a benchmark for self-monitoring and a robust sense of agency for one's utterances. However, an alternative hypothesis states that speakers often have no detailed preview of what they are about to say, and that they instead use auditory feedback to infer the meaning of their words. In the experiment reported here, participants performed a Stroop color-naming task while we covertly manipulated their auditory feedback in real time so that they said one thing but heard themselves saying something else. Under ideal timing conditions, two thirds of these semantic exchanges went undetected by the participants, and in 85% of all nondetected exchanges, the inserted words were experienced as self-produced. These findings indicate that the sense of agency for speech has a strong inferential component, and that auditory feedback of one's own voice acts as a pathway for semantic monitoring, potentially overriding other feedback loops. © The Author(s) 2014.

  15. Migration and interaction in a contact zone: mtDNA variation among Bantu-speakers in Southern Africa.

    Directory of Open Access Journals (Sweden)

    Chiara Barbieri

    Full Text Available Bantu speech communities expanded over large parts of sub-Saharan Africa within the last 4000-5000 years, reaching different parts of southern Africa 1200-2000 years ago. The Bantu languages subdivide in several major branches, with languages belonging to the Eastern and Western Bantu branches spreading over large parts of Central, Eastern, and Southern Africa. There is still debate whether this linguistic divide is correlated with a genetic distinction between Eastern and Western Bantu speakers. During their expansion, Bantu speakers would have come into contact with diverse local populations, such as the Khoisan hunter-gatherers and pastoralists of southern Africa, with whom they may have intermarried. In this study, we analyze complete mtDNA genome sequences from over 900 Bantu-speaking individuals from Angola, Zambia, Namibia, and Botswana to investigate the demographic processes at play during the last stages of the Bantu expansion. Our results show that most of these Bantu-speaking populations are genetically very homogenous, with no genetic division between speakers of Eastern and Western Bantu languages. Most of the mtDNA diversity in our dataset is due to different degrees of admixture with autochthonous populations. Only the pastoralist Himba and Herero stand out due to high frequencies of particular L3f and L3d lineages; the latter are also found in the neighboring Damara, who speak a Khoisan language and were foragers and small-stock herders. In contrast, the close cultural and linguistic relatives of the Herero and Himba, the Kuvale, are genetically similar to other Bantu-speakers. Nevertheless, as demonstrated by resampling tests, the genetic divergence of Herero, Himba, and Kuvale is compatible with a common shared ancestry with high levels of drift, while the similarity of the Herero, Himba, and Damara probably reflects admixture, as also suggested by linguistic analyses.

  16. Impact of Industry Guest Speakers on Business Students' Perceptions of Employability Skills Development

    Science.gov (United States)

    Riebe, L.; Sibson, R.; Roepen, D.; Meakins, K.

    2013-01-01

    This study provides insights into the perceptions and expectations of Australian undergraduate business students (n=150) regarding the incorporation of guest speakers into the curriculum of a leadership unit focused on employability skills development. The authors adopted a mixed methods approach. A survey was conducted, with quantitative results…

  17. Non-Native Speakers of the Language of Instruction: Self-Perceptions of Teaching Ability

    Science.gov (United States)

    Samuel, Carolyn

    2017-01-01

    Given the linguistically diverse instructor and student populations at Canadian universities, mutually comprehensible oral language may not be a given. Indeed, both instructors who are non-native speakers of the language of instruction (NNSLIs) and students have acknowledged oral communication challenges. Little is known, though, about how the…

  18. STATE-OF-THE-ART TASKS AND ACHIEVEMENTS OF PARALINGUISTIC SPEECH ANALYSIS SYSTEMS

    Directory of Open Access Journals (Sweden)

    A. A. Karpov

    2016-07-01

    Full Text Available We present analytical survey of state-of-the-art actual tasks in the area of computational paralinguistics, as well as the recent achievements of automatic systems for paralinguistic analysis of conversational speech. Paralinguistics studies non-verbal aspects of human communication and speech such as: natural emotions, accents, psycho-physiological states, pronunciation features, speaker’s voice parameters, etc. We describe architecture of a baseline computer system for acoustical paralinguistic analysis, its main components and useful speech processing methods. We present some information on an International contest called Computational Paralinguistics Challenge (ComParE, which is held each year since 2009 in the framework of the International conference INTERSPEECH organized by the International Speech Communication Association. We present sub-challenges (tasks that were proposed at the ComParE Challenges in 2009-2016, and analyze winning computer systems for each sub-challenge and obtained results. The last completed ComParE-2015 Challenge was organized in September 2015 in Germany and proposed 3 sub-challenges: 1 Degree of Nativeness (DN sub-challenge, determination of nativeness degree of speakers based on acoustics; 2 Parkinson's Condition (PC sub-challenge, recognition of a degree of Parkinson’s condition based on speech analysis; 3 Eating Condition (EC sub-challenge, determination of the eating condition state during speaking or a dialogue, and classification of consumed food type (one of seven classes of food by the speaker. In the last sub-challenge (EC, the winner was a joint Turkish-Russian team consisting of the authors of the given paper. We have developed the most efficient computer-based system for detection and classification of the corresponding (EC acoustical paralinguistic events. The paper deals with the architecture of this system, its main modules and methods, as well as the description of used training and evaluation

  19. Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.

    Science.gov (United States)

    Gebru, Israel D; Ba, Sileye; Li, Xiaofei; Horaud, Radu

    2018-05-01

    Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.

  20. Investigations of rhesus monkey video-task performance: evidence for enrichment

    Science.gov (United States)

    Washburn, D. A.; Rumbaugh, D. M.

    1992-01-01

    We have developed the Language Research Center's Computerized Test System (LRC-CTS) for psychological research. Basically, the LRC-CTS is a battery of software tasks--computerized versions of many of the classic testing paradigms of cognitive and comparative psychology--and the hardware required to administer them. An XT- or 386-compatible computer is connected to a color monitor, onto which computer-generated stimuli are presented. Sound feedback is delivered through an external speaker/amplifier, and a joystick is used as an input device. The animals reach through the mesh of their home cages to manipulate the joystick, which causes isomorphic movements of a cursor on the screen thereby allowing animals to respond according to the varied demands of the tasks. Correct responses are rewarded with a fruit-flavored chow pellet. Using this technology, we have trained and tested rhesus monkeys, a variety of apes, human adults, and normally developing or mentally retarded human children. Other labs using the LRC-CTS are beginning to report encouraging results with other monkey species as well. From this research, a number of interesting and important psychological findings have resulted. In the present paper, however, evidence will be reviewed which suggests that the LRC-CTS is an effective means of providing environmental enrichment to singly housed rhesus monkeys.

  1. Be My Guest: A Survey of Mass Communication Students' Perception of Guest Speakers

    Science.gov (United States)

    Merle, Patrick F.; Craig, Clay

    2017-01-01

    The use of guest speakers as a pedagogical technique across disciplines at the college level is hardly novel. However, empirical assessment of journalism and mass communication students' perceptions of this practice has not previously been conducted. To fill this gap, this article presents results from an online survey specifically administered to…

  2. "Feminism Lite?" Feminist Identification, Speaker Appearance, and Perceptions of Feminist and Antifeminist Messengers

    Science.gov (United States)

    Bullock, Heather E.; Fernald, Julian L.

    2003-01-01

    Drawing on a communications model of persuasion (Hovland, Janis, & Kelley, 1953), this study examined the effect of target appearance on feminists' and nonfeminists' perceptions of a speaker delivering a feminist or an antifeminist message. One hundred three college women watched one of four videotaped speeches that varied by content (profeminist…

  3. STUDENTS WRITING EMAILS TO FACULTY: AN EXAMINATION OF E-POLITENESS AMONG NATIVE AND NON-NATIVE SPEAKERS OF ENGLISH

    Directory of Open Access Journals (Sweden)

    Sigrun Biesenbach-Lucas

    2007-02-01

    Full Text Available This study combines interlanguage pragmatics and speech act research with computer-mediated communication and examines how native and non-native speakers of English formulate low- and high-imposition requests to faculty. While some research claims that email, due to absence of non-verbal cues, encourages informal language, other research has claimed the opposite. However, email technology also allows writers to plan and revise messages before sending them, thus affording the opportunity to edit not only for grammar and mechanics, but also for pragmatic clarity and politeness.The study examines email requests sent by native and non-native English speaking graduate students to faculty at a major American university over a period of several semesters and applies Blum-Kulka, House, and Kasper’s (1989 speech act analysis framework – quantitatively to distinguish levels of directness, i.e. pragmatic clarity; and qualitatively to compare syntactic and lexical politeness devices, the request perspectives, and the specific linguistic request realization patterns preferred by native and non-native speakers. Results show that far more requests are realized through direct strategies as well as hints than conventionally indirect strategies typically found in comparative speech act studies. Politeness conventions in email, a text-only medium with little guidance in the academic institutional hierarchy, appear to be a work in progress, and native speakers demonstrate greater resources in creating e-polite messages to their professors than non-native speakers. A possible avenue for pedagogical intervention with regard to instruction in and acquisition of politeness routines in hierarchically upward email communication is presented.

  4. Effects of age and auditory and visual dual tasks on closed-road driving performance.

    Science.gov (United States)

    Chaparro, Alex; Wood, Joanne M; Carberry, Trent

    2005-08-01

    This study investigated how driving performance of young and old participants is affected by visual and auditory secondary tasks on a closed driving course. Twenty-eight participants comprising two age groups (younger, mean age = 27.3 years; older, mean age = 69.2 years) drove around a 5.1-km closed-road circuit under both single and dual task conditions. Measures of driving performance included detection and identification of road signs, detection and avoidance of large low-contrast road hazards, gap judgment, lane keeping, and time to complete the course. The dual task required participants to verbally report the sums of pairs of single-digit numbers presented through either a computer speaker (auditorily) or a dashboard-mounted monitor (visually) while driving. Participants also completed a vision and cognitive screening battery, including LogMAR visual acuity, Pelli-Robson letter contrast sensitivity, the Trails test, and the Digit Symbol Substitution (DSS) test. Drivers reported significantly fewer signs, hit more road hazards, misjudged more gaps, and increased their time to complete the course under the dual task (visual and auditory) conditions compared with the single task condition. The older participants also reported significantly fewer road signs and drove significantly more slowly than the younger participants, and this was exacerbated for the visual dual task condition. The results of the regression analysis revealed that cognitive aging (measured by the DSS and Trails test) rather than chronologic age was a better predictor of the declines seen in driving performance under dual task conditions. An overall z score was calculated, which took into account both driving and the secondary task (summing) performance under the two dual task conditions. Performance was significantly worse for the auditory dual task compared with the visual dual task, and the older participants performed significantly worse than the young subjects. These findings demonstrate

  5. Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification

    DEFF Research Database (Denmark)

    Delgado, Hector; Todisco, Massimiliano; Sahidullah, Md

    2016-01-01

    Many authentication applications involving automatic speaker verification (ASV) demand robust performance using short-duration, fixed or prompted text utterances. Text constraints not only reduce the phone-mismatch between enrollment and test utterances, which generally leads to improved performa...

  6. Analysis of error type and frequency in apraxia of speech among Portuguese speakers

    Directory of Open Access Journals (Sweden)

    Maysa Luchesi Cera

    Full Text Available Abstract Most studies characterizing errors in the speech of patients with apraxia involve English language. Objectives: To analyze the types and frequency of errors produced by patients with apraxia of speech whose mother tongue was Brazilian Portuguese. Methods: 20 adults with apraxia of speech caused by stroke were assessed. The types of error committed by patients were analyzed both quantitatively and qualitatively, and frequencies compared. Results: We observed the presence of substitution, omission, trial-and-error, repetition, self-correction, anticipation, addition, reiteration and metathesis, in descending order of frequency, respectively. Omission type errors were one of the most commonly occurring whereas addition errors were infrequent. These findings differed to those reported in English speaking patients, probably owing to differences in the methodologies used for classifying error types; the inclusion of speakers with apraxia secondary to aphasia; and the difference in the structure of Portuguese language to English in terms of syllable onset complexity and effect on motor control. Conclusions: The frequency of omission and addition errors observed differed to the frequency reported for speakers of English.

  7. Numerical investigation on vibration characteristics of a micro-speaker diaphragm considering thermoforming effects

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Kyeong Min; Park, Ke Un [Seoul National University of Science and Technology, Seoul (Korea, Republic of)

    2013-10-15

    Micro-speaker diaphragms play an important role in generating desired sound responses, and are designed to have thin membrane shapes for flexibility in the axial direction. The micro-speaker diaphragms are formed from thin polymer film through the thermoforming process, in which local thickness reductions occur due to strain localization. This thickness reduction results in a change in vibration characteristics of the diaphragm and different sound responses from that of the original design. In this study, the effect of this thickness change in the diaphragm on its vibration characteristics is numerically investigated by coupling thermoforming simulation, structural analysis and modal analysis. Thus, the thickness change in the diaphragm is calculated from the thermoforming simulation, and reflected in the further structural and modal analyses in order to estimate the relevant stiffness and vibration modes. Comparing these simulation results with those from a diaphragm with the uniform thickness, it is found that a local thickness reduction results in the stiffness reduction and the relevant change in the natural frequencies and the corresponding vibration modes.

  8. Speaker box made of composite particle board based on mushroom growing media waste

    Science.gov (United States)

    Tjahjanti, P. H.; Sutarman, Widodo, E.; Kurniawan, A. R.; Winarno, A. T.; Yani, A.

    2017-06-01

    This research aimed to use mushroom growing media waste (MGMW) that was added by urea, starch and polyvinyl chloride (PVC) glue as a composite particle board to be used as the material of speaker box manufacture. Physical and mechanical testing of particle board including density, moisture content, thickness swelling after immersion in water, strength in water absorption, internal bonding, modulus of elasticity, modulus of rupture and screw holding power, were carried out in accordance with the Stándar Nasional Indonesia (SNI) 03-2105-2006 and Japanese International Standard (JIS) A 5908-2003. The optimum composition of composite particle boards was 60% MGMW + 39% (50% urea +50% starch) + 1% PVC glue. Furthermore, the optimum composition to create speaker box with hardness values of 14.9 Brinnel Hardness Number and results of vibration test obtained amplitude values of the Z-axis, minimum of 0.032007 and maximum of 0.151575. For the acoustic test, results showed good sound absorption coefficients at frequencies of 500 Hz and it has better damping absorption.

  9. Numerical investigation on vibration characteristics of a micro-speaker diaphragm considering thermoforming effects

    International Nuclear Information System (INIS)

    Kim, Kyeong Min; Park, Ke Un

    2013-01-01

    Micro-speaker diaphragms play an important role in generating desired sound responses, and are designed to have thin membrane shapes for flexibility in the axial direction. The micro-speaker diaphragms are formed from thin polymer film through the thermoforming process, in which local thickness reductions occur due to strain localization. This thickness reduction results in a change in vibration characteristics of the diaphragm and different sound responses from that of the original design. In this study, the effect of this thickness change in the diaphragm on its vibration characteristics is numerically investigated by coupling thermoforming simulation, structural analysis and modal analysis. Thus, the thickness change in the diaphragm is calculated from the thermoforming simulation, and reflected in the further structural and modal analyses in order to estimate the relevant stiffness and vibration modes. Comparing these simulation results with those from a diaphragm with the uniform thickness, it is found that a local thickness reduction results in the stiffness reduction and the relevant change in the natural frequencies and the corresponding vibration modes.

  10. Analysis of error type and frequency in apraxia of speech among Portuguese speakers.

    Science.gov (United States)

    Cera, Maysa Luchesi; Minett, Thaís Soares Cianciarullo; Ortiz, Karin Zazo

    2010-01-01

    Most studies characterizing errors in the speech of patients with apraxia involve English language. To analyze the types and frequency of errors produced by patients with apraxia of speech whose mother tongue was Brazilian Portuguese. 20 adults with apraxia of speech caused by stroke were assessed. The types of error committed by patients were analyzed both quantitatively and qualitatively, and frequencies compared. We observed the presence of substitution, omission, trial-and-error, repetition, self-correction, anticipation, addition, reiteration and metathesis, in descending order of frequency, respectively. Omission type errors were one of the most commonly occurring whereas addition errors were infrequent. These findings differed to those reported in English speaking patients, probably owing to differences in the methodologies used for classifying error types; the inclusion of speakers with apraxia secondary to aphasia; and the difference in the structure of Portuguese language to English in terms of syllable onset complexity and effect on motor control. The frequency of omission and addition errors observed differed to the frequency reported for speakers of English.

  11. Speaker comfort and increase of voice level in lecture rooms

    DEFF Research Database (Denmark)

    Brunskog, Jonas; Gade, Anders Christian; Bellester, G P

    2008-01-01

    Teachers often suffer health problems or tension related to their voice. These problems may be related to there working environment, including room acoustics of the lecture rooms which forces them to stress their voices. The present paper describes a first effort in finding relationships between...... were also measured in the rooms and subjective impressions from about 20 persons who had experience talking in these rooms were collected as well. Analysis of the data revealed significant differences in the sound power produced by the speaker in the different rooms. It was also found...

  12. Developing a Speaker Identification System for the DARPA RATS Project

    DEFF Research Database (Denmark)

    Plchot, O; Matsoukas, S; Matejka, P

    2013-01-01

    This paper describes the speaker identification (SID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. ...... such as CFCCs out-perform MFCC front-ends on noisy audio, and (c) fusion of multiple systems provides 24% relative improvement in EER compared to the single best system when using a novel SVM-based fusion algorithm that uses side information such as gender, language, and channel id....

  13. An automatic speech recognition system with speaker-independent identification support

    Science.gov (United States)

    Caranica, Alexandru; Burileanu, Corneliu

    2015-02-01

    The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.

  14. Limited data speaker identification

    Indian Academy of Sciences (India)

    recognition can be either identification or verification depending on the task objective. .... like Bayesian formalism, voting method and Dempster-Shafer (D–S) theory ..... self-organizing map (SOM) (Kohonen 1990), learning vector quantization ...

  15. Effects of low speed wind on the recognition/identification and pass-through communication tasks of auditory situation awareness afforded by military hearing protection/enhancement devices and tactical communication and protective systems.

    Science.gov (United States)

    Lee, Kichol; Casali, John G

    2016-01-01

    To investigate the effect of controlled low-speed wind-noise on the auditory situation awareness performance afforded by military hearing protection/enhancement devices (HPED) and tactical communication and protective systems (TCAPS). Recognition/identification and pass-through communications tasks were separately conducted under three wind conditions (0, 5, and 10 mph). Subjects wore two in-ear-type TCAPS, one earmuff-type TCAPS, a Combat Arms Earplug in its 'open' or pass-through setting, and an EB-15LE electronic earplug. Devices with electronic gain systems were tested under two gain settings: 'unity' and 'max'. Testing without any device (open ear) was conducted as a control. Ten subjects were recruited from the student population at Virginia Tech. Audiometric requirements were 25 dBHL or better at 500, 1000, 2000, 4000, and 8000 Hz in both ears. Performance on the interaction of communication task-by-device was significantly different only in 0 mph wind speed. The between-device performance differences varied with azimuthal speaker locations. It is evident from this study that stable (non-gusting) wind speeds up to 10 mph did not significantly degrade recognition/identification task performance and pass-through communication performance of the group of HPEDs and TCAPS tested. However, the various devices performed differently as the test sound signal speaker location was varied and it appears that physical as well as electronic features may have contributed to this directional result.

  16. Direct Measurement of the Speed of Sound Using a Microphone and a Speaker

    Science.gov (United States)

    Gómez-Tejedor, José A.; Castro-Palacio, Juan C.; Monsoriu, Juan A.

    2014-01-01

    We present a simple and accurate experiment to obtain the speed of sound in air using a conventional speaker and a microphone connected to a computer. A free open source digital audio editor and recording computer software application allows determination of the time-of-flight of the wave for different distances, from which the speed of sound is…

  17. The Role of Statistical Learning and Working Memory in L2 Speakers' Pattern Learning

    Science.gov (United States)

    McDonough, Kim; Trofimovich, Pavel

    2016-01-01

    This study investigated whether second language (L2) speakers' morphosyntactic pattern learning was predicted by their statistical learning and working memory abilities. Across three experiments, Thai English as a Foreign Language (EFL) university students (N = 140) were exposed to either the transitive construction in Esperanto (e.g., "tauro…

  18. The Influence of Language Proficiency on Book Search Behaviour

    DEFF Research Database (Denmark)

    Skov, Mette; Bogers, Toine

    2015-01-01

    In this paper we describe our participation in the Interactive Social Book Search task at CLEF 2015. We focus our analysis on differences in search behaviour between native and non-native speakers of English. The analysis is based on both questionnaire and log data. 49 participants out of the 192...... total participants are native speakers and the remaining 143 participants are nonnative speakers. In general results show surprisingly few differences in search behaviour between native and non-native speakers. Non-native speakers spent more time on both the focused and the open task than the native...... speakers, but no significant differences were found in relation to number of queries, query length, depth of results inspection, number of books added to the book-bag, or length of notes explaining why a book was added to the book-bag....

  19. Online Matchmaking: It's Not Just for Dating Sites Anymore! Connecting the Climate Voices Science Speakers Network to Educators

    Science.gov (United States)

    Wegner, Kristin; Herrin, Sara; Schmidt, Cynthia

    2015-01-01

    Scientists play an integral role in the development of climate literacy skills - for both teachers and students alike. By partnering with local scientists, teachers can gain valuable insights into the science practices highlighted by the Next Generation Science Standards (NGSS), as well as a deeper understanding of cutting-edge scientific discoveries and local impacts of climate change. For students, connecting to local scientists can provide a relevant connection to climate science and STEM skills. Over the past two years, the Climate Voices Science Speakers Network (climatevoices.org) has grown to a robust network of nearly 400 climate science speakers across the United States. Formal and informal educators, K-12 students, and community groups connect with our speakers through our interactive map-based website and invite them to meet through face-to-face and virtual presentations, such as webinars and podcasts. But creating a common language between scientists and educators requires coaching on both sides. In this presentation, we will present the "nitty-gritty" of setting up scientist-educator collaborations, as well as the challenges and opportunities that arise from these partnerships. We will share the impact of these collaborations through case studies, including anecdotal feedback and metrics.

  20. Effects of traumatic brain injury on a virtual reality social problem solving task and relations to cortical thickness in adolescence.

    Science.gov (United States)

    Hanten, Gerri; Cook, Lori; Orsten, Kimberley; Chapman, Sandra B; Li, Xiaoqi; Wilde, Elisabeth A; Schnelle, Kathleen P; Levin, Harvey S

    2011-02-01

    Social problem solving was assessed in 28 youth ages 12-19 years (15 with moderate to severe traumatic brain injury (TBI), 13 uninjured) using a naturalistic, computerized virtual reality (VR) version of the Interpersonal Negotiations Strategy interview (Yeates, Schultz, & Selman, 1991). In each scenario, processing load condition was varied in terms of number of characters and amount of information. Adolescents viewed animated scenarios depicting social conflict in a virtual microworld environment from an avatar's viewpoint, and were questioned on four problem solving steps: defining the problem, generating solutions, selecting solutions, and evaluating the likely outcome. Scoring was based on a developmental scale in which responses were judged as impulsive, unilateral, reciprocal, or collaborative, in order of increasing score. Adolescents with TBI were significantly impaired on the summary VR-Social Problem Solving (VR-SPS) score in Condition A (2 speakers, no irrelevant information), p=0.005; in Condition B (2 speakers+irrelevant information), p=0.035; and Condition C (4 speakers+irrelevant information), p=0.008. Effect sizes (Cohen's D) were large (A=1.40, B=0.96, C=1.23). Significant group differences were strongest and most consistent for defining the problems and evaluating outcomes. The relation of task performance to cortical thickness of specific brain regions was also explored, with significant relations found with orbitofrontal regions, the frontal pole, the cuneus, and the temporal pole. Results are discussed in the context of specific cognitive and neural mechanisms underlying social problem solving deficits after childhood TBI. Copyright © 2010 Elsevier Ltd. All rights reserved.

  1. Vliv temporálních manipulací na vnímání kompetence mluvčího / Effect of Temporal Manipulations on the Perception of Speaker Competence

    Directory of Open Access Journals (Sweden)

    Zuzana Berkovcová

    2016-06-01

    Full Text Available Speech communication research based on psychological methods currently stands at the forefront of scientific interest. Speech is an integral part of the social identity of a person and has a significant impact on the perception of the speaker by his surroundings. The present study aims to chart the effect of the temporal organization of utterances on the perception of a speaker’s competence. Recordings of four Spanish native speakers were manipulated in a way which destabilized the regular temporal structure of their utterances. A perception test was administered to forty Czech listeners differing in level of proficiency with the Spanish language. The aim of the test was to reveal the listeners’ positive or negative judgments of the original (regular and manipulated (dysfluent items. The basis for the perception test was the Big Five personality traits model, with the factor evaluated being the speakers’ competence that is the ability and readiness to effectively deal with tasks. The results confirmed our main hypothesis, which assumed that both groups of listeners will, in terms of competence, evaluate the manipulated items negatively. Students of Spanish studies were more perceptive of the temporal manipulations, most likely due to their familiarity with the prosodic structure of Spanish and their understanding of the meaning of the tested utterances.

  2. 78 FR 65511 - Death of Thomas S. Foley Former Speaker of the House of Representatives

    Science.gov (United States)

    2013-10-31

    ... Vol. 78 Thursday, No. 211 October 31, 2013 Part IV The President Proclamation 9046--Death of Thomas S. Foley Former Speaker of the House of Representatives #0; #0; #0; Presidential Documents #0; #0...; #0; #0;Title 3-- #0;The President [[Page 65513

  3. Teaching Standard Italian to Dialect Speakers: A Pedagogical Perspective of Linguistic Systems in Contact

    Science.gov (United States)

    Danesi, Marcel

    1974-01-01

    The teaching of standard Italian to speakers of Italian dialects both in Italy and in North America is discussed, specifically through a specialized pedagogical program within the framework of a sociolinguistic and psycholinguistic perspective, and based on a structural analysis of linguistic systems in contact. Italian programs in Toronto are…

  4. Revealing Word Order: Using Serial Position in Binomials to Predict Properties of the Speaker

    Science.gov (United States)

    Iliev, Rumen; Smirnova, Anastasia

    2016-01-01

    Three studies test the link between word order in binomials and psychological and demographic characteristics of a speaker. While linguists have already suggested that psychological, cultural and societal factors are important in choosing word order in binomials, the vast majority of relevant research was focused on general factors and on broadly…

  5. Children with Autism Understand Indirect Speech Acts: Evidence from a Semi-Structured Act-Out Task.

    Directory of Open Access Journals (Sweden)

    Mikhail Kissine

    Full Text Available Children with Autism Spectrum Disorder are often said to present a global pragmatic impairment. However, there is some observational evidence that context-based comprehension of indirect requests may be preserved in autism. In order to provide experimental confirmation to this hypothesis, indirect speech act comprehension was tested in a group of 15 children with autism between 7 and 12 years and a group of 20 typically developing children between 2:7 and 3:6 years. The aim of the study was to determine whether children with autism can display genuinely contextual understanding of indirect requests. The experiment consisted of a three-pronged semi-structured task involving Mr Potato Head. In the first phase a declarative sentence was uttered by one adult as an instruction to put a garment on a Mr Potato Head toy; in the second the same sentence was uttered as a comment on a picture by another speaker; in the third phase the same sentence was uttered as a comment on a picture by the first speaker. Children with autism complied with the indirect request in the first phase and demonstrated the capacity to inhibit the directive interpretation in phases 2 and 3. TD children had some difficulty in understanding the indirect instruction in phase 1. These results call for a more nuanced view of pragmatic dysfunction in autism.

  6. “Learning from real life and not books”: A gamified approach to Business English task design in transatlantic telecollaboration

    Directory of Open Access Journals (Sweden)

    Ana Sevilla-Pavón

    2017-05-01

    Full Text Available This paper deals with task design in the context of a telecollaboration project which was carried out in a Business English course among students from Spain and the United States. The goal was to provide students with opportunities to develop linguistic, intercultural and digital competences by interacting and collaborating online with native speakers of the target language. A task-based approach was adopted and enriched by gamification, the different tasks being designed with a view towards engaging students intrinsically in the learning process. This was achieved by means of the adoption of gamification strategies and techniques such as the use of points, performance graphs, quests, avatars, a reward system, peer assessment and the use of social media. Via technological immersion, students from both sides of the Atlantic Ocean were required to work together online to complete different tasks while exchanging peer feedback and assessment. The paper analyses and discusses participants’ views and perceptions about the gamified telecollaboration exchange. The quantitative and qualitative data were gathered by means of a pre- and a post-treatment questionnaires. Results indicate that students found this way of learning beneficial in terms of the development of different skills and competences (namely linguistic, digital and intercultural and motivation.

  7. Acoustic Analysis Method for Flat Panel Speaker Driven by Giant Magnetostrictive-Material-Based Exciter(Linear Motor concerning Daily Life)

    OpenAIRE

    兪, 炳振; 平田, 勝弘; 大西, 敦郎; Byungjin, YOO; Katsuhiro, HIRATA; Atsurou, OONISHI; 大阪大学; 大阪大学; 大阪大学

    2011-01-01

    This paper presents a coupled analysis method of electromagnetic-structural-acoustic fields for flat panel speaker driven by giant magnetostrictive material (GMM) based exciter designed by using the finite element method (FEM). The acoustic field creation of the flat panel speaker driven by GMM exciter relies on the vibration of flat panel caused by magnetostrictive phenomenon of GMM when a magnetic field is applied. In this case, to predict the sound pressure level (SPL) at audio frequency r...

  8. Coupled Electro-Magneto-Mechanical-Acoustic Analysis Method Developed by Using 2D Finite Element Method for Flat Panel Speaker Driven by Magnetostrictive-Material-Based Actuator

    Science.gov (United States)

    Yoo, Byungjin; Hirata, Katsuhiro; Oonishi, Atsurou

    In this study, a coupled analysis method for flat panel speakers driven by giant magnetostrictive material (GMM) based actuator was developed. The sound field produced by a flat panel speaker that is driven by a GMM actuator depends on the vibration of the flat panel, this vibration is a result of magnetostriction property of the GMM. In this case, to predict the sound pressure level (SPL) in the audio-frequency range, it is necessary to take into account not only the magnetostriction property of the GMM but also the effect of eddy current and the vibration characteristics of the actuator and the flat panel. In this paper, a coupled electromagnetic-structural-acoustic analysis method is presented; this method was developed by using the finite element method (FEM). This analysis method is used to predict the performance of a flat panel speaker in the audio-frequency range. The validity of the analysis method is verified by comparing with the measurement results of a prototype speaker.

  9. Early-Stage Chunking of Finger Tapping Sequences by Persons Who Stutter and Fluent Speakers

    Science.gov (United States)

    Smits-Bandstra, Sarah; De Nil, Luc F.

    2013-01-01

    This research note explored the hypothesis that chunking differences underlie the slow finger-tap sequencing performance reported in the literature for persons who stutter (PWS) relative to fluent speakers (PNS). Early-stage chunking was defined as an immediate and spontaneous tendency to organize a long sequence into pauses, for motor planning,…

  10. English Language Schooling, Linguistic Realities, and the Native Speaker of English in Hong Kong

    Science.gov (United States)

    Hansen Edwards, Jette G.

    2018-01-01

    The study employs a case study approach to examine the impact of educational backgrounds on nine Hong Kong tertiary students' English and Cantonese language practices and identifications as native speakers of English and Cantonese. The study employed both survey and interview data to probe the participants' English and Cantonese language use at…

  11. Reflecting on the dichotomy native-non native speakers in an EFL context

    OpenAIRE

    Mariño, Claudia

    2011-01-01

    This article provides a discussion based on constructs about the dichotomy betweennative and non-native speakers. Several models and examples are displayed about thespreading of the English language with the intention of understanding its developmentin the whole world and in Colombia, specifically. Then, some possible definitions aregiven to the term “native speaker” and its conceptualization is described as both realityand myth. One of the main reasons for writing this article is grounded on...

  12. –ED ALLOMORPHS AND LINGUISTIC KNOWLEDGE OF MALAY SPEAKERS OF ENGLISH: A DESCRIPTIVE AND CORRELATIONAL STUDY

    Directory of Open Access Journals (Sweden)

    Maskanah Mohammad Lotfie

    2017-09-01

    Full Text Available Malay is a language from the Austronesian family and unlike the Indo-European-originated English, it does not generally have inflectional temporal markers. Investigating this from a cross-linguistics - influence perspective, differences between the languages could mean difficulties for Malay speakers to acquire features of English. The objectives of this study are to investigate Malay speakers’ pronunciation of the English language –ed allomorphs – [d], [t] and [ɪd]/[əd] – and the relationship between the morphophonological forms and two types of linguistic knowledge, one of which is implicit while the other is explicit. Data were collated from fifty participants who are social science undergraduates and English majors who speak English as a second language. Four instruments were used to gauge the respondents’ verbal use of –ed allomorphs as well as their implicit and explicit knowledge of the allomorphs. Results indicate that the students’ verbal usage of the target items either lacks approximation to Standard English pronunciation or is largely dropped altogether. Results also suggest a moderate relationship between implicit and explicit knowledge of the allomorphs and their verbal production by Malay speakers of English. The finding illuminates acquisition problem of English language speakers whose mother tongue does not share similar inflectional markers. Pedagogical solutions can help learners of the English language to approximate Standard English and in the long run, enhance effective communication and increase chances of employability.

  13. [Understanding the symbolic values of Japanese onomatopoeia: comparison of Japanese and Chinese speakers].

    Science.gov (United States)

    Haryu, Etsuko; Zhao, Lihua

    2007-10-01

    Do non-native speakers of the Japanese language understand the symbolic values of Japanese onomatopoeia matching a voiced/unvoiced consonant with a big/small sound made by a big/small object? In three experiments, participants who were native speakers of Japanese, Japanese-learning Chinese, or Chinese without knowledge of the Japanese language were shown two pictures. One picture was of a small object making a small sound, such as a small vase being broken, and the other was of a big object making a big sound, such as a big vase being broken. Participants were presented with two novel onomatopoetic words with voicing contrasts, e.g.,/dachan/vs./tachan/, and were told that each word corresponded to one of the two pictures. They were then asked to match the words to the corresponding pictures. Chinese without knowledge of Japanese performed only at chance level, whereas Japanese and Japanese-learning Chinese successfully matched a voiced/unvoiced consonant with a big/small object respectively. The results suggest that the key to understanding the symbolic values of voicing contrasts in Japanese onomatopoeia is some basic knowledge that is intrinsic to the Japanese language.

  14. QUANTITATIVE REDUCTION OF VOWEL GRAPHS “A” AND “O” POSITIONED AFTER THE HARD CONSONANTS IN THE SPEECH OF NATIVE AND NON-NATIVE RUSSIAN SPEAKERS IN LITHUANIA

    Directory of Open Access Journals (Sweden)

    Danutė Balšaitytė

    2015-04-01

    Full Text Available This article analyses the absolute duration (ms of stressed Russian vowels /a/, /o/ (graphs: “a”, “o” and their allophones in unstressed positions after the hard consonants in the pronunciation of native and non-native Russian speakers in Lithuania. The results of the conducted spectral analysis reveal the specificities of quantitative reduction in the speech of the Russian speakers in Lithuania and the Lithuanian speakers that are learning the Russian language. These specificities are influenced by the two phonetic systems interaction. The speakers of both languages by the realisation of “a” and “o” violates the relation of unstressed vowel duration that is peculiar to the contemporary Russian language: the post-stressed vowels in closed syllables are shorter than the pre-stressed vowels; the first pre-stressed syllable differs from the second pre-stressed and post-stressed syllables by a longer voice duration. Both Russians and Lithuanians pronounce vowels longer in post-stressed syllables than in the pre-stressed syllables. This corresponds to the qualitative reduction of the Lithuanian language vowels /a:/ and /o:/. There are certain differences between the pronunciation of qualitative vowels “a” and “o” reduction among the native and non-native Russian speakers in Lithuania. The Russian speakers in Lithuania pronounce the second pre-stressed vowel longer than the first pre-stressed vowel; this corresponds to the degree of reduction of pre-stressed vowels “a” and “o” in the standardised Russian language. These degrees of quantitative reduction in the Lithuanian pronunciation are peculiar only for “a” in the Russian language. According to the duration ratio, the unstressed allophones “a” and “o” in the Russian language are closer to the unstressed /a:/ and /o:/ in the Lithuanian language in the pronunciation of Russian-Lithuanian bilinguals than in the pronunciation Lithuanian speakers.

  15. Disadvantages of publishing biomedical research articles in English for non-native speakers of English

    Directory of Open Access Journals (Sweden)

    Mohsen Rezaeian

    2015-05-01

    Full Text Available OBJECTIVES: English has become the most frequently used language for scientific communication in the biomedical field. Therefore, scholars from all over the world try to publish their findings in English. This trend has a number of advantages, along with several disadvantages. METHODS: In the current article, the most important disadvantages of publishing biomedical research articles in English for non-native speakers of English are reviewed. RESULTS: The most important disadvantages of publishing biomedical research articles in English for non-native speakers may include: Overlooking, either unintentionally or even deliberately, the most important local health problems; failure to carry out groundbreaking research due to limited medical research budgets; violating generally accepted codes of publication ethics and committing research misconduct and publications in open-access scam/predatory journals rather than prestigious journals. CONCLUSIONS: The above mentioned disadvantages could eventually result in academic establishments becoming irresponsible or, even worse, corrupt. In order to avoid this, scientists, scientific organizations, academic institutions, and scientific associations all over the world should design and implement a wider range of collaborative and comprehensive plans.

  16. Disadvantages of publishing biomedical research articles in English for non-native speakers of English.

    Science.gov (United States)

    Rezaeian, Mohsen

    2015-01-01

    English has become the most frequently used language for scientific communication in the biomedical field. Therefore, scholars from all over the world try to publish their findings in English. This trend has a number of advantages, along with several disadvantages. In the current article, the most important disadvantages of publishing biomedical research articles in English for non-native speakers of English are reviewed. The most important disadvantages of publishing biomedical research articles in English for non-native speakers may include: Overlooking, either unintentionally or even deliberately, the most important local health problems; failure to carry out groundbreaking research due to limited medical research budgets; violating generally accepted codes of publication ethics and committing research misconduct and publications in open-access scam/predatory journals rather than prestigious journals. The above mentioned disadvantages could eventually result in academic establishments becoming irresponsible or, even worse, corrupt. In order to avoid this, scientists, scientific organizations, academic institutions, and scientific associations all over the world should design and implement a wider range of collaborative and comprehensive plans.

  17. Tracking Multiple Statistics: Simultaneous Learning of Object Names and Categories in English and Mandarin Speakers.

    Science.gov (United States)

    Chen, Chi-Hsin; Gershkoff-Stowe, Lisa; Wu, Chih-Yi; Cheung, Hintat; Yu, Chen

    2017-08-01

    Two experiments were conducted to examine adult learners' ability to extract multiple statistics in simultaneously presented visual and auditory input. Experiment 1 used a cross-situational learning paradigm to test whether English speakers were able to use co-occurrences to learn word-to-object mappings and concurrently form object categories based on the commonalities across training stimuli. Experiment 2 replicated the first experiment and further examined whether speakers of Mandarin, a language in which final syllables of object names are more predictive of category membership than English, were able to learn words and form object categories when trained with the same type of structures. The results indicate that both groups of learners successfully extracted multiple levels of co-occurrence and used them to learn words and object categories simultaneously. However, marked individual differences in performance were also found, suggesting possible interference and competition in processing the two concurrent streams of regularities. Copyright © 2016 Cognitive Science Society, Inc.

  18. Insight into the Attitudes of Speakers of Urban Meccan Hijazi Arabic towards Their Dialect

    Science.gov (United States)

    Alahmadi, Sameeha D.

    2016-01-01

    The current study mainly aims to examine the attitudes of speakers of Urban Meccan Hijazi Arabic (UMHA) towards their dialect, which is spoken in Mecca, Saudi Arabia. It also investigates whether the participants' age, sex and educational level have any impact on their perception of their dialect. To this end, I designed a 5-point-Likert-scale…

  19. Wavelet Packet Entropy in Speaker-Independent Emotional State Detection from Speech Signal

    OpenAIRE

    Mina Kadkhodaei Elyaderani; Seyed Hamid Mahmoodian; Ghazaal Sheikhi

    2015-01-01

    In this paper, wavelet packet entropy is proposed for speaker-independent emotion detection from speech. After pre-processing, wavelet packet decomposition using wavelet type db3 at level 4 is calculated and Shannon entropy in its nodes is calculated to be used as feature. In addition, prosodic features such as first four formants, jitter or pitch deviation amplitude, and shimmer or energy variation amplitude besides MFCC features are applied to complete the feature vector. Then, Support Vect...

  20. Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions

    DEFF Research Database (Denmark)

    Ma, Ning; Brown, Guy J.; May, Tobias

    2015-01-01

    This paper presents a novel machine-hearing system that exploits deep neural networks (DNNs) and head movements for binaural localisation of multiple speakers in reverberant conditions. DNNs are used to map binaural features, consisting of the complete crosscorrelation function (CCF) and interaural...

  1. Lifting the Curtain on the Wizard of Oz: Biased Voice-Based Impressions of Speaker Size

    Science.gov (United States)

    Rendall, Drew; Vokey, John R.; Nemeth, Christie

    2007-01-01

    The consistent, but often wrong, impressions people form of the size of unseen speakers are not random but rather point to a consistent misattribution bias, one that the advertising, broadcasting, and entertainment industries also routinely exploit. The authors report 3 experiments examining the perceptual basis of this bias. The results indicate…

  2. The Role of Speaker Identification in Korean University Students' Attitudes towards Five Varieties of English

    Science.gov (United States)

    Yook, Cheongmin; Lindemann, Stephanie

    2013-01-01

    This study investigates how the attitudes of 60 Korean university students towards five varieties of English are affected by the identification of the speaker's nationality and ethnicity. The study employed both a verbal guise technique and questions eliciting overt beliefs and preferences related to learning English. While the majority of the…

  3. The "Tse Tsa Watle" Speaker Series: An Example of Ensemble Leadership and Generative Adult Learning

    Science.gov (United States)

    McKendry, Virginia

    2017-01-01

    This chapter examines an Indigenous speaker series formed to foster intercultural partnerships at a Canadian university. Using ensemble leadership and generative learning theories to make sense of the project, the author argues that ensemble leadership is key to designing the generative learning adult learners need in an era of ambiguity.

  4. The effects of ethnicity, musicianship, and tone language experience on pitch perception.

    Science.gov (United States)

    Zheng, Yi; Samuel, Arthur G

    2018-02-01

    Language and music are intertwined: music training can facilitate language abilities, and language experiences can also help with some music tasks. Possible language-music transfer effects are explored in two experiments in this study. In Experiment 1, we tested native Mandarin, Korean, and English speakers on a pitch discrimination task with two types of sounds: speech sounds and fundamental frequency (F0) patterns derived from speech sounds. To control for factors that might influence participants' performance, we included cognitive ability tasks testing memory and intelligence. In addition, two music skill tasks were used to examine general transfer effects from language to music. Prior studies showing that tone language speakers have an advantage on pitch tasks have been taken as support for three alternative hypotheses: specific transfer effects, general transfer effects, and an ethnicity effect. In Experiment 1, musicians outperformed non-musicians on both speech and F0 sounds, suggesting a music-to-language transfer effect. Korean and Mandarin speakers performed similarly, and they both outperformed English speakers, providing some evidence for an ethnicity effect. Alternatively, this could be due to population selection bias. In Experiment 2, we recruited Chinese Americans approximating the native English speakers' language background to further test the ethnicity effect. Chinese Americans, regardless of their tone language experiences, performed similarly to their non-Asian American counterparts in all tasks. Therefore, although this study provides additional evidence of transfer effects across music and language, it casts doubt on the contribution of ethnicity to differences observed in pitch perception and general music abilities.

  5. Integrating single-point vibrometer and full-field electronic speckle pattern interferometer to evaluate a micro-speaker

    Science.gov (United States)

    Chang, Wen-Chi; Chen, Yu-Chi; Chien, Chih-Jen; Wang, An-Bang; Lee, Chih-Kung

    2011-04-01

    A testing system contains an advanced vibrometer/interferometer device (AVID) and a high-speed electronic speckle pattern interferometer (ESPI) was developed. AVID is a laser Doppler vibrometer that can be used to detect single-point linear and angular velocity with DC to 20 MHz bandwidth and with nanometer resolution. In swept frequency mode, frequency response from mHz to MHz of the structure of interest can be measured. The ESPI experimental setup can be used to measure full-field out-of-plane displacement. A 5-1 phase shifting method and a correlation algorithm were used to analyze the phase difference between the reference signal and the speckle signal scattered from the sample surface. In order to show the efficiency and effectiveness of AVID and ESPI, we designed a micro-speaker composed of a plate with fixed boundaries and two piezo-actuators attached to the sides of the plate. The AVID was used to measure the vibration of one of the piezo-actuators and the ESPI was adopted to measure the two-dimensional out-of-plane displacement of the plate. A microphone was used to measure the acoustic response created by the micro-speaker. Driving signal includes random signal, sinusoidal signal, amplitude modulated high-frequency carrier signal, etc. Angular response induced by amplitude modulated high-frequency carrier signal was found to be significantly narrower than the frequency responses created by other types of driving signals. The validity of our newly developed NDE system are detailed by comparing the relationship between the vibration signal of the micro-speaker and the acoustic field generated.

  6. HotTips for Speakers: 25 Surefire Ways To Engage and Captivate Any Group or Audience.

    Science.gov (United States)

    Abernathy, Rob; Reardon, Mark

    From managing stage fright to keeping the audience hanging on their every word, experienced public speakers have the techniques to make every presentation memorable. This book contains a collection of 25 strategies for public speaking that have already worked for many people. Each "HotTip" (strategy) has been tested and used with…

  7. Linguistic skills of adult native speakers, as a function of age and level of education

    NARCIS (Netherlands)

    Mulder, K.; Hulstijn, J.H.

    2011-01-01

    This study assessed, in a sample of 98 adult native speakers of Dutch, how their lexical skills and their speaking proficiency varied as a function of their age and level of education and profession (EP). Participants, categorized in terms of their age (18-35, 36-50, and 51-76 years old) and the

  8. The role of fundamental frequency and formants in the perception of speaker sex

    Science.gov (United States)

    Hillenbrand, James M.

    2005-09-01

    The purpose of this study was to determine the relative contributions of fundamental frequency (F0) and formants in controlling the speaker-sex percept. A source-filter synthesizer was used to create four versions of 25 sentences spoken by men: (1) unmodified synthesis; (2) F0 only shifted up toward values typical of women; (3) formants only shifted up toward values typical of women; and (4) both F0 and formants shifted up. Identical methods were used to generate four comparable versions of 25 sentences spoken by women (e.g., unmodified synthesis, F0 only shifted down toward values typical of men, etc.). Listening tests showed: (1) perceived talker sex for the unmodified synthesis conditions was nearly always correct; (2) shifting both F0 and formants was usually effective (~82%) in changing the perceived sex of the utterance; (3) shifting either F0 or formants alone was usually ineffective in changing the perceived sex of the utterance. Both F0 and formants are apparently needed to specify speaker sex, though even together these cues are not entirely effective. Results also suggested that F0 is just slightly more important than formants, despite the fact that the male-female difference in F0 is proportionally much larger than the difference in formants. [Work supported by NIH.

  9. Vocabulary Use by Low, Moderate, and High ASL-Proficient Writers Compared to Hearing ESL and Monolingual Speakers.

    Science.gov (United States)

    Singleton, Jenny L; Morgan, Dianne; DiGello, Elizabeth; Wiles, Jill; Rivers, Rachel

    2004-01-01

    The written English vocabulary of 72 deaf elementary school students of various proficiency levels in American Sign Language (ASL) was compared with the performance of 60 hearing English-as-a-second-language (ESL) speakers and 61 hearing monolingual speakers of English, all of similar age. Students were asked to retell "The Tortoise and the Hare" story (previously viewed on video) in a writing activity. Writing samples were later scored for total number of words, use of words known to be highly frequent in children's writing, redundancy in writing, and use of English function words. All deaf writers showed significantly lower use of function words as compared to their hearing peers. Low-ASL-proficient students demonstrated a highly formulaic writing style, drawing mostly on high-frequency words and repetitive use of a limited range of function words. The moderate- and high-ASL-proficient deaf students' writing was not formulaic and incorporated novel, low-frequency vocabulary to communicate their thoughts. The moderate- and high-ASL students' performance revealed a departure from findings one might expect based on previous studies with deaf writers and their vocabulary use. The writing of the deaf writers also differed from the writing of hearing ESL speakers. Implications for deaf education and literacy instruction are discussed, with special attention to the fact that ASL-proficient, deaf second-language learners of English may be approaching English vocabulary acquisition in ways that are different from hearing ESL learners.

  10. A Comparative Study on the Use of Compliment Response Strategies by Persian and English Native Speakers

    Science.gov (United States)

    Shabani, Mansour; Zeinali, Maryam

    2015-01-01

    The significance of pragmatic knowledge and politeness strategies has recently been emphasized in language learning and teaching. Most communication failures originate in the lack of pragmatic awareness which is evident among EFL learners while communicating with English native speakers. The present study aimed at investigating compliment response…

  11. TEACHING TURKISH AS SPOKEN IN TURKEY TO TURKIC SPEAKERS - TÜRK DİLLİLERE TÜRKİYE TÜRKÇESİ ÖĞRETİMİ NASIL OLMALIDIR?

    Directory of Open Access Journals (Sweden)

    Ali TAŞTEKİN

    2015-12-01

    Full Text Available Attributing different titles to the activity of teaching Turkish to non-native speakers is related to the perspective of those who conduct this activity. If Turkish Language teaching centres are sub-units of Schools of Foreign Languages and Departments of Foreign Languages of our Universities or teachers have a foreign language background, then the title “Teaching Turkish as a Foreign Language” is adopted and claimed to be universal. In determining success at teaching and learning, the psychological perception of the educational activity and the associational power of the words used are far more important factors than the teacher, students, educational environment and educational tools. For this reason, avoiding the negative connotations of the adjective “foreign” in the activity of teaching foreigners Turkish as spoken in Turkey would be beneficial. In order for the activity of Teaching Turkish as Spoken in Turkey to Turkic Speakers to be successful, it is crucial to dwell on the formal and contextual quality of the books written for this purpose. Almost none of the course books and supplementary books in the field of teaching Turkish to non-native speakers has taken Teaching Turkish as Spoken in Turkey to Turkic Speakers into consideration. The books written for the purpose of teaching Turkish to non-speakers should be examined thoroughly in terms of content and method and should be organized in accordance with the purpose and level of readiness of the target audience. Activities of Teaching Turkish as Spoken in Turkey to Turkic Speakers are still conducted at public and private primary and secondary schools and colleges as well as private courses by self-educated teachers who are trained within a master-apprentice relationship. Turkic populations who had long been parted by necessity have found the opportunity to reunite and turn towards common objectives after the dissolution of The Union of Soviet Socialist Republics. This recent

  12. Valence, arousal, and task effects in emotional prosody processing

    Directory of Open Access Journals (Sweden)

    Silke ePaulmann

    2013-06-01

    Full Text Available Previous research suggests that emotional prosody processing is a highly rapid and complex process. In particular, it has been shown that different basic emotions can be differentiated in an early event-related brain potential (ERP component, the P200. Often, the P200 is followed by later long lasting ERPs such as the late positive complex (LPC. The current experiment set out to explore in how far emotionality and arousal can modulate these previously reported ERP components. In addition, we also investigated the influence of task demands (implicit vs. explicit evaluation of stimuli. Participants listened to pseudo-sentences (sentences with no lexical content spoken in six different emotions or in a neutral tone of voice while they either rated the arousal level of the speaker or their own arousal level. Results confirm that different emotional intonations can first be differentiated in the P200 component, reflecting a first emotional encoding of the stimulus possibly including a valence tagging process. A marginal significant arousal effect was also found in this time-window with high arousing stimuli eliciting a stronger P200 than low arousing stimuli. The P200 component was followed by a long lasting positive ERP between 400 and 750 ms. In this late time-window, both emotion and arousal effects were found. No effects of task were observed in either time-window. Taken together, results suggest that emotion relevant details are robustly decoded during early processing and late processing stages while arousal information is only reliably taken into consideration at a later stage of processing.

  13. Native Speaker Norms and China English: From the Perspective of Learners and Teachers in China

    Science.gov (United States)

    He, Deyuan; Zhang, Qunying

    2010-01-01

    This article explores the question of whether the norms based on native speakers of English should be kept in English teaching in an era when English has become World Englishes. This is an issue that has been keenly debated in recent years, not least in the pages of "TESOL Quarterly." However, "China English" in such debates…

  14. Instrumental Analysis of the English Stops Produced by Arabic Speakers of English

    Directory of Open Access Journals (Sweden)

    Noureldin Mohamed Abdelaal

    2017-07-01

    Full Text Available This study reports the findings of a research that was conducted on ten (10 Arab students, who were enrolled in a master of English applied linguistics program at Universiti Putra Malaysia. The research aimed at instrumentally analyzing the English stops produced by Arab learners, in terms of voice onset time (VOT; identifying the effect of their mother tongue on producing the English stops; and the extent Arabic speakers of English differentiate in terms of pronunciation between minimal pairs. The findings of the study showed that some of the subjects’ VOT values were similar to native speakers of English. It was also found that the subjects could differentiate in terms of aspiration or voicing between /p/ and /b/, which refutes the assumption that Arab learners have a problem in producing the /p/ sound with appropriate aspiration. However, they did not show significant difference in pronunciation between the /t/ and /d/ or between the /k/ and /g/. Moreover, there is a kind of limited effect of the L1 on producing some stops (e.g. /t/ and /g/. However, for the /b/ sound, it cannot be inferred that there is interference from the mother tongue because its VOT value is almost the same in English and Arabic. This research suggests that teachers need to enhance Arab learners’ pronunciation of some minimal pairs such as /t/ and /d/ or /k/ and /g/.

  15. The Understanding of English Emotion Words by Chinese and Japanese Speakers of English as a Lingua Franca

    DEFF Research Database (Denmark)

    Mosekjær, Stine

    In this thesis I investigate the understanding and use of the English emotion words guilty, ashamed, and proud by Japanese and Chinese speakers of English as a lingua franca. By exploring empirical data I examine (1) how Japanese and Chinese participants understand and use the three stimulus words......, (2) if their understanding and use differ from that of native English speakers, and (3) if so, what these differences are. In the thesis 65 participants are investigated. The participants consist of 20 native Japanese and 23 native Chinese. For comparison, a group of 22 British native English....... The framework, which is based on the theoretical notion of the word as an image-idea pair as suggested by the theory of linguistic supertypes, consists of three tests each addressing three different aspects of the understanding and use of the stimulus words: the Free Association test (FA test), the Context...

  16. L2 Romanian Influence in the Acquisition of the English Passive by L1 Speakers of Hungarian

    Directory of Open Access Journals (Sweden)

    Tankó Enikő

    2015-03-01

    Full Text Available The main question to be investigated is to what extent native speakers of Hungarian understand and acquire the English passive voice, as there is no generalized syntactic passive construction in Hungarian. As we will show, native speakers of Hungarian tend to use the predicative verbal adverbial construction when translating English passive sentences, as this construction is the closest syntactic equivalent of the English passive voice. Another question to be investigated is whether L2 Romanian works as a facilitating factor in the process of acquiring the L3 English passive voice. If all our subjects, Hungarian students living in Romania, were Hungarian-Romanian bilinguals, it would be obvious that knowledge of Romanian helps them in acquiring the English passive. However, as it will be shown, the bilingualism hypothesis is disconfirmed. Still, passive knowledge of Romanian influences to some extent the acquisition of the English passive voice.

  17. Barriers beyond words: cancer, culture, and translation in a community of Russian speakers.

    Science.gov (United States)

    Dohan, Daniel; Levintova, Marya

    2007-11-01

    Language and culture relate in complex ways. Addressing this complexity in the context of language translation is a challenge when caring for patients with limited English proficiency (LEP). To examine processes of care related to language, culture and translation in an LEP population is the objective of this study. We used community based participatory research to examine the experiences of Russian-speaking cancer patients in San Francisco, California. A Russian Cancer Information Taskforce (RCIT), including community-based organizations, local government, and clinics, participated in all phases of the study. A purposeful sample of 74 individuals were the participants of the study. The RCIT shaped research themes and facilitated access to participants. Methods were focus groups, individual interviews, and participant observation. RCIT reviewed data and provided guidance in interpreting results. Four themes emerged. (1) Local Russian-language resources were seen as inadequate and relatively unavailable compared to other non-English languages; (2) a taboo about the word "cancer" led to language "games" surrounding disclosure; (3) this taboo, and other dynamics of care, reflected expectations that Russian speakers derived from experiences in their countries of origin; (4) using interpreters as cultural brokers or establishing support groups for Russian speakers could help address barriers. The language barriers experienced by this LEP population reflect cultural and linguistic issues. Providers should consider partnering with trained interpreters to address the intertwining of language and culture.

  18. Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects

    Science.gov (United States)

    S. Al-Kaltakchi, Musab T.; Woo, Wai L.; Dlay, Satnam; Chambers, Jonathon A.

    2017-12-01

    In this study, a speaker identification system is considered consisting of a feature extraction stage which utilizes both power normalized cepstral coefficients (PNCCs) and Mel frequency cepstral coefficients (MFCC). Normalization is applied by employing cepstral mean and variance normalization (CMVN) and feature warping (FW), together with acoustic modeling using a Gaussian mixture model-universal background model (GMM-UBM). The main contributions are comprehensive evaluations of the effect of both additive white Gaussian noise (AWGN) and non-stationary noise (NSN) (with and without a G.712 type handset) upon identification performance. In particular, three NSN types with varying signal to noise ratios (SNRs) were tested corresponding to street traffic, a bus interior, and a crowded talking environment. The performance evaluation also considered the effect of late fusion techniques based on score fusion, namely, mean, maximum, and linear weighted sum fusion. The databases employed were TIMIT, SITW, and NIST 2008; and 120 speakers were selected from each database to yield 3600 speech utterances. As recommendations from the study, mean fusion is found to yield overall best performance in terms of speaker identification accuracy (SIA) with noisy speech, whereas linear weighted sum fusion is overall best for original database recordings.

  19. Examining the Native Speakers' Understanding of Communicative Purposes of a Written Genre in Modern Standard Chinese.

    Science.gov (United States)

    Yunxia, Zhu

    1997-01-01

    Examines the different attitudes of native speakers in understanding a written genre of Modern Standard Chinese--sales letters. The study focuses on the use of formulaic components appearing in real Chinese sales letters and compares these components with the advice given in textbooks. Findings reveal a gap between business teaching and business…

  20. Noun and verb knowledge in monolingual preschool children across 17 languages: Data from Cross-linguistic Lexical Tasks (LITMUS-CLT).

    Science.gov (United States)

    Haman, Ewa; Łuniewska, Magdalena; Hansen, Pernille; Simonsen, Hanne Gram; Chiat, Shula; Bjekić, Jovana; Blažienė, Agnė; Chyl, Katarzyna; Dabašinskienė, Ineta; Engel de Abreu, Pascale; Gagarina, Natalia; Gavarró, Anna; Håkansson, Gisela; Harel, Efrat; Holm, Elisabeth; Kapalková, Svetlana; Kunnari, Sari; Levorato, Chiara; Lindgren, Josefin; Mieszkowska, Karolina; Montes Salarich, Laia; Potgieter, Anneke; Ribu, Ingeborg; Ringblom, Natalia; Rinker, Tanja; Roch, Maja; Slančová, Daniela; Southwood, Frenette; Tedeschi, Roberta; Tuncer, Aylin Müge; Ünal-Logacev, Özlem; Vuksanović, Jasmina; Armon-Lotem, Sharon

    2017-01-01

    This article investigates the cross-linguistic comparability of the newly developed lexical assessment tool Cross-linguistic Lexical Tasks (LITMUS-CLT). LITMUS-CLT is a part the Language Impairment Testing in Multilingual Settings (LITMUS) battery (Armon-Lotem, de Jong & Meir, 2015). Here we analyse results on receptive and expressive word knowledge tasks for nouns and verbs across 17 languages from eight different language families: Baltic (Lithuanian), Bantu (isiXhosa), Finnic (Finnish), Germanic (Afrikaans, British English, South African English, German, Luxembourgish, Norwegian, Swedish), Romance (Catalan, Italian), Semitic (Hebrew), Slavic (Polish, Serbian, Slovak) and Turkic (Turkish). The participants were 639 monolingual children aged 3;0-6;11 living in 15 different countries. Differences in vocabulary size were small between 16 of the languages; but isiXhosa-speaking children knew significantly fewer words than speakers of the other languages. There was a robust effect of word class: accuracy was higher for nouns than verbs. Furthermore, comprehension was more advanced than production. Results are discussed in the context of cross-linguistic comparisons of lexical development in monolingual and bilingual populations.