robust speaker identification: Topics by WorldWideScience.org

Sample records for robust speaker identification

Robust Digital Speech Watermarking For Online Speaker Recognition

Directory of Open Access Journals (Sweden)

Mohammad Ali Nematollahi

2015-01-01

Full Text Available A robust and blind digital speech watermarking technique has been proposed for online speaker recognition systems based on Discrete Wavelet Packet Transform (DWPT and multiplication to embed the watermark in the amplitudes of the wavelet’s subbands. In order to minimize the degradation effect of the watermark, these subbands are selected where less speaker-specific information was available (500 Hz–3500 Hz and 6000 Hz–7000 Hz. Experimental results on Texas Instruments Massachusetts Institute of Technology (TIMIT, Massachusetts Institute of Technology (MIT, and Mobile Biometry (MOBIO show that the degradation for speaker verification and identification is 1.16% and 2.52%, respectively. Furthermore, the proposed watermark technique can provide enough robustness against different signal processing attacks.
LEARNING VECTOR QUANTIZATION FOR ADAPTED GAUSSIAN MIXTURE MODELS IN AUTOMATIC SPEAKER IDENTIFICATION

Directory of Open Access Journals (Sweden)

IMEN TRABELSI

2017-05-01

Full Text Available Speaker Identification (SI aims at automatically identifying an individual by extracting and processing information from his/her voice. Speaker voice is a robust a biometric modality that has a strong impact in several application areas. In this study, a new combination learning scheme has been proposed based on Gaussian mixture model-universal background model (GMM-UBM and Learning vector quantization (LVQ for automatic text-independent speaker identification. Features vectors, constituted by the Mel Frequency Cepstral Coefficients (MFCC extracted from the speech signal are used to train the New England subset of the TIMIT database. The best results obtained (90% for gender- independent speaker identification, 97 % for male speakers and 93% for female speakers for test data using 36 MFCC features.
Cost-Sensitive Learning for Emotion Robust Speaker Recognition

Directory of Open Access Journals (Sweden)

Dongdong Li

2014-01-01

Full Text Available In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.
Cost-sensitive learning for emotion robust speaker recognition.

Science.gov (United States)

Li, Dongdong; Yang, Yingchun; Dai, Weihui

2014-01-01

In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.
Robustness-related issues in speaker recognition

CERN Document Server

Zheng, Thomas Fang

2017-01-01

This book presents an overview of speaker recognition technologies with an emphasis on dealing with robustness issues. Firstly, the book gives an overview of speaker recognition, such as the basic system framework, categories under different criteria, performance evaluation and its development history. Secondly, with regard to robustness issues, the book presents three categories, including environment-related issues, speaker-related issues and application-oriented issues. For each category, the book describes the current hot topics, existing technologies, and potential research focuses in the future. The book is a useful reference book and self-learning guide for early researchers working in the field of robust speech recognition.
Robust speaker recognition in noisy environments

CERN Document Server

Rao, K Sreenivasa

2014-01-01

This book discusses speaker recognition methods to deal with realistic variable noisy environments. The text covers authentication systems for; robust noisy background environments, functions in real time and incorporated in mobile devices. The book focuses on different approaches to enhance the accuracy of speaker recognition in presence of varying background environments. The authors examine: (a) Feature compensation using multiple background models, (b) Feature mapping using data-driven stochastic models, (c) Design of super vector- based GMM-SVM framework for robust speaker recognition, (d) Total variability modeling (i-vectors) in a discriminative framework and (e) Boosting method to fuse evidences from multiple SVM models.
Multi-Frame Rate Based Multiple-Model Training for Robust Speaker Identification of Disguised Voice

DEFF Research Database (Denmark)

Prasad, Swati; Tan, Zheng-Hua; Prasad, Ramjee

2013-01-01

Speaker identification systems are prone to attack when voice disguise is adopted by the user. To address this issue,our paper studies the effect of using different frame rates on the accuracy of the speaker identification system for disguised voice.In addition, a multi-frame rate based multiple......-model training method is proposed. The experimental results show the superior performance of the proposed method compared to the commonly used single frame rate method for three types of disguised voice taken from the CHAINS corpus....
Pitch Correlogram Clustering for Fast Speaker Identification

Directory of Open Access Journals (Sweden)

Nitin Jhanwar

2004-12-01

Full Text Available Gaussian mixture models (GMMs are commonly used in text-independent speaker identification systems. However, for large speaker databases, their high computational run-time limits their use in online or real-time speaker identification situations. Two-stage identification systems, in which the database is partitioned into clusters based on some proximity criteria and only a single-cluster GMM is run in every test, have been suggested in literature to speed up the identification process. However, most clustering algorithms used have shown limited success, apparently because the clustering and GMM feature spaces used are derived from similar speech characteristics. This paper presents a new clustering approach based on the concept of a pitch correlogram that captures frame-to-frame pitch variations of a speaker rather than short-time spectral characteristics like cepstral coefficient, spectral slopes, and so forth. The effectiveness of this two-stage identification process is demonstrated on the IVIE corpus of 110 speakers. The overall system achieves a run-time advantage of 500% as well as a 10% reduction of error in overall speaker identification.
FPGA Implementation for GMM-Based Speaker Identification

Directory of Open Access Journals (Sweden)

Phaklen EhKan

2011-01-01

Full Text Available In today's society, highly accurate personal identification systems are required. Passwords or pin numbers can be forgotten or forged and are no longer considered to offer a high level of security. The use of biological features, biometrics, is becoming widely accepted as the next level for security systems. Biometric-based speaker identification is a method of identifying persons from their voice. Speaker-specific characteristics exist in speech signals due to different speakers having different resonances of the vocal tract. These differences can be exploited by extracting feature vectors such as Mel-Frequency Cepstral Coefficients (MFCCs from the speech signal. A well-known statistical modelling process, the Gaussian Mixture Model (GMM, then models the distribution of each speaker's MFCCs in a multidimensional acoustic space. The GMM-based speaker identification system has features that make it promising for hardware acceleration. This paper describes the hardware implementation for classification of a text-independent GMM-based speaker identification system. The aim was to produce a system that can perform simultaneous identification of large numbers of voice streams in real time. This has important potential applications in security and in automated call centre applications. A speedup factor of ninety was achieved compared to a software implementation on a standard PC.
Noise Reduction with Microphone Arrays for Speaker Identification

Energy Technology Data Exchange (ETDEWEB)

Cohen, Z

2011-12-22

Reducing acoustic noise in audio recordings is an ongoing problem that plagues many applications. This noise is hard to reduce because of interfering sources and non-stationary behavior of the overall background noise. Many single channel noise reduction algorithms exist but are limited in that the more the noise is reduced; the more the signal of interest is distorted due to the fact that the signal and noise overlap in frequency. Specifically acoustic background noise causes problems in the area of speaker identification. Recording a speaker in the presence of acoustic noise ultimately limits the performance and confidence of speaker identification algorithms. In situations where it is impossible to control the environment where the speech sample is taken, noise reduction filtering algorithms need to be developed to clean the recorded speech of background noise. Because single channel noise reduction algorithms would distort the speech signal, the overall challenge of this project was to see if spatial information provided by microphone arrays could be exploited to aid in speaker identification. The goals are: (1) Test the feasibility of using microphone arrays to reduce background noise in speech recordings; (2) Characterize and compare different multichannel noise reduction algorithms; (3) Provide recommendations for using these multichannel algorithms; and (4) Ultimately answer the question - Can the use of microphone arrays aid in speaker identification?
Developing a Speaker Identification System for the DARPA RATS Project

DEFF Research Database (Denmark)

Plchot, O; Matsoukas, S; Matejka, P

2013-01-01

This paper describes the speaker identification (SID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. ...... such as CFCCs out-perform MFCC front-ends on noisy audio, and (c) fusion of multiple systems provides 24% relative improvement in EER compared to the single best system when using a novel SVM-based fusion algorithm that uses side information such as gender, language, and channel id....
A Joint Approach for Single-Channel Speaker Identification and Speech Separation

DEFF Research Database (Denmark)

Mowlaee, Pejman; Saeidi, Rahim; Christensen, Mads Græsbøll

2012-01-01

) accuracy, here, we report the objective and subjective results as well. The results show that the proposed system performs as well as the best of the state-of-the-art in terms of perceived quality while its performance in terms of speaker identification and automatic speech recognition results......In this paper, we present a novel system for joint speaker identification and speech separation. For speaker identification a single-channel speaker identification algorithm is proposed which provides an estimate of signal-to-signal ratio (SSR) as a by-product. For speech separation, we propose...... a sinusoidal model-based algorithm. The speech separation algorithm consists of a double-talk/single-talk detector followed by a minimum mean square error estimator of sinusoidal parameters for finding optimal codevectors from pre-trained speaker codebooks. In evaluating the proposed system, we start from...
Signal-to-Signal Ratio Independent Speaker Identification for Co-channel Speech Signals

DEFF Research Database (Denmark)

Saeidi, Rahim; Mowlaee, Pejman; Kinnunen, Tomi

2010-01-01

In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is recorded by one microphone only. The goal is to identify both of the speakers from their mixed signal. High recognition accuracies have already been reported when an accurately...
Using Avatars for Improving Speaker Identification in Captioning

Science.gov (United States)

Vy, Quoc V.; Fels, Deborah I.

Captioning is the main method for accessing television and film content by people who are deaf or hard-of-hearing. One major difficulty consistently identified by the community is that of knowing who is speaking particularly for an off screen narrator. A captioning system was created using a participatory design method to improve speaker identification. The final prototype contained avatars and a coloured border for identifying specific speakers. Evaluation results were very positive; however participants also wanted to customize various components such as caption and avatar location.
Speaker identification for the improvement of the security communication between law enforcement units

Science.gov (United States)

Tovarek, Jaromir; Partila, Pavol

2017-05-01

This article discusses the speaker identification for the improvement of the security communication between law enforcement units. The main task of this research was to develop the text-independent speaker identification system which can be used for real-time recognition. This system is designed for identification in the open set. It means that the unknown speaker can be anyone. Communication itself is secured, but we have to check the authorization of the communication parties. We have to decide if the unknown speaker is the authorized for the given action. The calls are recorded by IP telephony server and then these recordings are evaluate using classification If the system evaluates that the speaker is not authorized, it sends a warning message to the administrator. This message can detect, for example a stolen phone or other unusual situation. The administrator then performs the appropriate actions. Our novel proposal system uses multilayer neural network for classification and it consists of three layers (input layer, hidden layer, and output layer). A number of neurons in input layer corresponds with the length of speech features. Output layer then represents classified speakers. Artificial Neural Network classifies speech signal frame by frame, but the final decision is done over the complete record. This rule substantially increases accuracy of the classification. Input data for the neural network are a thirteen Mel-frequency cepstral coefficients, which describe the behavior of the vocal tract. These parameters are the most used for speaker recognition. Parameters for training, testing and validation were extracted from recordings of authorized users. Recording conditions for training data correspond with the real traffic of the system (sampling frequency, bit rate). The main benefit of the research is the system developed for text-independent speaker identification which is applied to secure communication between law enforcement units.
Performance of svm, k-nn and nbc classifiers for text-independent speaker identification with and without modelling through merging models

Directory of Open Access Journals (Sweden)

Yussouf Nahayo

2016-04-01

Full Text Available This paper proposes some methods of robust text-independent speaker identification based on Gaussian Mixture Model (GMM. We implemented a combination of GMM model with a set of classifiers such as Support Vector Machine (SVM, K-Nearest Neighbour (K-NN, and Naive Bayes Classifier (NBC. In order to improve the identification rate, we developed a combination of hybrid systems by using validation technique. The experiments were performed on the dialect DR1 of the TIMIT corpus. The results have showed a better performance for the developed technique compared to the individual techniques.
Robust Speaker Authentication Based on Combined Speech and Voiceprint Recognition

Science.gov (United States)

Malcangi, Mario

2009-08-01

Personal authentication is becoming increasingly important in many applications that have to protect proprietary data. Passwords and personal identification numbers (PINs) prove not to be robust enough to ensure that unauthorized people do not use them. Biometric authentication technology may offer a secure, convenient, accurate solution but sometimes fails due to its intrinsically fuzzy nature. This research aims to demonstrate that combining two basic speech processing methods, voiceprint identification and speech recognition, can provide a very high degree of robustness, especially if fuzzy decision logic is used.
Speaker Recognition

DEFF Research Database (Denmark)

Mølgaard, Lasse Lohilahti; Jørgensen, Kasper Winther

2005-01-01

Speaker recognition is basically divided into speaker identification and speaker verification. Verification is the task of automatically determining if a person really is the person he or she claims to be. This technology can be used as a biometric feature for verifying the identity of a person...
Similar speaker recognition using nonlinear analysis

International Nuclear Information System (INIS)

Seo, J.P.; Kim, M.S.; Baek, I.C.; Kwon, Y.H.; Lee, K.S.; Chang, S.W.; Yang, S.I.

2004-01-01

Speech features of the conventional speaker identification system, are usually obtained by linear methods in spectral space. However, these methods have the drawback that speakers with similar voices cannot be distinguished, because the characteristics of their voices are also similar in spectral space. To overcome the difficulty in linear methods, we propose to use the correlation exponent in the nonlinear space as a new feature vector for speaker identification among persons with similar voices. We show that our proposed method surprisingly reduces the error rate of speaker identification system to speakers with similar voices
Using Closed-Set Speaker Identification Score Confidence to Enhance Audio-Based Collaborative Filtering for Multiple Users

DEFF Research Database (Denmark)

Shepstone, Sven Ewan; Tan, Zheng-Hua; Kristoffersen, Miklas Strøm

2018-01-01

In this paper, we utilize a closed-set speaker-identification approach to convey the ratings needed for collaborative filtering-based recommendation. Instead of explicitly providing a rating for a given program, users use a speech interface to dictate the desired rating after watching a movie. Due...... to the inaccuracies that may be imposed by a state-of-the-art speaker identification system, it is possible to mistake a user for another user in the household, especially when the users exhibit similar or identical age and gender demographics. This leads to the undesirable effect of injecting unwanted ratings...... into the collaborative rating matrix, and when the users have different tastes, can result in the recommendation of undesirable items. We therefore propose a simple confidence-based heuristic that utilizes the log-likelihood scores from the speaker identification front-end. The algorithm limits the degree to which...

Text-Independent Speaker Identification Using the Histogram Transform Model

DEFF Research Database (Denmark)

Ma, Zhanyu; Yu, Hong; Tan, Zheng-Hua

2016-01-01

In this paper, we propose a novel probabilistic method for the task of text-independent speaker identification (SI). In order to capture the dynamic information during SI, we design a super-MFCCs features by cascading three neighboring Mel-frequency Cepstral coefficients (MFCCs) frames together....... These super-MFCC vectors are utilized for probabilistic model training such that the speaker’s characteristics can be sufficiently captured. The probability density function (PDF) of the aforementioned super-MFCCs features is estimated by the recently proposed histogram transform (HT) method. To recedes...
Speaker gender identification based on majority vote classifiers

Science.gov (United States)

Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri

2017-03-01

Speaker gender identification is considered among the most important tools in several multimedia applications namely in automatic speech recognition, interactive voice response systems and audio browsing systems. Gender identification systems performance is closely linked to the selected feature set and the employed classification model. Typical techniques are based on selecting the best performing classification method or searching optimum tuning of one classifier parameters through experimentation. In this paper, we consider a relevant and rich set of features involving pitch, MFCCs as well as other temporal and frequency-domain descriptors. Five classification models including decision tree, discriminant analysis, nave Bayes, support vector machine and k-nearest neighbor was experimented. The three best perming classifiers among the five ones will contribute by majority voting between their scores. Experimentations were performed on three different datasets spoken in three languages: English, German and Arabic in order to validate language independency of the proposed scheme. Results confirm that the presented system has reached a satisfying accuracy rate and promising classification performance thanks to the discriminating abilities and diversity of the used features combined with mid-level statistics.
Joint Single-Channel Speech Separation and Speaker Identification

DEFF Research Database (Denmark)

Mowlaee, Pejman; Saeidi, Rahim; Tan, Zheng-Hua

2010-01-01

In this paper, we propose a closed loop system to improve the performance of single-channel speech separation in a speaker independent scenario. The system is composed of two interconnected blocks: a separation block and a speaker identiſcation block. The improvement is accomplished by incorporat......In this paper, we propose a closed loop system to improve the performance of single-channel speech separation in a speaker independent scenario. The system is composed of two interconnected blocks: a separation block and a speaker identiſcation block. The improvement is accomplished...... enhances the quality of the separated output signals. To assess the improvements, the results are reported in terms of PESQ for both target and masked signals....
Optimization of multilayer neural network parameters for speaker recognition

Science.gov (United States)

Tovarek, Jaromir; Partila, Pavol; Rozhon, Jan; Voznak, Miroslav; Skapa, Jan; Uhrin, Dominik; Chmelikova, Zdenka

2016-05-01

This article discusses the impact of multilayer neural network parameters for speaker identification. The main task of speaker identification is to find a specific person in the known set of speakers. It means that the voice of an unknown speaker (wanted person) belongs to a group of reference speakers from the voice database. One of the requests was to develop the text-independent system, which means to classify wanted person regardless of content and language. Multilayer neural network has been used for speaker identification in this research. Artificial neural network (ANN) needs to set parameters like activation function of neurons, steepness of activation functions, learning rate, the maximum number of iterations and a number of neurons in the hidden and output layers. ANN accuracy and validation time are directly influenced by the parameter settings. Different roles require different settings. Identification accuracy and ANN validation time were evaluated with the same input data but different parameter settings. The goal was to find parameters for the neural network with the highest precision and shortest validation time. Input data of neural networks are a Mel-frequency cepstral coefficients (MFCC). These parameters describe the properties of the vocal tract. Audio samples were recorded for all speakers in a laboratory environment. Training, testing and validation data set were split into 70, 15 and 15 %. The result of the research described in this article is different parameter setting for the multilayer neural network for four speakers.
Influence of binary mask estimation errors on robust speaker identification

DEFF Research Database (Denmark)

May, Tobias

2017-01-01

Missing-data strategies have been developed to improve the noise-robustness of automatic speech recognition systems in adverse acoustic conditions. This is achieved by classifying time-frequency (T-F) units into reliable and unreliable components, as indicated by a so-called binary mask. Different...... approaches have been proposed to handle unreliable feature components, each with distinct advantages. The direct masking (DM) approach attenuates unreliable T-F units in the spectral domain, which allows the extraction of conventionally used mel-frequency cepstral coefficients (MFCCs). Instead of attenuating....... Since each of these approaches utilizes the knowledge about reliable and unreliable feature components in a different way, they will respond differently to estimation errors in the binary mask. The goal of this study was to identify the most effective strategy to exploit knowledge about reliable...
The Role of Speaker Identification in Korean University Students' Attitudes towards Five Varieties of English

Science.gov (United States)

Yook, Cheongmin; Lindemann, Stephanie

2013-01-01

This study investigates how the attitudes of 60 Korean university students towards five varieties of English are affected by the identification of the speaker's nationality and ethnicity. The study employed both a verbal guise technique and questions eliciting overt beliefs and preferences related to learning English. While the majority of the…
Hybrid Speaker Recognition Using Universal Acoustic Model

Science.gov (United States)

Nishimura, Jun; Kuroda, Tadahiro

We propose a novel speaker recognition approach using a speaker-independent universal acoustic model (UAM) for sensornet applications. In sensornet applications such as “Business Microscope”, interactions among knowledge workers in an organization can be visualized by sensing face-to-face communication using wearable sensor nodes. In conventional studies, speakers are detected by comparing energy of input speech signals among the nodes. However, there are often synchronization errors among the nodes which degrade the speaker recognition performance. By focusing on property of the speaker's acoustic channel, UAM can provide robustness against the synchronization error. The overall speaker recognition accuracy is improved by combining UAM with the energy-based approach. For 0.1s speech inputs and 4 subjects, speaker recognition accuracy of 94% is achieved at the synchronization error less than 100ms.
Identification and robust control of an experimental servo motor.

Science.gov (United States)

Adam, E J; Guestrin, E D

2002-04-01

In this work, the design of a robust controller for an experimental laboratory-scale position control system based on a dc motor drive as well as the corresponding identification and robust stability analysis are presented. In order to carry out the robust design procedure, first, a classic closed-loop identification technique is applied and then, the parametrization by internal model control is used. The model uncertainty is evaluated under both parametric and global representation. For the latter case, an interesting discussion about the conservativeness of this description is presented by means of a comparison between the uncertainty disk and the critical perturbation radius approaches. Finally, conclusions about the performance of the experimental system with the robust controller are discussed using comparative graphics of the controlled variable and the Nyquist stability margin as a robustness measurement.
Visual speaker gender affects vowel identification in Danish

DEFF Research Database (Denmark)

Larsen, Charlotte; Tøndering, John

2013-01-01

The experiment examined the effect of visual speaker gender on the vowel perception of 20 native Danish-speaking subjects. Auditory stimuli consisting of a continuum between /muːlə/ ‘muzzle’ and /moːlə/ ‘pier’ generated using TANDEM-STRAIGHT matched with video clips of a female and a male speaker...
Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects

Science.gov (United States)

S. Al-Kaltakchi, Musab T.; Woo, Wai L.; Dlay, Satnam; Chambers, Jonathon A.

2017-12-01

In this study, a speaker identification system is considered consisting of a feature extraction stage which utilizes both power normalized cepstral coefficients (PNCCs) and Mel frequency cepstral coefficients (MFCC). Normalization is applied by employing cepstral mean and variance normalization (CMVN) and feature warping (FW), together with acoustic modeling using a Gaussian mixture model-universal background model (GMM-UBM). The main contributions are comprehensive evaluations of the effect of both additive white Gaussian noise (AWGN) and non-stationary noise (NSN) (with and without a G.712 type handset) upon identification performance. In particular, three NSN types with varying signal to noise ratios (SNRs) were tested corresponding to street traffic, a bus interior, and a crowded talking environment. The performance evaluation also considered the effect of late fusion techniques based on score fusion, namely, mean, maximum, and linear weighted sum fusion. The databases employed were TIMIT, SITW, and NIST 2008; and 120 speakers were selected from each database to yield 3600 speech utterances. As recommendations from the study, mean fusion is found to yield overall best performance in terms of speaker identification accuracy (SIA) with noisy speech, whereas linear weighted sum fusion is overall best for original database recordings.
Multimodal Speaker Diarization.

Science.gov (United States)

Noulas, A; Englebienne, G; Krose, B J A

2012-01-01

We present a novel probabilistic framework that fuses information coming from the audio and video modality to perform speaker diarization. The proposed framework is a Dynamic Bayesian Network (DBN) that is an extension of a factorial Hidden Markov Model (fHMM) and models the people appearing in an audiovisual recording as multimodal entities that generate observations in the audio stream, the video stream, and the joint audiovisual space. The framework is very robust to different contexts, makes no assumptions about the location of the recording equipment, and does not require labeled training data as it acquires the model parameters using the Expectation Maximization (EM) algorithm. We apply the proposed model to two meeting videos and a news broadcast video, all of which come from publicly available data sets. The results acquired in speaker diarization are in favor of the proposed multimodal framework, which outperforms the single modality analysis results and improves over the state-of-the-art audio-based speaker diarization.
Segmentation of the Speaker's Face Region with Audiovisual Correlation

Science.gov (United States)

Liu, Yuyu; Sato, Yoichi

The ability to find the speaker's face region in a video is useful for various applications. In this work, we develop a novel technique to find this region within different time windows, which is robust against the changes of view, scale, and background. The main thrust of our technique is to integrate audiovisual correlation analysis into a video segmentation framework. We analyze the audiovisual correlation locally by computing quadratic mutual information between our audiovisual features. The computation of quadratic mutual information is based on the probability density functions estimated by kernel density estimation with adaptive kernel bandwidth. The results of this audiovisual correlation analysis are incorporated into graph cut-based video segmentation to resolve a globally optimum extraction of the speaker's face region. The setting of any heuristic threshold in this segmentation is avoided by learning the correlation distributions of speaker and background by expectation maximization. Experimental results demonstrate that our method can detect the speaker's face region accurately and robustly for different views, scales, and backgrounds.
Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations

Directory of Open Access Journals (Sweden)

Md. Rabiul Islam

2014-01-01

Full Text Available The aim of the paper is to propose a feature fusion based Audio-Visual Speaker Identification (AVSI system with varied conditions of illumination environments. Among the different fusion strategies, feature level fusion has been used for the proposed AVSI system where Hidden Markov Model (HMM is used for learning and classification. Since the feature set contains richer information about the raw biometric data than any other levels, integration at feature level is expected to provide better authentication results. In this paper, both Mel Frequency Cepstral Coefficients (MFCCs and Linear Prediction Cepstral Coefficients (LPCCs are combined to get the audio feature vectors and Active Shape Model (ASM based appearance and shape facial features are concatenated to take the visual feature vectors. These combined audio and visual features are used for the feature-fusion. To reduce the dimension of the audio and visual feature vectors, Principal Component Analysis (PCA method is used. The VALID audio-visual database is used to measure the performance of the proposed system where four different illumination levels of lighting conditions are considered. Experimental results focus on the significance of the proposed audio-visual speaker identification system with various combinations of audio and visual features.
Using timing information in speaker verification

CSIR Research Space (South Africa)

Van Heerden, CJ

2005-11-01

Full Text Available This paper presents an analysis of temporal information as a feature for use in speaker verification systems. The relevance of temporal information in a speaker’s utterances is investigated, both with regard to improving the robustness of modern...
A robust firearm identification algorithm of forensic ballistics specimens

Science.gov (United States)

Chuan, Z. L.; Jemain, A. A.; Liong, C.-Y.; Ghani, N. A. M.; Tan, L. K.

2017-09-01

There are several inherent difficulties in the existing firearm identification algorithms, include requiring the physical interpretation and time consuming. Therefore, the aim of this study is to propose a robust algorithm for a firearm identification based on extracting a set of informative features from the segmented region of interest (ROI) using the simulated noisy center-firing pin impression images. The proposed algorithm comprises Laplacian sharpening filter, clustering-based threshold selection, unweighted least square estimator, and segment a square ROI from the noisy images. A total of 250 simulated noisy images collected from five different pistols of the same make, model and caliber are used to evaluate the robustness of the proposed algorithm. This study found that the proposed algorithm is able to perform the identical task on the noisy images with noise levels as high as 70%, while maintaining a firearm identification accuracy rate of over 90%.
A constrained robust least squares approach for contaminant release history identification

Science.gov (United States)

Sun, Alexander Y.; Painter, Scott L.; Wittmeyer, Gordon W.

2006-04-01

Contaminant source identification is an important type of inverse problem in groundwater modeling and is subject to both data and model uncertainty. Model uncertainty was rarely considered in the previous studies. In this work, a robust framework for solving contaminant source recovery problems is introduced. The contaminant source identification problem is first cast into one of solving uncertain linear equations, where the response matrix is constructed using a superposition technique. The formulation presented here is general and is applicable to any porous media flow and transport solvers. The robust least squares (RLS) estimator, which originated in the field of robust identification, directly accounts for errors arising from model uncertainty and has been shown to significantly reduce the sensitivity of the optimal solution to perturbations in model and data. In this work, a new variant of RLS, the constrained robust least squares (CRLS), is formulated for solving uncertain linear equations. CRLS allows for additional constraints, such as nonnegativity, to be imposed. The performance of CRLS is demonstrated through one- and two-dimensional test problems. When the system is ill-conditioned and uncertain, it is found that CRLS gave much better performance than its classical counterpart, the nonnegative least squares. The source identification framework developed in this work thus constitutes a reliable tool for recovering source release histories in real applications.
Analysis of human scream and its impact on text-independent speaker verification.

Science.gov (United States)

Hansen, John H L; Nandwana, Mahesh Kumar; Shokouhi, Navid

2017-04-01

Scream is defined as sustained, high-energy vocalizations that lack phonological structure. Lack of phonological structure is how scream is identified from other forms of loud vocalization, such as "yell." This study investigates the acoustic aspects of screams and addresses those that are known to prevent standard speaker identification systems from recognizing the identity of screaming speakers. It is well established that speaker variability due to changes in vocal effort and Lombard effect contribute to degraded performance in automatic speech systems (i.e., speech recognition, speaker identification, diarization, etc.). However, previous research in the general area of speaker variability has concentrated on human speech production, whereas less is known about non-speech vocalizations. The UT-NonSpeech corpus is developed here to investigate speaker verification from scream samples. This study considers a detailed analysis in terms of fundamental frequency, spectral peak shift, frame energy distribution, and spectral tilt. It is shown that traditional speaker recognition based on the Gaussian mixture models-universal background model framework is unreliable when evaluated with screams.
Gender Identification of the Speaker Using VQ Method

Directory of Open Access Journals (Sweden)

Vasif V. Nabiyev

2009-11-01

Full Text Available Speaking is the easiest and natural form of communication between people. Intensive studies are made in order to provide this communication via computers between people. The systems using voice biometric technology are attracting attention especially in the angle of cost and usage. When compared with the other biometic systems the application is much more practical. For example by using a microphone placed in the environment voice record can be obtained even without notifying the user and the system can be applied. Moreover the remote access facility is one of the other advantages of voice biometry. In this study, it is aimed to automatically determine the gender of the speaker through the speech waves which include personal information. If the speaker gender can be determined while composing models according to the gender information, the success of voice recognition systems can be increased in an important degree. Generally all the speaker recognition systems are composed of two parts which are feature extraction and matching. Feature extraction is the procedure in which the least information presenting the speech and the speaker is determined through voice signal. There are different features used in voice applications such as LPC, MFCC and PLP. In this study as a feature vector MFCC is used. Feature mathcing is the procedure in which the features derived from unknown speakers and known speaker group are compared. According to the text used in comparison the system is devided to two parts that are text dependent and text independent. While the same text is used in text dependent systems, different texts are used in indepentent text systems. Nowadays, DTW and HMM are text dependent, VQ and GMM are text indepentent matching methods. In this study due to the high success ratio and simple application features VQ approach is used.In this study a system which determines the speaker gender automatically and text independent is proposed. The proposed
[On the use of the spectral speech characteristics for the determination of biometric parameters of the vocal tract in forensic medical identification of the speaker's personality].

Science.gov (United States)

Kaganov, A Sh

2014-01-01

The objective of the present study was to elucidate the relationship between the spectral speech characteristics and the biometric parameters of the speaker's vocal tract. The secondary objective was to consider the theoretical basis behind the medico-criminalistic personality identification from the biometric parameters of the speaker's vocal tract. The article is based on the results of real forensic medical investigations and the literature data.
Learning speaker-specific characteristics with a deep neural architecture.

Science.gov (United States)

Chen, Ke; Salman, Ahmad

2011-11-01

Speech signals convey various yet mixed information ranging from linguistic to speaker-specific information. However, most of acoustic representations characterize all different kinds of information as whole, which could hinder either a speech or a speaker recognition (SR) system from producing a better performance. In this paper, we propose a novel deep neural architecture (DNA) especially for learning speaker-specific characteristics from mel-frequency cepstral coefficients, an acoustic representation commonly used in both speech recognition and SR, which results in a speaker-specific overcomplete representation. In order to learn intrinsic speaker-specific characteristics, we come up with an objective function consisting of contrastive losses in terms of speaker similarity/dissimilarity and data reconstruction losses used as regularization to normalize the interference of non-speaker-related information. Moreover, we employ a hybrid learning strategy for learning parameters of the deep neural networks: i.e., local yet greedy layerwise unsupervised pretraining for initialization and global supervised learning for the ultimate discriminative goal. With four Linguistic Data Consortium (LDC) benchmarks and two non-English corpora, we demonstrate that our overcomplete representation is robust in characterizing various speakers, no matter whether their utterances have been used in training our DNA, and highly insensitive to text and languages spoken. Extensive comparative studies suggest that our approach yields favorite results in speaker verification and segmentation. Finally, we discuss several issues concerning our proposed approach.

Forensic speaker identification through comparative analysis of the formant frequencies of the vowels in the Macedonian language

International Nuclear Information System (INIS)

Pop-Dimitrijoska, V.; Apostolovska, G

2012-01-01

The main objective of this study is forensic speaker identification from an incriminated recording. The identification was made through a comparative analysis between first three formants F 1 , F 2 and F 3 of the voice samples from the questioned and suspects’ recordings. The measurements were made with the PRAAT software, for each of the five vowels in the Macedonian language: a, e, i, o and u, which were isolated from the recordings. Used methodology of recording examinations employed in this research showed positive identification of the questioned voice. The forensic audio analysis still doesn't have its place in legal and the crime fighting systems in Macedonia. This is a sufficient reason to put a bigger accent on the research of this issue in the future that will contribute in solving many criminal cases which until now, because of the type of generally accepted evidence, were not resolved. (Author)
Set-Membership Identification for Robust Control Design

Science.gov (United States)

1993-04-28

Clauifica lion) ( U) Set-Memnbership Identification for Robust Control Design ___________________ 1. PERSONAL A UTHOR(SI Dr. Robert L. Kosul. Final Report...Shalom, E.Tse "Caution, probing, and the value of information in the control of un- certain systems", Annals of Economic and Social Measurement, 5/3, pp...knowing a bound on I the impulse response is quantitative. A similar clasoitication can be made regarding signal charateristics . Knowing that a signal is
System Identification and Resonant Control of Thermoacoustic Engines for Robust Solar Power

Directory of Open Access Journals (Sweden)

Boe-Shong Hong

2015-05-01

Full Text Available It was found that thermoacoustic solar-power generators with resonant control are more powerful than passive ones. To continue the work, this paper focuses on the synthesis of robustly resonant controllers that guarantee single-mode resonance not only in steady states, but also in transient states when modelling uncertainties happen and working temperature temporally varies. Here the control synthesis is based on the loop shifting and the frequency-domain identification in advance thereof. Frequency-domain identification is performed to modify the mathematical modelling and to identify the most powerful mode, so that the DSP-based feedback controller can online pitch the engine to the most powerful resonant-frequency robustly and accurately. Moreover, this paper develops two control tools, the higher-order van-der-Pol oscillator and the principle of Dynamical Equilibrium, to assist in system identification and feedback synthesis, respectively.
Recognition of speaker-dependent continuous speech with KEAL

Science.gov (United States)

Mercier, G.; Bigorgne, D.; Miclet, L.; Le Guennec, L.; Querre, M.

1989-04-01

A description of the speaker-dependent continuous speech recognition system KEAL is given. An unknown utterance, is recognized by means of the followng procedures: acoustic analysis, phonetic segmentation and identification, word and sentence analysis. The combination of feature-based, speaker-independent coarse phonetic segmentation with speaker-dependent statistical classification techniques is one of the main design features of the acoustic-phonetic decoder. The lexical access component is essentially based on a statistical dynamic programming technique which aims at matching a phonemic lexical entry containing various phonological forms, against a phonetic lattice. Sentence recognition is achieved by use of a context-free grammar and a parsing algorithm derived from Earley's parser. A speaker adaptation module allows some of the system parameters to be adjusted by matching known utterances with their acoustical representation. The task to be performed, described by its vocabulary and its grammar, is given as a parameter of the system. Continuously spoken sentences extracted from a 'pseudo-Logo' language are analyzed and results are presented.
Automatic Speaker Recognition for Mobile Forensic Applications

Directory of Open Access Journals (Sweden)

Mohammed Algabri

2017-01-01

Full Text Available Presently, lawyers, law enforcement agencies, and judges in courts use speech and other biometric features to recognize suspects. In general, speaker recognition is used for discriminating people based on their voices. The process of determining, if a suspected speaker is the source of trace, is called forensic speaker recognition. In such applications, the voice samples are most probably noisy, the recording sessions might mismatch each other, the sessions might not contain sufficient recording for recognition purposes, and the suspect voices are recorded through mobile channel. The identification of a person through his voice within a forensic quality context is challenging. In this paper, we propose a method for forensic speaker recognition for the Arabic language; the King Saud University Arabic Speech Database is used for obtaining experimental results. The advantage of this database is that each speaker’s voice is recorded in both clean and noisy environments, through a microphone and a mobile channel. This diversity facilitates its usage in forensic experimentations. Mel-Frequency Cepstral Coefficients are used for feature extraction and the Gaussian mixture model-universal background model is used for speaker modeling. Our approach has shown low equal error rates (EER, within noisy environments and with very short test samples.
Robust stator resistance identification of an IM drive using model reference adaptive system

International Nuclear Information System (INIS)

Madadi Kojabadi, Hossein; Abarzadeh, Mostafa; Aghaei Farouji, Said

2013-01-01

Highlights: ► We estimate the stator resistance and rotor speed of the IM. ► We proposed a new quantity to estimate the speed and stator resistance of IM. ► The proposed algorithm is robust to rotor resistance variations. ► We estimate the IM speed and stator resistance simultaneously to avoid speed error. - Abstract: Model reference adaptive system (MRAS) based robust stator resistance estimator for sensorless induction motor (IM) drive is proposed. The MRAS is formed with a semi-active power quantity. The proposed identification method can be achieved with on-line tuning of the stator resistance with robustness against rotor resistance variations. Stable and efficient estimation of IM speed at low region will be guaranteed by simultaneous identification of IM speed and stator resistance. The stability of proposed stator resistance estimator is checked through Popov’s hyperstability theorem. Simulation and experimental results are given to highlight the feasibility, the simplicity, and the robustness of the proposed method.
Limited data speaker identification

Indian Academy of Sciences (India)

recognition can be either identification or verification depending on the task objective. .... like Bayesian formalism, voting method and Dempster-Shafer (D–S) theory ..... self-organizing map (SOM) (Kohonen 1990), learning vector quantization ...
Utilising Tree-Based Ensemble Learning for Speaker Segmentation

DEFF Research Database (Denmark)

Abou-Zleikha, Mohamed; Tan, Zheng-Hua; Christensen, Mads Græsbøll

2014-01-01

In audio and speech processing, accurate detection of the changing points between multiple speakers in speech segments is an important stage for several applications such as speaker identification and tracking. Bayesian Information Criteria (BIC)-based approaches are the most traditionally used...... for a certain condition, the model becomes biased to the data used for training limiting the model’s generalisation ability. In this paper, we propose a BIC-based tuning-free approach for speaker segmentation through the use of ensemble-based learning. A forest of segmentation trees is constructed in which each...... tree is trained using a sampled version of the speech segment. During the tree construction process, a set of randomly selected points in the input sequence is examined as potential segmentation points. The point that yields the highest ΔBIC is chosen and the same process is repeated for the resultant...
A Novel Approach in Text-Independent Speaker Recognition in Noisy Environment

Directory of Open Access Journals (Sweden)

Nona Heydari Esfahani

2014-10-01

Full Text Available In this paper, robust text-independent speaker recognition is taken into consideration. The proposed method performs on manual silence-removed utterances that are segmented into smaller speech units containing few phones and at least one vowel. The segments are basic units for long-term feature extraction. Sub-band entropy is directly extracted in each segment. A robust vowel detection method is then applied on each segment to separate a high energy vowel that is used as unit for pitch frequency and formant extraction. By applying a clustering technique, extracted short-term features namely MFCC coefficients are combined with long term features. Experiments using MLP classifier show that the average speaker accuracy recognition rate is 97.33% for clean speech and 61.33% in noisy environment for -2db SNR, that shows improvement compared to other conventional methods.
A robust star identification algorithm with star shortlisting

Science.gov (United States)

Mehta, Deval Samirbhai; Chen, Shoushun; Low, Kay Soon

2018-05-01

A star tracker provides the most accurate attitude solution in terms of arc seconds compared to the other existing attitude sensors. When no prior attitude information is available, it operates in "Lost-In-Space (LIS)" mode. Star pattern recognition, also known as star identification algorithm, forms the most crucial part of a star tracker in the LIS mode. Recognition reliability and speed are the two most important parameters of a star pattern recognition technique. In this paper, a novel star identification algorithm with star ID shortlisting is proposed. Firstly, the star IDs are shortlisted based on worst-case patch mismatch, and later stars are identified in the image by an initial match confirmed with a running sequential angular match technique. The proposed idea is tested on 16,200 simulated star images having magnitude uncertainty, noise stars, positional deviation, and varying size of the field of view. The proposed idea is also benchmarked with the state-of-the-art star pattern recognition techniques. Finally, the real-time performance of the proposed technique is tested on the 3104 real star images captured by a star tracker SST-20S currently mounted on a satellite. The proposed technique can achieve an identification accuracy of 98% and takes only 8.2 ms for identification on real images. Simulation and real-time results depict that the proposed technique is highly robust and achieves a high speed of identification suitable for actual space applications.
Speaker segmentation and clustering

OpenAIRE

Kotti, M; Moschou, V; Kotropoulos, C

2008-01-01

07.08.13 KB. Ok to add the accepted version to Spiral, Elsevier says ok whlile mandate not enforced. This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker...
Robust Switching Control and Subspace Identification for Flutter of Flexible Wing

Directory of Open Access Journals (Sweden)

Yizhe Wang

2018-01-01

Full Text Available Active flutter suppression and subspace identification for a flexible wing model using micro fiber composite actuator were experimentally studied in a low speed wind tunnel. NACA0006 thin airfoil model was used for the experimental object to verify the performance of identification algorithm and designed controller. The equation of the fluid, vibration, and piezoelectric coupled motion was theoretically analyzed and experimentally identified under the open-loop and closed-loop condition by subspace method for controller design. A robust pole placement algorithm in terms of linear matrix inequality that accommodates the model uncertainty caused by identification deviation and flow speed variation was utilized to stabilize the divergent aeroelastic system. For further enlarging the flutter envelope, additional controllers were designed subject to the models beyond the flutter speed. Wind speed was measured online as the decision parameter of switching between the controllers. To ensure the stability of arbitrary switching, Common Lyapunov function method was applied to design the robust pole placement controllers for different models to ensure that the closed-loop system shared a common Lyapunov function. Wind tunnel result showed that the designed controllers could stabilize the time varying aeroelastic system over a wide range under arbitrary switching.
Integrated Robust Open-Set Speaker Identification System (IROSIS)

Science.gov (United States)

2012-05-01

the exact joint estimation, but the deviation is small enough. The references [16] and [18] introduce a “ Gauss - Seidel -like iterative algorithm... iteration for a given number of times or until convergence . The Baum-Welch statistics are re-calculated in every iteration . 3.3.1.2 MAP adaptation of GMMs...have almost converged , and in the subsequent iterations it is mostly the magnitude that gets adjusted. When the initial values are two small, it would
Speaker Authentication

CERN Document Server

Li, Qi (Peter)

2012-01-01

This book focuses on use of voice as a biometric measure for personal authentication. In particular, "Speaker Recognition" covers two approaches in speaker authentication: speaker verification (SV) and verbal information verification (VIV). The SV approach attempts to verify a speaker’s identity based on his/her voice characteristics while the VIV approach validates a speaker’s identity through verification of the content of his/her utterance(s). SV and VIV can be combined for new applications. This is still a new research topic with significant potential applications. The book provides with a broad overview of the recent advances in speaker authentication while giving enough attention to advanced and useful algorithms and techniques. It also provides a step by step introduction to the current state of the speaker authentication technology, from the fundamental concepts to advanced algorithms. We will also present major design methodologies and share our experience in developing real and successful speake...
Improving Speaker Recognition by Biometric Voice Deconstruction

Directory of Open Access Journals (Sweden)

Luis Miguel eMazaira-Fernández

2015-09-01

Full Text Available Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g. YouTube to broadcast its message. In this new scenario, classical identification methods (such fingerprints or face recognition have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. Through the present paper, a new methodology to characterize speakers will be shown. This methodology is benefiting from the advances achieved during the last years in understanding and modelling voice production. The paper hypothesizes that a gender dependent characterization of speakers combined with the use of a new set of biometric parameters extracted from the components resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract gender-dependent extended biometric parameters are given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.
An integer optimization algorithm for robust identification of non-linear gene regulatory networks

Directory of Open Access Journals (Sweden)

Chemmangattuvalappil Nishanth

2012-09-01

Full Text Available Abstract Background Reverse engineering gene networks and identifying regulatory interactions are integral to understanding cellular decision making processes. Advancement in high throughput experimental techniques has initiated innovative data driven analysis of gene regulatory networks. However, inherent noise associated with biological systems requires numerous experimental replicates for reliable conclusions. Furthermore, evidence of robust algorithms directly exploiting basic biological traits are few. Such algorithms are expected to be efficient in their performance and robust in their prediction. Results We have developed a network identification algorithm to accurately infer both the topology and strength of regulatory interactions from time series gene expression data in the presence of significant experimental noise and non-linear behavior. In this novel formulism, we have addressed data variability in biological systems by integrating network identification with the bootstrap resampling technique, hence predicting robust interactions from limited experimental replicates subjected to noise. Furthermore, we have incorporated non-linearity in gene dynamics using the S-system formulation. The basic network identification formulation exploits the trait of sparsity of biological interactions. Towards that, the identification algorithm is formulated as an integer-programming problem by introducing binary variables for each network component. The objective function is targeted to minimize the network connections subjected to the constraint of maximal agreement between the experimental and predicted gene dynamics. The developed algorithm is validated using both in silico and experimental data-sets. These studies show that the algorithm can accurately predict the topology and connection strength of the in silico networks, as quantified by high precision and recall, and small discrepancy between the actual and predicted kinetic parameters
Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment

Science.gov (United States)

2015-10-01

Dallas Erik Jonsson School of Engineering & Computer Science EC32 P.O. Box 830688 Richardson, Texas 75083-0688 8. PERFORMING ORGANIZATION REPORT...87 4.3 Whisper Based Processing for ASR ………………………………………….…. 92 5.0 Task 5: SPEAKER STATE ASSESSMENT/ ENVIROMENTAL SNIFFING (SSA/ENVS...Dec. 7-10, 2014 [3] S. Amuda, H. Boril, A. Sangwan, J.H.L. Hansen, T.S. Ibiyemi, “ Engineering analysis and recognition of Nigerian English: An
Identifying the nonlinear mechanical behaviour of micro-speakers from their quasi-linear electrical response

Science.gov (United States)

Zilletti, Michele; Marker, Arthur; Elliott, Stephen John; Holland, Keith

2017-05-01

In this study model identification of the nonlinear dynamics of a micro-speaker is carried out by purely electrical measurements, avoiding any explicit vibration measurements. It is shown that a dynamic model of the micro-speaker, which takes into account the nonlinear damping characteristic of the device, can be identified by measuring the response between the voltage input and the current flowing into the coil. An analytical formulation of the quasi-linear model of the micro-speaker is first derived and an optimisation method is then used to identify a polynomial function which describes the mechanical damping behaviour of the micro-speaker. The analytical results of the quasi-linear model are compared with numerical results. This study potentially opens up the possibility of efficiently implementing nonlinear echo cancellers.
Shhh… I Need Quiet! Children's Understanding of American, British, and Japanese-accented English Speakers.

Science.gov (United States)

Bent, Tessa; Holt, Rachael Frush

2018-02-01

Children's ability to understand speakers with a wide range of dialects and accents is essential for efficient language development and communication in a global society. Here, the impact of regional dialect and foreign-accent variability on children's speech understanding was evaluated in both quiet and noisy conditions. Five- to seven-year-old children ( n = 90) and adults ( n = 96) repeated sentences produced by three speakers with different accents-American English, British English, and Japanese-accented English-in quiet or noisy conditions. Adults had no difficulty understanding any speaker in quiet conditions. Their performance declined for the nonnative speaker with a moderate amount of noise; their performance only substantially declined for the British English speaker (i.e., below 93% correct) when their understanding of the American English speaker was also impeded. In contrast, although children showed accurate word recognition for the American and British English speakers in quiet conditions, they had difficulty understanding the nonnative speaker even under ideal listening conditions. With a moderate amount of noise, their perception of British English speech declined substantially and their ability to understand the nonnative speaker was particularly poor. These results suggest that although school-aged children can understand unfamiliar native dialects under ideal listening conditions, their ability to recognize words in these dialects may be highly susceptible to the influence of environmental degradation. Fully adult-like word identification for speakers with unfamiliar accents and dialects may exhibit a protracted developmental trajectory.
Perception of English palatal codas by Korean speakers of English

Science.gov (United States)

Yeon, Sang-Hee

2003-04-01

This study aimed at looking at perception of English palatal codas by Korean speakers of English to determine if perception problems are the source of production problems. In particular, first, this study looked at the possible first language effect on the perception of English palatal codas. Second, a possible perceptual source of vowel epenthesis after English palatal codas was investigated. In addition, individual factors, such as length of residence, TOEFL score, gender and academic status, were compared to determine if those affected the varying degree of the perception accuracy. Eleven adult Korean speakers of English as well as three native speakers of English participated in the study. Three sets of a perception test including identification of minimally different English pseudo- or real words were carried out. The results showed that, first, the Korean speakers perceived the English codas significantly worse than the Americans. Second, the study supported the idea that Koreans perceived an extra /i/ after the final affricates due to final release. Finally, none of the individual factors explained the varying degree of the perceptional accuracy. In particular, TOEFL scores and the perception test scores did not have any statistically significant association.

Incorporating Pass-Phrase Dependent Background Models for Text-Dependent Speaker verification

DEFF Research Database (Denmark)

Sarkar, Achintya Kumar; Tan, Zheng-Hua

2018-01-01

-dependent. We show that the proposed method significantly reduces the error rates of text-dependent speaker verification for the non-target types: target-wrong and impostor-wrong while it maintains comparable TD-SV performance when impostors speak a correct utterance with respect to the conventional system......In this paper, we propose pass-phrase dependent background models (PBMs) for text-dependent (TD) speaker verification (SV) to integrate the pass-phrase identification process into the conventional TD-SV system, where a PBM is derived from a text-independent background model through adaptation using...... the utterances of a particular pass-phrase. During training, pass-phrase specific target speaker models are derived from the particular PBM using the training data for the respective target model. While testing, the best PBM is first selected for the test utterance in the maximum likelihood (ML) sense...
Do Speakers and Listeners Observe the Gricean Maxim of Quantity?

Science.gov (United States)

Engelhardt, Paul E.; Bailey, Karl G. D.; Ferreira, Fernanda

2006-01-01

The Gricean Maxim of Quantity is believed to govern linguistic performance. Speakers are assumed to provide as much information as required for referent identification and no more, and listeners are believed to expect unambiguous but concise descriptions. In three experiments we examined the extent to which naive participants are sensitive to the…
On-orbit real-time robust cooperative target identification in complex background

Directory of Open Access Journals (Sweden)

Wen Zhuoman

2015-10-01

Full Text Available Cooperative target identification is the prerequisite for the relative position and orientation measurement between the space robot arm and the to-be-arrested object. We propose an on-orbit real-time robust algorithm for cooperative target identification in complex background using the features of circle and lines. It first extracts only the interested edges in the target image using an adaptive threshold and refines them to about single-pixel-width with improved non-maximum suppression. Adapting a novel tracking approach, edge segments changing smoothly in tangential directions are obtained. With a small amount of calculation, large numbers of invalid edges are removed. From the few remained edges, valid circular arcs are extracted and reassembled to obtain circles according to a reliable criterion. Finally, the target is identified if there are certain numbers of straight lines whose relative positions with the circle match the known target pattern. Experiments demonstrate that the proposed algorithm accurately identifies the cooperative target within the range of 0.3–1.5 m under complex background at the speed of 8 frames per second, regardless of lighting condition and target attitude. The proposed algorithm is very suitable for real-time visual measurement of space robot arm because of its robustness and small memory requirement.
Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification

DEFF Research Database (Denmark)

Delgado, Hector; Todisco, Massimiliano; Sahidullah, Md

2016-01-01

Many authentication applications involving automatic speaker verification (ASV) demand robust performance using short-duration, fixed or prompted text utterances. Text constraints not only reduce the phone-mismatch between enrollment and test utterances, which generally leads to improved performa...
Working with Speakers.

Science.gov (United States)

Pestel, Ann

1989-01-01

The author discusses working with speakers from business and industry to present career information at the secondary level. Advice for speakers is presented, as well as tips for program coordinators. (CH)
Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

Directory of Open Access Journals (Sweden)

Patterson Eric K

2002-01-01

Full Text Available Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multiple speakers in an application environment, are significantly reduced by the additional information provided by visual features. This paper presents information on a new audio-visual database, a feature study on moving speakers, and on baseline results for the whole speaker group. Although a few databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-visual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The Clemson University Audio-Visual Experiments (CUAVE database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7000 utterances. It contains a wide variety of speakers and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. A feature study of connected digit strings is also discussed. It compares stationary and moving talkers in a speaker-independent grouping. An image-processing-based contour technique, an image transform method, and a deformable template scheme are used in this comparison to obtain visual features. This paper also presents methods and results in an attempt to make these techniques more robust to speaker movement. Finally, initial baseline speaker-independent results are included using all speakers, and conclusions as well as suggested areas of research are
On the optimization of a mixed speaker array in an enclosed space using the virtual-speaker weighting method

Science.gov (United States)

Peng, Bo; Zheng, Sifa; Liao, Xiangning; Lian, Xiaomin

2018-03-01

In order to achieve sound field reproduction in a wide frequency band, multiple-type speakers are used. The reproduction accuracy is not only affected by the signals sent to the speakers, but also depends on the position and the number of each type of speaker. The method of optimizing a mixed speaker array is investigated in this paper. A virtual-speaker weighting method is proposed to optimize both the position and the number of each type of speaker. In this method, a virtual-speaker model is proposed to quantify the increment of controllability of the speaker array when the speaker number increases. While optimizing a mixed speaker array, the gain of the virtual-speaker transfer function is used to determine the priority orders of the candidate speaker positions, which optimizes the position of each type of speaker. Then the relative gain of the virtual-speaker transfer function is used to determine whether the speakers are redundant, which optimizes the number of each type of speaker. Finally the virtual-speaker weighting method is verified by reproduction experiments of the interior sound field in a passenger car. The results validate that the optimum mixed speaker array can be obtained using the proposed method.
Robust identification of noncoding RNA from transcriptomes requires phylogenetically-informed sampling.

Directory of Open Access Journals (Sweden)

Stinus Lindgreen

2014-10-01

Full Text Available Noncoding RNAs are integral to a wide range of biological processes, including translation, gene regulation, host-pathogen interactions and environmental sensing. While genomics is now a mature field, our capacity to identify noncoding RNA elements in bacterial and archaeal genomes is hampered by the difficulty of de novo identification. The emergence of new technologies for characterizing transcriptome outputs, notably RNA-seq, are improving noncoding RNA identification and expression quantification. However, a major challenge is to robustly distinguish functional outputs from transcriptional noise. To establish whether annotation of existing transcriptome data has effectively captured all functional outputs, we analysed over 400 publicly available RNA-seq datasets spanning 37 different Archaea and Bacteria. Using comparative tools, we identify close to a thousand highly-expressed candidate noncoding RNAs. However, our analyses reveal that capacity to identify noncoding RNA outputs is strongly dependent on phylogenetic sampling. Surprisingly, and in stark contrast to protein-coding genes, the phylogenetic window for effective use of comparative methods is perversely narrow: aggregating public datasets only produced one phylogenetic cluster where these tools could be used to robustly separate unannotated noncoding RNAs from a null hypothesis of transcriptional noise. Our results show that for the full potential of transcriptomics data to be realized, a change in experimental design is paramount: effective transcriptomics requires phylogeny-aware sampling.
A Parametric Learning and Identification Based Robust Iterative Learning Control for Time Varying Delay Systems

Directory of Open Access Journals (Sweden)

Lun Zhai

2014-01-01

Full Text Available A parametric learning based robust iterative learning control (ILC scheme is applied to the time varying delay multiple-input and multiple-output (MIMO linear systems. The convergence conditions are derived by using the H∞ and linear matrix inequality (LMI approaches, and the convergence speed is analyzed as well. A practical identification strategy is applied to optimize the learning laws and to improve the robustness and performance of the control system. Numerical simulations are illustrated to validate the above concepts.
Robust Recognition of Loud and Lombard speech in the Fighter Cockpit Environment

Science.gov (United States)

1988-08-01

the latter as inter-speaker variability. According to Zue [Z85j, inter-speaker variabilities can be attributed to sociolinguistic background, dialect...34 Journal of the Acoustical Society of America , Vol 50, 1971. [At74I B. S. Atal, "Linear prediction for speaker identification," Journal of the Acoustical...Society of America , Vol 55, 1974. [B771 B. Beek, E. P. Neuberg, and D. C. Hodge, "An Assessment of the Technology of Automatic Speech Recognition for
Supervised and Unsupervised Speaker Adaptation in the NIST 2005 Speaker Recognition Evaluation

National Research Council Canada - National Science Library

Hansen, Eric G; Slyh, Raymond E; Anderson, Timothy R

2006-01-01

Starting in 2004, the annual NIST Speaker Recognition Evaluation (SRE) has added an optional unsupervised speaker adaptation track where test files are processed sequentially and one may update the target model...
An automatic speech recognition system with speaker-independent identification support

Science.gov (United States)

Caranica, Alexandru; Burileanu, Corneliu

2015-02-01

The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.
Robust Identification of Developmentally Active Endothelial Enhancers in Zebrafish Using FANS-Assisted ATAC-Seq.

Science.gov (United States)

Quillien, Aurelie; Abdalla, Mary; Yu, Jun; Ou, Jianhong; Zhu, Lihua Julie; Lawson, Nathan D

2017-07-18

Identification of tissue-specific and developmentally active enhancers provides insights into mechanisms that control gene expression during embryogenesis. However, robust detection of these regulatory elements remains challenging, especially in vertebrate genomes. Here, we apply fluorescent-activated nuclei sorting (FANS) followed by Assay for Transposase-Accessible Chromatin with high-throughput sequencing (ATAC-seq) to identify developmentally active endothelial enhancers in the zebrafish genome. ATAC-seq of nuclei from Tg(fli1a:egfp) y1 transgenic embryos revealed expected patterns of nucleosomal positioning at transcriptional start sites throughout the genome and association with active histone modifications. Comparison of ATAC-seq from GFP-positive and -negative nuclei identified more than 5,000 open elements specific to endothelial cells. These elements flanked genes functionally important for vascular development and that displayed endothelial-specific gene expression. Importantly, a majority of tested elements drove endothelial gene expression in zebrafish embryos. Thus, FANS-assisted ATAC-seq using transgenic zebrafish embryos provides a robust approach for genome-wide identification of active tissue-specific enhancer elements. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
On the Use of Complementary Spectral Features for Speaker Recognition

Directory of Open Access Journals (Sweden)

Sridhar Krishnan

2007-12-01

Full Text Available The most popular features for speaker recognition are Mel frequency cepstral coefficients (MFCCs and linear prediction cepstral coefficients (LPCCs. These features are used extensively because they characterize the vocal tract configuration which is known to be highly speaker-dependent. In this work, several features are introduced that can characterize the vocal system in order to complement the traditional features and produce better speaker recognition models. The spectral centroid (SC, spectral bandwidth (SBW, spectral band energy (SBE, spectral crest factor (SCF, spectral flatness measure (SFM, Shannon entropy (SE, and Renyi entropy (RE were utilized for this purpose. This work demonstrates that these features are robust in noisy conditions by simulating some common distortions that are found in the speakers' environment and a typical telephone channel. Babble noise, additive white Gaussian noise (AWGN, and a bandpass channel with 1Ã¢Â€Â‰dB of ripple were used to simulate these noisy conditions. The results show significant improvements in classification performance for all noise conditions when these features were used to complement the MFCC and ÃŽÂ”MFCC features. In particular, the SC and SCF improved performance in almost all noise conditions within the examined SNR range (10Ã¢Â€Â“40Ã¢Â€Â‰dB. For example, in cases where there was only one source of distortion, classification improvements of up to 8% and 10% were achieved under babble noise and AWGN, respectively, using the SCF feature.
Performance-Driven Robust Identification and Control of Uncertain Dynamical Systems

Energy Technology Data Exchange (ETDEWEB)

Basar, Tamer

2001-10-29

The grant DEFG02-97ER13939 from the Department of Energy has supported our research program on robust identification and control of uncertain dynamical systems, initially for the three-year period June 15, 1997-June 14, 2000, which was then extended on a no-cost basis for another year until June 14, 2001. This final report provides an overview of our research conducted during this period, along with a complete list of publications supported by the Grant. Within the scope of this project, we have studied fundamental issues that arise in modeling, identification, filtering, control, stabilization, control-based model reduction, decomposition and aggregation, and optimization of uncertain systems. The mathematical framework we have worked in has allowed the system dynamics to be only partially known (with the uncertainties being of both parametric or structural nature), and further the dynamics to be perturbed by unknown dynamic disturbances. Our research over these four years has generated a substantial body of new knowledge, and has led to new major developments in theory, applications, and computational algorithms. These have all been documented in various journal articles and book chapters, and have been presented at leading conferences, as to be described. A brief description of the results we have obtained within the scope of this project can be found in Section 3. To set the stage for the material of that section, we first provide in the next section (Section 2) a brief description of the issues that arise in the control of uncertain systems, and introduce several criteria under which optimality will lead to robustness and stability. Section 4 contains a list of references cited in these two sections. A list of our publications supported by the DOE Grant (covering the period June 15, 1997-June 14, 2001) comprises Section 5 of the report.
When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.

Science.gov (United States)

Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola

2017-11-01

Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.
Student perceptions of native and non-native speaker language instructors: A comparison of ESL and Spanish

Directory of Open Access Journals (Sweden)

Laura Callahan

2006-12-01

Full Text Available The question of the native vs. non-native speaker status of second and foreign language instructors has been investigated chiefly from the perspective of the teacher. Anecdotal evidence suggests that students have strong opinions on the relative qualities of instruction by native and non-native speakers. Most research focuses on students of English as a foreign or second language. This paper reports on data gathered through a questionnaire administered to 55 university students: 31 students of Spanish as FL and 24 students of English as SL. Qualitative results show what strengths students believe each type of instructor has, and quantitative results confirm that any gap students may perceive between the abilities of native and non-native instructors is not so wide as one might expect based on popular notions of the issue. ESL students showed a stronger preference for native-speaker instructors overall, and were at variance with the SFL students' ratings of native-speaker instructors' performance on a number of aspects. There was a significant correlation in both groups between having a family member who is a native speaker of the target language and student preference for and self-identification with a native speaker as instructor. (English text
A Robust Iris Identification System Based on Wavelet Packet Decomposition and Local Comparisons of the Extracted Signatures

Directory of Open Access Journals (Sweden)

Rossant Florence

2010-01-01

Full Text Available Abstract This paper presents a complete iris identification system including three main stages: iris segmentation, signature extraction, and signature comparison. An accurate and robust pupil and iris segmentation process, taking into account eyelid occlusions, is first detailed and evaluated. Then, an original wavelet-packet-based signature extraction method and a novel identification approach, based on the fusion of local distance measures, are proposed. Performance measurements validating the proposed iris signature and demonstrating the benefit of our local-based signature comparison are provided. Moreover, an exhaustive evaluation of robustness, with regards to the acquisition conditions, attests the high performances and the reliability of our system. Tests have been conducted on two different databases, the well-known CASIA database (V3 and our ISEP database. Finally, a comparison of the performances of our system with the published ones is given and discussed.
"Feminism Lite?" Feminist Identification, Speaker Appearance, and Perceptions of Feminist and Antifeminist Messengers

Science.gov (United States)

Bullock, Heather E.; Fernald, Julian L.

2003-01-01

Drawing on a communications model of persuasion (Hovland, Janis, & Kelley, 1953), this study examined the effect of target appearance on feminists' and nonfeminists' perceptions of a speaker delivering a feminist or an antifeminist message. One hundred three college women watched one of four videotaped speeches that varied by content (profeminist…
Noise-robust cortical tracking of attended speech in real-world acoustic scenes

DEFF Research Database (Denmark)

Fuglsang, Søren; Dau, Torsten; Hjortkjær, Jens

2017-01-01

Selectively attending to one speaker in a multi-speaker scenario is thought to synchronize low-frequency cortical activity to the attended speech signal. In recent studies, reconstruction of speech from single-trial electroencephalogram (EEG) data has been used to decode which talker a listener...... is attending to in a two-talker situation. It is currently unclear how this generalizes to more complex sound environments. Behaviorally, speech perception is robust to the acoustic distortions that listeners typically encounter in everyday life, but it is unknown whether this is mirrored by a noise......-robust neural tracking of attended speech. Here we used advanced acoustic simulations to recreate real-world acoustic scenes in the laboratory. In virtual acoustic realities with varying amounts of reverberation and number of interfering talkers, listeners selectively attended to the speech stream...

Grammatical Planning Units during Real-Time Sentence Production in Speakers with Agrammatic Aphasia and Healthy Speakers

Science.gov (United States)

Lee, Jiyeon; Yoshida, Masaya; Thompson, Cynthia K.

2015-01-01

Purpose: Grammatical encoding (GE) is impaired in agrammatic aphasia; however, the nature of such deficits remains unclear. We examined grammatical planning units during real-time sentence production in speakers with agrammatic aphasia and control speakers, testing two competing models of GE. We queried whether speakers with agrammatic aphasia…
The 2016 NIST Speaker Recognition Evaluation

Science.gov (United States)

2017-08-20

impact on system performance. Index Terms: NIST evaluation, NIST SRE, speaker detection, speaker recognition, speaker verification 1. Introduction NIST... self -reported. Second, there were two training conditions in SRE16, namely fixed and open. In the fixed training condition, par- ticipants were only
Arctic Visiting Speakers Series (AVS)

Science.gov (United States)

Fox, S. E.; Griswold, J.

2011-12-01

The Arctic Visiting Speakers (AVS) Series funds researchers and other arctic experts to travel and share their knowledge in communities where they might not otherwise connect. Speakers cover a wide range of arctic research topics and can address a variety of audiences including K-12 students, graduate and undergraduate students, and the general public. Host applications are accepted on an on-going basis, depending on funding availability. Applications need to be submitted at least 1 month prior to the expected tour dates. Interested hosts can choose speakers from an online Speakers Bureau or invite a speaker of their choice. Preference is given to individuals and organizations to host speakers that reach a broad audience and the general public. AVS tours are encouraged to span several days, allowing ample time for interactions with faculty, students, local media, and community members. Applications for both domestic and international visits will be considered. Applications for international visits should involve participation of more than one host organization and must include either a US-based speaker or a US-based organization. This is a small but important program that educates the public about Arctic issues. There have been 27 tours since 2007 that have impacted communities across the globe including: Gatineau, Quebec Canada; St. Petersburg, Russia; Piscataway, New Jersey; Cordova, Alaska; Nuuk, Greenland; Elizabethtown, Pennsylvania; Oslo, Norway; Inari, Finland; Borgarnes, Iceland; San Francisco, California and Wolcott, Vermont to name a few. Tours have included lectures to K-12 schools, college and university students, tribal organizations, Boy Scout troops, science center and museum patrons, and the general public. There are approximately 300 attendees enjoying each AVS tour, roughly 4100 people have been reached since 2007. The expectations for each tour are extremely manageable. Hosts must submit a schedule of events and a tour summary to be posted online
English Language Schooling, Linguistic Realities, and the Native Speaker of English in Hong Kong

Science.gov (United States)

Hansen Edwards, Jette G.

2018-01-01

The study employs a case study approach to examine the impact of educational backgrounds on nine Hong Kong tertiary students' English and Cantonese language practices and identifications as native speakers of English and Cantonese. The study employed both survey and interview data to probe the participants' English and Cantonese language use at…
A Robust Identification of the Protein Standard Bands in Two-Dimensional Electrophoresis Gel Images

Directory of Open Access Journals (Sweden)

Serackis Artūras

2017-12-01

Full Text Available The aim of the investigation presented in this paper was to develop a software-based assistant for the protein analysis workflow. The prior characterization of the unknown protein in two-dimensional electrophoresis gel images is performed according to the molecular weight and isoelectric point of each protein spot estimated from the gel image before further sequence analysis by mass spectrometry. The paper presents a method for automatic and robust identification of the protein standard band in a two-dimensional gel image. In addition, the method introduces the identification of the positions of the markers, prepared by using pre-selected proteins with known molecular mass. The robustness of the method was achieved by using special validation rules in the proposed original algorithms. In addition, a self-organizing map-based decision support algorithm is proposed, which takes Gabor coefficients as image features and searches for the differences in preselected vertical image bars. The experimental investigation proved the good performance of the new algorithms included into the proposed method. The detection of the protein standard markers works without modification of algorithm parameters on two-dimensional gel images obtained by using different staining and destaining procedures, which results in different average levels of intensity in the images.
Robust volcano plot: identification of differential metabolites in the presence of outliers.

Science.gov (United States)

Kumar, Nishith; Hoque, Md Aminul; Sugimoto, Masahiro

2018-04-11

The identification of differential metabolites in metabolomics is still a big challenge and plays a prominent role in metabolomics data analyses. Metabolomics datasets often contain outliers because of analytical, experimental, and biological ambiguity, but the currently available differential metabolite identification techniques are sensitive to outliers. We propose a kernel weight based outlier-robust volcano plot for identifying differential metabolites from noisy metabolomics datasets. Two numerical experiments are used to evaluate the performance of the proposed technique against nine existing techniques, including the t-test and the Kruskal-Wallis test. Artificially generated data with outliers reveal that the proposed method results in a lower misclassification error rate and a greater area under the receiver operating characteristic curve compared with existing methods. An experimentally measured breast cancer dataset to which outliers were artificially added reveals that our proposed method produces only two non-overlapping differential metabolites whereas the other nine methods produced between seven and 57 non-overlapping differential metabolites. Our data analyses show that the performance of the proposed differential metabolite identification technique is better than that of existing methods. Thus, the proposed method can contribute to analysis of metabolomics data with outliers. The R package and user manual of the proposed method are available at https://github.com/nishithkumarpaul/Rvolcano .
Robust identification of transcriptional regulatory networks using a Gibbs sampler on outlier sum statistic.

Science.gov (United States)

Gu, Jinghua; Xuan, Jianhua; Riggins, Rebecca B; Chen, Li; Wang, Yue; Clarke, Robert

2012-08-01

Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive 'noise' in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context. In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm. xuan@vt.edu Supplementary data are available at Bioinformatics online.
English Speakers Attend More Strongly than Spanish Speakers to Manner of Motion when Classifying Novel Objects and Events

Science.gov (United States)

Kersten, Alan W.; Meissner, Christian A.; Lechuga, Julia; Schwartz, Bennett L.; Albrechtsen, Justin S.; Iglesias, Adam

2010-01-01

Three experiments provide evidence that the conceptualization of moving objects and events is influenced by one's native language, consistent with linguistic relativity theory. Monolingual English speakers and bilingual Spanish/English speakers tested in an English-speaking context performed better than monolingual Spanish speakers and bilingual…
Speaker's voice as a memory cue.

Science.gov (United States)

Campeanu, Sandra; Craik, Fergus I M; Alain, Claude

2015-02-01

Speaker's voice occupies a central role as the cornerstone of auditory social interaction. Here, we review the evidence suggesting that speaker's voice constitutes an integral context cue in auditory memory. Investigation into the nature of voice representation as a memory cue is essential to understanding auditory memory and the neural correlates which underlie it. Evidence from behavioral and electrophysiological studies suggest that while specific voice reinstatement (i.e., same speaker) often appears to facilitate word memory even without attention to voice at study, the presence of a partial benefit of similar voices between study and test is less clear. In terms of explicit memory experiments utilizing unfamiliar voices, encoding methods appear to play a pivotal role. Voice congruency effects have been found when voice is specifically attended at study (i.e., when relatively shallow, perceptual encoding takes place). These behavioral findings coincide with neural indices of memory performance such as the parietal old/new recollection effect and the late right frontal effect. The former distinguishes between correctly identified old words and correctly identified new words, and reflects voice congruency only when voice is attended at study. Characterization of the latter likely depends upon voice memory, rather than word memory. There is also evidence to suggest that voice effects can be found in implicit memory paradigms. However, the presence of voice effects appears to depend greatly on the task employed. Using a word identification task, perceptual similarity between study and test conditions is, like for explicit memory tests, crucial. In addition, the type of noise employed appears to have a differential effect. While voice effects have been observed when white noise is used at both study and test, using multi-talker babble does not confer the same results. In terms of neuroimaging research modulations, characterization of an implicit memory effect
Utterance Verification for Text-Dependent Speaker Recognition

DEFF Research Database (Denmark)

Kinnunen, Tomi; Sahidullah, Md; Kukanov, Ivan

2016-01-01

Text-dependent automatic speaker verification naturally calls for the simultaneous verification of speaker identity and spoken content. These two tasks can be achieved with automatic speaker verification (ASV) and utterance verification (UV) technologies. While both have been addressed previously...
The Speaker Gender Gap at Critical Care Conferences.

Science.gov (United States)

Mehta, Sangeeta; Rose, Louise; Cook, Deborah; Herridge, Margaret; Owais, Sawayra; Metaxa, Victoria

2018-06-01

To review women's participation as faculty at five critical care conferences over 7 years. Retrospective analysis of five scientific programs to identify the proportion of females and each speaker's profession based on conference conveners, program documents, or internet research. Three international (European Society of Intensive Care Medicine, International Symposium on Intensive Care and Emergency Medicine, Society of Critical Care Medicine) and two national (Critical Care Canada Forum, U.K. Intensive Care Society State of the Art Meeting) annual critical care conferences held between 2010 and 2016. Female faculty speakers. None. Male speakers outnumbered female speakers at all five conferences, in all 7 years. Overall, women represented 5-31% of speakers, and female physicians represented 5-26% of speakers. Nursing and allied health professional faculty represented 0-25% of speakers; in general, more than 50% of allied health professionals were women. Over the 7 years, Society of Critical Care Medicine had the highest representation of female (27% overall) and nursing/allied health professional (16-25%) speakers; notably, male physicians substantially outnumbered female physicians in all years (62-70% vs 10-19%, respectively). Women's representation on conference program committees ranged from 0% to 40%, with Society of Critical Care Medicine having the highest representation of women (26-40%). The female proportions of speakers, physician speakers, and program committee members increased significantly over time at the Society of Critical Care Medicine and U.K. Intensive Care Society State of the Art Meeting conferences (p gap at critical care conferences, with male faculty outnumbering female faculty. This gap is more marked among physician speakers than those speakers representing nursing and allied health professionals. Several organizational strategies can address this gender gap.
Brain Plasticity in Speech Training in Native English Speakers Learning Mandarin Tones

Science.gov (United States)

Heinzen, Christina Carolyn

The current study employed behavioral and event-related potential (ERP) measures to investigate brain plasticity associated with second-language (L2) phonetic learning based on an adaptive computer training program. The program utilized the acoustic characteristics of Infant-Directed Speech (IDS) to train monolingual American English-speaking listeners to perceive Mandarin lexical tones. Behavioral identification and discrimination tasks were conducted using naturally recorded speech, carefully controlled synthetic speech, and non-speech control stimuli. The ERP experiments were conducted with selected synthetic speech stimuli in a passive listening oddball paradigm. Identical pre- and post- tests were administered on nine adult listeners, who completed two-to-three hours of perceptual training. The perceptual training sessions used pair-wise lexical tone identification, and progressed through seven levels of difficulty for each tone pair. The levels of difficulty included progression in speaker variability from one to four speakers and progression through four levels of acoustic exaggeration of duration, pitch range, and pitch contour. Behavioral results for the natural speech stimuli revealed significant training-induced improvement in identification of Tones 1, 3, and 4. Improvements in identification of Tone 4 generalized to novel stimuli as well. Additionally, comparison between discrimination of across-category and within-category stimulus pairs taken from a synthetic continuum revealed a training-induced shift toward more native-like categorical perception of the Mandarin lexical tones. Analysis of the Mismatch Negativity (MMN) responses in the ERP data revealed increased amplitude and decreased latency for pre-attentive processing of across-category discrimination as a result of training. There were also laterality changes in the MMN responses to the non-speech control stimuli, which could reflect reallocation of brain resources in processing pitch patterns
Physiological responses at short distances from a parametric speaker

Directory of Open Access Journals (Sweden)

Lee Soomin

2012-06-01

Full Text Available Abstract In recent years, parametric speakers have been used in various circumstances. In our previous studies, we verified that the physiological burden of the sound of parametric speaker set at 2.6 m from the subjects was lower than that of the general speaker. However, nothing has yet been demonstrated about the effects of the sound of a parametric speaker at the shorter distance between parametric speakers the human body. Therefore, we studied this effect on physiological functions and task performance. Nine male subjects participated in this study. They completed three consecutive sessions: a 20-minute quiet period as a baseline, a 30-minute mental task period with general speakers or parametric speakers, and a 20-minute recovery period. We measured electrocardiogram (ECG photoplethysmogram (PTG, electroencephalogram (EEG, systolic and diastolic blood pressure. Four experiments, one with a speaker condition (general speaker and parametric speaker, the other with a distance condition (0.3 m and 1.0 m, were conducted respectively at the same time of day on separate days. To examine the effects of the speaker and distance, three-way repeated measures ANOVA (speaker factor x distance factor x time factor were conducted. In conclusion, we found that the physiological responses were not significantly different between the speaker condition and the distance condition. Meanwhile, it was shown that the physiological burdens increased with progress in time independently of speaker condition and distance condition. In summary, the effects of the parametric speaker at the 2.6 m distance were not obtained at the distance of 1 m or less.
Performance of wavelet analysis and neural networks for pathological voices identification

Science.gov (United States)

Salhi, Lotfi; Talbi, Mourad; Abid, Sabeur; Cherif, Adnane

2011-09-01

Within the medical environment, diverse techniques exist to assess the state of the voice of the patient. The inspection technique is inconvenient for a number of reasons, such as its high cost, the duration of the inspection, and above all, the fact that it is an invasive technique. This study focuses on a robust, rapid and accurate system for automatic identification of pathological voices. This system employs non-invasive, non-expensive and fully automated method based on hybrid approach: wavelet transform analysis and neural network classifier. First, we present the results obtained in our previous study while using classic feature parameters. These results allow visual identification of pathological voices. Second, quantified parameters drifting from the wavelet analysis are proposed to characterise the speech sample. On the other hand, a system of multilayer neural networks (MNNs) has been developed which carries out the automatic detection of pathological voices. The developed method was evaluated using voice database composed of recorded voice samples (continuous speech) from normophonic or dysphonic speakers. The dysphonic speakers were patients of a National Hospital 'RABTA' of Tunis Tunisia and a University Hospital in Brussels, Belgium. Experimental results indicate a success rate ranging between 75% and 98.61% for discrimination of normal and pathological voices using the proposed parameters and neural network classifier. We also compared the average classification rate based on the MNN, Gaussian mixture model and support vector machines.
Audiovisual perceptual learning with multiple speakers.

Science.gov (United States)

Mitchel, Aaron D; Gerfen, Chip; Weiss, Daniel J

2016-05-01

One challenge for speech perception is between-speaker variability in the acoustic parameters of speech. For example, the same phoneme (e.g. the vowel in "cat") may have substantially different acoustic properties when produced by two different speakers and yet the listener must be able to interpret these disparate stimuli as equivalent. Perceptual tuning, the use of contextual information to adjust phonemic representations, may be one mechanism that helps listeners overcome obstacles they face due to this variability during speech perception. Here we test whether visual contextual cues to speaker identity may facilitate the formation and maintenance of distributional representations for individual speakers, allowing listeners to adjust phoneme boundaries in a speaker-specific manner. We familiarized participants to an audiovisual continuum between /aba/ and /ada/. During familiarization, the "b-face" mouthed /aba/ when an ambiguous token was played, while the "D-face" mouthed /ada/. At test, the same ambiguous token was more likely to be identified as /aba/ when paired with a stilled image of the "b-face" than with an image of the "D-face." This was not the case in the control condition when the two faces were paired equally with the ambiguous token. Together, these results suggest that listeners may form speaker-specific phonemic representations using facial identity cues.
Speakers' choice of frame in binary choice

Directory of Open Access Journals (Sweden)

Marc van Buiten

2009-02-01

Full Text Available A distinction is proposed between extit{recommending for} preferred choice options and extit{recommending against} non-preferred choice options. In binary choice, both recommendation modes are logically, though not psychologically, equivalent. We report empirical evidence showing that speakers recommending for preferred options predominantly select positive frames, which are less common when speakers recommend against non-preferred options. In addition, option attractiveness is shown to affect speakers' choice of frame, and adoption of recommendation mode. The results are interpreted in terms of three compatibility effects, (i extit{recommendation mode---valence framing compatibility}: speakers' preference for positive framing is enhanced under extit{recommending for} and diminished under extit{recommending against} instructions, (ii extit{option attractiveness---valence framing compatibility}: speakers' preference for positive framing is more pronounced for attractive than for unattractive options, and (iii extit{recommendation mode---option attractiveness compatibility}: speakers are more likely to adopt a extit{recommending for} approach for attractive than for unattractive binary choice pairs.
Identification for Control

DEFF Research Database (Denmark)

Tøffner-Clausen, S.

1995-01-01

Identification of model error bounds for robust control design has recently achieved much attention.......Identification of model error bounds for robust control design has recently achieved much attention....
Speaker-dependent Dictionary-based Speech Enhancement for Text-Dependent Speaker Verification

DEFF Research Database (Denmark)

Thomsen, Nicolai Bæk; Thomsen, Dennis Alexander Lehmann; Tan, Zheng-Hua

2016-01-01

not perform well in this setting. In this work we compare the performance of different noise reduction methods under different noise conditions in terms of speaker verification when the text is known and the system is trained on clean data (mis-matched conditions). We furthermore propose a new approach based......The problem of text-dependent speaker verification under noisy conditions is becoming ever more relevant, due to increased usage for authentication in real-world applications. Classical methods for noise reduction such as spectral subtraction and Wiener filtering introduce distortion and do...... on dictionary-based noise reduction and compare it to the baseline methods....
Apology Strategy in English By Native Speaker

Directory of Open Access Journals (Sweden)

Mezia Kemala Sari

2016-05-01

Full Text Available This research discussed apology strategies in English by native speaker. This descriptive study was presented within the framework of Pragmatics based on the forms of strategies due to the coding manual as found in CCSARP (Cross-Cultural Speech Acts Realization Project.The goals of this study were to describe the apology strategies in English by native speaker and identify the influencing factors of it. Data were collected through the use of the questionnaire in the form of Discourse Completion Test, which was distributed to 30 native speakers. Data were classified based on the degree of familiarity and the social distance between speaker and hearer and then the data of native will be separated and classified by the type of strategies in coding manual. The results of this study are the pattern of apology strategies of native speaker brief with the pattern that potentially occurs IFID plus Offer of repair plus Taking on responsibility. While Alerters, Explanation and Downgrading appear with less number of percentage. Then, the factors that influence the apology utterance by native speakers are the social situation, the degree of familiarity and degree of the offence which more complicated the mistake tend to produce the most complex utterances by the speaker.
The invisible minority: revisiting the debate on foreign-accented speakers and upward mobility in the workplace.

Science.gov (United States)

Akomolafe, Soji

2013-01-01

Of some of the major types of discrimination, the one that gets the least attention is national origin discrimination and in particular, accent discrimination, especially when it comes to upward mobility in the workplace. Yet, unlike other forms of discrimination, accent discrimination is rarely a subject of any robust public debate. This paper is a modest attempt to help establish a framework for understanding the relative neglect to which the discourse on accent discrimination has been subjected vis-a-vis the overall national debate on diversity. Hopefully, in the process, it will stimulate a more robust conversation on the plight of foreign-accented speakers.

"Necesita una vacuna": what Spanish-speakers want in text-message immunization reminders.

Science.gov (United States)

Ahlers-Schmidt, Carolyn R; Chesser, Amy; Brannon, Jennifer; Lopez, Venessa; Shah-Haque, Sapna; Williams, Katherine; Hart, Traci

2013-08-01

Appointment reminders help parents deal with complex immunization schedules. Preferred content of text-message reminders has been identified for English-speakers. Spanish-speaking parents of children under three years old were recruited to develop Spanish text-message immunization reminders. Structured interviews included questions about demographic characteristics, use of technology, and willingness to receive text reminders. Each participant was assigned to one user-centered design (UCD) test: card sort, needs analysis or comprehension testing. Respondents (N=54) were female (70%) and averaged 27 years of age (SD=7). A card sort of 20 immunization-related statements resulted in identification of seven pieces of critical information, which were compiled into eight example texts. These texts were ranked in the needs assessment and the top two were assessed for comprehension. All participants were able to understand the content and describe intention to act. Utilizing UCD testing, Spanish-speakers identified short, specific text content that differed from preferred content of English-speaking parents.
Who spoke when? Audio-based speaker location estimation for diarization

NARCIS (Netherlands)

Dadvar, M.

2011-01-01

Speaker diarization is the process which detects active speakers and groups those speech signals which has been uttered by the same speaker. Generally we can find two main applications for speaker diarization. Automatic Speech Recognition systems make use of the speaker homogeneous clusters to adapt
Improving Speaker Recognition by Biometric Voice Deconstruction

Science.gov (United States)

Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro

2015-01-01

Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions. PMID:26442245
A fully robust PARAFAC method for analyzing fluorescence data

DEFF Research Database (Denmark)

Engelen, Sanne; Frosch, Stina; Jørgensen, Bo

2009-01-01

and Rayleigh scatter. Recently, a robust PARAFAC method that circumvents the harmful effects of outlying samples has been developed. For removing the scatter effects on the final PARAFAC model, different techniques exist. Newly, an automated scatter identification tool has been constructed. However......, there still exists no robust method for handling fluorescence data encountering both outlying EEM landscapes and scatter. In this paper, we present an iterative algorithm where the robust PARAFAC method and the scatter identification tool are alternately performed. A fully automated robust PARAFAC method...
Speaker-specific variability of phoneme durations

CSIR Research Space (South Africa)

Van Heerden, CJ

2007-11-01

Full Text Available The durations of phonemes varies for different speakers. To this end, the correlations between phonemes across different speakers are studied and a novel approach to predict unknown phoneme durations from the values of known phoneme durations for a...
Unsupervised Speaker Change Detection for Broadcast News Segmentation

DEFF Research Database (Denmark)

Jørgensen, Kasper Winther; Mølgaard, Lasse Lohilahti; Hansen, Lars Kai

2006-01-01

This paper presents a speaker change detection system for news broadcast segmentation based on a vector quantization (VQ) approach. The system does not make any assumption about the number of speakers or speaker identity. The system uses mel frequency cepstral coefficients and change detection...
A New Database for Speaker Recognition

DEFF Research Database (Denmark)

Feng, Ling; Hansen, Lars Kai

2005-01-01

In this paper we discuss properties of speech databases used for speaker recognition research and evaluation, and we characterize some popular standard databases. The paper presents a new database called ELSDSR dedicated to speaker recognition applications. The main characteristics of this database...
Speaker Segmentation and Clustering Using Gender Information

Science.gov (United States)

2006-02-01

used in the first stages of segmentation forder information in the clustering of the opposite-gender speaker diarization of news broadcasts. files, the...AFRL-HE-WP-TP-2006-0026 AIR FORCE RESEARCH LABORATORY Speaker Segmentation and Clustering Using Gender Information Brian M. Ore General Dynamics...COVERED (From - To) February 2006 ProceedinLgs 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Speaker Segmentation and Clustering Using Gender Information 5b
GPU-accelerated automatic identification of robust beam setups for proton and carbon-ion radiotherapy

International Nuclear Information System (INIS)

Ammazzalorso, F; Jelen, U; Bednarz, T

2014-01-01

We demonstrate acceleration on graphic processing units (GPU) of automatic identification of robust particle therapy beam setups, minimizing negative dosimetric effects of Bragg peak displacement caused by treatment-time patient positioning errors. Our particle therapy research toolkit, RobuR, was extended with OpenCL support and used to implement calculation on GPU of the Port Homogeneity Index, a metric scoring irradiation port robustness through analysis of tissue density patterns prior to dose optimization and computation. Results were benchmarked against an independent native CPU implementation. Numerical results were in agreement between the GPU implementation and native CPU implementation. For 10 skull base cases, the GPU-accelerated implementation was employed to select beam setups for proton and carbon ion treatment plans, which proved to be dosimetrically robust, when recomputed in presence of various simulated positioning errors. From the point of view of performance, average running time on the GPU decreased by at least one order of magnitude compared to the CPU, rendering the GPU-accelerated analysis a feasible step in a clinical treatment planning interactive session. In conclusion, selection of robust particle therapy beam setups can be effectively accelerated on a GPU and become an unintrusive part of the particle therapy treatment planning workflow. Additionally, the speed gain opens new usage scenarios, like interactive analysis manipulation (e.g. constraining of some setup) and re-execution. Finally, through OpenCL portable parallelism, the new implementation is suitable also for CPU-only use, taking advantage of multiple cores, and can potentially exploit types of accelerators other than GPUs.
GPU-accelerated automatic identification of robust beam setups for proton and carbon-ion radiotherapy

Science.gov (United States)

Ammazzalorso, F.; Bednarz, T.; Jelen, U.

2014-03-01

We demonstrate acceleration on graphic processing units (GPU) of automatic identification of robust particle therapy beam setups, minimizing negative dosimetric effects of Bragg peak displacement caused by treatment-time patient positioning errors. Our particle therapy research toolkit, RobuR, was extended with OpenCL support and used to implement calculation on GPU of the Port Homogeneity Index, a metric scoring irradiation port robustness through analysis of tissue density patterns prior to dose optimization and computation. Results were benchmarked against an independent native CPU implementation. Numerical results were in agreement between the GPU implementation and native CPU implementation. For 10 skull base cases, the GPU-accelerated implementation was employed to select beam setups for proton and carbon ion treatment plans, which proved to be dosimetrically robust, when recomputed in presence of various simulated positioning errors. From the point of view of performance, average running time on the GPU decreased by at least one order of magnitude compared to the CPU, rendering the GPU-accelerated analysis a feasible step in a clinical treatment planning interactive session. In conclusion, selection of robust particle therapy beam setups can be effectively accelerated on a GPU and become an unintrusive part of the particle therapy treatment planning workflow. Additionally, the speed gain opens new usage scenarios, like interactive analysis manipulation (e.g. constraining of some setup) and re-execution. Finally, through OpenCL portable parallelism, the new implementation is suitable also for CPU-only use, taking advantage of multiple cores, and can potentially exploit types of accelerators other than GPUs.
(En)countering native-speakerism global perspectives

CERN Document Server

Holliday, Adrian; Swan, Anne

2015-01-01

The book addresses the issue of native-speakerism, an ideology based on the assumption that 'native speakers' of English have a special claim to the language itself, through critical qualitative studies of the lived experiences of practising teachers and students in a range of scenarios.
A system of automatic speaker recognition on a minicomputer

International Nuclear Information System (INIS)

El Chafei, Cherif

1978-01-01

This study describes a system of automatic speaker recognition using the pitch of the voice. The pre-treatment consists in the extraction of the speakers' discriminating characteristics taken from the pitch. The programme of recognition gives, firstly, a preselection and then calculates the distance between the speaker's characteristics to be recognized and those of the speakers already recorded. An experience of recognition has been realized. It has been undertaken with 15 speakers and included 566 tests spread over an intermittent period of four months. The discriminating characteristics used offer several interesting qualities. The algorithms concerning the measure of the characteristics on one hand, the speakers' classification on the other hand, are simple. The results obtained in real time with a minicomputer are satisfactory. Furthermore they probably could be improved if we considered other speaker's discriminating characteristics but this was unfortunately not in our possibilities. (author) [fr
Speaker diarization system using HXLPS and deep neural network

Directory of Open Access Journals (Sweden)

V. Subba Ramaiah

2018-03-01

Full Text Available In general, speaker diarization is defined as the process of segmenting the input speech signal and grouped the homogenous regions with regard to the speaker identity. The main idea behind this system is that it is able to discriminate the speaker signal by assigning the label of the each speaker signal. Due to rapid growth of broadcasting and meeting, the speaker diarization is burdensome to enhance the readability of the speech transcription. In order to solve this issue, Holoentropy with the eXtended Linear Prediction using autocorrelation Snapshot (HXLPS and deep neural network (DNN is proposed for the speaker diarization system. The HXLPS extraction method is newly developed by incorporating the Holoentropy with the XLPS. Once we attain the features, the speech and non-speech signals are detected by the Voice Activity Detection (VAD method. Then, i-vector representation of every segmented signal is obtained using Universal Background Model (UBM model. Consequently, DNN is utilized to assign the label for the speaker signal which is then clustered according to the speaker label. The performance is analysed using the evaluation metrics, such as tracking distance, false alarm rate and diarization error rate. The outcome of the proposed method ensures the better diarization performance by achieving the lower DER of 1.36% based on lambda value and DER of 2.23% depends on the frame length. Keywords: Speaker diarization, HXLPS feature extraction, Voice activity detection, Deep neural network, Speaker clustering, Diarization Error Rate (DER
Robust synchronization of delayed neural networks based on adaptive control and parameters identification

International Nuclear Information System (INIS)

Zhou Jin; Chen Tianping; Xiang Lan

2006-01-01

This paper investigates synchronization dynamics of delayed neural networks with all the parameters unknown. By combining the adaptive control and linear feedback with the updated law, some simple yet generic criteria for determining the robust synchronization based on the parameters identification of uncertain chaotic delayed neural networks are derived by using the invariance principle of functional differential equations. It is shown that the approaches developed here further extend the ideas and techniques presented in recent literature, and they are also simple to implement in practice. Furthermore, the theoretical results are applied to a typical chaotic delayed Hopfied neural networks, and numerical simulation also demonstrate the effectiveness and feasibility of the proposed technique
Real Time Recognition Of Speakers From Internet Audio Stream

Directory of Open Access Journals (Sweden)

Weychan Radoslaw

2015-09-01

Full Text Available In this paper we present an automatic speaker recognition technique with the use of the Internet radio lossy (encoded speech signal streams. We show an influence of the audio encoder (e.g., bitrate on the speaker model quality. The model of each speaker was calculated with the use of the Gaussian mixture model (GMM approach. Both the speaker recognition and the further analysis were realized with the use of short utterances to facilitate real time processing. The neighborhoods of the speaker models were analyzed with the use of the ISOMAP algorithm. The experiments were based on four 1-hour public debates with 7–8 speakers (including the moderator, acquired from the Polish radio Internet services. The presented software was developed with the MATLAB environment.
Accent Attribution in Speakers with Foreign Accent Syndrome

Science.gov (United States)

Verhoeven, Jo; De Pauw, Guy; Pettinato, Michele; Hirson, Allen; Van Borsel, John; Marien, Peter

2013-01-01

Purpose: The main aim of this experiment was to investigate the perception of Foreign Accent Syndrome in comparison to speakers with an authentic foreign accent. Method: Three groups of listeners attributed accents to conversational speech samples of 5 FAS speakers which were embedded amongst those of 5 speakers with a real foreign accent and 5…
Comparison of Diarization Tools for Building Speaker Database

Directory of Open Access Journals (Sweden)

Eva Kiktova

2015-01-01

Full Text Available This paper compares open source diarization toolkits (LIUM, DiarTK, ALIZE-Lia_Ral, which were designed for extraction of speaker identity from audio records without any prior information about the analysed data. The comparative study of used diarization tools was performed for three different types of analysed data (broadcast news - BN and TV shows. Corresponding values of achieved DER measure are presented here. The automatic speaker diarization system developed by LIUM was able to identified speech segments belonging to speakers at very good level. Its segmentation outputs can be used to build a speaker database.
An introduction to application-independent evaluation of speaker recognition systems

NARCIS (Netherlands)

Leeuwen, D.A. van; Brümmer, N.

2007-01-01

In the evaluation of speaker recognition systems - an important part of speaker classification [1], the trade-off between missed speakers and false alarms has always been an important diagnostic tool. NIST has defined the task of speaker detection with the associated Detection Cost Function (DCF) to
Role of Speaker Cues in Attention Inference

Directory of Open Access Journals (Sweden)

Jin Joo Lee

2017-10-01

Full Text Available Current state-of-the-art approaches to emotion recognition primarily focus on modeling the nonverbal expressions of the sole individual without reference to contextual elements such as the co-presence of the partner. In this paper, we demonstrate that the accurate inference of listeners’ social-emotional state of attention depends on accounting for the nonverbal behaviors of their storytelling partner, namely their speaker cues. To gain a deeper understanding of the role of speaker cues in attention inference, we conduct investigations into real-world interactions of children (5–6 years old storytelling with their peers. Through in-depth analysis of human–human interaction data, we first identify nonverbal speaker cues (i.e., backchannel-inviting cues and listener responses (i.e., backchannel feedback. We then demonstrate how speaker cues can modify the interpretation of attention-related backchannels as well as serve as a means to regulate the responsiveness of listeners. We discuss the design implications of our findings toward our primary goal of developing attention recognition models for storytelling robots, and we argue that social robots can proactively use speaker cues to form more accurate inferences about the attentive state of their human partners.
Speaker and Observer Perceptions of Physical Tension during Stuttering.

Science.gov (United States)

Tichenor, Seth; Leslie, Paula; Shaiman, Susan; Yaruss, J Scott

2017-01-01

Speech-language pathologists routinely assess physical tension during evaluation of those who stutter. If speakers experience tension that is not visible to clinicians, then judgments of severity may be inaccurate. This study addressed this potential discrepancy by comparing judgments of tension by people who stutter and expert clinicians to determine if clinicians could accurately identify the speakers' experience of physical tension. Ten adults who stutter were audio-video recorded in two speaking samples. Two board-certified specialists in fluency evaluated the samples using the Stuttering Severity Instrument-4 and a checklist adapted for this study. Speakers rated their tension using the same forms, and then discussed their experiences in a qualitative interview so that themes related to physical tension could be identified. The degree of tension reported by speakers was higher than that observed by specialists. Tension in parts of the body that were less visible to the observer (chest, abdomen, throat) was reported more by speakers than by specialists. The thematic analysis revealed that speakers' experience of tension changes over time and that these changes may be related to speakers' acceptance of stuttering. The lack of agreement between speaker and specialist perceptions of tension suggests that using self-reports is a necessary component for supporting the accurate diagnosis of tension in stuttering. © 2018 S. Karger AG, Basel.

Forensic speaker recognition

NARCIS (Netherlands)

Meuwly, Didier

2013-01-01

The aim of forensic speaker recognition is to establish links between individuals and criminal activities, through audio speech recordings. This field is multidisciplinary, combining predominantly phonetics, linguistics, speech signal processing, and forensic statistics. On these bases, expert-based
Linear array of photodiodes to track a human speaker for video recording

International Nuclear Information System (INIS)

DeTone, D; Neal, H; Lougheed, R

2012-01-01

Communication and collaboration using stored digital media has garnered more interest by many areas of business, government and education in recent years. This is due primarily to improvements in the quality of cameras and speed of computers. An advantage of digital media is that it can serve as an effective alternative when physical interaction is not possible. Video recordings that allow for viewers to discern a presenter's facial features, lips and hand motions are more effective than videos that do not. To attain this, one must maintain a video capture in which the speaker occupies a significant portion of the captured pixels. However, camera operators are costly, and often do an imperfect job of tracking presenters in unrehearsed situations. This creates motivation for a robust, automated system that directs a video camera to follow a presenter as he or she walks anywhere in the front of a lecture hall or large conference room. Such a system is presented. The system consists of a commercial, off-the-shelf pan/tilt/zoom (PTZ) color video camera, a necklace of infrared LEDs and a linear photodiode array detector. Electronic output from the photodiode array is processed to generate the location of the LED necklace, which is worn by a human speaker. The computer controls the video camera movements to record video of the speaker. The speaker's vertical position and depth are assumed to remain relatively constant– the video camera is sent only panning (horizontal) movement commands. The LED necklace is flashed at 70Hz at a 50% duty cycle to provide noise-filtering capability. The benefit to using a photodiode array versus a standard video camera is its higher frame rate (4kHz vs. 60Hz). The higher frame rate allows for the filtering of infrared noise such as sunlight and indoor lighting–a capability absent from other tracking technologies. The system has been tested in a large lecture hall and is shown to be effective.
Linear array of photodiodes to track a human speaker for video recording

Science.gov (United States)

DeTone, D.; Neal, H.; Lougheed, R.

2012-12-01

Communication and collaboration using stored digital media has garnered more interest by many areas of business, government and education in recent years. This is due primarily to improvements in the quality of cameras and speed of computers. An advantage of digital media is that it can serve as an effective alternative when physical interaction is not possible. Video recordings that allow for viewers to discern a presenter's facial features, lips and hand motions are more effective than videos that do not. To attain this, one must maintain a video capture in which the speaker occupies a significant portion of the captured pixels. However, camera operators are costly, and often do an imperfect job of tracking presenters in unrehearsed situations. This creates motivation for a robust, automated system that directs a video camera to follow a presenter as he or she walks anywhere in the front of a lecture hall or large conference room. Such a system is presented. The system consists of a commercial, off-the-shelf pan/tilt/zoom (PTZ) color video camera, a necklace of infrared LEDs and a linear photodiode array detector. Electronic output from the photodiode array is processed to generate the location of the LED necklace, which is worn by a human speaker. The computer controls the video camera movements to record video of the speaker. The speaker's vertical position and depth are assumed to remain relatively constant- the video camera is sent only panning (horizontal) movement commands. The LED necklace is flashed at 70Hz at a 50% duty cycle to provide noise-filtering capability. The benefit to using a photodiode array versus a standard video camera is its higher frame rate (4kHz vs. 60Hz). The higher frame rate allows for the filtering of infrared noise such as sunlight and indoor lighting-a capability absent from other tracking technologies. The system has been tested in a large lecture hall and is shown to be effective.
Data-Model Relationship in Text-Independent Speaker Recognition

Directory of Open Access Journals (Sweden)

Stapert Robert

2005-01-01

Full Text Available Text-independent speaker recognition systems such as those based on Gaussian mixture models (GMMs do not include time sequence information (TSI within the model itself. The level of importance of TSI in speaker recognition is an interesting question and one addressed in this paper. Recent works has shown that the utilisation of higher-level information such as idiolect, pronunciation, and prosodics can be useful in reducing speaker recognition error rates. In accordance with these developments, the aim of this paper is to show that as more data becomes available, the basic GMM can be enhanced by utilising TSI, even in a text-independent mode. This paper presents experimental work incorporating TSI into the conventional GMM. The resulting system, known as the segmental mixture model (SMM, embeds dynamic time warping (DTW into a GMM framework. Results are presented on the 2000-speaker SpeechDat Welsh database which show improved speaker recognition performance with the SMM.
The effect of L1 prosodic backgrounds of Cantonese and Japanese speakers on the perception of Mandarin tones after training

Science.gov (United States)

So, Connie K.

2005-04-01

The present study investigated to what extent ones' L1 prosodic backgrounds affect their learning of a new tonal system. The question as to whether native speakers of a tone language perform differently from those of a pitch accent language will be addressed. Twenty native speakers of Hong Kong Cantonese (a tone language) and Japanese (a pitch accent language) were assigned to two groups. All of them had had no prior knowledge of Mandarin, and had never received any form of musical training before they participated in the study. Their performance of the identification of Mandarin tones before and after a short-term training was compared. Analysis of listeners' tonal confusions in the pretest, posttest, and generalization tests revealed that both Cantonese and Japanese listeners had more confusion for two contrastive tone pairs: Tone 1-Tone 4, and Tone 2-Tone 3. Moreover, Cantonese speakers consistently had greater difficulty than Japanese speakers in distinguishing the tones in each pair. These imply that listeners L1 prosodic backgrounds are at work during the process of learning a new tonal system. The findings will be further discussed in terms of the Perceptual Assimilation Model (Best, 1995). [Work supported by SSHRC.
Inferring speaker attributes in adductor spasmodic dysphonia: ratings from unfamiliar listeners.

Science.gov (United States)

Isetti, Derek; Xuereb, Linnea; Eadie, Tanya L

2014-05-01

To determine whether unfamiliar listeners' perceptions of speakers with adductor spasmodic dysphonia (ADSD) differ from control speakers on the parameters of relative age, confidence, tearfulness, and vocal effort and are related to speaker-rated vocal effort or voice-specific quality of life. Twenty speakers with ADSD (including 6 speakers with ADSD plus tremor) and 20 age- and sex-matched controls provided speech recordings, completed a voice-specific quality-of-life instrument (Voice Handicap Index; Jacobson et al., 1997), and rated their own vocal effort. Twenty listeners evaluated speech samples for relative age, confidence, tearfulness, and vocal effort using rating scales. Listeners judged speakers with ADSD as sounding significantly older, less confident, more tearful, and more effortful than control speakers (p < .01). Increased vocal effort was strongly associated with decreased speaker confidence (rs = .88-.89) and sounding more tearful (rs = .83-.85). Self-rated speaker effort was moderately related (rs = .45-.52) to listener impressions. Listeners' perceptions of confidence and tearfulness were also moderately associated with higher Voice Handicap Index scores (rs = .65-.70). Unfamiliar listeners judge speakers with ADSD more negatively than control speakers, with judgments extending beyond typical clinical measures. The results have implications for counseling and understanding the psychosocial effects of ADSD.
Speaker-dependent Multipitch Tracking Using Deep Neural Networks

Science.gov (United States)

2015-01-01

sentences spoken by each of 34 speakers (18 male, 16 female). Two male and two female speakers (No. 1, 2, 18, 20, same as [30]), denoted as MA1, MA2 ...Engineering Technical Report #12, 2015 Speaker Pairs MA1- MA2 MA1-FE1 MA1-FE2 MA2 -FE1 MA2 -FE2 FE1-FE2 E T ot al 0 10 20 30 40 50 60 70 80 Jin and Wang Hu and...Pitch 1 Estimated Pitch 2 (d) Figure 6: Multipitch tracking results on a test mixture (pbbv6n and priv3n) for the MA1- MA2 speaker pair. (a) Groundtruth
Online Matchmaking: It's Not Just for Dating Sites Anymore! Connecting the Climate Voices Science Speakers Network to Educators

Science.gov (United States)

Wegner, Kristin; Herrin, Sara; Schmidt, Cynthia

2015-01-01

Scientists play an integral role in the development of climate literacy skills - for both teachers and students alike. By partnering with local scientists, teachers can gain valuable insights into the science practices highlighted by the Next Generation Science Standards (NGSS), as well as a deeper understanding of cutting-edge scientific discoveries and local impacts of climate change. For students, connecting to local scientists can provide a relevant connection to climate science and STEM skills. Over the past two years, the Climate Voices Science Speakers Network (climatevoices.org) has grown to a robust network of nearly 400 climate science speakers across the United States. Formal and informal educators, K-12 students, and community groups connect with our speakers through our interactive map-based website and invite them to meet through face-to-face and virtual presentations, such as webinars and podcasts. But creating a common language between scientists and educators requires coaching on both sides. In this presentation, we will present the "nitty-gritty" of setting up scientist-educator collaborations, as well as the challenges and opportunities that arise from these partnerships. We will share the impact of these collaborations through case studies, including anecdotal feedback and metrics.
Request Strategies in Everyday Interactions of Persian and English Speakers

Directory of Open Access Journals (Sweden)

Shiler Yazdanfar

2016-12-01

Full Text Available Cross-cultural studies of speech acts in different linguistic contexts might have interesting implications for language researchers and practitioners. Drawing on the Speech Act Theory, the present study aimed at conducting a comparative study of request speech act in Persian and English. Specifically, the study endeavored to explore the request strategies used in daily interactions of Persian and English speakers based on directness level and supportive moves. To this end, English and Persian TV series were observed and requestive utterances were transcribed. The utterances were then categorized based on Blum-Kulka and Olshtain’s Cross-Cultural Study of Speech Act Realization Pattern (CCSARP for directness level and internal and external mitigation devises. According to the results, although speakers of both languages opted for the direct level as their most frequently used strategy in their daily interactions, the English speakers used more conventionally indirect strategies than the Persian speakers did, and the Persian speakers used more non-conventionally indirect strategies than the English speakers did. Furthermore, the analyzed data revealed the fact that American English speakers use more mitigation devices in their daily interactions with friends and family members than Persian speakers.
Speaker Reliability Guides Children's Inductive Inferences about Novel Properties

Science.gov (United States)

Kim, Sunae; Kalish, Charles W.; Harris, Paul L.

2012-01-01

Prior work shows that children can make inductive inferences about objects based on their labels rather than their appearance (Gelman, 2003). A separate line of research shows that children's trust in a speaker's label is selective. Children accept labels from a reliable speaker over an unreliable speaker (e.g., Koenig & Harris, 2005). In the…
Guest Speakers in School-Based Sexuality Education

Science.gov (United States)

McRee, Annie-Laurie; Madsen, Nikki; Eisenberg, Marla E.

2014-01-01

This study, using data from a statewide survey (n = 332), examined teachers' practices regarding the inclusion of guest speakers to cover sexuality content. More than half of teachers (58%) included guest speakers. In multivariate analyses, teachers who taught high school, had professional preparation in health education, or who received…
The Communication of Public Speaking Anxiety: Perceptions of Asian and American Speakers.

Science.gov (United States)

Martini, Marianne; And Others

1992-01-01

Finds that U.S. audiences perceive Asian speakers to have more speech anxiety than U.S. speakers, even though Asian speakers do not self-report higher anxiety levels. Confirms that speech state anxiety is not communicated effectively between speakers and audiences for Asian or U.S. speakers. (SR)
Application of Native Speaker Models for Identifying Deviations in Rhetorical Moves in Non-Native Speaker Manuscripts

Directory of Open Access Journals (Sweden)

Assef Khalili

2016-06-01

Full Text Available Introduction: Explicit teaching of generic conventions of a text genre, usually extracted from native-speaker (NS manuscripts, has long been emphasized in the teaching of Academic Writing inEnglish for Specific Purposes (henceforthESP classes, both in theory and practice. While consciousness-raising about rhetorical structure can be instrumental to non-native speakers(NNS, it has to be admitted that most works done in the field of ESP have tended to focus almost exclusively on native-speaker (NS productions, giving scant attention to non-native speaker (NNS manuscripts. That is, having outlined established norms for good writing on the basis of NS productions, few have been inclined to provide a descriptive account of NNS attempts at trying to produce a research article (RA in English. That is what we have tried to do in the present research. Methods: We randomly selected 20 RAs in dentistry and used two well-established models for results and discussion sections to try to describe the move structure of these articles and show the points of divergence from the established norms. Results: The results pointed to significant divergences that could seriously compromise the quality of an RA. Conclusion: It is believed that the insights gained on the deviations in NNS manuscripts could prove very useful in designing syllabi for ESP classes.
Speaker Clustering for a Mixture of Singing and Reading (Preprint)

Science.gov (United States)

2012-03-01

diarization [2, 3] which answers the ques- tion of ”who spoke when?” is a combination of speaker segmentation and clustering. Although it is possible to...focuses on speaker clustering, the techniques developed here can be applied to speaker diarization . For the remainder of this paper, the term ”speech...and retrieval,” Proceedings of the IEEE, vol. 88, 2000. [2] S. Tranter and D. Reynolds, “An overview of automatic speaker diarization systems,” IEEE
Human and automatic speaker recognition over telecommunication channels

CERN Document Server

Fernández Gallardo, Laura

2016-01-01

This work addresses the evaluation of the human and the automatic speaker recognition performances under different channel distortions caused by bandwidth limitation, codecs, and electro-acoustic user interfaces, among other impairments. Its main contribution is the demonstration of the benefits of communication channels of extended bandwidth, together with an insight into how speaker-specific characteristics of speech are preserved through different transmissions. It provides sufficient motivation for considering speaker recognition as a criterion for the migration from narrowband to enhanced bandwidths, such as wideband and super-wideband.
Electrophysiology of subject-verb agreement mediated by speakers' gender.

Science.gov (United States)

Hanulíková, Adriana; Carreiras, Manuel

2015-01-01

An important property of speech is that it explicitly conveys features of a speaker's identity such as age or gender. This event-related potential (ERP) study examined the effects of social information provided by a speaker's gender, i.e., the conceptual representation of gender, on subject-verb agreement. Despite numerous studies on agreement, little is known about syntactic computations generated by speaker characteristics extracted from the acoustic signal. Slovak is well suited to investigate this issue because it is a morphologically rich language in which agreement involves features for number, case, and gender. Grammaticality of a sentence can be evaluated by checking a speaker's gender as conveyed by his/her voice. We examined how conceptual information about speaker gender, which is not syntactic but rather social and pragmatic in nature, is interpreted for the computation of agreement patterns. ERP responses to verbs disagreeing with the speaker's gender (e.g., a sentence including a masculine verbal inflection spoken by a female person 'the neighbors were upset because I (∗)stoleMASC plums') elicited a larger early posterior negativity compared to correct sentences. When the agreement was purely syntactic and did not depend on the speaker's gender, a disagreement between a formally marked subject and the verb inflection (e.g., the womanFEM (∗)stoleMASC plums) resulted in a larger P600 preceded by a larger anterior negativity compared to the control sentences. This result is in line with proposals according to which the recruitment of non-syntactic information such as the gender of the speaker results in N400-like effects, while formally marked syntactic features lead to structural integration as reflected in a LAN/P600 complex.
Speakers of different languages process the visual world differently.

Science.gov (United States)

Chabal, Sarah; Marian, Viorica

2015-06-01

Language and vision are highly interactive. Here we show that people activate language when they perceive the visual world, and that this language information impacts how speakers of different languages focus their attention. For example, when searching for an item (e.g., clock) in the same visual display, English and Spanish speakers look at different objects. Whereas English speakers searching for the clock also look at a cloud, Spanish speakers searching for the clock also look at a gift, because the Spanish names for gift (regalo) and clock (reloj) overlap phonologically. These different looking patterns emerge despite an absence of direct language input, showing that linguistic information is automatically activated by visual scene processing. We conclude that the varying linguistic information available to speakers of different languages affects visual perception, leading to differences in how the visual world is processed. (c) 2015 APA, all rights reserved).
Multimodal Speaker Diarization

NARCIS (Netherlands)

Noulas, A.; Englebienne, G.; Kröse, B.J.A.

2012-01-01

We present a novel probabilistic framework that fuses information coming from the audio and video modality to perform speaker diarization. The proposed framework is a Dynamic Bayesian Network (DBN) that is an extension of a factorial Hidden Markov Model (fHMM) and models the people appearing in an
Robust matching for voice recognition

Science.gov (United States)

Higgins, Alan; Bahler, L.; Porter, J.; Blais, P.

1994-10-01

This paper describes an automated method of comparing a voice sample of an unknown individual with samples from known speakers in order to establish or verify the individual's identity. The method is based on a statistical pattern matching approach that employs a simple training procedure, requires no human intervention (transcription, work or phonetic marketing, etc.), and makes no assumptions regarding the expected form of the statistical distributions of the observations. The content of the speech material (vocabulary, grammar, etc.) is not assumed to be constrained in any way. An algorithm is described which incorporates frame pruning and channel equalization processes designed to achieve robust performance with reasonable computational resources. An experimental implementation demonstrating the feasibility of the concept is described.
Studies on inter-speaker variability in speech and its application in ...

Indian Academy of Sciences (India)

tic representation of vowel realizations by different speakers. ... in regional background, education level and gender of speaker. A more ...... formal maps such as bilinear transform and its generalizations for speaker normalization. Since.

Content-specific coordination of listeners' to speakers' EEG during communication.

Science.gov (United States)

Kuhlen, Anna K; Allefeld, Carsten; Haynes, John-Dylan

2012-01-01

Cognitive neuroscience has recently begun to extend its focus from the isolated individual mind to two or more individuals coordinating with each other. In this study we uncover a coordination of neural activity between the ongoing electroencephalogram (EEG) of two people-a person speaking and a person listening. The EEG of one set of twelve participants ("speakers") was recorded while they were narrating short stories. The EEG of another set of twelve participants ("listeners") was recorded while watching audiovisual recordings of these stories. Specifically, listeners watched the superimposed videos of two speakers simultaneously and were instructed to attend either to one or the other speaker. This allowed us to isolate neural coordination due to processing the communicated content from the effects of sensory input. We find several neural signatures of communication: First, the EEG is more similar among listeners attending to the same speaker than among listeners attending to different speakers, indicating that listeners' EEG reflects content-specific information. Secondly, listeners' EEG activity correlates with the attended speakers' EEG, peaking at a time delay of about 12.5 s. This correlation takes place not only between homologous, but also between non-homologous brain areas in speakers and listeners. A semantic analysis of the stories suggests that listeners coordinate with speakers at the level of complex semantic representations, so-called "situation models". With this study we link a coordination of neural activity between individuals directly to verbally communicated information.
Forensic Speaker Recognition Law Enforcement and Counter-Terrorism

CERN Document Server

Patil, Hemant

2012-01-01

Forensic Speaker Recognition: Law Enforcement and Counter-Terrorism is an anthology of the research findings of 35 speaker recognition experts from around the world. The volume provides a multidimensional view of the complex science involved in determining whether a suspect’s voice truly matches forensic speech samples, collected by law enforcement and counter-terrorism agencies, that are associated with the commission of a terrorist act or other crimes. While addressing such topics as the challenges of forensic case work, handling speech signal degradation, analyzing features of speaker recognition to optimize voice verification system performance, and designing voice applications that meet the practical needs of law enforcement and counter-terrorism agencies, this material all sounds a common theme: how the rigors of forensic utility are demanding new levels of excellence in all aspects of speaker recognition. The contributors are among the most eminent scientists in speech engineering and signal process...
Gricean Semantics and Vague Speaker-Meaning

OpenAIRE

Schiffer, Stephen

2017-01-01

Presentations of Gricean semantics, including Stephen Neale’s in “Silent Reference,” totally ignore vagueness, even though virtually every utterance is vague. I ask how Gricean semantics might be adjusted to accommodate vague speaker-meaning. My answer is that it can’t accommodate it: the Gricean program collapses in the face of vague speaker-meaning. The Gricean might, however, fi nd some solace in knowing that every other extant meta-semantic and semantic program is in the same boat.
Effect of lisping on audience evaluation of male speakers.

Science.gov (United States)

Mowrer, D E; Wahl, P; Doolan, S J

1978-05-01

The social consequences of adult listeners' first impression of lisping were evaluated in two studies. Five adult speakers were rated by adult listeners with regard to speaking ability, intelligence, education, masculinity, and friendship. Results from both studies indicate that listeners rate adult speakers who demonstrate frontal lisping lower than nonlispers in all five categories investigated. Efforts to correct frontal lisping are justifiable on the basis of the poor impression lisping speakers make on the listener.
Word level language identification in online multilingual communication

NARCIS (Netherlands)

Nguyen, Dong-Phuong; Dogruoz, A. Seza

2013-01-01

Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data require automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using
Consistency between verbal and non-verbal affective cues: a clue to speaker credibility.

Science.gov (United States)

Gillis, Randall L; Nilsen, Elizabeth S

2017-06-01

Listeners are exposed to inconsistencies in communication; for example, when speakers' words (i.e. verbal) are discrepant with their demonstrated emotions (i.e. non-verbal). Such inconsistencies introduce ambiguity, which may render a speaker to be a less credible source of information. Two experiments examined whether children make credibility discriminations based on the consistency of speakers' affect cues. In Experiment 1, school-age children (7- to 8-year-olds) preferred to solicit information from consistent speakers (e.g. those who provided a negative statement with negative affect), over novel speakers, to a greater extent than they preferred to solicit information from inconsistent speakers (e.g. those who provided a negative statement with positive affect) over novel speakers. Preschoolers (4- to 5-year-olds) did not demonstrate this preference. Experiment 2 showed that school-age children's ratings of speakers were influenced by speakers' affect consistency when the attribute being judged was related to information acquisition (speakers' believability, "weird" speech), but not general characteristics (speakers' friendliness, likeability). Together, findings suggest that school-age children are sensitive to, and use, the congruency of affect cues to determine whether individuals are credible sources of information.
Young Children's Sensitivity to Speaker Gender When Learning from Others

Science.gov (United States)

Ma, Lili; Woolley, Jacqueline D.

2013-01-01

This research explores whether young children are sensitive to speaker gender when learning novel information from others. Four- and 6-year-olds ("N" = 144) chose between conflicting statements from a male versus a female speaker (Studies 1 and 3) or decided which speaker (male or female) they would ask (Study 2) when learning about the functions…
Fluency profile: comparison between Brazilian and European Portuguese speakers.

Science.gov (United States)

Castro, Blenda Stephanie Alves e; Martins-Reis, Vanessa de Oliveira; Baptista, Ana Catarina; Celeste, Letícia Correa

2014-01-01

The purpose of the study was to compare the speech fluency of Brazilian Portuguese speakers with that of European Portuguese speakers. The study participants were 76 individuals of any ethnicity or skin color aged 18-29 years. Of the participants, 38 lived in Brazil and 38 in Portugal. Speech samples from all participants were obtained and analyzed according to the variables of typology and frequency of speech disruptions and speech rate. Descriptive and inferential statistical analyses were performed to assess the association between the fluency profile and linguistic variant variables. We found that the speech rate of European Portuguese speakers was higher than the speech rate of Brazilian Portuguese speakers in words per minute (p=0.004). The qualitative distribution of the typology of common dysfluencies (pPortuguese speakers is not available, speech therapists in Portugal can use the same speech fluency assessment as has been used in Brazil to establish a diagnosis of stuttering, especially in regard to typical and stuttering dysfluencies, with care taken when evaluating the speech rate.
Direct Speaker Gaze Promotes Trust in Truth-Ambiguous Statements.

Science.gov (United States)

Kreysa, Helene; Kessler, Luise; Schweinberger, Stefan R

2016-01-01

A speaker's gaze behaviour can provide perceivers with a multitude of cues which are relevant for communication, thus constituting an important non-verbal interaction channel. The present study investigated whether direct eye gaze of a speaker affects the likelihood of listeners believing truth-ambiguous statements. Participants were presented with videos in which a speaker produced such statements with either direct or averted gaze. The statements were selected through a rating study to ensure that participants were unlikely to know a-priori whether they were true or not (e.g., "sniffer dogs cannot smell the difference between identical twins"). Participants indicated in a forced-choice task whether or not they believed each statement. We found that participants were more likely to believe statements by a speaker looking at them directly, compared to a speaker with averted gaze. Moreover, when participants disagreed with a statement, they were slower to do so when the statement was uttered with direct (compared to averted) gaze, suggesting that the process of rejecting a statement as untrue may be inhibited when that statement is accompanied by direct gaze.
Noise-robust speech triage.

Science.gov (United States)

Bartos, Anthony L; Cipr, Tomas; Nelson, Douglas J; Schwarz, Petr; Banowetz, John; Jerabek, Ladislav

2018-04-01

A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (-10 dB ≤ SNR < +10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR ≥ +10 dB).
A hybrid generative-discriminative approach to speaker diarization

NARCIS (Netherlands)

Noulas, A.K.; van Kasteren, T.; Kröse, B.J.A.

2008-01-01

In this paper we present a sound probabilistic approach to speaker diarization. We use a hybrid framework where a distribution over the number of speakers at each point of a multimodal stream is estimated with a discriminative model. The output of this process is used as input in a generative model
Understanding speaker attitudes from prosody by adults with Parkinson's disease.

Science.gov (United States)

Monetta, Laura; Cheang, Henry S; Pell, Marc D

2008-09-01

The ability to interpret vocal (prosodic) cues during social interactions can be disrupted by Parkinson's disease, with notable effects on how emotions are understood from speech. This study investigated whether PD patients who have emotional prosody deficits exhibit further difficulties decoding the attitude of a speaker from prosody. Vocally inflected but semantically nonsensical 'pseudo-utterances' were presented to listener groups with and without PD in two separate rating tasks. Task I required participants to rate how confident a speaker sounded from their voice and Task 2 required listeners to rate how polite the speaker sounded for a comparable set of pseudo-utterances. The results showed that PD patients were significantly less able than HC participants to use prosodic cues to differentiate intended levels of speaker confidence in speech, although the patients could accurately detect the politelimpolite attitude of the speaker from prosody in most cases. Our data suggest that many PD patients fail to use vocal cues to effectively infer a speaker's emotions as well as certain attitudes in speech such as confidence, consistent with the idea that the basal ganglia play a role in the meaningful processing of prosodic sequences in spoken language (Pell & Leonard, 2003).
Simultaneous Assessment of Speech Identification and Spatial Discrimination

Directory of Open Access Journals (Sweden)

Jennifer K. Bizley

2015-12-01

Full Text Available With increasing numbers of children and adults receiving bilateral cochlear implants, there is an urgent need for assessment tools that enable testing of binaural hearing abilities. Current test batteries are either limited in scope or are of an impractical duration for routine testing. Here, we report a behavioral test that enables combined testing of speech identification and spatial discrimination in noise. In this task, multitalker babble was presented from all speakers, and pairs of speech tokens were sequentially presented from two adjacent speakers. Listeners were required to identify both words from a closed set of four possibilities and to determine whether the second token was presented to the left or right of the first. In Experiment 1, normal-hearing adult listeners were tested at 15° intervals throughout the frontal hemifield. Listeners showed highest spatial discrimination performance in and around the frontal midline, with a decline at more eccentric locations. In contrast, speech identification abilities were least accurate near the midline and showed an improvement in performance at more lateral locations. In Experiment 2, normal-hearing listeners were assessed using a restricted range of speaker locations designed to match those found in clinical testing environments. Here, speakers were separated by 15° around the midline and 30° at more lateral locations. This resulted in a similar pattern of behavioral results as in Experiment 1. We conclude, this test offers the potential to assess both spatial discrimination and the ability to use spatial information for unmasking in clinical populations.
A Method to Integrate GMM, SVM and DTW for Speaker Recognition

Directory of Open Access Journals (Sweden)

Ing-Jr Ding

2014-01-01

Full Text Available This paper develops an effective and efficient scheme to integrate Gaussian mixture model (GMM, support vector machine (SVM, and dynamic time wrapping (DTW for automatic speaker recognition. GMM and SVM are two popular classifiers for speaker recognition applications. DTW is a fast and simple template matching method, and it is frequently seen in applications of speech recognition. In this work, DTW does not play a role to perform speech recognition, and it will be employed to be a verifier for verification of valid speakers. The proposed combination scheme of GMM, SVM and DTW, called SVMGMM-DTW, for speaker recognition in this study is a two-phase verification process task including GMM-SVM verification of the first phase and DTW verification of the second phase. By providing a double check to verify the identity of a speaker, it will be difficult for imposters to try to pass the security protection; therefore, the safety degree of speaker recognition systems will be largely increased. A series of experiments designed on door access control applications demonstrated that the superiority of the developed SVMGMM-DTW on speaker recognition accuracy.
Direct Speaker Gaze Promotes Trust in Truth-Ambiguous Statements.

Directory of Open Access Journals (Sweden)

Helene Kreysa

Full Text Available A speaker's gaze behaviour can provide perceivers with a multitude of cues which are relevant for communication, thus constituting an important non-verbal interaction channel. The present study investigated whether direct eye gaze of a speaker affects the likelihood of listeners believing truth-ambiguous statements. Participants were presented with videos in which a speaker produced such statements with either direct or averted gaze. The statements were selected through a rating study to ensure that participants were unlikely to know a-priori whether they were true or not (e.g., "sniffer dogs cannot smell the difference between identical twins". Participants indicated in a forced-choice task whether or not they believed each statement. We found that participants were more likely to believe statements by a speaker looking at them directly, compared to a speaker with averted gaze. Moreover, when participants disagreed with a statement, they were slower to do so when the statement was uttered with direct (compared to averted gaze, suggesting that the process of rejecting a statement as untrue may be inhibited when that statement is accompanied by direct gaze.
Defining robustness protocols: a method to include and evaluate robustness in clinical plans

International Nuclear Information System (INIS)

McGowan, S E; Albertini, F; Lomax, A J; Thomas, S J

2015-01-01

We aim to define a site-specific robustness protocol to be used during the clinical plan evaluation process. Plan robustness of 16 skull base IMPT plans to systematic range and random set-up errors have been retrospectively and systematically analysed. This was determined by calculating the error-bar dose distribution (ebDD) for all the plans and by defining some metrics used to define protocols aiding the plan assessment. Additionally, an example of how to clinically use the defined robustness database is given whereby a plan with sub-optimal brainstem robustness was identified. The advantage of using different beam arrangements to improve the plan robustness was analysed. Using the ebDD it was found range errors had a smaller effect on dose distribution than the corresponding set-up error in a single fraction, and that organs at risk were most robust to the range errors, whereas the target was more robust to set-up errors. A database was created to aid planners in terms of plan robustness aims in these volumes. This resulted in the definition of site-specific robustness protocols. The use of robustness constraints allowed for the identification of a specific patient that may have benefited from a treatment of greater individuality. A new beam arrangement showed to be preferential when balancing conformality and robustness for this case. The ebDD and error-bar volume histogram proved effective in analysing plan robustness. The process of retrospective analysis could be used to establish site-specific robustness planning protocols in proton therapy. These protocols allow the planner to determine plans that, although delivering a dosimetrically adequate dose distribution, have resulted in sub-optimal robustness to these uncertainties. For these cases the use of different beam start conditions may improve the plan robustness to set-up and range uncertainties. (paper)
Defining robustness protocols: a method to include and evaluate robustness in clinical plans

Science.gov (United States)

McGowan, S. E.; Albertini, F.; Thomas, S. J.; Lomax, A. J.

2015-04-01

We aim to define a site-specific robustness protocol to be used during the clinical plan evaluation process. Plan robustness of 16 skull base IMPT plans to systematic range and random set-up errors have been retrospectively and systematically analysed. This was determined by calculating the error-bar dose distribution (ebDD) for all the plans and by defining some metrics used to define protocols aiding the plan assessment. Additionally, an example of how to clinically use the defined robustness database is given whereby a plan with sub-optimal brainstem robustness was identified. The advantage of using different beam arrangements to improve the plan robustness was analysed. Using the ebDD it was found range errors had a smaller effect on dose distribution than the corresponding set-up error in a single fraction, and that organs at risk were most robust to the range errors, whereas the target was more robust to set-up errors. A database was created to aid planners in terms of plan robustness aims in these volumes. This resulted in the definition of site-specific robustness protocols. The use of robustness constraints allowed for the identification of a specific patient that may have benefited from a treatment of greater individuality. A new beam arrangement showed to be preferential when balancing conformality and robustness for this case. The ebDD and error-bar volume histogram proved effective in analysing plan robustness. The process of retrospective analysis could be used to establish site-specific robustness planning protocols in proton therapy. These protocols allow the planner to determine plans that, although delivering a dosimetrically adequate dose distribution, have resulted in sub-optimal robustness to these uncertainties. For these cases the use of different beam start conditions may improve the plan robustness to set-up and range uncertainties.
Bilingual and Monolingual Children Prefer Native-Accented Speakers

Directory of Open Access Journals (Sweden)

Andre L. eSouza

2013-12-01

Full Text Available Adults and young children prefer to affiliate with some individuals rather than others. Studies have shown that monolingual children show in-group biases for individuals who speak their native language without a foreign accent (Kinzler, Dupoux, & Spelke, 2007. Some studies have suggested that bilingual children are less influenced than monolinguals by language variety when attributing personality traits to different speakers (Anisfeld & Lambert, 1964, which could indicate that bilinguals have fewer in-group biases and perhaps greater social flexibility. However, no previous studies have compared monolingual and bilingual children’s reactions to speakers with unfamiliar foreign accents. In the present study, we investigated the social preferences of 5-year-old English and French monolinguals and English-French bilinguals. Contrary to our predictions, both monolingual and bilingual preschoolers preferred to be friends with native-accented speakers over speakers who spoke their dominant language with an unfamiliar foreign accent. This result suggests that both monolingual and bilingual children have strong preferences for in-group members who use a familiar language variety, and that bilingualism does not lead to generalized social flexibility.
Bilingual and monolingual children prefer native-accented speakers.

Science.gov (United States)

Souza, André L; Byers-Heinlein, Krista; Poulin-Dubois, Diane

2013-01-01

Adults and young children prefer to affiliate with some individuals rather than others. Studies have shown that monolingual children show in-group biases for individuals who speak their native language without a foreign accent (Kinzler et al., 2007). Some studies have suggested that bilingual children are less influenced than monolinguals by language variety when attributing personality traits to different speakers (Anisfeld and Lambert, 1964), which could indicate that bilinguals have fewer in-group biases and perhaps greater social flexibility. However, no previous studies have compared monolingual and bilingual children's reactions to speakers with unfamiliar foreign accents. In the present study, we investigated the social preferences of 5-year-old English and French monolinguals and English-French bilinguals. Contrary to our predictions, both monolingual and bilingual preschoolers preferred to be friends with native-accented speakers over speakers who spoke their dominant language with an unfamiliar foreign accent. This result suggests that both monolingual and bilingual children have strong preferences for in-group members who use a familiar language variety, and that bilingualism does not lead to generalized social flexibility.
Differences in Sickness Allowance Receipt between Swedish Speakers and Finnish Speakers in Finland

Directory of Open Access Journals (Sweden)

Kaarina S. Reini

2017-12-01

Full Text Available Previous research has documented lower disability retirement and mortality rates of Swedish speakers as compared with Finnish speakers in Finland. This paper is the first to compare the two language groups with regard to the receipt of sickness allowance, which is an objective health measure that reflects a less severe poor health condition. Register-based data covering the years 1988-2011 are used. We estimate logistic regression models with generalized estimating equations to account for repeated observations at the individual level. We find that Swedish-speaking men have approximately 30 percent lower odds of receiving sickness allowance than Finnish-speaking men, whereas the difference in women is about 15 percent. In correspondence with previous research on all-cause mortality at working ages, we find no language-group difference in sickness allowance receipt in the socially most successful subgroup of the population.

On the improvement of speaker diarization by detecting overlapped speech

OpenAIRE

Hernando Pericás, Francisco Javier; Hernando Pericás, Francisco Javier

2010-01-01

Simultaneous speech in meeting environment is responsible for a certain amount of errors caused by standard speaker diarization systems. We are presenting an overlap detection system for far-field data based on spectral and spatial features, where the spatial features obtained on different microphone pairs are fused by means of principal component analysis. Detected overlap segments are applied for speaker diarization in order to increase the purity of speaker clusters an...
Comprehending non-native speakers: theory and evidence for adjustment in manner of processing.

Science.gov (United States)

Lev-Ari, Shiri

2014-01-01

Non-native speakers have lower linguistic competence than native speakers, which renders their language less reliable in conveying their intentions. We suggest that expectations of lower competence lead listeners to adapt their manner of processing when they listen to non-native speakers. We propose that listeners use cognitive resources to adjust by increasing their reliance on top-down processes and extracting less information from the language of the non-native speaker. An eye-tracking study supports our proposal by showing that when following instructions by a non-native speaker, listeners make more contextually-induced interpretations. Those with relatively high working memory also increase their reliance on context to anticipate the speaker's upcoming reference, and are less likely to notice lexical errors in the non-native speech, indicating that they take less information from the speaker's language. These results contribute to our understanding of the flexibility in language processing and have implications for interactions between native and non-native speakers.
Role of Speaker Cues in Attention Inference

OpenAIRE

Jin Joo Lee; Cynthia Breazeal; David DeSteno

2017-01-01

Current state-of-the-art approaches to emotion recognition primarily focus on modeling the nonverbal expressions of the sole individual without reference to contextual elements such as the co-presence of the partner. In this paper, we demonstrate that the accurate inference of listeners’ social-emotional state of attention depends on accounting for the nonverbal behaviors of their storytelling partner, namely their speaker cues. To gain a deeper understanding of the role of speaker cues in at...
Speaker recognition through NLP and CWT modeling.

Energy Technology Data Exchange (ETDEWEB)

Brown-VanHoozer, A.; Kercel, S. W.; Tucker, R. W.

1999-06-23

The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the ''huge population'' problem by seeking two completely different kinds of characterizing features. These features are extracted using the techniques of Neuro-Linguistic Programming (NLP) and the continuous wavelet transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant
Race in Conflict with Heritage: "Black" Heritage Language Speaker of Japanese

Science.gov (United States)

Doerr, Neriko Musha; Kumagai, Yuri

2014-01-01

"Heritage language speaker" is a relatively new term to denote minority language speakers who grew up in a household where the language was used or those who have a family, ancestral, or racial connection to the minority language. In research on heritage language speakers, overlap between these 2 definitions is often assumed--that is,…
Are Cantonese-speakers really descriptivists? Revisiting cross-cultural semantics.

Science.gov (United States)

Lam, Barry

2010-05-01

In an article in Cognition [Machery, E., Mallon, R., Nichols, S., & Stich, S. (2004). Semantics cross-cultural style. Cognition, 92, B1-B12] present data which purports to show that East Asian Cantonese-speakers tend to have descriptivist intuitions about the referents of proper names, while Western English-speakers tend to have causal-historical intuitions about proper names. Machery et al. take this finding to support the view that some intuitions, the universality of which they claim is central to philosophical theories, vary according to cultural background. Machery et al. conclude from their findings that the philosophical methodology of consulting intuitions about hypothetical cases is flawed vis a vis the goal of determining truths about some philosophical domains like philosophical semantics. In the following study, three new vignettes in English were given to Western native English-speakers, and Cantonese translations were given to native Cantonese-speaking immigrants from a Cantonese community in Southern California. For all three vignettes, questions were given to elicit intuitions about the referent of a proper name and the truth-value of an uttered sentence containing a proper name. The results from this study reveal that East Asian Cantonese-speakers do not differ from Western English-speakers in ways that support Machery et al.'s conclusions. This new data concerning the intuitions of Cantonese-speakers raises questions about whether cross-cultural variation in answers to questions on certain vignettes reveal genuine differences in intuitions, or whether such differences stem from non-intuitional differences, such as differences in linguistic competence. Copyright 2009 Elsevier B.V. All rights reserved.
A Study on Metadiscoursive Interaction in the MA Theses of the Native Speakers of English and the Turkish Speakers of English

Science.gov (United States)

Köroglu, Zehra; Tüm, Gülden

2017-01-01

This study has been conducted to evaluate the TM usage in the MA theses written by the native speakers (NSs) of English and the Turkish speakers (TSs) of English. The purpose is to compare the TM usage in the introduction, results and discussion, and conclusion sections by both groups' randomly selected MA theses in the field of ELT between the…
Speaker Recognition from Emotional Speech Using I-vector Approach

Directory of Open Access Journals (Sweden)

MACKOVÁ Lenka

2014-05-01

Full Text Available In recent years the concept of i-vectors become very popular and successful in the field of the speaker verification. The basic principle of i-vectors is that each utterance is represented by fixed-length feature vector of low-dimension. In the literature for purpose of speaker verification various recordings obtained from telephones or microphones were used. The aim of this experiment was to perform speaker verification using speaker model trained with emotional recordings on i-vector basis. The Mel Frequency Cepstral Coefficients (MFCC, log energy, their deltas and acceleration coefficients were used in process of features extraction. As the classification methods of the verification system Mahalanobis distance metric in combination with Eigen Factor Radial normalization was used and in the second approach Cosine Distance Scoring (CSS metric with Within-class Covariance Normalization as a channel compensation was employed. This verification system used emotional recordings of male subjects from freely available German emotional database (Emo-DB.
The TNO speaker diarization system for NIST RT05s meeting data

NARCIS (Netherlands)

Leeuwen, D.A. van

2006-01-01

The TNO speaker speaker diarization system is based on a standard BIC segmentation and clustering algorithm. Since for the NIST Rich Transcription speaker dizarization evaluation measure correct speech detection appears to be essential, we have developed a speech activity detector (SAD) as well.
A fundamental residue pitch perception bias for tone language speakers

Science.gov (United States)

Petitti, Elizabeth

A complex tone composed of only higher-order harmonics typically elicits a pitch percept equivalent to the tone's missing fundamental frequency (f0). When judging the direction of residue pitch change between two such tones, however, listeners may have completely opposite perceptual experiences depending on whether they are biased to perceive changes based on the overall spectrum or the missing f0 (harmonic spacing). Individual differences in residue pitch change judgments are reliable and have been associated with musical experience and functional neuroanatomy. Tone languages put greater pitch processing demands on their speakers than non-tone languages, and we investigated whether these lifelong differences in linguistic pitch processing affect listeners' bias for residue pitch. We asked native tone language speakers and native English speakers to perform a pitch judgment task for two tones with missing fundamental frequencies. Given tone pairs with ambiguous pitch changes, listeners were asked to judge the direction of pitch change, where the direction of their response indicated whether they attended to the overall spectrum (exhibiting a spectral bias) or the missing f0 (exhibiting a fundamental bias). We found that tone language speakers are significantly more likely to perceive pitch changes based on the missing f0 than English speakers. These results suggest that tone-language speakers' privileged experience with linguistic pitch fundamentally tunes their basic auditory processing.
Internal request modification by first and second language speakers ...

African Journals Online (AJOL)

This study focuses on the question of whether Luganda English speakers would negatively transfer into their English speech the use of syntactic and lexical down graders resulting in pragmatic failure. Data were collected from Luganda and Luganda English speakers by means of a Discourse Completion Test (DCT) ...
Speaker Input Variability Does Not Explain Why Larger Populations Have Simpler Languages.

Science.gov (United States)

Atkinson, Mark; Kirby, Simon; Smith, Kenny

2015-01-01

A learner's linguistic input is more variable if it comes from a greater number of speakers. Higher speaker input variability has been shown to facilitate the acquisition of phonemic boundaries, since data drawn from multiple speakers provides more information about the distribution of phonemes in a speech community. It has also been proposed that speaker input variability may have a systematic influence on individual-level learning of morphology, which can in turn influence the group-level characteristics of a language. Languages spoken by larger groups of people have less complex morphology than those spoken in smaller communities. While a mechanism by which the number of speakers could have such an effect is yet to be convincingly identified, differences in speaker input variability, which is thought to be larger in larger groups, may provide an explanation. By hindering the acquisition, and hence faithful cross-generational transfer, of complex morphology, higher speaker input variability may result in structural simplification. We assess this claim in two experiments which investigate the effect of such variability on language learning, considering its influence on a learner's ability to segment a continuous speech stream and acquire a morphologically complex miniature language. We ultimately find no evidence to support the proposal that speaker input variability influences language learning and so cannot support the hypothesis that it explains how population size determines the structural properties of language.
Does verbatim sentence recall underestimate the language competence of near-native speakers?

Directory of Open Access Journals (Sweden)

Judith eSchweppe

2015-02-01

Full Text Available Verbatim sentence recall is widely used to test the language competence of native and non-native speakers since it involves comprehension and production of connected speech. However, we assume that, to maintain surface information, sentence recall relies particularly on attentional resources, which differentially affects native and non-native speakers. Since even in near-natives language processing is less automatized than in native speakers, processing a sentence in a foreign language plus retaining its surface may result in a cognitive overload. We contrasted sentence recall performance of German native speakers with that of highly proficient non-natives. Non-natives recalled the sentences significantly poorer than the natives, but performed equally well on a cloze test. This implies that sentence recall underestimates the language competence of good non-native speakers in mixed groups with native speakers. The findings also suggest that theories of sentence recall need to consider both its linguistic and its attentional aspects.
Are Cantonese-Speakers Really Descriptivists? Revisiting Cross-Cultural Semantics

Science.gov (United States)

Lam, Barry

2010-01-01

In an article in "Cognition" [Machery, E., Mallon, R., Nichols, S., & Stich, S. (2004). "Semantics cross-cultural style." "Cognition, 92", B1-B12] present data which purports to show that East Asian Cantonese-speakers tend to have descriptivist intuitions about the referents of proper names, while Western English-speakers tend to have…
Teaching Portuguese to Spanish Speakers: A Case for Trilingualism

Science.gov (United States)

Carvalho, Ana M.; Freire, Juliana Luna; da Silva, Antonio J. B.

2010-01-01

Portuguese is the sixth-most-spoken native language in the world, with approximately 240,000,000 speakers. Within the United States, there is a growing demand for K-12 language programs to engage the community of Portuguese heritage speakers. According to the 2000 U.S. census, 85,000 school-age children speak Portuguese at home. As a result, more…
Speaker diarization system on the 2007 NIST rich transcription meeting recognition evaluation

Science.gov (United States)

Sun, Hanwu; Nwe, Tin Lay; Koh, Eugene Chin Wei; Bin, Ma; Li, Haizhou

2007-09-01

This paper presents a speaker diarization system developed at the Institute for Infocomm Research (I2R) for NIST Rich Transcription 2007 (RT-07) evaluation task. We describe in details our primary approaches for the speaker diarization on the Multiple Distant Microphones (MDM) conditions in conference room scenario. Our proposed system consists of six modules: 1). Least-mean squared (NLMS) adaptive filter for the speaker direction estimate via Time Difference of Arrival (TDOA), 2). An initial speaker clustering via two-stage TDOA histogram distribution quantization approach, 3). Multiple microphone speaker data alignment via GCC-PHAT Time Delay Estimate (TDE) among all the distant microphone channel signals, 4). A speaker clustering algorithm based on GMM modeling approach, 5). Non-speech removal via speech/non-speech verification mechanism and, 6). Silence removal via "Double-Layer Windowing"(DLW) method. We achieves error rate of 31.02% on the 2006 Spring (RT-06s) MDM evaluation task and a competitive overall error rate of 15.32% for the NIST Rich Transcription 2007 (RT-07) MDM evaluation task.
Iambic-Trochaic Law Effects among Native Speakers of Spanish and English

Directory of Open Access Journals (Sweden)

Megan Crowhurst

2016-10-01

Full Text Available The Iambic-Trochaic Law (Bolton, 1894; Hayes, 1995; Woodrow, 1909 asserts that listeners associate greater intensity with group beginnings (a loud-first preference and greater duration with group endings (a long-last preference. Hayes (1987; 1995 posits a natural connection between the prominences referred to in the ITL and the locations of stressed syllables in feet. However, not all lengthening in final positions originates with stressed syllables, and greater duration may also be associated with stress in nonfinal (trochaic positions. The research described here challenged the notion that presumptive long-last effects necessarily reflect stress-related duration patterns, and investigated the general hypothesis that the robustness of long-last effects should vary depending on the strength of the association between final positions and increased duration, whatever its source. Two ITL studies were conducted in which native speakers of Spanish and of English grouped streams of rhythmically alternating syllables in which vowel intensity and/or duration levels were varied. These languages were chosen because while they are prosodically similar, increased duration on constituent-final syllables is both more common and more salient in English than Spanish. Outcomes revealed robust loud-first effects in both language groups. Long-last effects were significantly weaker in the Spanish group when vowel duration was varied singly. However, long-last effects were present and comparable in both language groups when intensity and duration were covaried. Intensity was a more robust predictor of responses than duration. A primary conclusion was that whether or not humans’ rhythmic grouping preferences have an innate component, duration-based grouping preferences, at least, and the magnitude of intensity-based effects are shaped by listeners’ backgrounds.
Speaker Introductions at Internal Medicine Grand Rounds: Forms of Address Reveal Gender Bias.

Science.gov (United States)

Files, Julia A; Mayer, Anita P; Ko, Marcia G; Friedrich, Patricia; Jenkins, Marjorie; Bryan, Michael J; Vegunta, Suneela; Wittich, Christopher M; Lyle, Melissa A; Melikian, Ryan; Duston, Trevor; Chang, Yu-Hui H; Hayes, Sharonne N

2017-05-01

Gender bias has been identified as one of the drivers of gender disparity in academic medicine. Bias may be reinforced by gender subordinating language or differential use of formality in forms of address. Professional titles may influence the perceived expertise and authority of the referenced individual. The objective of this study is to examine how professional titles were used in the same and mixed-gender speaker introductions at Internal Medicine Grand Rounds (IMGR). A retrospective observational study of video-archived speaker introductions at consecutive IMGR was conducted at two different locations (Arizona, Minnesota) of an academic medical center. Introducers and speakers at IMGR were physician and scientist peers holding MD, PhD, or MD/PhD degrees. The primary outcome was whether or not a speaker's professional title was used during the first form of address during speaker introductions at IMGR. As secondary outcomes, we evaluated whether or not the speakers professional title was used in any form of address during the introduction. Three hundred twenty-one forms of address were analyzed. Female introducers were more likely to use professional titles when introducing any speaker during the first form of address compared with male introducers (96.2% [102/106] vs. 65.6% [141/215]; p form of address 97.8% (45/46) compared with male dyads who utilized a formal title 72.4% (110/152) of the time (p = 0.007). In mixed-gender dyads, where the introducer was female and speaker male, formal titles were used 95.0% (57/60) of the time. Male introducers of female speakers utilized professional titles 49.2% (31/63) of the time (p addressed by professional title than were men introduced by men. Differential formality in speaker introductions may amplify isolation, marginalization, and professional discomfiture expressed by women faculty in academic medicine.
Google Home: smart speaker as environmental control unit.

Science.gov (United States)

Noda, Kenichiro

2017-08-23

Environmental Control Units (ECU) are devices or a system that allows a person to control appliances in their home or work environment. Such system can be utilized by clients with physical and/or functional disability to enhance their ability to control their environment, to promote independence and improve their quality of life. Over the last several years, there have been an emergence of several inexpensive, commercially-available, voice activated smart speakers into the market such as Google Home and Amazon Echo. These smart speakers are equipped with far field microphone that supports voice recognition, and allows for complete hand-free operation for various purposes, including for playing music, for information retrieval, and most importantly, for environmental control. Clients with disability could utilize these features to turn the unit into a simple ECU that is completely voice activated and wirelessly connected to appliances. Smart speakers, with their ease of setup, low cost and versatility, may be a more affordable and accessible alternative to the traditional ECU. Implications for Rehabilitation Environmental Control Units (ECU) enable independence for physically and functionally disabled clients, and reduce burden and frequency of demands on carers. Traditional ECU can be costly and may require clients to learn specialized skills to use. Smart speakers have the potential to be used as a new-age ECU by overcoming these barriers, and can be used by a wider range of clients.
A general auditory bias for handling speaker variability in speech? Evidence in humans and songbirds

Directory of Open Access Journals (Sweden)

Buddhamas eKriengwatana

2015-08-01

Full Text Available Different speakers produce the same speech sound differently, yet listeners are still able to reliably identify the speech sound. How listeners can adjust their perception to compensate for speaker differences in speech, and whether these compensatory processes are unique only to humans, is still not fully understood. In this study we compare the ability of humans and zebra finches to categorize vowels despite speaker variation in speech in order to test the hypothesis that accommodating speaker and gender differences in isolated vowels can be achieved without prior experience with speaker-related variability. Using a behavioural Go/No-go task and identical stimuli, we compared Australian English adults’ (naïve to Dutch and zebra finches’ (naïve to human speech ability to categorize /ɪ/ and /ɛ/ vowels of an novel Dutch speaker after learning to discriminate those vowels from only one other speaker. Experiment 1 and 2 presented vowels of two speakers interspersed or blocked, respectively. Results demonstrate that categorization of vowels is possible without prior exposure to speaker-related variability in speech for zebra finches, and in non-native vowel categories for humans. Therefore, this study is the first to provide evidence for what might be a species-shared auditory bias that may supersede speaker-related information during vowel categorization. It additionally provides behavioural evidence contradicting a prior hypothesis that accommodation of speaker differences is achieved via the use of formant ratios. Therefore, investigations of alternative accounts of vowel normalization that incorporate the possibility of an auditory bias for disregarding inter-speaker variability are warranted.

The mechanism of speech processing in congenital amusia: evidence from Mandarin speakers.

Directory of Open Access Journals (Sweden)

Fang Liu

Full Text Available Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results.
The mechanism of speech processing in congenital amusia: evidence from Mandarin speakers.

Science.gov (United States)

Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren

2012-01-01

Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results.
Presenting and processing information in background noise: A combined speaker-listener perspective.

Science.gov (United States)

Bockstael, Annelies; Samyn, Laurie; Corthals, Paul; Botteldooren, Dick

2018-01-01

Transferring information orally in background noise is challenging, for both speaker and listener. Successful transfer depends on complex interaction between characteristics related to listener, speaker, task, background noise, and context. To fully assess the underlying real-life mechanisms, experimental design has to mimic this complex reality. In the current study, the effects of different types of background noise have been studied in an ecologically valid test design. Documentary-style information had to be presented by the speaker and simultaneously acquired by the listener in four conditions: quiet, unintelligible multitalker babble, fluctuating city street noise, and little varying highway noise. For both speaker and listener, the primary task was to focus on the content that had to be transferred. In addition, for the speakers, the occurrence of hesitation phenomena was assessed. The listener had to perform an additional secondary task to address listening effort. For the listener the condition with the most eventful background noise, i.e., fluctuating city street noise, appeared to be the most difficult with markedly longer duration of the secondary task. In the same fluctuating background noise, speech appeared to be less disfluent, suggesting a higher level of concentration from the speaker's side.
Key-note speaker: Predictors of weight loss after preventive Health consultations

DEFF Research Database (Denmark)

Lous, Jørgen; Freund, Kirsten S

2018-01-01

Invited key-note speaker ved conferencen: Preventive Medicine and Public Health Conference 2018, July 16-17, London.......Invited key-note speaker ved conferencen: Preventive Medicine and Public Health Conference 2018, July 16-17, London....
Automaticity and stability of adaptation to a foreign-accented speaker

NARCIS (Netherlands)

Witteman, M.J.; Bardhan, N.P.; Weber, A.C.; McQueen, J.M.

2015-01-01

In three cross-modal priming experiments we asked whether adaptation to a foreign-accented speaker is automatic, and whether adaptation can be seen after a long delay between initial exposure and test. Dutch listeners were exposed to a Hebrew-accented Dutch speaker with two types of Dutch words:
Dysprosody and Stimulus Effects in Cantonese Speakers with Parkinson's Disease

Science.gov (United States)

Ma, Joan K.-Y.; Whitehill, Tara; Cheung, Katherine S.-K.

2010-01-01

Background: Dysprosody is a common feature in speakers with hypokinetic dysarthria. However, speech prosody varies across different types of speech materials. This raises the question of what is the most appropriate speech material for the evaluation of dysprosody. Aims: To characterize the prosodic impairment in Cantonese speakers with…
Profiles of an Acquisition Generation: Nontraditional Heritage Speakers of Spanish

Science.gov (United States)

DeFeo, Dayna Jean

2018-01-01

Though definitions vary, the literature on heritage speakers of Spanish identifies two primary attributes: a linguistic and cultural connection to the language. This article profiles four Anglo college students who grew up in bilingual or Spanish-dominant communities in the Southwest who self-identified as Spanish heritage speakers, citing…
THE HUMOROUS SPEAKER: THE CONSTRUCTION OF ETHOS IN COMEDY

Directory of Open Access Journals (Sweden)

Maria Flávia Figueiredo

2016-07-01

Full Text Available The rhetoric is guided by three dimensions: logos, pathos and ethos. Logos is the speech itself, pathos are the passions that the speaker, through logos, awakens in his audience, and ethos is the image that the speaker creates of himself, also through logos, in front of an audience. The rhetorical genres are three: deliberative (which drives the audience or the judge to think about future events, characterizing them as convenient or harmful, judiciary (the audience thinks about past events in order to classify them as fair or unfair and epidictic (the audience will judge any fact occurred, or even the character of a person as beautiful or not. According to Figueiredo (2014 and based on Eggs (2005, we advocate that ethos is not a mark left by the speaker only in rhetorical genres, but in any textual genre, once the result of human production, the simplest choices in textual construction, are able to reproduce something that is closely linked to speaker, thus, demarcating hir/her ethos. To verify this assumption, we selected a display of a video of the comedian Danilo Gentili, which will be examined in the light of Rhetoric and Textual Linguistics. So, our objective is to find, in the stand-up comedy genre, marks left by the speaker in the speech that characterizes his/her ethos. The analysis results show that ethos, discursive genre and communicational purpose amalgamate in an indissoluble complex in which the success of one of them interdepends on how the other was built.
Cattle identification based in biometric features of the muzzle

OpenAIRE

Monteiro, Marta; Cadavez, Vasco; Monteiro, Fernando C.

2015-01-01

Cattle identification has been a serious problem for breeding association. Muzzle pattern or nose print has the same characteristic with the human fingerprint which is the most popular biometric marker. The identification accuracy and the processing time are two key challenges of any cattle identification methodology. This paper presents a robust and fast cattle identification scheme from muzzle images using Speed-up Robust Features matching. The matching refinement technique based on the mat...
Quantile Acoustic Vectors vs. MFCC Applied to Speaker Verification

Directory of Open Access Journals (Sweden)

Mayorga-Ortiz Pedro

2014-02-01

Full Text Available In this paper we describe speaker and command recognition related experiments, through quantile vectors and Gaussian Mixture Modelling (GMM. Over the past several years GMM and MFCC have become two of the dominant approaches for modelling speaker and speech recognition applications. However, memory and computational costs are important drawbacks, because autonomous systems suffer processing and power consumption constraints; thus, having a good trade-off between accuracy and computational requirements is mandatory. We decided to explore another approach (quantile vectors in several tasks and a comparison with MFCC was made. Quantile acoustic vectors are proposed for speaker verification and command recognition tasks and the results showed very good recognition efficiency. This method offered a good trade-off between computation times, characteristics vector complexity and overall achieved efficiency.
Robust structural identification via polyhedral template matching

DEFF Research Database (Denmark)

Larsen, Peter Mahler; Schmidt, Søren; Schiøtz, Jakob

2016-01-01

Successful scientific applications of large-scale molecular dynamics often rely on automated methods for identifying the local crystalline structure of condensed phases. Many existing methods for structural identification, such as common neighbour analysis, rely on interatomic distances (or thres...... is made available under a Free and Open Source Software license....
Defining "Native Speaker" in Multilingual Settings: English as a Native Language in Asia

Science.gov (United States)

Hansen Edwards, Jette G.

2017-01-01

The current study examines how and why speakers of English from multilingual contexts in Asia are identifying as native speakers of English. Eighteen participants from different contexts in Asia, including Singapore, Malaysia, India, Taiwan, and The Philippines, who self-identified as native speakers of English participated in hour-long interviews…
Sensitivity to phonological context in L2 spelling: evidence from Russian ESL speakers

DEFF Research Database (Denmark)

Dich, Nadya

2010-01-01

The study attempts to investigate factors underlying the development of spellers’ sensitivity to phonological context in English. Native English speakers and Russian speakers of English as a second language (ESL) were tested on their ability to use information about the coda to predict the spelling...... on the information about the coda when spelling vowels in nonwords. In both native and non-native speakers, context sensitivity was predicted by English word spelling; in Russian ESL speakers this relationship was mediated by English proficiency. L1 spelling proficiency did not facilitate L2 context sensitivity...
An analysis of topics and vocabulary in Chinese oral narratives by normal speakers and speakers with fluent aphasia.

Science.gov (United States)

Law, Sam-Po; Kong, Anthony Pak-Hin; Lai, Christy

2018-01-01

This study analysed the topic and vocabulary of Chinese speakers based on language samples of personal recounts in a large spoken Chinese database recently made available in the public domain, i.e. Cantonese AphasiaBank ( http://www.speech.hku.hk/caphbank/search/ ). The goal of the analysis is to offer clinicians a rich source for selecting ecologically valid training materials for rehabilitating Chinese-speaking people with aphasia (PWA) in the design and planning of culturally and linguistically appropriate treatments. Discourse production of 65 Chinese-speaking PWA of fluent types (henceforth, PWFA) and their non-aphasic controls narrating an important event in their life were extracted from Cantonese AphasiaBank. Analyses of topics and vocabularies in terms of part-of-speech, word frequency, lexical semantics, and diversity were conducted. There was significant overlap in topics between the two groups. While the vocabulary was larger for controls than that of PWFA as expected, they were similar in distribution across parts-of-speech, frequency of occurrence, and the ratio of concrete to abstract items in major open word classes. Moreover, proportionately more different verbs than nouns were employed at the individual level for both speaker groups. The findings provide important implications for guiding directions of aphasia rehabilitation not only of fluent but also non-fluent Chinese aphasic speakers.
Revisiting vocal perception in non-human animals: a review of vowel discrimination, speaker voice recognition, and speaker normalization

Directory of Open Access Journals (Sweden)

Buddhamas eKriengwatana

2015-01-01

Full Text Available The extent to which human speech perception evolved by taking advantage of predispositions and pre-existing features of vertebrate auditory and cognitive systems remains a central question in the evolution of speech. This paper reviews asymmetries in vowel perception, speaker voice recognition, and speaker normalization in non-human animals – topics that have not been thoroughly discussed in relation to the abilities of non-human animals, but are nonetheless important aspects of vocal perception. Throughout this paper we demonstrate that addressing these issues in non-human animals is relevant and worthwhile because many non-human animals must deal with similar issues in their natural environment. That is, they must also discriminate between similar-sounding vocalizations, determine signaler identity from vocalizations, and resolve signaler-dependent variation in vocalizations from conspecifics. Overall, we find that, although plausible, the current evidence is insufficiently strong to conclude that directional asymmetries in vowel perception are specific to humans, or that non-human animals can use voice characteristics to recognize human individuals. However, we do find some indication that non-human animals can normalize speaker differences. Accordingly, we identify avenues for future research that would greatly improve and advance our understanding of these topics.
The Status of Native Speaker Intuitions in a Polylectal Grammar.

Science.gov (United States)

Debose, Charles E.

A study of one speaker's intuitions about and performance in Black English is presented with relation to Saussure's "langue-parole" dichotomy. Native speakers of a language have intuitions about the static synchronic entities although the data of their speaking is variable and panchronic. These entities are in a diglossic relationship to each…
Progress in the AMIDA speaker diarization system for meeting data

NARCIS (Netherlands)

Leeuwen, D.A. van; Konečný, M.

2008-01-01

In this paper we describe the AMIDA speaker dizarization system as it was submitted to the NIST Rich Transcription evaluation 2007 for conference room data. This is done in the context of the history of this system and other speaker diarization systems. One of the goals of our system is to have as
Speaker and Accent Variation Are Handled Differently: Evidence in Native and Non-Native Listeners

Science.gov (United States)

Kriengwatana, Buddhamas; Terry, Josephine; Chládková, Kateřina; Escudero, Paola

2016-01-01

Listeners are able to cope with between-speaker variability in speech that stems from anatomical sources (i.e. individual and sex differences in vocal tract size) and sociolinguistic sources (i.e. accents). We hypothesized that listeners adapt to these two types of variation differently because prior work indicates that adapting to speaker/sex variability may occur pre-lexically while adapting to accent variability may require learning from attention to explicit cues (i.e. feedback). In Experiment 1, we tested our hypothesis by training native Dutch listeners and Australian-English (AusE) listeners without any experience with Dutch or Flemish to discriminate between the Dutch vowels /I/ and /ε/ from a single speaker. We then tested their ability to classify /I/ and /ε/ vowels of a novel Dutch speaker (i.e. speaker or sex change only), or vowels of a novel Flemish speaker (i.e. speaker or sex change plus accent change). We found that both Dutch and AusE listeners could successfully categorize vowels if the change involved a speaker/sex change, but not if the change involved an accent change. When AusE listeners were given feedback on their categorization responses to the novel speaker in Experiment 2, they were able to successfully categorize vowels involving an accent change. These results suggest that adapting to accents may be a two-step process, whereby the first step involves adapting to speaker differences at a pre-lexical level, and the second step involves adapting to accent differences at a contextual level, where listeners have access to word meaning or are given feedback that allows them to appropriately adjust their perceptual category boundaries. PMID:27309889
Improved autonomous star identification algorithm

International Nuclear Information System (INIS)

Luo Li-Yan; Xu Lu-Ping; Zhang Hua; Sun Jing-Rong

2015-01-01

The log–polar transform (LPT) is introduced into the star identification because of its rotation invariance. An improved autonomous star identification algorithm is proposed in this paper to avoid the circular shift of the feature vector and to reduce the time consumed in the star identification algorithm using LPT. In the proposed algorithm, the star pattern of the same navigation star remains unchanged when the stellar image is rotated, which makes it able to reduce the star identification time. The logarithmic values of the plane distances between the navigation and its neighbor stars are adopted to structure the feature vector of the navigation star, which enhances the robustness of star identification. In addition, some efforts are made to make it able to find the identification result with fewer comparisons, instead of searching the whole feature database. The simulation results demonstrate that the proposed algorithm can effectively accelerate the star identification. Moreover, the recognition rate and robustness by the proposed algorithm are better than those by the LPT algorithm and the modified grid algorithm. (paper)
Children's Understanding That Utterances Emanate from Minds: Using Speaker Belief To Aid Interpretation.

Science.gov (United States)

Mitchell, Peter; Robinson, Elizabeth J.; Thompson, Doreen E.

1999-01-01

Three experiments examined 3- to 6-year olds' ability to use a speaker's utterance based on false belief to identify which of several referents was intended. Found that many 4- to 5-year olds performed correctly only when it was unnecessary to consider the speaker's belief. When the speaker gave an ambiguous utterance, many 3- to 6-year olds…

Comparative Analysys of Speech Parameters for the Design of Speaker Verification Systems

National Research Council Canada - National Science Library

Souza, A

2001-01-01

Speaker verification systems are basically composed of three stages: feature extraction, feature processing and comparison of the modified features from speaker voice and from the voice that should be...
Wavelet packet transform-based robust video watermarking technique

Indian Academy of Sciences (India)

If any conflict happens to the copyright identification and authentication, ... the present work is concentrated on the robust digital video watermarking. .... the wavelet decomposition, resulting in a new family of orthonormal bases for function ...
The (TNO) Speaker Diarization System for NIST Rich Transcription Evaluation 2005 for meeting data

NARCIS (Netherlands)

Leeuwen, D.A. van

2005-01-01

Abstract. The TNO speaker speaker diarization system is based on a standard BIC segmentation and clustering algorithm. Since for the NIST Rich Transcription speaker dizarization evaluation measure correct speech detection appears to be essential, we have developed a speech activity detector (SAD) as
Popular Public Discourse at Speakers' Corner: Negotiating Cultural Identities in Interaction

DEFF Research Database (Denmark)

McIlvenny, Paul

1996-01-01

, religious and general topical 'soap-box' oration. However, audiences are not passive receivers of rhetorical messages. They are active negotiators of interpretations and alignments that may conflict with the speaker's and other audience members' orientations to prior talk. Speakers' Corner is a space...
Impact of Cyrillic on Native English Speakers' Phono-lexical Acquisition of Russian.

Science.gov (United States)

Showalter, Catherine E

2018-03-01

We investigated the influence of grapheme familiarity and native language grapheme-phoneme correspondences during second language lexical learning. Native English speakers learned Russian-like words via auditory presentations containing only familiar first language phones, pictured meanings, and exposure to either Cyrillic orthographic forms (Orthography condition) or the sequence (No Orthography condition). Orthography participants saw three types of written forms: familiar-congruent (e.g., -[kom]), familiar-incongruent (e.g., -[rɑt]), and unfamiliar (e.g., -[fil]). At test, participants determined whether pictures and words matched according to what they saw during word learning. All participants performed near ceiling in all stimulus conditions, except for Orthography participants on words containing incongruent grapheme-phoneme correspondences. These results suggest that first language grapheme-phoneme correspondences can cause interference during second language phono-lexical acquisition. In addition, these results suggest that orthographic input effects are robust enough to interfere even when the input does not contain novel phones.
Speaker Linking and Applications using Non-Parametric Hashing Methods

Science.gov (United States)

2016-09-08

nonparametric estimate of a multivariate density function,” The Annals of Math- ematical Statistics , vol. 36, no. 3, pp. 1049–1051, 1965. [9] E. A. Patrick...Speaker Linking and Applications using Non-Parametric Hashing Methods† Douglas Sturim and William M. Campbell MIT Lincoln Laboratory, Lexington, MA...with many approaches [1, 2]. For this paper, we focus on using i-vectors [2], but the methods apply to any embedding. For the task of speaker QBE and
A simple optical method for measuring the vibration amplitude of a speaker

OpenAIRE

UEDA, Masahiro; YAMAGUCHI, Toshihiko; KAKIUCHI, Hiroki; SUGA, Hiroshi

1999-01-01

A simple optical method has been proposed for measuring the vibration amplitude of a speaker vibrating with a frequency of approximately 10 kHz. The method is based on a multiple reflection between a vibrating speaker plane and a mirror parallel to that speaker plane. The multiple reflection can magnify a dispersion of the laser beam caused by the vibration, and easily make a measurement of the amplitude. The measuring sensitivity ranges between sub-microns and 1 mm. A preliminary experim...
Coronal View Ultrasound Imaging of Movement in Different Segments of the Tongue during Paced Recital: Findings from Four Normal Speakers and a Speaker with Partial Glossectomy

Science.gov (United States)

Bressmann, Tim; Flowers, Heather; Wong, Willy; Irish, Jonathan C.

2010-01-01

The goal of this study was to quantitatively describe aspects of coronal tongue movement in different anatomical regions of the tongue. Four normal speakers and a speaker with partial glossectomy read four repetitions of a metronome-paced poem. Their tongue movement was recorded in four coronal planes using two-dimensional B-mode ultrasound…
Intelligibility of Standard German and Low German to Speakers of Dutch

NARCIS (Netherlands)

Gooskens, C.S.; Kürschner, Sebastian; van Bezooijen, R.

2011-01-01

This paper reports on the intelligibility of spoken Low German and Standard German for speakers of Dutch. Two aspects are considered. First, the relative potential for intelligibility of the Low German variety of Bremen and the High German variety of Modern Standard German for speakers of Dutch is
Speaker detection for conversational robots using synchrony between audio and video

NARCIS (Netherlands)

Noulas, A.; Englebienne, G.; Terwijn, B.; Kröse, B.; Hanheide, M.; Zender, H.

2010-01-01

This paper compares different methods for detecting the speaking person when multiple persons are interacting with a robot. We evaluate the state-of-the-art speaker detection methods on the iCat robot. These methods use the synchrony between audio and video to locate the most probable speaker. We
Evaluating acoustic speaker normalization algorithms: evidence from longitudinal child data.

Science.gov (United States)

Kohn, Mary Elizabeth; Farrington, Charlie

2012-03-01

Speaker vowel formant normalization, a technique that controls for variation introduced by physical differences between speakers, is necessary in variationist studies to compare speakers of different ages, genders, and physiological makeup in order to understand non-physiological variation patterns within populations. Many algorithms have been established to reduce variation introduced into vocalic data from physiological sources. The lack of real-time studies tracking the effectiveness of these normalization algorithms from childhood through adolescence inhibits exploration of child participation in vowel shifts. This analysis compares normalization techniques applied to data collected from ten African American children across five time points. Linear regressions compare the reduction in variation attributable to age and gender for each speaker for the vowels BEET, BAT, BOT, BUT, and BOAR. A normalization technique is successful if it maintains variation attributable to a reference sociolinguistic variable, while reducing variation attributable to age. Results indicate that normalization techniques which rely on both a measure of central tendency and range of the vowel space perform best at reducing variation attributable to age, although some variation attributable to age persists after normalization for some sections of the vowel space. © 2012 Acoustical Society of America
Do children go for the nice guys? The influence of speaker benevolence and certainty on selective word learning.

Science.gov (United States)

Bergstra, Myrthe; DE Mulder, Hannah N M; Coopmans, Peter

2018-04-06

This study investigated how speaker certainty (a rational cue) and speaker benevolence (an emotional cue) influence children's willingness to learn words in a selective learning paradigm. In two experiments four- to six-year-olds learnt novel labels from two speakers and, after a week, their memory for these labels was reassessed. Results demonstrated that children retained the label-object pairings for at least a week. Furthermore, children preferred to learn from certain over uncertain speakers, but they had no significant preference for nice over nasty speakers. When the cues were combined, children followed certain speakers, even if they were nasty. However, children did prefer to learn from nice and certain speakers over nasty and certain speakers. These results suggest that rational cues regarding a speaker's linguistic competence trump emotional cues regarding a speaker's affective status in word learning. However, emotional cues were found to have a subtle influence on this process.
Effects of Language Background on Gaze Behavior: A Crosslinguistic Comparison Between Korean and German Speakers

Science.gov (United States)

Goller, Florian; Lee, Donghoon; Ansorge, Ulrich; Choi, Soonja

2017-01-01

Languages differ in how they categorize spatial relations: While German differentiates between containment (in) and support (auf) with distinct spatial words—(a) den Kuli IN die Kappe stecken (”put pen in cap”); (b) die Kappe AUF den Kuli stecken (”put cap on pen”)—Korean uses a single spatial word (kkita) collapsing (a) and (b) into one semantic category, particularly when the spatial enclosure is tight-fit. Korean uses a different word (i.e., netha) for loose-fits (e.g., apple in bowl). We tested whether these differences influence the attention of the speaker. In a crosslinguistic study, we compared native German speakers with native Korean speakers. Participants rated the similarity of two successive video clips of several scenes where two objects were joined or nested (either in a tight or loose manner). The rating data show that Korean speakers base their rating of similarity more on tight- versus loose-fit, whereas German speakers base their rating more on containment versus support (in vs. auf). Throughout the experiment, we also measured the participants’ eye movements. Korean speakers looked equally long at the moving Figure object and at the stationary Ground object, whereas German speakers were more biased to look at the Ground object. Additionally, Korean speakers also looked more at the region where the two objects touched than did German speakers. We discuss our data in the light of crosslinguistic semantics and the extent of their influence on spatial cognition and perception. PMID:29362644
The Sound of Voice: Voice-Based Categorization of Speakers' Sexual Orientation within and across Languages.

Directory of Open Access Journals (Sweden)

Simone Sulpizio

Full Text Available Empirical research had initially shown that English listeners are able to identify the speakers' sexual orientation based on voice cues alone. However, the accuracy of this voice-based categorization, as well as its generalizability to other languages (language-dependency and to non-native speakers (language-specificity, has been questioned recently. Consequently, we address these open issues in 5 experiments: First, we tested whether Italian and German listeners are able to correctly identify sexual orientation of same-language male speakers. Then, participants of both nationalities listened to voice samples and rated the sexual orientation of both Italian and German male speakers. We found that listeners were unable to identify the speakers' sexual orientation correctly. However, speakers were consistently categorized as either heterosexual or gay on the basis of how they sounded. Moreover, a similar pattern of results emerged when listeners judged the sexual orientation of speakers of their own and of the foreign language. Overall, this research suggests that voice-based categorization of sexual orientation reflects the listeners' expectations of how gay voices sound rather than being an accurate detector of the speakers' actual sexual identity. Results are discussed with regard to accuracy, acoustic features of voices, language dependency and language specificity.
Characteristic Model-Based Robust Model Predictive Control for Hypersonic Vehicles with Constraints

Directory of Open Access Journals (Sweden)

Jun Zhang

2017-06-01

Full Text Available Designing robust control for hypersonic vehicles in reentry is difficult, due to the features of the vehicles including strong coupling, non-linearity, and multiple constraints. This paper proposed a characteristic model-based robust model predictive control (MPC for hypersonic vehicles with reentry constraints. First, the hypersonic vehicle is modeled by a characteristic model composed of a linear time-varying system and a lumped disturbance. Then, the identification data are regenerated by the accumulative sum idea in the gray theory, which weakens effects of the random noises and strengthens regularity of the identification data. Based on the regenerated data, the time-varying parameters and the disturbance are online estimated according to the gray identification. At last, the mixed H2/H∞ robust predictive control law is proposed based on linear matrix inequalities (LMIs and receding horizon optimization techniques. Using active tackling system constraints of MPC, the input and state constraints are satisfied in the closed-loop control system. The validity of the proposed control is verified theoretically according to Lyapunov theory and illustrated by simulation results.
Popular Public Discourse at Speakers' Corner: Negotiating Cultural Identities in Interaction

DEFF Research Database (Denmark)

McIlvenny, Paul

1996-01-01

In this paper I examine how cultural identities are actively negotiated in popular debate at a multicultural public setting in London. Speakers at Speakers' Corner manage the local construction of group affiliation, audience response and argument in and through talk, within the context of ethnic...... in which participant 'citizens' in the public sphere can actively struggle over cultural representation and identities. Using transcribed examples of video data recorded at Speakers' Corner my paper will examine how cultural identity is invoked in the management of active participation. Audiences...... and their affiliations are regulated and made accountable through the routines of membership categorisation and the policing of cultural identities and their imaginary borders....
Proficiency in English sentence stress production by Cantonese speakers who speak English as a second language (ESL).

Science.gov (United States)

Ng, Manwa L; Chen, Yang

2011-12-01

The present study examined English sentence stress produced by native Cantonese speakers who were speaking English as a second language (ESL). Cantonese ESL speakers' proficiency in English stress production as perceived by English-speaking listeners was also studied. Acoustical parameters associated with sentence stress including fundamental frequency (F0), vowel duration, and intensity were measured from the English sentences produced by 40 Cantonese ESL speakers. Data were compared with those obtained from 40 native speakers of American English. The speech samples were also judged by eight native listeners who were native speakers of American English for placement, degree, and naturalness of stress. Results showed that Cantonese ESL speakers were able to use F0, vowel duration, and intensity to differentiate sentence stress patterns. Yet, both female and male Cantonese ESL speakers exhibited consistently higher F0 in stressed words than English speakers. Overall, Cantonese ESL speakers were found to be proficient in using duration and intensity to signal sentence stress, in a way comparable with English speakers. In addition, F0 and intensity were found to correlate closely with perceptual judgement and the degree of stress with the naturalness of stress.
Articulatory Movements during Vowels in Speakers with Dysarthria and Healthy Controls

Science.gov (United States)

Yunusova, Yana; Weismer, Gary; Westbury, John R.; Lindstrom, Mary J.

2008-01-01

Purpose: This study compared movement characteristics of markers attached to the jaw, lower lip, tongue blade, and dorsum during production of selected English vowels by normal speakers and speakers with dysarthria due to amyotrophic lateral sclerosis (ALS) or Parkinson disease (PD). The study asked the following questions: (a) Are movement…
A Comparison of Coverbal Gesture Use in Oral Discourse Among Speakers With Fluent and Nonfluent Aphasia

Science.gov (United States)

Law, Sam-Po; Chak, Gigi Wan-Chi

2017-01-01

Purpose Coverbal gesture use, which is affected by the presence and degree of aphasia, can be culturally specific. The purpose of this study was to compare gesture use among Cantonese-speaking individuals: 23 neurologically healthy speakers, 23 speakers with fluent aphasia, and 21 speakers with nonfluent aphasia. Method Multimedia data of discourse samples from these speakers were extracted from the Cantonese AphasiaBank. Gestures were independently annotated on their forms and functions to determine how gesturing rate and distribution of gestures differed across speaker groups. A multiple regression was conducted to determine the most predictive variable(s) for gesture-to-word ratio. Results Although speakers with nonfluent aphasia gestured most frequently, the rate of gesture use in counterparts with fluent aphasia did not differ significantly from controls. Different patterns of gesture functions in the 3 speaker groups revealed that gesture plays a minor role in lexical retrieval whereas its role in enhancing communication dominates among the speakers with aphasia. The percentages of complete sentences and dysfluency strongly predicted the gesturing rate in aphasia. Conclusions The current results supported the sketch model of language–gesture association. The relationship between gesture production and linguistic abilities and clinical implications for gesture-based language intervention for speakers with aphasia are also discussed. PMID:28609510
Use of the BAT with a Cantonese-Putonghua Speaker with Aphasia

Science.gov (United States)

Kong, Anthony Pak-Hin; Weekes, Brendan Stuart

2011-01-01

The aim of this article is to illustrate the use of the Bilingual Aphasia Test (BAT) with a Cantonese-Putonghua speaker. We describe G, who is a relatively young Chinese bilingual speaker with aphasia. G's communication abilities in his L2, Putonghua, were impaired following brain damage. This impairment caused specific difficulties in…

Processing advantage for emotional words in bilingual speakers.

Science.gov (United States)

Ponari, Marta; Rodríguez-Cuadrado, Sara; Vinson, David; Fox, Neil; Costa, Albert; Vigliocco, Gabriella

2015-10-01

Effects of emotion on word processing are well established in monolingual speakers. However, studies that have assessed whether affective features of words undergo the same processing in a native and nonnative language have provided mixed results: Studies that have found differences between native language (L1) and second language (L2) processing attributed the difference to the fact that L2 learned late in life would not be processed affectively, because affective associations are established during childhood. Other studies suggest that adult learners show similar effects of emotional features in L1 and L2. Differences in affective processing of L2 words can be linked to age and context of learning, proficiency, language dominance, and degree of similarity between L2 and L1. Here, in a lexical decision task on tightly matched negative, positive, and neutral words, highly proficient English speakers from typologically different L1s showed the same facilitation in processing emotionally valenced words as native English speakers, regardless of their L1, the age of English acquisition, or the frequency and context of English use. (c) 2015 APA, all rights reserved).
Methods of Speakers\\' Effects on the Audience

Directory of Open Access Journals (Sweden)

فریبا حسینی

2010-09-01

Full Text Available Methods of Speakers' Effects on the Audience Nasrollah Shameli * Fariba Hosayni ** Abstract This article is focused on four issues. The first issue is related to the speaker's external appearance including the beauty of face, the power of his voice, moves and signals by hand, the stick and eyebrow as well as the height. Such characteristics could have an important effect on the audience. The second issue is related to internal features of the speaker. These include the ethics of the preacher , his/her piety and intention on the speakers based on their personalities, habits and emotions, knowledge and culture, and speed of learning. The third issue is concerned with the appearance of the lecture. Words should be clear enough as well as being mixed with Quranic verses, poetry and proverbs. The final issue is related to the content. It is argued that the subject of the talk should be in accordance with the level of understanding of listeners as well as being new and interesting for them. 3 - A phenomenon rhetoric: It was noted in this section How to give words and phrases so that these words and phrases are clear, correct, mixed in parables, governance and Quranic verses, and appropriate their meaning. 4 - the content of Oratory : It was noted in this section to the topic of Oratory and say that the Oratory should be the theme commensurate with the minds of audiences and also should mean that agree with the case may be, then I say: that the rhetoric if the theme was innovative and new is affecting more and more on the audience. Key words : Oratory , Preacher , Audience, Influence of speech * Associate Professor, Department of Arabic Language and Literature, University of Isfahan E-mail: Dr-Nasrolla Shameli@Yahoo.com * * M.A. in Arabic Language and Literature from Isfahan University E-mail: faribahosayni@yahoo.com
Gender parity trends for invited speakers at four prominent virology conference series.

Science.gov (United States)

Kalejta, Robert F; Palmenberg, Ann C

2017-06-07

Scientific conferences are most beneficial to participants when they showcase significant new experimental developments, accurately summarize the current state of the field, and provide strong opportunities for collaborative networking. A top-notch slate of invited speakers, assembled by conference organizers or committees, is key to achieving these goals. The perceived underrepresentation of female speakers at prominent scientific meetings is currently a popular topic for discussion, but one that often lacks supportive data. We compiled the full rosters of invited speakers over the last 35 years for four prominent international virology conferences, the American Society for Virology Annual Meeting (ASV), the International Herpesvirus Workshop (IHW), the Positive-Strand RNA Virus Symposium (PSR), and the Gordon Research Conference on Viruses & Cells (GRC). The rosters were cross-indexed by unique names, gender, year, and repeat invitations. When plotted as gender-dependent trends over time, all four conferences showed a clear proclivity for male-dominated invited speaker lists. Encouragingly, shifts toward parity are emerging within all units, but at different rates. Not surprisingly, both selection of a larger percentage of first time participants and the presence of a woman on the speaker selection committee correlated with improved parity. Session chair information was also collected for the IHW and GRC. These visible positions also displayed a strong male dominance over time that is eroding slowly. We offer our personal interpretation of these data to aid future organizers achieve improved equity among the limited number of available positions for session moderators and invited speakers. IMPORTANCE Politicians and media members have a tendency to cite anecdotes as conclusions without any supporting data. This happens so frequently now, that a name for it has emerged: fake news. Good science proceeds otherwise. The under representation of women as invited
Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

Directory of Open Access Journals (Sweden)

Umit H. Yapanel

2008-08-01

Full Text Available A proven method for achieving effective automatic speech recognition (ASR due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN, despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN, where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER by 24%, and (ii for a diverse noisy speech task (SPINE 2, where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.
Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

Directory of Open Access Journals (Sweden)

Yapanel UmitH

2008-01-01

Full Text Available A proven method for achieving effective automatic speech recognition (ASR due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN, despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN, where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER by 24%, and (ii for a diverse noisy speech task (SPINE 2, where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.
Outlier robustness for wind turbine extrapolated extreme loads

DEFF Research Database (Denmark)

Natarajan, Anand; Verelst, David Robert

2012-01-01

. Stochastic identification of numerical artifacts in simulated loads is demonstrated using the method of principal component analysis. The extrapolation methodology is made robust to outliers through a weighted loads approach, whereby the eigenvalues of the correlation matrix obtained using the loads with its...
Language control in different contexts: the behavioural ecology of bilingual speakers

Directory of Open Access Journals (Sweden)

David William Green

2011-05-01

Full Text Available This paper proposes that different experimental contexts (single or dual language contexts permit different neural loci at which words in the target language can be selected. However, in order to develop a fuller understanding of the neural circuit mediating language control we need to consider the community context in which bilingual speakers typically use their two languages (the behavioural ecology of bilingual speakers. The contrast between speakers from code-switching and non-code switching communities offers a way to increase our understanding of the cortical, subcortical and, in particular, cerebellar structures involved in language control. It will also help us identify the non-verbal behavioural correlates associated with these control processes.
Artificially intelligent recognition of Arabic speaker using voice print-based local features

Science.gov (United States)

Mahmood, Awais; Alsulaiman, Mansour; Muhammad, Ghulam; Akram, Sheeraz

2016-11-01

Local features for any pattern recognition system are based on the information extracted locally. In this paper, a local feature extraction technique was developed. This feature was extracted in the time-frequency plain by taking the moving average on the diagonal directions of the time-frequency plane. This feature captured the time-frequency events producing a unique pattern for each speaker that can be viewed as a voice print of the speaker. Hence, we referred to this technique as voice print-based local feature. The proposed feature was compared to other features including mel-frequency cepstral coefficient (MFCC) for speaker recognition using two different databases. One of the databases used in the comparison is a subset of an LDC database that consisted of two short sentences uttered by 182 speakers. The proposed feature attained 98.35% recognition rate compared to 96.7% for MFCC using the LDC subset.
GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data.

Science.gov (United States)

Rue-Albrecht, Kévin; McGettigan, Paul A; Hernández, Belinda; Nalpas, Nicolas C; Magee, David A; Parnell, Andrew C; Gordon, Stephen V; MacHugh, David E

2016-03-11

Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.
Objective eye-gaze behaviour during face-to-face communication with proficient alaryngeal speakers: a preliminary study.

Science.gov (United States)

Evitts, Paul; Gallop, Robert

2011-01-01

There is a large body of research demonstrating the impact of visual information on speaker intelligibility in both normal and disordered speaker populations. However, there is minimal information on which specific visual features listeners find salient during conversational discourse. To investigate listeners' eye-gaze behaviour during face-to-face conversation with normal, laryngeal and proficient alaryngeal speakers. Sixty participants individually participated in a 10-min conversation with one of four speakers (typical laryngeal, tracheoesophageal, oesophageal, electrolaryngeal; 15 participants randomly assigned to one mode of speech). All speakers were > 85% intelligible and were judged to be 'proficient' by two certified speech-language pathologists. Participants were fitted with a head-mounted eye-gaze tracking device (Mobile Eye, ASL) that calculated the region of interest and mean duration of eye-gaze. Self-reported gaze behaviour was also obtained following the conversation using a 10 cm visual analogue scale. While listening, participants viewed the lower facial region of the oesophageal speaker more than the normal or tracheoesophageal speaker. Results of non-hierarchical cluster analyses showed that while listening, the pattern of eye-gaze was predominantly directed at the lower face of the oesophageal and electrolaryngeal speaker and more evenly dispersed among the background, lower face, and eyes of the normal and tracheoesophageal speakers. Finally, results show a low correlation between self-reported eye-gaze behaviour and objective regions of interest data. Overall, results suggest similar eye-gaze behaviour when healthy controls converse with normal and tracheoesophageal speakers and that participants had significantly different eye-gaze patterns when conversing with an oesophageal speaker. Results are discussed in terms of existing eye-gaze data and its potential implications on auditory-visual speech perception. © 2011 Royal College of Speech
Speaker Prediction based on Head Orientations

NARCIS (Netherlands)

Rienks, R.J.; Poppe, Ronald Walter; van Otterlo, M.; Poel, Mannes; Poel, M.; Nijholt, A.; Nijholt, Antinus

2005-01-01

To gain insight into gaze behavior in meetings, this paper compares the results from a Naive Bayes classifier, Neural Networks and humans on speaker prediction in four-person meetings given solely the azimuth head angles. The Naive Bayes classifier scored 69.4% correctly, Neural Networks 62.3% and
Robust model identification applied to type 1diabetes

DEFF Research Database (Denmark)

Finan, Daniel Aaron; Jørgensen, John Bagterp; Poulsen, Niels Kjølstad

2010-01-01

In many realistic applications, process noise is known to be neither white nor normally distributed. When identifying models in these cases, it may be more effective to minimize a different penalty function than the standard sum of squared errors (as in a least-squares identification method). Thi...
The AMI speaker diarization system for NIST RT06s meeting data

NARCIS (Netherlands)

Leeuwen, D.A. van; Huijbregts, Marijn

2006-01-01

We describe the systems submitted to the NIST RT06s evaluation for the Speech Activity Detection (SAD) and Speaker Diarization (SPKR) tasks. For speech activity detection, a new analysis methodology is presented that generalizes the Detection Erorr Tradeoff analysis commonly used in speaker
The AMI speaker diarization system for NIST RT06s meeting data

NARCIS (Netherlands)

van Leeuwen, David A.; Huijbregts, M.A.H.

2007-01-01

We describe the systems submitted to the NIST RT06s evaluation for the Speech Activity Detection (SAD) and Speaker Diarization (SPKR) tasks. For speech activity detection, a new analysis methodology is presented that generalizes the Detection Erorr Tradeoﬀ analysis commonly used in speaker detection
An acoustic analysis of English vowels produced by speakers of seven different native-language backgrounds

NARCIS (Netherlands)

Heuven, van V.J.J.P.; Gooskens, C.

2017-01-01

We measured F1, F2 and duration of ten English monophthongs produced by American native speakers and by Danish, Norwegian, Swedish, Dutch, Hungarian and Chinese L2 speakers. We hypothesized that (i) L2 speakers would approximate the English vowels more closely as the phonological distance between
A Text-Independent Speaker Authentication System for Mobile Devices

Directory of Open Access Journals (Sweden)

Florentin Thullier

2017-09-01

Full Text Available This paper presents a text independent speaker authentication method adapted to mobile devices. Special attention was placed on delivering a fully operational application, which admits a sufficient reliability level and an efficient functioning. To this end, we have excluded the need for any network communication. Hence, we opted for the completion of both the training and the identification processes directly on the mobile device through the extraction of linear prediction cepstral coefficients and the naive Bayes algorithm as the classifier. Furthermore, the authentication decision is enhanced to overcome misidentification through access privileges that the user should attribute to each application beforehand. To evaluate the proposed authentication system, eleven participants were involved in the experiment, conducted in quiet and noisy environments. Public speech corpora were also employed to compare this implementation to existing methods. Results were efficient regarding mobile resources’ consumption. The overall classification performance obtained was accurate with a small number of samples. Then, it appeared that our authentication system might be used as a first security layer, but also as part of a multilayer authentication, or as a fall-back mechanism.
The Acquisition of English Focus Marking by Non-Native Speakers

Science.gov (United States)

Baker, Rachel Elizabeth

This dissertation examines Mandarin and Korean speakers' acquisition of English focus marking, which is realized by accenting particular words within a focused constituent. It is important for non-native speakers to learn how accent placement relates to focus in English because appropriate accent placement and realization makes a learner's English more native-like and easier to understand. Such knowledge may also improve their English comprehension skills. In this study, 20 native English speakers, 20 native Mandarin speakers, and 20 native Korean speakers participated in four experiments: (1) a production experiment, in which they were recorded reading the answers to questions, (2) a perception experiment, in which they were asked to determine which word in a recording was the last prominent word, (3) an understanding experiment, in which they were asked whether the answers in recorded question-answer pairs had context-appropriate prosody, and (4) an accent placement experiment, in which they were asked which word they would make prominent in a particular context. Finally, a new group of native English speakers listened to utterances produced in the production experiment, and determined whether the prosody of each utterance was appropriate for its context. The results of the five experiments support a novel predictive model for second language prosodic focus marking acquisition. This model holds that both transfer of linguistic features from a learner's native language (L1) and features of their second language (L2) affect learners' acquisition of prosodic focus marking. As a result, the model includes two complementary components: the Transfer Component and the L2 Challenge Component. The Transfer Component predicts that prosodic structures in the L2 will be more easily acquired by language learners that have similar structures in their L1 than those who do not, even if there are differences between the L1 and L2 in how the structures are realized. The L2
Speaker transfer in children's peer conversation: completing communication-aid-mediated contributions.

Science.gov (United States)

Clarke, Michael; Bloch, Steven; Wilkinson, Ray

2013-03-01

Managing the exchange of speakers from one person to another effectively is a key issue for participants in everyday conversational interaction. Speakers use a range of resources to indicate, in advance, when their turn will come to an end, and listeners attend to such signals in order to know when they might legitimately speak. Using the principles and findings from conversation analysis, this paper examines features of speaker transfer in a conversation between a boy with cerebral palsy who has been provided with a voice-output communication aid (VOCA), and a peer without physical or communication difficulties. Specifically, the analysis focuses on turn exchange, where a VOCA-mediated contribution approach completion, and the child without communication needs is due to speak next.
Comparing headphone and speaker effects on simulated driving.

Science.gov (United States)

Nelson, T M; Nilsson, T H

1990-12-01

Twelve persons drove for three hours in an automobile simulator while listening to music at sound level 63dB over stereo headphones during one session and from a dashboard speaker during another session. They were required to steer a mountain highway, maintain a certain indicated speed, shift gears, and respond to occasional hazards. Steering and speed control were dependent on visual cues. The need to shift and the hazards were indicated by sound and vibration effects. With the headphones, the driver's average reaction time for the most complex task presented--shifting gears--was about one-third second longer than with the speaker. The use of headphones did not delay the development of subjective fatigue.
Speaker information affects false recognition of unstudied lexical-semantic associates.

Science.gov (United States)

Luthra, Sahil; Fox, Neal P; Blumstein, Sheila E

2018-05-01

Recognition of and memory for a spoken word can be facilitated by a prior presentation of that word spoken by the same talker. However, it is less clear whether this speaker congruency advantage generalizes to facilitate recognition of unheard related words. The present investigation employed a false memory paradigm to examine whether information about a speaker's identity in items heard by listeners could influence the recognition of novel items (critical intruders) phonologically or semantically related to the studied items. In Experiment 1, false recognition of semantically associated critical intruders was sensitive to speaker information, though only when subjects attended to talker identity during encoding. Results from Experiment 2 also provide some evidence that talker information affects the false recognition of critical intruders. Taken together, the present findings indicate that indexical information is able to contact the lexical-semantic network to affect the processing of unheard words.

Using Reversed MFCC and IT-EM for Automatic Speaker Verification

Directory of Open Access Journals (Sweden)

Sheeraz Memon

2012-01-01

Full Text Available This paper proposes text independent automatic speaker verification system using IMFCC (Inverse/ Reverse Mel Frequency Coefficients and IT-EM (Information Theoretic Expectation Maximization. To perform speaker verification, feature extraction using Mel scale has been widely applied and has established better results. The IMFCC is based on inverse Mel-scale. The IMFCC effectively captures information available at the high frequency formants which is ignored by the MFCC. In this paper the fusion of MFCC and IMFCC at input level is proposed. GMMs (Gaussian Mixture Models based on EM (Expectation Maximization have been widely used for classification of text independent verification. However EM comes across the convergence issue. In this paper we use our proposed IT-EM which has faster convergence, to train speaker models. IT-EM uses information theory principles such as PDE (Parzen Density Estimation and KL (Kullback-Leibler divergence measure. IT-EM acclimatizes the weights, means and covariances, like EM. However, IT-EM process is not performed on feature vector sets but on a set of centroids obtained using IT (Information Theoretic metric. The IT-EM process at once diminishes divergence measure between PDE estimates of features distribution within a given class and the centroids distribution within the same class. The feature level fusion and IT-EM is tested for the task of speaker verification using NIST2001 and NIST2004. The experimental evaluation validates that MFCC/IMFCC has better results than the conventional delta/MFCC feature set. The MFCC/IMFCC feature vector size is also much smaller than the delta MFCC thus reducing the computational burden as well. IT-EM method also showed faster convergence, than the conventional EM method, and thus it leads to higher speaker recognition scores.
Study of audio speakers containing ferrofluid

Energy Technology Data Exchange (ETDEWEB)

Rosensweig, R E [34 Gloucester Road, Summit, NJ 07901 (United States); Hirota, Y; Tsuda, S [Ferrotec, 1-4-14 Kyobashi, chuo-Ku, Tokyo 104-0031 (Japan); Raj, K [Ferrotec, 33 Constitution Drive, Bedford, NH 03110 (United States)

2008-05-21

This work validates a method for increasing the radial restoring force on the voice coil in audio speakers containing ferrofluid. In addition, a study is made of factors influencing splash loss of the ferrofluid due to shock. Ferrohydrodynamic analysis is employed throughout to model behavior, and predictions are compared to experimental data.
Designing, Modeling, Constructing, and Testing a Flat Panel Speaker and Sound Diffuser for a Simulator

Science.gov (United States)

Dillon, Christina

2013-01-01

The goal of this project was to design, model, build, and test a flat panel speaker and frame for a spherical dome structure being made into a simulator. The simulator will be a test bed for evaluating an immersive environment for human interfaces. This project focused on the loud speakers and a sound diffuser for the dome. The rest of the team worked on an Ambisonics 3D sound system, video projection system, and multi-direction treadmill to create the most realistic scene possible. The main programs utilized in this project, were Pro-E and COMSOL. Pro-E was used for creating detailed figures for the fabrication of a frame that held a flat panel loud speaker. The loud speaker was made from a thin sheet of Plexiglas and 4 acoustic exciters. COMSOL, a multiphysics finite analysis simulator, was used to model and evaluate all stages of the loud speaker, frame, and sound diffuser. Acoustical testing measurements were utilized to create polar plots from the working prototype which were then compared to the COMSOL simulations to select the optimal design for the dome. The final goal of the project was to install the flat panel loud speaker design in addition to a sound diffuser on to the wall of the dome. After running tests in COMSOL on various speaker configurations, including a warped Plexiglas version, the optimal speaker design included a flat piece of Plexiglas with a rounded frame to match the curvature of the dome. Eight of these loud speakers will be mounted into an inch and a half of high performance acoustic insulation, or Thinsulate, that will cover the inside of the dome. The following technical paper discusses these projects and explains the engineering processes used, knowledge gained, and the projected future goals of this project
Data requirements for speaker independent acoustic models

CSIR Research Space (South Africa)

Badenhorst, JAC

2008-11-01

Full Text Available When developing speech recognition systems in resource-constrained environments, careful design of the training corpus can play an important role in compensating for data scarcity. One of the factors to consider relates to the speaker composition...
During Threaded Discussions Are Non-Native English Speakers Always at a Disadvantage?

Science.gov (United States)

Shafer Willner, Lynn

2014-01-01

When participating in threaded discussions, under what conditions might non¬native speakers of English (NNSE) be at a comparative disadvantage to their classmates who are native speakers of English (NSE)? This study compares the threaded discussion perspectives of closely-matched NNSE and NSE adult students having different levels of threaded…
Analysis of Acoustic Features in Speakers with Cognitive Disorders and Speech Impairments

Science.gov (United States)

Saz, Oscar; Simón, Javier; Rodríguez, W. Ricardo; Lleida, Eduardo; Vaquero, Carlos

2009-12-01

This work presents the results in the analysis of the acoustic features (formants and the three suprasegmental features: tone, intensity and duration) of the vowel production in a group of 14 young speakers suffering different kinds of speech impairments due to physical and cognitive disorders. A corpus with unimpaired children's speech is used to determine the reference values for these features in speakers without any kind of speech impairment within the same domain of the impaired speakers; this is 57 isolated words. The signal processing to extract the formant and pitch values is based on a Linear Prediction Coefficients (LPCs) analysis of the segments considered as vowels in a Hidden Markov Model (HMM) based Viterbi forced alignment. Intensity and duration are also based in the outcome of the automated segmentation. As main conclusion of the work, it is shown that intelligibility of the vowel production is lowered in impaired speakers even when the vowel is perceived as correct by human labelers. The decrease in intelligibility is due to a 30% of increase in confusability in the formants map, a reduction of 50% in the discriminative power in energy between stressed and unstressed vowels and to a 50% increase of the standard deviation in the length of the vowels. On the other hand, impaired speakers keep good control of tone in the production of stressed and unstressed vowels.
Evaluation of Speakers at a National Continuing Medical Education (CME Course

Directory of Open Access Journals (Sweden)

Jannette Collins, MD, MEd, FCCP

2002-12-01

Full Text Available Purpose: Evaluations of a national radiology continuing medical education (CME course in thoracic imaging were analyzed to determine what constitutes effective and ineffective lecturing. Methods and Materials: Evaluations of sessions and individual speakers participating in a five-day course jointly sponsored by the Society of Thoracic Radiology (STR and the Radiological Society of North America (RSNA were tallied by the RSNA Department of Data Management and three members of the STR Training Committee. Comments were collated and analyzed to determine the number of positive and negative comments and common themes related to ineffective lecturing. Results: Twenty-two sessions were evaluated by 234 (75.7% of 309 professional registrants. Eighty-one speakers were evaluated by an average of 153 registrants (range, 2 313. Mean ratings for 10 items evaluating sessions ranged from 1.28 2.05 (1=most positive, 4=least positive; SD .451 - .902. The average speaker rating was 5.7 (1=very poor, 7=outstanding; SD 0.94; range 4.3 6.4. Total number of comments analyzed was 862, with 505 (58.6% considered positive and 404 (46.9% considered negative (the total number exceeds 862 as a comment could consist of both positive and negative statements. Poor content was mentioned most frequently, making up 107 (26.5% of 404 negative comments, and applied to 51 (63% of 81 speakers. Other negative comments, in order of decreasing frequency, were related to delivery, image slides, command of the English language, text slides, and handouts. Conclusions: Individual evaluations of speakers at a national CME course provided information regarding the quality of lectures that was not provided by evaluations of grouped presentations. Systematic review of speaker evaluations provided specific information related to the types and frequency of features related to ineffective lecturing. This information can be used to design CME course evaluations, design future CME
Complimenting Functions by Native English Speakers and Iranian EFL Learners: A Divergence or Convergence

Directory of Open Access Journals (Sweden)

Ali Akbar Ansarin

2016-01-01

Full Text Available The study of compliment speech act has been under investigation on many occasions in recent years. In this study, an attempt is made to explore appraisals performed by native English speakers and Iranian EFL learners to find out how these two groups diverge or converge from each other with regard to complimenting patterns and norms. The participants of the study were 60 advanced Iranian EFL learners who were speaking Persian as their first language and 60 native English speakers. Through a written Discourse Completion Task comprised of eight different scenarios, compliments were analyzed with regard to topics (performance, personality, possession, and skill, functions (explicit, implicit, and opt-out, gender differences and the common positive adjectives used by two groups of native and nonnative participants. The findings suggested that native English speakers praised individuals more implicitly in comparison with Iranian EFL learners and native speakers provided opt-outs more frequently than Iranian EFL learners did. The analysis of data by Chi-square showed that gender and macro functions are independent of each other among Iranian EFL learners’ compliments while for native speakers, gender played a significant role in the distribution of appraisals. Iranian EFL learners’ complimenting patterns converge more towards those of native English speakers. Moreover, both groups favored explicit compliments. However, Iranian EFL learners were more inclined to provide explicit compliments. It can be concluded that there were more similarities rather than differences between Iranian EFL learners and native English speakers regarding compliment speech act. The results of this study can benefit researchers, teachers, material developers, and EFL learners.
7 CFR 247.13 - Provisions for non-English or limited-English speakers.

Science.gov (United States)

2010-01-01

... 7 Agriculture 4 2010-01-01 2010-01-01 false Provisions for non-English or limited-English speakers... § 247.13 Provisions for non-English or limited-English speakers. (a) What must State and local agencies do to ensure that non-English or limited-English speaking persons are aware of their rights and...
Analysis of Feature Extraction Methods for Speaker Dependent Speech Recognition

Directory of Open Access Journals (Sweden)

Gurpreet Kaur

2017-02-01

Full Text Available Speech recognition is about what is being said, irrespective of who is saying. Speech recognition is a growing field. Major progress is taking place on the technology of automatic speech recognition (ASR. Still, there are lots of barriers in this field in terms of recognition rate, background noise, speaker variability, speaking rate, accent etc. Speech recognition rate mainly depends on the selection of features and feature extraction methods. This paper outlines the feature extraction techniques for speaker dependent speech recognition for isolated words. A brief survey of different feature extraction techniques like Mel-Frequency Cepstral Coefficients (MFCC, Linear Predictive Coding Coefficients (LPCC, Perceptual Linear Prediction (PLP, Relative Spectra Perceptual linear Predictive (RASTA-PLP analysis are presented and evaluation is done. Speech recognition has various applications from daily use to commercial use. We have made a speaker dependent system and this system can be useful in many areas like controlling a patient vehicle using simple commands.
Communication Interface for Mexican Spanish Dysarthric Speakers

Directory of Open Access Journals (Sweden)

Gladys Bonilla-Enriquez

2012-03-01

Full Text Available La disartria es una discapacidad motora del habla caracterizada por debilidad o poca coordinación de los músculos del habla. Esta condición puede ser causada por un infarto, parálisis cerebral, o por una lesión severa en el cerebro. Para mexicanos con esta condición hay muy pocas, si es que hay alguna, tecnologías de asistencia para mejorar sus habilidades sociales de interacción. En este artículo presentamos nuestros avances hacia el desarrollo de una interfazde comunicación para hablantes con disartria cuya lengua materna sea el español mexicano. La metodología propuesta depende de (1 diseño especial de un corpus de entrenamiento con voz normal y recursos limitados, (2 adaptación de usuario estándar, y (3 control de la perplejidad del modelo de lenguaje para lograr alta precisión en el Reconocimiento Automático del Habla (RAH. La interfaz permite al usuario y terapéuta el realizar actividades como adaptación dinámica de usuario, adaptación de vocabulario, y síntesis de texto a voz. Pruebas en vivo fueron realizadas con un usuario con disartria leve, logrando precisiones de 93%-95% para habla espontánea.Dysarthria is a motor speech disorder due to weakness or poor coordination of the speechmuscles. This condition can be caused by a stroke, cerebral palsy, or by a traumatic braininjury. For Mexican people with this condition there are few, if any, assistive technologies to improve their social interaction skills. In this paper we present our advances towards the development of a communication interface for dysarthric speakers whose native language is Mexican Spanish. We propose a methodology that relies on (1 special design of a training normal-speech corpus with limited resources, (2 standard speaker adaptation, and (3 control of language model perplexity, to achieve high Automatic Speech Recognition (ASR accuracy. The interface allows the user and therapist to perform tasks such as dynamic speaker adaptation, vocabulary
Design and implementation of robust controllers for a gait trainer.

Science.gov (United States)

Wang, F C; Yu, C H; Chou, T Y

2009-08-01

This paper applies robust algorithms to control an active gait trainer for children with walking disabilities. Compared with traditional rehabilitation procedures, in which two or three trainers are required to assist the patient, a motor-driven mechanism was constructed to improve the efficiency of the procedures. First, a six-bar mechanism was designed and constructed to mimic the trajectory of children's ankles in walking. Second, system identification techniques were applied to obtain system transfer functions at different operating points by experiments. Third, robust control algorithms were used to design Hinfinity robust controllers for the system. Finally, the designed controllers were implemented to verify experimentally the system performance. From the results, the proposed robust control strategies are shown to be effective.
Robust finger vein ROI localization based on flexible segmentation.

Science.gov (United States)

Lu, Yu; Xie, Shan Juan; Yoon, Sook; Yang, Jucheng; Park, Dong Sun

2013-10-24

Finger veins have been proved to be an effective biometric for personal identification in the recent years. However, finger vein images are easily affected by influences such as image translation, orientation, scale, scattering, finger structure, complicated background, uneven illumination, and collection posture. All these factors may contribute to inaccurate region of interest (ROI) definition, and so degrade the performance of finger vein identification system. To improve this problem, in this paper, we propose a finger vein ROI localization method that has high effectiveness and robustness against the above factors. The proposed method consists of a set of steps to localize ROIs accurately, namely segmentation, orientation correction, and ROI detection. Accurate finger region segmentation and correct calculated orientation can support each other to produce higher accuracy in localizing ROIs. Extensive experiments have been performed on the finger vein image database, MMCBNU_6000, to verify the robustness of the proposed method. The proposed method shows the segmentation accuracy of 100%. Furthermore, the average processing time of the proposed method is 22 ms for an acquired image, which satisfies the criterion of a real-time finger vein identification system.
Robust Finger Vein ROI Localization Based on Flexible Segmentation

Directory of Open Access Journals (Sweden)

Dong Sun Park

2013-10-01

Full Text Available Finger veins have been proved to be an effective biometric for personal identification in the recent years. However, finger vein images are easily affected by influences such as image translation, orientation, scale, scattering, finger structure, complicated background, uneven illumination, and collection posture. All these factors may contribute to inaccurate region of interest (ROI definition, and so degrade the performance of finger vein identification system. To improve this problem, in this paper, we propose a finger vein ROI localization method that has high effectiveness and robustness against the above factors. The proposed method consists of a set of steps to localize ROIs accurately, namely segmentation, orientation correction, and ROI detection. Accurate finger region segmentation and correct calculated orientation can support each other to produce higher accuracy in localizing ROIs. Extensive experiments have been performed on the finger vein image database, MMCBNU_6000, to verify the robustness of the proposed method. The proposed method shows the segmentation accuracy of 100%. Furthermore, the average processing time of the proposed method is 22 ms for an acquired image, which satisfies the criterion of a real-time finger vein identification system.
Robust Finger Vein ROI Localization Based on Flexible Segmentation

Science.gov (United States)

Lu, Yu; Xie, Shan Juan; Yoon, Sook; Yang, Jucheng; Park, Dong Sun

2013-01-01

Finger veins have been proved to be an effective biometric for personal identification in the recent years. However, finger vein images are easily affected by influences such as image translation, orientation, scale, scattering, finger structure, complicated background, uneven illumination, and collection posture. All these factors may contribute to inaccurate region of interest (ROI) definition, and so degrade the performance of finger vein identification system. To improve this problem, in this paper, we propose a finger vein ROI localization method that has high effectiveness and robustness against the above factors. The proposed method consists of a set of steps to localize ROIs accurately, namely segmentation, orientation correction, and ROI detection. Accurate finger region segmentation and correct calculated orientation can support each other to produce higher accuracy in localizing ROIs. Extensive experiments have been performed on the finger vein image database, MMCBNU_6000, to verify the robustness of the proposed method. The proposed method shows the segmentation accuracy of 100%. Furthermore, the average processing time of the proposed method is 22 ms for an acquired image, which satisfies the criterion of a real-time finger vein identification system. PMID:24284769
Vocal caricatures reveal signatures of speaker identity

Science.gov (United States)

López, Sabrina; Riera, Pablo; Assaneo, María Florencia; Eguía, Manuel; Sigman, Mariano; Trevisan, Marcos A.

2013-12-01

What are the features that impersonators select to elicit a speaker's identity? We built a voice database of public figures (targets) and imitations produced by professional impersonators. They produced one imitation based on their memory of the target (caricature) and another one after listening to the target audio (replica). A set of naive participants then judged identity and similarity of pairs of voices. Identity was better evoked by the caricatures and replicas were perceived to be closer to the targets in terms of voice similarity. We used this data to map relevant acoustic dimensions for each task. Our results indicate that speaker identity is mainly associated with vocal tract features, while perception of voice similarity is related to vocal folds parameters. We therefore show the way in which acoustic caricatures emphasize identity features at the cost of loosing similarity, which allows drawing an analogy with caricatures in the visual space.
Within-category variance and lexical tone discrimination in native and non-native speakers

NARCIS (Netherlands)

Hoffmann, C.W.G.; Sadakata, M.; Chen, A.; Desain, P.W.M.; McQueen, J.M.; Gussenhove, C.; Chen, Y.; Dediu, D.

2014-01-01

In this paper, we show how acoustic variance within lexical tones in disyllabic Mandarin Chinese pseudowords affects discrimination abilities in both native and non-native speakers of Mandarin Chinese. Within-category acoustic variance did not hinder native speakers in discriminating between lexical
The Acquisition of Clitic Pronouns in the Spanish Interlanguage of Peruvian Quechua Speakers.

Science.gov (United States)

Klee, Carol A.

1989-01-01

Analysis of four adult Quechua speakers' acquisition of clitic pronouns in Spanish revealed that educational attainment and amount of contact with monolingual Spanish speakers were positively related to native-like norms of competence in the use of object pronouns in Spanish. (CB)
"I May Be a Native Speaker but I'm Not Monolingual": Reimagining "All" Teachers' Linguistic Identities in TESOL

Science.gov (United States)

Ellis, Elizabeth M.

2016-01-01

Teacher linguistic identity has so far mainly been researched in terms of whether a teacher identifies (or is identified by others) as a native speaker (NEST) or nonnative speaker (NNEST) (Moussu & Llurda, 2008; Reis, 2011). Native speakers are presumed to be monolingual, and nonnative speakers, although by definition bilingual, tend to be…
Robust uncertainty evaluation for system identification on distributed wireless platforms

Science.gov (United States)

Crinière, Antoine; Döhler, Michael; Le Cam, Vincent; Mevel, Laurent

2016-04-01

Health monitoring of civil structures by system identification procedures from automatic control is now accepted as a valid approach. These methods provide frequencies and modeshapes from the structure over time. For a continuous monitoring the excitation of a structure is usually ambient, thus unknown and assumed to be noise. Hence, all estimates from the vibration measurements are realizations of random variables with inherent uncertainty due to (unknown) process and measurement noise and finite data length. The underlying algorithms are usually running under Matlab under the assumption of large memory pool and considerable computational power. Even under these premises, computational and memory usage are heavy and not realistic for being embedded in on-site sensor platforms such as the PEGASE platform. Moreover, the current push for distributed wireless systems calls for algorithmic adaptation for lowering data exchanges and maximizing local processing. Finally, the recent breakthrough in system identification allows us to process both frequency information and its related uncertainty together from one and only one data sequence, at the expense of computational and memory explosion that require even more careful attention than before. The current approach will focus on presenting a system identification procedure called multi-setup subspace identification that allows to process both frequencies and their related variances from a set of interconnected wireless systems with all computation running locally within the limited memory pool of each system before being merged on a host supervisor. Careful attention will be given to data exchanges and I/O satisfying OGC standards, as well as minimizing memory footprints and maximizing computational efficiency. Those systems are built in a way of autonomous operations on field and could be later included in a wide distributed architecture such as the Cloud2SM project. The usefulness of these strategies is illustrated on

Robust identification and localization of intramedullary nail holes for distal locking using CBCT: a simulation study.

Science.gov (United States)

Kamarianakis, Z; Buliev, I; Pallikarakis, N

2011-05-01

Closed intramedullary nailing is a common technique for treatment of femur and tibia fractures. The most challenging step in this procedure is the precise placement of the lateral screws that stabilize the fragmented bone. The present work concerns the development and the evaluation of a method to accurately identify in the 3D space the axes of the nail hole canals. A limited number of projection images are acquired around the leg with the help of a C-arm. On two of them, the locking hole entries are interactively selected and a rough localization of the hole axes is performed. Perpendicularly to one of them, cone-beam computed tomography (CBCT) reconstructions are produced. The accurate identification and localization of the hole axes are done by an identification of the centers of the nail holes on the tomograms and a further 3D linear regression through principal component analysis (PCA). Various feature-based approaches (RANSAC, least-square fitting, Hough transform) have been compared for best matching the contours and the centers of the holes on the tomograms. The robustness of the suggested method was investigated using simulations. Programming is done in Matlab and C++. Results obtained on synthetic data confirm very good localization accuracy - mean translational error of 0.14 mm (std=0.08 mm) and mean angular error of 0.84° (std=0.35°) at no radiation excess. Successful localization can be further used to guide a surgeon or a robot for correct drilling the bone along the nail openings. Copyright © 2010 IPEM. Published by Elsevier Ltd. All rights reserved.
Bridging Gaps in Common Ground: Speakers Design Their Gestures for Their Listeners

Science.gov (United States)

Hilliard, Caitlin; Cook, Susan Wagner

2016-01-01

Communication is shaped both by what we are trying to say and by whom we are saying it to. We examined whether and how shared information influences the gestures speakers produce along with their speech. Unlike prior work examining effects of common ground on speech and gesture, we examined a situation in which some speakers have the same amount…
The Crane Robust Control

Directory of Open Access Journals (Sweden)

Marek Hicar

2004-01-01

Full Text Available The article is about a control design for complete structure of the crane: crab, bridge and crane uplift.The most important unknown parameters for simulations are burden weight and length of hanging rope. We will use robustcontrol for crab and bridge control to ensure adaptivity for burden weight and rope length. Robust control will be designed for current control of the crab and bridge, necessary is to know the range of unknown parameters. Whole robust will be splitto subintervals and after correct identification of unknown parameters the most suitable robust controllers will be chosen.The most important condition at the crab and bridge motion is avoiding from burden swinging in the final position. Crab and bridge drive is designed by asynchronous motor fed from frequency converter. We will use crane uplift with burden weightobserver in combination for uplift, crab and bridge drive with cooperation of their parameters: burden weight, rope length and crab and bridge position. Controllers are designed by state control method. We will use preferably a disturbance observerwhich will identify burden weight as a disturbance. The system will be working in both modes at empty hook as well asat maximum load: burden uplifting and dropping down.
Encoding, rehearsal, and recall in signers and speakers: shared network but differential engagement.

Science.gov (United States)

Bavelier, D; Newman, A J; Mukherjee, M; Hauser, P; Kemeny, S; Braun, A; Boutla, M

2008-10-01

Short-term memory (STM), or the ability to hold verbal information in mind for a few seconds, is known to rely on the integrity of a frontoparietal network of areas. Here, we used functional magnetic resonance imaging to ask whether a similar network is engaged when verbal information is conveyed through a visuospatial language, American Sign Language, rather than speech. Deaf native signers and hearing native English speakers performed a verbal recall task, where they had to first encode a list of letters in memory, maintain it for a few seconds, and finally recall it in the order presented. The frontoparietal network described to mediate STM in speakers was also observed in signers, with its recruitment appearing independent of the modality of the language. This finding supports the view that signed and spoken STM rely on similar mechanisms. However, deaf signers and hearing speakers differentially engaged key structures of the frontoparietal network as the stages of STM unfold. In particular, deaf signers relied to a greater extent than hearing speakers on passive memory storage areas during encoding and maintenance, but on executive process areas during recall. This work opens new avenues for understanding similarities and differences in STM performance in signers and speakers.
Speaker-Sex Discrimination for Voiced and Whispered Vowels at Short Durations

OpenAIRE

Smith, David R. R.

2016-01-01

Whispered vowels, produced with no vocal fold vibration, lack the periodic temporal fine structure which in voiced vowels underlies the perceptual attribute of pitch (a salient auditory cue to speaker sex). Voiced vowels possess no temporal fine structure at very short durations (below two glottal cycles). The prediction was that speaker-sex discrimination performance for whispered and voiced vowels would be similar for very short durations but, as stimulus duration increases, voiced vowel pe...
Promoting Communities of Practice among Non-Native Speakers of English in Online Discussions

Science.gov (United States)

Kim, Hoe Kyeung

2011-01-01

An online discussion involving text-based computer-mediated communication has great potential for promoting equal participation among non-native speakers of English. Several studies claimed that online discussions could enhance the academic participation of non-native speakers of English. However, there is little research around participation…
Learning foreign labels from a foreign speaker: the role of (limited) exposure to a second language.

Science.gov (United States)

Akhtar, Nameera; Menjivar, Jennifer; Hoicka, Elena; Sabbagh, Mark A

2012-11-01

Three- and four-year-olds (N = 144) were introduced to novel labels by an English speaker and a foreign speaker (of Nordish, a made-up language), and were asked to endorse one of the speaker's labels. Monolingual English-speaking children were compared to bilingual children and English-speaking children who were regularly exposed to a language other than English. All children tended to endorse the English speaker's labels when asked 'What do you call this?', but when asked 'What do you call this in Nordish?', children with exposure to a second language were more likely to endorse the foreign label than monolingual and bilingual children. The findings suggest that, at this age, exposure to, but not necessarily immersion in, more than one language may promote the ability to learn foreign words from a foreign speaker.
Is the superior verbal memory span of Mandarin speakers due to faster rehearsal?

Science.gov (United States)

Mattys, Sven L; Baddeley, Alan; Trenkic, Danijela

2018-04-01

It is well established that digit span in native Chinese speakers is atypically high. This is commonly attributed to a capacity for more rapid subvocal rehearsal for that group. We explored this hypothesis by testing a group of English-speaking native Mandarin speakers on digit span and word span in both Mandarin and English, together with a measure of speed of articulation for each. When compared to the performance of native English speakers, the Mandarin group proved to be superior on both digit and word spans while predictably having lower spans in English. This suggests that the Mandarin advantage is not limited to digits. Speed of rehearsal correlated with span performance across materials. However, this correlation was more pronounced for English speakers than for any of the Chinese measures. Further analysis suggested that speed of rehearsal did not provide an adequate account of differences between Mandarin and English spans or for the advantage of digits over words. Possible alternative explanations are discussed.
Variation among heritage speakers: Sequential vs. simultaneous bilinguals

Directory of Open Access Journals (Sweden)

Teresa Lee

2013-08-01

Full Text Available This study examines the differences in the grammatical knowledge of two types of heritage speakers of Korean. Early simultaneous bilinguals are exposed to both English and the heritage language from birth, whereas early sequential bilinguals are exposed to the heritage language first and then to English upon schooling. A listening comprehension task involving relative clauses was conducted with 51 beginning-level Korean heritage speakers. The results showed that the early sequential bilinguals exhibited much more accurate knowledge than the early simultaneous bilinguals, who lacked rudimentary knowledge of Korean relative clauses. Drawing on the findings of adult and child Korean L1 data on the acquisition of relative clauses, the performance of each group is discussed with respect to attrition and incomplete acquisition of the heritage language.
Robustness Evaluation of Timber Structures

DEFF Research Database (Denmark)

Kirkegaard, Poul Henning; Sørensen, John Dalsgaard; čizmar, D.

2010-01-01

The present paper outlines results from working group 3 (WG3) in the EU COST Action E55 – ‘Modelling of the performance of timber structures’. The objectives of the project are related to the three main research activities: the identification and modelling of relevant load and environmental...... exposure scenarios, the improvement of knowledge concerning the behaviour of timber structural elements and the development of a generic framework for the assessment of the life-cycle vulnerability and robustness of timber structures....
The native-speaker fever in English language teaching (ELT: Pitting pedagogical competence against historical origin

Directory of Open Access Journals (Sweden)

Anchimbe, Eric A.

2006-01-01

Full Text Available This paper discusses English language teaching (ELT around the world, and argues that as a profession, it should emphasise pedagogical competence rather than native-speaker requirement in the recruitment of teachers in English as a foreign language (EFL and English as a second language (ESL contexts. It establishes that being a native speaker does not make one automatically a competent speaker or, of that matter, a competent teacher of the language. It observes that on many grounds, including physical, sociocultural, technological and economic changes in the world as well as the status of English as official and national language in many post-colonial regions, the distinction between native and non-native speakers is no longer valid.
Psychophysical Boundary for Categorization of Voiced-Voiceless Stop Consonants in Native Japanese Speakers

Science.gov (United States)

Tamura, Shunsuke; Ito, Kazuhito; Hirose, Nobuyuki; Mori, Shuji

2018-01-01

Purpose: The purpose of this study was to investigate the psychophysical boundary used for categorization of voiced-voiceless stop consonants in native Japanese speakers. Method: Twelve native Japanese speakers participated in the experiment. The stimuli were synthetic stop consonant-vowel stimuli varying in voice onset time (VOT) with…
Physiological Indices of Bilingualism: Oral–Motor Coordination and Speech Rate in Bengali–English Speakers

Science.gov (United States)

Chakraborty, Rahul; Goffman, Lisa; Smith, Anne

2009-01-01

Purpose To examine how age of immersion and proficiency in a 2nd language influence speech movement variability and speaking rate in both a 1st language and a 2nd language. Method A group of 21 Bengali–English bilingual speakers participated. Lip and jaw movements were recorded. For all 21 speakers, lip movement variability was assessed based on productions of Bengali (L1; 1st language) and English (L2; 2nd language) sentences. For analyses related to the influence of L2 proficiency on speech production processes, participants were sorted into low- (n = 7) and high-proficiency (n = 7) groups. Lip movement variability and speech rate were evaluated for both of these groups across L1 and L2 sentences. Results Surprisingly, adult bilingual speakers produced equally consistent speech movement patterns in their production of L1 and L2. When groups were sorted according to proficiency, highly proficient speakers were marginally more variable in their L1. In addition, there were some phoneme-specific effects, most markedly that segments not shared by both languages were treated differently in production. Consistent with previous studies, movement durations were longer for less proficient speakers in both L1 and L2. Interpretation In contrast to those of child learners, the speech motor systems of adult L2 speakers show a high degree of consistency. Such lack of variability presumably contributes to protracted difficulties with acquiring nativelike pronunciation in L2. The proficiency results suggest bidirectional interactions across L1 and L2, which is consistent with hypotheses regarding interference and the sharing of phonological space. A slower speech rate in less proficient speakers implies that there are increased task demands on speech production processes. PMID:18367680
Does training make French speakers more able to identify lexical stress?

OpenAIRE

Schwab, Sandra; Llisterri, Joaquim

2013-01-01

This research takes the stress deafness hypothesis as a starting point (e.g. Dupoux et al., 2008), and, more specifically, the fact that French speakers present difficulties in perceiving lexical stress in a free-stress language. In this framework, we aim at determining whether a prosodic training could improve the ability of French speakers to identify the stressed syllable in Spanish words. Three groups of participants took part in this experiment. The Native group was composed of 16 speake...
a sociophonetic study of young nigerian english speakers

African Journals Online (AJOL)

Oladipupo

between male and female speakers in boundary consonant deletion, (F(1, .... speech perception (Foulkes 2006, Clopper & Pisoni, 2005, Thomas 2002). ... in Nigeria, and had had the privilege of travelling to Europe and the Americas for the.
Classifications of Vocalic Segments from Articulatory Kinematics: Healthy Controls and Speakers with Dysarthria

Science.gov (United States)

Yunusova, Yana; Weismer, Gary G.; Lindstrom, Mary J.

2011-01-01

Purpose: In this study, the authors classified vocalic segments produced by control speakers (C) and speakers with dysarthria due to amyotrophic lateral sclerosis (ALS) or Parkinson's disease (PD); classification was based on movement measures. The researchers asked the following questions: (a) Can vowels be classified on the basis of selected…
Robust input design for nonlinear dynamic modeling of AUV.

Science.gov (United States)

Nouri, Nowrouz Mohammad; Valadi, Mehrdad

2017-09-01

Input design has a dominant role in developing the dynamic model of autonomous underwater vehicles (AUVs) through system identification. Optimal input design is the process of generating informative inputs that can be used to generate the good quality dynamic model of AUVs. In a problem with optimal input design, the desired input signal depends on the unknown system which is intended to be identified. In this paper, the input design approach which is robust to uncertainties in model parameters is used. The Bayesian robust design strategy is applied to design input signals for dynamic modeling of AUVs. The employed approach can design multiple inputs and apply constraints on an AUV system's inputs and outputs. Particle swarm optimization (PSO) is employed to solve the constraint robust optimization problem. The presented algorithm is used for designing the input signals for an AUV, and the estimate obtained by robust input design is compared with that of the optimal input design. According to the results, proposed input design can satisfy both robustness of constraints and optimality. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Wavelet Filtering to Reduce Conservatism in Aeroservoelastic Robust Stability Margins

Science.gov (United States)

Brenner, Marty; Lind, Rick

1998-01-01

Wavelet analysis for filtering and system identification was used to improve the estimation of aeroservoelastic stability margins. The conservatism of the robust stability margins was reduced with parametric and nonparametric time-frequency analysis of flight data in the model validation process. Nonparametric wavelet processing of data was used to reduce the effects of external desirableness and unmodeled dynamics. Parametric estimates of modal stability were also extracted using the wavelet transform. Computation of robust stability margins for stability boundary prediction depends on uncertainty descriptions derived from the data for model validation. F-18 high Alpha Research Vehicle aeroservoelastic flight test data demonstrated improved robust stability prediction by extension of the stability boundary beyond the flight regime.
The effect on recognition memory of noise cancelling headphones in a noisy environment with native and nonnative speakers

Directory of Open Access Journals (Sweden)

Brett R C Molesworth

2014-01-01

Full Text Available Noise has the potential to impair cognitive performance. For nonnative speakers, the effect of noise on performance is more severe than their native counterparts. What remains unknown is the effectiveness of countermeasures such as noise attenuating devices in such circumstances. Therefore, the main aim of the present research was to examine the effectiveness of active noise attenuating countermeasures in the presence of simulated aircraft noise for both native and nonnative English speakers. Thirty-two participants, half native English speakers and half native German speakers completed four recognition (cued recall tasks presented in English under four different audio conditions, all in the presence of simulated aircraft noise. The results of the research indicated that in simulated aircraft noise at 65 dB(A, performance of nonnative English speakers was poorer than for native English speakers. The beneficial effects of noise cancelling headphones in improving the signal to noise ratio led to an improved performance for nonnative speakers. These results have particular importance for organizations operating in a safety-critical environment such as aviation.
Continuing Medical Education Speakers with High Evaluation Scores Use more Image-based Slides

Directory of Open Access Journals (Sweden)

Ferguson, Ian

2017-01-01

Full Text Available Although continuing medical education (CME presentations are common across health professions, it is unknown whether slide design is independently associated with audience evaluations of the speaker. Based on the conceptual framework of Mayer’s theory of multimedia learning, this study aimed to determine whether image use and text density in presentation slides are associated with overall speaker evaluations. This retrospective analysis of six sequential CME conferences (two annual emergency medicine conferences over a three-year period used a mixed linear regression model to assess whether postconference speaker evaluations were associated with image fraction (percentage of image-based slides per presentation and text density (number of words per slide. A total of 105 unique lectures were given by 49 faculty members, and 1,222 evaluations (70.1% response rate were available for analysis. On average, 47.4% (SD=25.36 of slides had at least one educationally-relevant image (image fraction. Image fraction significantly predicted overall higher evaluation scores [F(1, 100.676=6.158, p=0.015] in the mixed linear regression model. The mean (SD text density was 25.61 (8.14 words/slide but was not a significant predictor [F(1, 86.293=0.55, p=0.815]. Of note, the individual speaker [χ2 (1=2.952, p=0.003] and speaker seniority [F(3, 59.713=4.083, p=0.011] significantly predicted higher scores. This is the first published study to date assessing the linkage between slide design and CME speaker evaluations by an audience of practicing clinicians. The incorporation of images was associated with higher evaluation scores, in alignment with Mayer’s theory of multimedia learning. Contrary to this theory, however, text density showed no significant association, suggesting that these scores may be multifactorial. Professional development efforts should focus on teaching best practices in both slide design and presentation skills.

Continuing Medical Education Speakers with High Evaluation Scores Use more Image-based Slides.

Science.gov (United States)

Ferguson, Ian; Phillips, Andrew W; Lin, Michelle

2017-01-01

Although continuing medical education (CME) presentations are common across health professions, it is unknown whether slide design is independently associated with audience evaluations of the speaker. Based on the conceptual framework of Mayer's theory of multimedia learning, this study aimed to determine whether image use and text density in presentation slides are associated with overall speaker evaluations. This retrospective analysis of six sequential CME conferences (two annual emergency medicine conferences over a three-year period) used a mixed linear regression model to assess whether post-conference speaker evaluations were associated with image fraction (percentage of image-based slides per presentation) and text density (number of words per slide). A total of 105 unique lectures were given by 49 faculty members, and 1,222 evaluations (70.1% response rate) were available for analysis. On average, 47.4% (SD=25.36) of slides had at least one educationally-relevant image (image fraction). Image fraction significantly predicted overall higher evaluation scores [F(1, 100.676)=6.158, p=0.015] in the mixed linear regression model. The mean (SD) text density was 25.61 (8.14) words/slide but was not a significant predictor [F(1, 86.293)=0.55, p=0.815]. Of note, the individual speaker [χ 2 (1)=2.952, p=0.003] and speaker seniority [F(3, 59.713)=4.083, p=0.011] significantly predicted higher scores. This is the first published study to date assessing the linkage between slide design and CME speaker evaluations by an audience of practicing clinicians. The incorporation of images was associated with higher evaluation scores, in alignment with Mayer's theory of multimedia learning. Contrary to this theory, however, text density showed no significant association, suggesting that these scores may be multifactorial. Professional development efforts should focus on teaching best practices in both slide design and presentation skills.
Speaker-Sex Discrimination for Voiced and Whispered Vowels at Short Durations.

Science.gov (United States)

Smith, David R R

2016-01-01

Whispered vowels, produced with no vocal fold vibration, lack the periodic temporal fine structure which in voiced vowels underlies the perceptual attribute of pitch (a salient auditory cue to speaker sex). Voiced vowels possess no temporal fine structure at very short durations (below two glottal cycles). The prediction was that speaker-sex discrimination performance for whispered and voiced vowels would be similar for very short durations but, as stimulus duration increases, voiced vowel performance would improve relative to whispered vowel performance as pitch information becomes available. This pattern of results was shown for women's but not for men's voices. A whispered vowel needs to have a duration three times longer than a voiced vowel before listeners can reliably tell whether it's spoken by a man or woman (∼30 ms vs. ∼10 ms). Listeners were half as sensitive to information about speaker-sex when it is carried by whispered compared with voiced vowels.
Infant sensitivity to speaker and language in learning a second label.

Science.gov (United States)

Bhagwat, Jui; Casasola, Marianella

2014-02-01

Two experiments examined when monolingual, English-learning 19-month-old infants learn a second object label. Two experimenters sat together. One labeled a novel object with one novel label, whereas the other labeled the same object with a different label in either the same or a different language. Infants were tested on their comprehension of each label immediately following its presentation. Infants mapped the first label at above chance levels, but they did so with the second label only when requested by the speaker who provided it (Experiment 1) or when the second experimenter labeled the object in a different language (Experiment 2). These results show that 19-month-olds learn second object labels but do not readily generalize them across speakers of the same language. The results highlight how speaker and language spoken guide infants' acceptance of second labels, supporting sociopragmatic views of word learning. Copyright © 2013 Elsevier Inc. All rights reserved.
B Anand | Speakers | Indian Academy of Sciences

Indian Academy of Sciences (India)

However, the mechanism by which this protospacer fragment gets integrated in a directional fashion into the leader proximal end is elusive. The speakers group identified that the leader region abutting the first CRISPR repeat localizes Integration Host Factor (IHF) and Cas1-2 complex in Escherichia coli. IHF binding to the ...
L2 speakers decompose morphologically complex verbs: fMRI evidence from priming of transparent derived verbs

Directory of Open Access Journals (Sweden)

Sophie eDe Grauwe

2014-10-01

Full Text Available In this fMRI long-lag priming study, we investigated the processing of Dutch semantically transparent, derived prefix verbs. In such words, the meaning of the word as a whole can be deduced from the meanings of its parts, e.g. wegleggen ‘put aside’. Many behavioral and some fMRI studies suggest that native (L1 speakers decompose transparent derived words. The brain region usually implicated in morphological decomposition is the left inferior frontal gyrus (LIFG. In non-native (L2 speakers, the processing of transparent derived words has hardly been investigated, especially in fMRI studies, and results are contradictory: Some studies find more reliance on holistic (i.e. non-decompositional processing by L2 speakers; some find no difference between L1 and L2 speakers. In this study, we wanted to find out whether Dutch transparent derived prefix verbs are decomposed or processed holistically by German L2 speakers of Dutch. Half of the derived verbs (e.g. omvallen ‘fall down’ were preceded by their stem (e.g. vallen ‘fall’ with a lag of 4 to 6 words (‘primed’; the other half (e.g. inslapen ‘fall asleep’ were not (‘unprimed’. L1 and L2 speakers of Dutch made lexical decisions on these visually presented verbs. Both ROI analyses and whole-brain analyses showed that there was a significant repetition suppression effect for primed compared to unprimed derived verbs in the LIFG. This was true both for the analyses over L2 speakers only and for the analyses over the two language groups together. The latter did not reveal any interaction with language group (L1 vs. L2 in the LIFG. Thus, L2 speakers show a clear priming effect in the LIFG, an area that has been associated with morphological decomposition. Our findings are consistent with the idea that L2 speakers engage in decomposition of transparent derived verbs rather than processing them holistically.
Congenital Amusia in Speakers of a Tone Language: Association with Lexical Tone Agnosia

Science.gov (United States)

Nan, Yun; Sun, Yanan; Peretz, Isabelle

2010-01-01

Congenital amusia is a neurogenetic disorder that affects the processing of musical pitch in speakers of non-tonal languages like English and French. We assessed whether this musical disorder exists among speakers of Mandarin Chinese who use pitch to alter the meaning of words. Using the Montreal Battery of Evaluation of Amusia, we tested 117…
Identification system by eye retinal pattern

International Nuclear Information System (INIS)

Sunagawa, Takahisa; Shibata, Susumu

1987-01-01

Identification system by eye retinal pattern is introduced from the view-point of history of R and D, measurement, apparatus, evaluation tests, safety and application. According to our evaluation tests, enrolling time is approximately less than 1 min, verification time is a few seconds and false accept rate is 0 %. Evaluation tests at Sandia National Laboratories in USA show the comparison data of false accept rates such as 0 % for eye retinal pattern, 10.5 % for finger-print, 5.8 % for signature dynamics and 17.7 % for speaker voice. The identification system by eye retinal pattern has only three applications in Japan, but there has been a number of experience in USA. This fact suggests that the system will become an important means for physical protections not only in nuclear field but also in other industrial fields in Japan. (author)
Phoneme Error Pattern by Heritage Speakers of Spanish on an English Word Recognition Test.

Science.gov (United States)

Shi, Lu-Feng

2017-04-01

Heritage speakers acquire their native language from home use in their early childhood. As the native language is typically a minority language in the society, these individuals receive their formal education in the majority language and eventually develop greater competency with the majority than their native language. To date, there have not been specific research attempts to understand word recognition by heritage speakers. It is not clear if and to what degree we may infer from evidence based on bilingual listeners in general. This preliminary study investigated how heritage speakers of Spanish perform on an English word recognition test and analyzed their phoneme errors. A prospective, cross-sectional, observational design was employed. Twelve normal-hearing adult Spanish heritage speakers (four men, eight women, 20-38 yr old) participated in the study. Their language background was obtained through the Language Experience and Proficiency Questionnaire. Nine English monolingual listeners (three men, six women, 20-41 yr old) were also included for comparison purposes. Listeners were presented with 200 Northwestern University Auditory Test No. 6 words in quiet. They repeated each word orally and in writing. Their responses were scored by word, word-initial consonant, vowel, and word-final consonant. Performance was compared between groups with Student's t test or analysis of variance. Group-specific error patterns were primarily descriptive, but intergroup comparisons were made using 95% or 99% confidence intervals for proportional data. The two groups of listeners yielded comparable scores when their responses were examined by word, vowel, and final consonant. However, heritage speakers of Spanish misidentified significantly more word-initial consonants and had significantly more difficulty with initial /p, b, h/ than their monolingual peers. The two groups yielded similar patterns for vowel and word-final consonants, but heritage speakers made significantly
Combining Behavioral and ERP Methodologies to Investigate the Differences Between McGurk Effects Demonstrated by Cantonese and Mandarin Speakers

Directory of Open Access Journals (Sweden)

Juan Zhang

2018-05-01

Full Text Available The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration.
Combining Behavioral and ERP Methodologies to Investigate the Differences Between McGurk Effects Demonstrated by Cantonese and Mandarin Speakers

Science.gov (United States)

Zhang, Juan; Meng, Yaxuan; McBride, Catherine; Fan, Xitao; Yuan, Zhen

2018-01-01

The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP) methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration. PMID:29780312
Robustness of Visual Place Cells in Dynamic Indoor and Outdoor Environment

Directory of Open Access Journals (Sweden)

C. Giovannangeli

2006-06-01

Full Text Available In this paper, a model of visual place cells (PCs based on precise neurobiological data is presented. The robustness of the model in real indoor and outdoor environments is tested. Results show that the interplay between neurobiological modelling and robotic experiments can promote the understanding of the neural structures and the achievement of robust robot navigation algorithms. Short Term Memory (STM, soft competition and sparse coding are important for both landmark identification and computation of PC activities. The extension of the paradigm to outdoor environments has confirmed the robustness of the vision-based model and pointed to improvements in order to further foster its performance.
Communication‐related affective, behavioral, and cognitive reactions in speakers with spasmodic dysphonia

Science.gov (United States)

Vanryckeghem, Martine

2017-01-01

Objectives To investigate the self‐perceived affective, behavioral, and cognitive reactions associated with communication of speakers with spasmodic dysphonia as a function of employment status. Study Design Prospective cross‐sectional investigation Methods 148 Participants with spasmodic dysphonia (SD) completed an adapted version of the Behavior Assessment Battery (BAB‐Voice), a multidimensional assessment of self‐perceived reactions to communication. The BAB‐Voice consisted of four subtests: the Speech Situation Checklist for A) Emotional Reaction (SSC‐ER) and B) Speech Disruption (SSC‐SD), C) the Behavior Checklist (BCL), and D) the Communication Attitude Test for Adults (BigCAT). Participants were assigned to groups based on employment status (working versus retired). Results Descriptive comparison of the BAB‐Voice in speakers with SD to previously published non‐dysphonic speaker data revealed substantially higher scores associated with SD across all four subtests. Multivariate Analysis of Variance (MANOVA) revealed no significantly different BAB‐Voice subtest scores as a function of SD group status (working vs. retired). Conclusions BAB‐Voice scores revealed that speakers with SD experienced substantial impact of their voice disorder on communication attitude, coping behaviors, and affective reactions in speaking situations as reflected in their high BAB scores. These impacts do not appear to be influenced by work status, as speakers with SD who were employed or retired experienced similar levels of affective and behavioral reactions in various speaking situations and cognitive responses. These findings are consistent with previously published pilot data. The specificity of items assessed by means of the BAB‐Voice may inform the clinician of valid patient‐centered treatment goals which target the impairment extended beyond the physiological dimension. Level of Evidence 2b PMID:29299525
Communication-related affective, behavioral, and cognitive reactions in speakers with spasmodic dysphonia.

Science.gov (United States)

Watts, Christopher R; Vanryckeghem, Martine

2017-12-01

To investigate the self-perceived affective, behavioral, and cognitive reactions associated with communication of speakers with spasmodic dysphonia as a function of employment status. Prospective cross-sectional investigation. 148 Participants with spasmodic dysphonia (SD) completed an adapted version of the Behavior Assessment Battery (BAB-Voice), a multidimensional assessment of self-perceived reactions to communication. The BAB-Voice consisted of four subtests: the Speech Situation Checklist for A) Emotional Reaction (SSC-ER) and B) Speech Disruption (SSC-SD), C) the Behavior Checklist (BCL), and D) the Communication Attitude Test for Adults (BigCAT). Participants were assigned to groups based on employment status (working versus retired). Descriptive comparison of the BAB-Voice in speakers with SD to previously published non-dysphonic speaker data revealed substantially higher scores associated with SD across all four subtests. Multivariate Analysis of Variance (MANOVA) revealed no significantly different BAB-Voice subtest scores as a function of SD group status (working vs. retired). BAB-Voice scores revealed that speakers with SD experienced substantial impact of their voice disorder on communication attitude, coping behaviors, and affective reactions in speaking situations as reflected in their high BAB scores. These impacts do not appear to be influenced by work status, as speakers with SD who were employed or retired experienced similar levels of affective and behavioral reactions in various speaking situations and cognitive responses. These findings are consistent with previously published pilot data. The specificity of items assessed by means of the BAB-Voice may inform the clinician of valid patient-centered treatment goals which target the impairment extended beyond the physiological dimension. 2b.
Schizophrenia among Sesotho speakers in South Africa | Mosotho ...

African Journals Online (AJOL)

Results: Core symptoms of schizophrenia among Sesotho speakers do not differ significantly from other cultures. However, the content of psychological symptoms such as delusions and hallucinations is strongly affected by cultural variables. Somatic symptoms such as headaches, palpitations, dizziness and excessive ...
Sentence comprehension in Swahili-English bilingual agrammatic speakers

NARCIS (Netherlands)

Abuom, Tom O.; Shah, Emmah; Bastiaanse, Roelien

For this study, sentence comprehension was tested in Swahili-English bilingual agrammatic speakers. The sentences were controlled for four factors: (1) order of the arguments (base vs. derived); (2) embedding (declarative vs. relative sentences); (3) overt use of the relative pronoun "who"; (4)
An evidence-based rehabilitation program for tracheoesophageal speakers

NARCIS (Netherlands)

Jongmans, P.; Rossum, M.; As-Brooks, C.; Hilgers, F.; Pols, L.; Hilgers, F.J.M.; Pols, L.C.W.; van Rossum, M.; van den Brekel, M.W.M.

2008-01-01

Objectives: to develop an evidence-based therapy program aimed at improving tracheoesophageal speech intelligibility. The therapy program is based on particular problems found for TE speakers in a previous study as performed by the authors. Patients/Materials and Methods: 9 male laryngectomized
The Effects of the Literal Meaning of Emotional Phrases on the Identification of Vocal Emotions.

Science.gov (United States)

Shigeno, Sumi

2018-02-01

This study investigates the discrepancy between the literal emotional content of speech and emotional tone in the identification of speakers' vocal emotions in both the listeners' native language (Japanese), and in an unfamiliar language (random-spliced Japanese). Both experiments involve a "congruent condition," in which the emotion contained in the literal meaning of speech (words and phrases) was compatible with vocal emotion, and an "incongruent condition," in which these forms of emotional information were discordant. Results for Japanese indicated that performance in identifying emotions did not differ significantly between the congruent and incongruent conditions. However, the results for random-spliced Japanese indicated that vocal emotion was correctly identified more often in the congruent than in the incongruent condition. The different results for Japanese and random-spliced Japanese suggested that the literal meaning of emotional phrases influences the listener's perception of the speaker's emotion, and that Japanese participants could infer speakers' intended emotions in the incongruent condition.
On the same wavelength: predictable language enhances speaker-listener brain-to-brain synchrony in posterior superior temporal gyrus.

Science.gov (United States)

Dikker, Suzanne; Silbert, Lauren J; Hasson, Uri; Zevin, Jason D

2014-04-30

Recent research has shown that the degree to which speakers and listeners exhibit similar brain activity patterns during human linguistic interaction is correlated with communicative success. Here, we used an intersubject correlation approach in fMRI to test the hypothesis that a listener's ability to predict a speaker's utterance increases such neural coupling between speakers and listeners. Nine subjects listened to recordings of a speaker describing visual scenes that varied in the degree to which they permitted specific linguistic predictions. In line with our hypothesis, the temporal profile of listeners' brain activity was significantly more synchronous with the speaker's brain activity for highly predictive contexts in left posterior superior temporal gyrus (pSTG), an area previously associated with predictive auditory language processing. In this region, predictability differentially affected the temporal profiles of brain responses in the speaker and listeners respectively, in turn affecting correlated activity between the two: whereas pSTG activation increased with predictability in the speaker, listeners' pSTG activity instead decreased for more predictable sentences. Listeners additionally showed stronger BOLD responses for predictive images before sentence onset, suggesting that highly predictable contexts lead comprehenders to preactivate predicted words.
Thermal Stresses Analysis and Optimized TTP Processes to Achieved CNT-Based Diaphragm for Thin Panel Speakers

Directory of Open Access Journals (Sweden)

Feng-Min Lai

2016-01-01

Full Text Available Industrial companies popularly used the powder coating, classing, and thermal transfer printing (TTP technique to avoid oxidation on the metallic surface and stiffened speaker diaphragm. This study developed a TTP technique to fabricate a carbon nanotubes (CNTs stiffened speaker diaphragm for thin panel speaker. The self-developed TTP stiffening technique did not require a high curing temperature that decreased the mechanical property of CNTs. In addition to increasing the stiffness of diaphragm substrate, this technique alleviated the middle and high frequency attenuation associated with the smoothing sound pressure curve of thin panel speaker. The advantage of TTP technique is less harmful to the ecology, but it causes thermal residual stresses and some unstable connections between printed plates. Thus, this study used the numerical analysis software (ANSYS to analyze the stress and thermal of work piece which have not delaminated problems in transfer interface. The Taguchi quality engineering method was applied to identify the optimal manufacturing parameters. Finally, the optimal manufacturing parameters were employed to fabricate a CNT-based diaphragm, which was then assembled onto a speaker. The result indicated that the CNT-based diaphragm improved the sound pressure curve smoothness of the speaker, which produced a minimum high frequency dip difference (ΔdB value.
The Space-Time Topography of English Speakers

Science.gov (United States)

Duman, Steve

2016-01-01

English speakers talk and think about Time in terms of physical space. The past is behind us, and the future is in front of us. In this way, we "map" space onto Time. This dissertation addresses the specificity of this physical space, or its topography. Inspired by languages like Yupno (Nunez, et al., 2012) and Bamileke-Dschang (Hyman,…

Does dynamic information about the speaker's face contribute to semantic speech processing? ERP evidence.

Science.gov (United States)

Hernández-Gutiérrez, David; Abdel Rahman, Rasha; Martín-Loeches, Manuel; Muñoz, Francisco; Schacht, Annekathrin; Sommer, Werner

2018-07-01

Face-to-face interactions characterize communication in social contexts. These situations are typically multimodal, requiring the integration of linguistic auditory input with facial information from the speaker. In particular, eye gaze and visual speech provide the listener with social and linguistic information, respectively. Despite the importance of this context for an ecological study of language, research on audiovisual integration has mainly focused on the phonological level, leaving aside effects on semantic comprehension. Here we used event-related potentials (ERPs) to investigate the influence of facial dynamic information on semantic processing of connected speech. Participants were presented with either a video or a still picture of the speaker, concomitant to auditory sentences. Along three experiments, we manipulated the presence or absence of the speaker's dynamic facial features (mouth and eyes) and compared the amplitudes of the semantic N400 elicited by unexpected words. Contrary to our predictions, the N400 was not modulated by dynamic facial information; therefore, semantic processing seems to be unaffected by the speaker's gaze and visual speech. Even though, during the processing of expected words, dynamic faces elicited a long-lasting late posterior positivity compared to the static condition. This effect was significantly reduced when the mouth of the speaker was covered. Our findings may indicate an increase of attentional processing to richer communicative contexts. The present findings also demonstrate that in natural communicative face-to-face encounters, perceiving the face of a speaker in motion provides supplementary information that is taken into account by the listener, especially when auditory comprehension is non-demanding. Copyright © 2018 Elsevier Ltd. All rights reserved.
Infants' Selectively Pay Attention to the Information They Receive from a Native Speaker of Their Language.

Science.gov (United States)

Marno, Hanna; Guellai, Bahia; Vidal, Yamil; Franzoi, Julia; Nespor, Marina; Mehler, Jacques

2016-01-01

From the first moments of their life, infants show a preference for their native language, as well as toward speakers with whom they share the same language. This preference appears to have broad consequences in various domains later on, supporting group affiliations and collaborative actions in children. Here, we propose that infants' preference for native speakers of their language also serves a further purpose, specifically allowing them to efficiently acquire culture specific knowledge via social learning. By selectively attending to informants who are native speakers of their language and who probably also share the same cultural background with the infant, young learners can maximize the possibility to acquire cultural knowledge. To test whether infants would preferably attend the information they receive from a speaker of their native language, we familiarized 12-month-old infants with a native and a foreign speaker, and then presented them with movies where each of the speakers silently gazed toward unfamiliar objects. At test, infants' looking behavior to the two objects alone was measured. Results revealed that infants preferred to look longer at the object presented by the native speaker. Strikingly, the effect was replicated also with 5-month-old infants, indicating an early development of such preference. These findings provide evidence that young infants pay more attention to the information presented by a person with whom they share the same language. This selectivity can serve as a basis for efficient social learning by influencing how infants' allocate attention between potential sources of information in their environment.
Towards PLDA-RBM based speaker recognition in mobile environment: Designing stacked/deep PLDA-RBM systems

DEFF Research Database (Denmark)

Nautsch, Andreas; Hao, Hong; Stafylakis, Themos

2016-01-01

recognition: two deep architectures are presented and examined, which aim at suppressing channel effects and recovering speaker-discriminative information on back-ends trained on a small dataset. Experiments are carried out on the MOBIO SRE'13 database, which is a challenging and publicly available dataset...... for mobile speaker recognition with limited amounts of training data. The experiments show that the proposed system outperforms the baseline i-vector/PLDA approach by relative gains of 31% on female and 9% on male speakers in terms of half total error rate....
Neural bases of congenital amusia in tonal language speakers.

Science.gov (United States)

Zhang, Caicai; Peng, Gang; Shao, Jing; Wang, William S-Y

2017-03-01

Congenital amusia is a lifelong neurodevelopmental disorder of fine-grained pitch processing. In this fMRI study, we examined the neural bases of congenial amusia in speakers of a tonal language - Cantonese. Previous studies on non-tonal language speakers suggest that the neural deficits of congenital amusia lie in the music-selective neural circuitry in the right inferior frontal gyrus (IFG). However, it is unclear whether this finding can generalize to congenital amusics in tonal languages. Tonal language experience has been reported to shape the neural processing of pitch, which raises the question of how tonal language experience affects the neural bases of congenital amusia. To investigate this question, we examined the neural circuitries sub-serving the processing of relative pitch interval in pitch-matched Cantonese level tone and musical stimuli in 11 Cantonese-speaking amusics and 11 musically intact controls. Cantonese-speaking amusics exhibited abnormal brain activities in a widely distributed neural network during the processing of lexical tone and musical stimuli. Whereas the controls exhibited significant activation in the right superior temporal gyrus (STG) in the lexical tone condition and in the cerebellum regardless of the lexical tone and music conditions, no activation was found in the amusics in those regions, which likely reflects a dysfunctional neural mechanism of relative pitch processing in the amusics. Furthermore, the amusics showed abnormally strong activation of the right middle frontal gyrus and precuneus when the pitch stimuli were repeated, which presumably reflect deficits of attending to repeated pitch stimuli or encoding them into working memory. No significant group difference was found in the right IFG in either the whole-brain analysis or region-of-interest analysis. These findings imply that the neural deficits in tonal language speakers might differ from those in non-tonal language speakers, and overlap partly with the
Time-Contrastive Learning Based DNN Bottleneck Features for Text-Dependent Speaker Verification

DEFF Research Database (Denmark)

Sarkar, Achintya Kumar; Tan, Zheng-Hua

2017-01-01

In this paper, we present a time-contrastive learning (TCL) based bottleneck (BN) feature extraction method for speech signals with an application to text-dependent (TD) speaker verification (SV). It is well-known that speech signals exhibit quasi-stationary behavior in and only in a short interval......, and the TCL method aims to exploit this temporal structure. More specifically, it trains deep neural networks (DNNs) to discriminate temporal events obtained by uniformly segmenting speech signals, in contrast to existing DNN based BN feature extraction methods that train DNNs using labeled data...... to discriminate speakers or pass-phrases or phones or a combination of them. In the context of speaker verification, speech data of fixed pass-phrases are used for TCL-BN training, while the pass-phrases used for TCL-BN training are excluded from being used for SV, so that the learned features can be considered...
Perceptual and acoustic analysis of lexical stress in Greek speakers with dysarthria.

Science.gov (United States)

Papakyritsis, Ioannis; Müller, Nicole

2014-01-01

The study reported in this paper investigated the abilities of Greek speakers with dysarthria to signal lexical stress at the single word level. Three speakers with dysarthria and two unimpaired control participants were recorded completing a repetition task of a list of words consisting of minimal pairs of Greek disyllabic words contrasted by lexical stress location only. Fourteen listeners were asked to determine the attempted stress location for each word pair. Acoustic analyses of duration and intensity ratios, both within and across words, were undertaken to identify possible acoustic correlates of the listeners' judgments concerning stress location. Acoustic and perceptual data indicate that while each participant with dysarthria in this study had some difficulty in signaling stress unambiguously, the pattern of difficulty was different for each speaker. Further, it was found that the relationship between the listeners' judgments of stress location and the acoustic data was not conclusive.
Switches to English during French Service Encounters: Relationships with L2 French Speakers' Willingness to Communicate and Motivation

Science.gov (United States)

McNaughton, Stephanie; McDonough, Kim

2015-01-01

This exploratory study investigated second language (L2) French speakers' service encounters in the multilingual setting of Montreal, specifically whether switches to English during French service encounters were related to L2 speakers' willingness to communicate or motivation. Over a two-week period, 17 French L2 speakers in Montreal submitted…
Musical Sophistication and the Effect of Complexity on Auditory Discrimination in Finnish Speakers

Science.gov (United States)

Dawson, Caitlin; Aalto, Daniel; Šimko, Juraj; Vainio, Martti; Tervaniemi, Mari

2017-01-01

Musical experiences and native language are both known to affect auditory processing. The present work aims to disentangle the influences of native language phonology and musicality on behavioral and subcortical sound feature processing in a population of musically diverse Finnish speakers as well as to investigate the specificity of enhancement from musical training. Finnish speakers are highly sensitive to duration cues since in Finnish, vowel and consonant duration determine word meaning. Using a correlational approach with a set of behavioral sound feature discrimination tasks, brainstem recordings, and a musical sophistication questionnaire, we find no evidence for an association between musical sophistication and more precise duration processing in Finnish speakers either in the auditory brainstem response or in behavioral tasks, but they do show an enhanced pitch discrimination compared to Finnish speakers with less musical experience and show greater duration modulation in a complex task. These results are consistent with a ceiling effect set for certain sound features which corresponds to the phonology of the native language, leaving an opportunity for music experience-based enhancement of sound features not explicitly encoded in the language (such as pitch, which is not explicitly encoded in Finnish). Finally, the pattern of duration modulation in more musically sophisticated Finnish speakers suggests integrated feature processing for greater efficiency in a real world musical situation. These results have implications for research into the specificity of plasticity in the auditory system as well as to the effects of interaction of specific language features with musical experiences. PMID:28450829
Musical Sophistication and the Effect of Complexity on Auditory Discrimination in Finnish Speakers.

Science.gov (United States)

Dawson, Caitlin; Aalto, Daniel; Šimko, Juraj; Vainio, Martti; Tervaniemi, Mari

2017-01-01

Musical experiences and native language are both known to affect auditory processing. The present work aims to disentangle the influences of native language phonology and musicality on behavioral and subcortical sound feature processing in a population of musically diverse Finnish speakers as well as to investigate the specificity of enhancement from musical training. Finnish speakers are highly sensitive to duration cues since in Finnish, vowel and consonant duration determine word meaning. Using a correlational approach with a set of behavioral sound feature discrimination tasks, brainstem recordings, and a musical sophistication questionnaire, we find no evidence for an association between musical sophistication and more precise duration processing in Finnish speakers either in the auditory brainstem response or in behavioral tasks, but they do show an enhanced pitch discrimination compared to Finnish speakers with less musical experience and show greater duration modulation in a complex task. These results are consistent with a ceiling effect set for certain sound features which corresponds to the phonology of the native language, leaving an opportunity for music experience-based enhancement of sound features not explicitly encoded in the language (such as pitch, which is not explicitly encoded in Finnish). Finally, the pattern of duration modulation in more musically sophisticated Finnish speakers suggests integrated feature processing for greater efficiency in a real world musical situation. These results have implications for research into the specificity of plasticity in the auditory system as well as to the effects of interaction of specific language features with musical experiences.
Willing Learners yet Unwilling Speakers in ESL Classrooms

Directory of Open Access Journals (Sweden)

Zuraidah Ali

2007-12-01

Full Text Available To some of us, speech production in ESL has become so natural and integral that we seem to take it for granted. We often do not even remember how we struggled through the initial process of mastering English. Unfortunately, to students who are still learning English, they seem to face myriad problems that make them appear unwilling or reluctant ESL speakers. This study will investigate this phenomenon which is very common in the ESL classroom. Setting its background on related research findings on this matter, a qualitative study was conducted among foreign students enrolled in the Intensive English Programme (IEP at Institute of Liberal Studies (IKAL, University Tenaga Nasional (UNITEN. The results will show and discuss an extent of truth behind this perplexing phenomenon: willing learners, yet unwilling speakers of ESL, in our effort to provide supportive learning cultures in second language acquisition (SLA to this group of students.
English exposed common mistakes made by Chinese speakers

CERN Document Server

Hart, Steve

2017-01-01

Having analysed the most common English errors made in over 600 academic papers written by Chinese undergraduates, postgraduates, and researchers, Steve Hart has written an essential, practical guide specifically for the native Chinese speaker on how to write good academic English. English Exposed: Common Mistakes Made by Chinese Speakers is divided into three main sections. The first section examines errors made with verbs, nouns, prepositions, and other grammatical classes of words. The second section focuses on problems of word choice. In addition to helping the reader find the right word, it provides instruction for selecting the right style too. The third section covers a variety of other areas essential for the academic writer, such as using punctuation, adding appropriate references, referring to tables and figures, and selecting among various English date and time phrases. Using English Exposed will allow a writer to produce material where content and ideas-not language mistakes-speak the loudest.
Neural decoding of attentional selection in multi-speaker environments without access to clean sources

Science.gov (United States)

O'Sullivan, James; Chen, Zhuo; Herrero, Jose; McKhann, Guy M.; Sheth, Sameer A.; Mehta, Ashesh D.; Mesgarani, Nima

2017-10-01

Objective. People who suffer from hearing impairments can find it difficult to follow a conversation in a multi-speaker environment. Current hearing aids can suppress background noise; however, there is little that can be done to help a user attend to a single conversation amongst many without knowing which speaker the user is attending to. Cognitively controlled hearing aids that use auditory attention decoding (AAD) methods are the next step in offering help. Translating the successes in AAD research to real-world applications poses a number of challenges, including the lack of access to the clean sound sources in the environment with which to compare with the neural signals. We propose a novel framework that combines single-channel speech separation algorithms with AAD. Approach. We present an end-to-end system that (1) receives a single audio channel containing a mixture of speakers that is heard by a listener along with the listener’s neural signals, (2) automatically separates the individual speakers in the mixture, (3) determines the attended speaker, and (4) amplifies the attended speaker’s voice to assist the listener. Main results. Using invasive electrophysiology recordings, we identified the regions of the auditory cortex that contribute to AAD. Given appropriate electrode locations, our system is able to decode the attention of subjects and amplify the attended speaker using only the mixed audio. Our quality assessment of the modified audio demonstrates a significant improvement in both subjective and objective speech quality measures. Significance. Our novel framework for AAD bridges the gap between the most recent advancements in speech processing technologies and speech prosthesis research and moves us closer to the development of cognitively controlled hearable devices for the hearing impaired.
Insight into the Attitudes of Speakers of Urban Meccan Hijazi Arabic towards their Dialect

Directory of Open Access Journals (Sweden)

Sameeha D. Alahmadi

2016-04-01

Full Text Available The current study mainly aims to examine the attitudes of speakers of Urban Meccan Hijazi Arabic (UMHA towards their dialect, which is spoken in Mecca, Saudi Arabia. It also investigates whether the participants’ age, sex and educational level have any impact on their perception of their dialect. To this end, I designed a 5-point-Likert-scale questionnaire, requiring participants to rate their attitudes towards their dialect. I asked 80 participants, whose first language is UMHA, to fill out the questionnaire. On the basis of the three independent variables, namely, age, sex and educational level, the participants were divided into three groups: old and young speakers, male and female speakers and educated and uneducated speakers. The results reveal that in general, all the groups (young and old, male and female, and educated and uneducated participants have a sense of responsibility towards their dialect, making their attitudes towards their dialect positive. However, differences exist between the three groups. For instance, old speakers tend to express their pride of their dialect more than young speakers. The same pattern is observed in male and female groups. The results show that females may feel embarrassed to provide answers that may imply that they are not proud of their own dialect, since the majority of women in the Arab world, in general, are under more pressure to conform to the overt norms of the society than males. Therefore, I argue that most Arab women may not have the same freedom to express their opinions and feelings about various issues. Based on the results, the study concludes with some recommendations for further research. Keywords: sociolinguistics, language attitudes, dialectology, social variables, Urban Meccan Hijazi Arabic
Accuracy of MFCC-Based Speaker Recognition in Series 60 Device

Directory of Open Access Journals (Sweden)

Pasi Fränti

2005-10-01

Full Text Available A fixed point implementation of speaker recognition based on MFCC signal processing is considered. We analyze the numerical error of the MFCC and its effect on the recognition accuracy. Techniques to reduce the information loss in a converted fixed point implementation are introduced. We increase the signal processing accuracy by adjusting the ratio of presentation accuracy of the operators and the signal. The signal processing error is found out to be more important to the speaker recognition accuracy than the error in the classification algorithm. The results are verified by applying the alternative technique to speech data. We also discuss the specific programming requirements set up by the Symbian and Series 60.
Lip-Synching Using Speaker-Specific Articulation, Shape and Appearance Models

Directory of Open Access Journals (Sweden)

Gaspard Breton

2009-01-01

Full Text Available We describe here the control, shape and appearance models that are built using an original photogrammetric method to capture characteristics of speaker-specific facial articulation, anatomy, and texture. Two original contributions are put forward here: the trainable trajectory formation model that predicts articulatory trajectories of a talking face from phonetic input and the texture model that computes a texture for each 3D facial shape according to articulation. Using motion capture data from different speakers and module-specific evaluation procedures, we show here that this cloning system restores detailed idiosyncrasies and the global coherence of visible articulation. Results of a subjective evaluation of the global system with competing trajectory formation models are further presented and commented.
Umesh V Waghmare | Speakers | Indian Academy of Sciences

Indian Academy of Sciences (India)

Umesh V Waghmare. Theoretical Sciences Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur P.O., Bangalore 560 064, ... These ideas apply quite well to dynamical structure of a crystal, as described by the dispersion of its phonons or vibrational waves. The speakers group has shown an interesting ...
A general auditory bias for handling speaker variability in speech? Evidence in humans and songbirds

NARCIS (Netherlands)

Kriengwatana, B.; Escudero, P.; Kerkhoven, A.H.; ten Cate, C.

2015-01-01

Different speakers produce the same speech sound differently, yet listeners are still able to reliably identify the speech sound. How listeners can adjust their perception to compensate for speaker differences in speech, and whether these compensatory processes are unique only to humans, is still
Age differences in vocal emotion perception: on the role of speaker age and listener sex.

Science.gov (United States)

Sen, Antarika; Isaacowitz, Derek; Schirmer, Annett

2017-10-24

Older adults have greater difficulty than younger adults perceiving vocal emotions. To better characterise this effect, we explored its relation to age differences in sensory, cognitive and emotional functioning. Additionally, we examined the role of speaker age and listener sex. Participants (N = 163) aged 19-34 years and 60-85 years categorised neutral sentences spoken by ten younger and ten older speakers with a happy, neutral, sad, or angry voice. Acoustic analyses indicated that expressions from younger and older speakers denoted the intended emotion with similar accuracy. As expected, younger participants outperformed older participants and this effect was statistically mediated by an age-related decline in both optimism and working-memory. Additionally, age differences in emotion perception were larger for younger as compared to older speakers and a better perception of younger as compared to older speakers was greater in younger as compared to older participants. Last, a female perception benefit was less pervasive in the older than the younger group. Together, these findings suggest that the role of age for emotion perception is multi-faceted. It is linked to emotional and cognitive change, to processing biases that benefit young and own-age expressions, and to the different aptitudes of women and men.
What makes a charismatic speaker?

DEFF Research Database (Denmark)

Niebuhr, Oliver; Voße, Jana; Brem, Alexander

2016-01-01

The former Apple CEO Steve Jobs was one of the most charismatic speakers of the past decades. However, there is, as yet, no detailed quantitative profile of his way of speaking. We used state-of-the-art computer techniques to acoustically analyze his speech behavior and relate it to reference...... samples. Our paper provides the first-ever acoustic profile of Steve Jobs, based on about 4000 syllables and 12,000 individual speech sounds from his two most outstanding and well-known product presentations: the introductions of the iPhone 4 and the iPad 2. Our results show that Steve Jobs stands out...
Advanced neural network-based computational schemes for robust fault diagnosis

CERN Document Server

Mrugalski, Marcin

2014-01-01

The present book is devoted to problems of adaptation of artificial neural networks to robust fault diagnosis schemes. It presents neural networks-based modelling and estimation techniques used for designing robust fault diagnosis schemes for non-linear dynamic systems. A part of the book focuses on fundamental issues such as architectures of dynamic neural networks, methods for designing of neural networks and fault diagnosis schemes as well as the importance of robustness. The book is of a tutorial value and can be perceived as a good starting point for the new-comers to this field. The book is also devoted to advanced schemes of description of neural model uncertainty. In particular, the methods of computation of neural networks uncertainty with robust parameter estimation are presented. Moreover, a novel approach for system identification with the state-space GMDH neural network is delivered. All the concepts described in this book are illustrated by both simple academic illustrative examples and practica...

Processing ser and estar to locate objects and events: An ERP study with L2 speakers of Spanish.

Science.gov (United States)

Dussias, Paola E; Contemori, Carla; Román, Patricia

2014-01-01

In Spanish locative constructions, a different form of the copula is selected in relation to the semantic properties of the grammatical subject: sentences that locate objects require estar while those that locate events require ser (both translated in English as 'to be'). In an ERP study, we examined whether second language (L2) speakers of Spanish are sensitive to the selectional restrictions that the different types of subjects impose on the choice of the two copulas. Twenty-four native speakers of Spanish and two groups of L2 Spanish speakers (24 beginners and 18 advanced speakers) were recruited to investigate the processing of 'object/event + estar/ser ' permutations. Participants provided grammaticality judgments on correct (object + estar ; event + ser ) and incorrect (object + ser ; event + estar ) sentences while their brain activity was recorded. In line with previous studies (Leone-Fernández, Molinaro, Carreiras, & Barber, 2012; Sera, Gathje, & Pintado, 1999), the results of the grammaticality judgment for the native speakers showed that participants correctly accepted object + estar and event + ser constructions. In addition, while 'object + ser ' constructions were considered grossly ungrammatical, 'event + estar ' combinations were perceived as unacceptable to a lesser degree. For these same participants, ERP recording time-locked to the onset of the critical word ' en ' showed a larger P600 for the ser predicates when the subject was an object than when it was an event (*La silla es en la cocina vs. La fiesta es en la cocina). This P600 effect is consistent with syntactic repair of the defining predicate when it does not fit with the adequate semantic properties of the subject. For estar predicates (La silla está en la cocina vs. *La fiesta está en la cocina), the findings showed a central-frontal negativity between 500-700 ms. Grammaticality judgment data for the L2 speakers of Spanish showed that beginners were significantly less accurate than
The Effect of Noise on Relationships Between Speech Intelligibility and Self-Reported Communication Measures in Tracheoesophageal Speakers.

Science.gov (United States)

Eadie, Tanya L; Otero, Devon Sawin; Bolt, Susan; Kapsner-Smith, Mara; Sullivan, Jessica R

2016-08-01

The purpose of this study was to examine how sentence intelligibility relates to self-reported communication in tracheoesophageal speakers when speech intelligibility is measured in quiet and noise. Twenty-four tracheoesophageal speakers who were at least 1 year postlaryngectomy provided audio recordings of 5 sentences from the Sentence Intelligibility Test. Speakers also completed self-reported measures of communication-the Voice Handicap Index-10 and the Communicative Participation Item Bank short form. Speech recordings were presented to 2 groups of inexperienced listeners who heard sentences in quiet or noise. Listeners transcribed the sentences to yield speech intelligibility scores. Very weak relationships were found between intelligibility in quiet and measures of voice handicap and communicative participation. Slightly stronger, but still weak and nonsignificant, relationships were observed between measures of intelligibility in noise and both self-reported measures. However, 12 speakers who were more than 65% intelligible in noise showed strong and statistically significant relationships with both self-reported measures (R2 = .76-.79). Speech intelligibility in quiet is a weak predictor of self-reported communication measures in tracheoesophageal speakers. Speech intelligibility in noise may be a better metric of self-reported communicative function for speakers who demonstrate higher speech intelligibility in noise.
Native Speakers' Perception of Non-Native English Speech

Science.gov (United States)

Jaber, Maysa; Hussein, Riyad F.

2011-01-01

This study is aimed at investigating the rating and intelligibility of different non-native varieties of English, namely French English, Japanese English and Jordanian English by native English speakers and their attitudes towards these foreign accents. To achieve the goals of this study, the researchers used a web-based questionnaire which…
Limited data speaker identification

Indian Academy of Sciences (India)

This work demonstrates the following: multiple frame size and rate (MFSR) analysis provides improvement in the analysis stage, combination of mel frequency cepstral coefﬁcients (MFCC), its temporal derivatives ( Δ , Δ Δ ) , linear prediction residual (LPR) and linear prediction residual phase (LPRP) features provides ...
Facial Expression Generation from Speaker's Emotional States in Daily Conversation

Science.gov (United States)

Mori, Hiroki; Ohshima, Koh

A framework for generating facial expressions from emotional states in daily conversation is described. It provides a mapping between emotional states and facial expressions, where the former is represented by vectors with psychologically-defined abstract dimensions, and the latter is coded by the Facial Action Coding System. In order to obtain the mapping, parallel data with rated emotional states and facial expressions were collected for utterances of a female speaker, and a neural network was trained with the data. The effectiveness of proposed method is verified by a subjective evaluation test. As the result, the Mean Opinion Score with respect to the suitability of generated facial expression was 3.86 for the speaker, which was close to that of hand-made facial expressions.
White Native English Speakers Needed: The Rhetorical Construction of Privilege in Online Teacher Recruitment Spaces

Science.gov (United States)

Ruecker, Todd; Ives, Lindsey

2015-01-01

Over the past few decades, scholars have paid increasing attention to the role of native speakerism in the field of TESOL. Several recent studies have exposed instances of native speakerism in TESOL recruitment discourses published through a variety of media, but none have focused specifically on professional websites advertising programs in…
Plug and Play Robust Distributed Control with Ellipsoidal Parametric Uncertainty System

Directory of Open Access Journals (Sweden)

Hong Wang-jian

2016-01-01

Full Text Available We consider a continuous linear time invariant system with ellipsoidal parametric uncertainty structured into subsystems. Since the design of a local controller uses only information on a subsystem and its neighbours, we combine the plug and play idea and robust distributed control to propose one distributed control strategy for linear system with ellipsoidal parametric uncertainty. Firstly for linear system with ellipsoidal parametric uncertainty, a necessary and sufficient condition for robust state feedback control is proposed by means of linear matrix inequality. If this necessary and sufficient condition is satisfied, this robust state feedback gain matrix can be easily derived to guarantee robust stability and prescribed closed loop performance. Secondly the plug and play idea is introduced in the design process. Finally by one example of aircraft flutter model parameter identification, the efficiency of the proposed control strategy can be easily realized.
Extending Situated Language Comprehension (Accounts) with Speaker and Comprehender Characteristics: Toward Socially Situated Interpretation.

Science.gov (United States)

Münster, Katja; Knoeferle, Pia

2017-01-01

More and more findings suggest a tight temporal coupling between (non-linguistic) socially interpreted context and language processing. Still, real-time language processing accounts remain largely elusive with respect to the influence of biological (e.g., age) and experiential (e.g., world and moral knowledge) comprehender characteristics and the influence of the 'socially interpreted' context, as for instance provided by the speaker. This context could include actions, facial expressions, a speaker's voice or gaze, and gestures among others. We review findings from social psychology, sociolinguistics and psycholinguistics to highlight the relevance of (the interplay between) the socially interpreted context and comprehender characteristics for language processing. The review informs the extension of an extant real-time processing account (already featuring a coordinated interplay between language comprehension and the non-linguistic visual context) with a variable ('ProCom') that captures characteristics of the language user and with a first approximation of the comprehender's speaker representation. Extending the CIA to the sCIA (social Coordinated Interplay Account) is the first step toward a real-time language comprehension account which might eventually accommodate the socially situated communicative interplay between comprehenders and speakers.
Omission of definite and indefinite articles in the spontaneous speech of agrammatic speakers with Broca's aphasia

NARCIS (Netherlands)

Havik, E.; Bastiaanse, Y.R.M.

2004-01-01

Background: Cross-linguistic investigation of agrammatic speech in speakers of different languages allows us to tests theoretical accounts of the nature of agrammatism. A significant feature of the speech of many agrammatic speakers is a problem with article production. Mansson and Ahlsen (2001)
Left hemisphere lateralization for lexical and acoustic pitch processing in Cantonese speakers as revealed by mismatch negativity.

Science.gov (United States)

Gu, Feng; Zhang, Caicai; Hu, Axu; Zhao, Guoping

2013-12-01

For nontonal language speakers, speech processing is lateralized to the left hemisphere and musical processing is lateralized to the right hemisphere (i.e., function-dependent brain asymmetry). On the other hand, acoustic temporal processing is lateralized to the left hemisphere and spectral/pitch processing is lateralized to the right hemisphere (i.e., acoustic-dependent brain asymmetry). In this study, we examine whether the hemispheric lateralization of lexical pitch and acoustic pitch processing in tonal language speakers is consistent with the patterns of function- and acoustic-dependent brain asymmetry in nontonal language speakers. Pitch contrast in both speech stimuli (syllable /ji/ in Experiment 1) and nonspeech stimuli (harmonic tone in Experiment 1; pure tone in Experiment 2) was presented to native Cantonese speakers in passive oddball paradigms. We found that the mismatch negativity (MMN) elicited by lexical pitch contrast was lateralized to the left hemisphere, which is consistent with the pattern of function-dependent brain asymmetry (i.e., left hemisphere lateralization for speech processing) in nontonal language speakers. However, the MMN elicited by acoustic pitch contrast was also left hemisphere lateralized (harmonic tone in Experiment 1) or showed a tendency for left hemisphere lateralization (pure tone in Experiment 2), which is inconsistent with the pattern of acoustic-dependent brain asymmetry (i.e., right hemisphere lateralization for acoustic pitch processing) in nontonal language speakers. The consistent pattern of function-dependent brain asymmetry and the inconsistent pattern of acoustic-dependent brain asymmetry between tonal and nontonal language speakers can be explained by the hypothesis that the acoustic-dependent brain asymmetry is the consequence of a carryover effect from function-dependent brain asymmetry. Potential evolutionary implication of this hypothesis is discussed. © 2013.
Credibility of native and non-native speakers of English revisited: Do non-native listeners feel the same?

OpenAIRE

Hanzlíková, Dagmar; Skarnitzl, Radek

2017-01-01

This study reports on research stimulated by Lev-Ari and Keysar (2010) who showed that native listeners find statements delivered by foreign-accented speakers to be less true than those read by native speakers. Our objective was to replicate the study with non-native listeners to see whether this effect is also relevant in international communication contexts. The same set of statements from the original study was recorded by 6 native and 6 nonnative speakers of English. 121 non-native listen...
Disrupted behaviour in grammatical morphology in French speakers with autism spectrum disorders.

Science.gov (United States)

Le Normand, Marie-Thérèse; Blanc, Romuald; Caldani, Simona; Bonnet-Brilhault, Frédérique

2018-01-18

Mixed and inconsistent findings have been reported across languages concerning grammatical morphology in speakers with Autism Spectrum Disorders (ASD). Some researchers argue for a selective sparing of grammar whereas others claim to have identified grammatical deficits. The present study aimed to investigate this issue in 26 participants with ASD speaking European French who were matched on age, gender and SES to 26 participants with typical development (TD). The groups were compared regarding their productivity and accuracy of syntactic and agreement categories using the French MOR part-of-speech tagger available from the CHILDES. The groups significantly differed in productivity with respect to nouns, adjectives, determiners, prepositions and gender markers. Error analysis revealed that ASD speakers exhibited a disrupted behaviour in grammatical morphology. They made gender, tense and preposition errors and they omitted determiners and pronouns in nominal and verbal contexts. ASD speakers may have a reduced sensitivity to perceiving and processing the distributional structure of syntactic categories when producing grammatical morphemes and agreement categories. The theoretical and cross-linguistic implications of these findings are discussed.
Identification and robust water level control of horizontal steam generators using quantitative feedback theory

International Nuclear Information System (INIS)

Safarzadeh, O.; Khaki-Sedigh, A.; Shirani, A.S.

2011-01-01

Highlights: → A robust water level controller for steam generators (SGs) is designed based on the Quantitative Feedback Theory. → To design the controller, fairly accurate linear models are identified for the SG. → The designed controller is verified using a developed novel global locally linear neuro-fuzzy model of the SG. → Both of the linear and nonlinear models are based on the SG mathematical thermal-hydraulic model developed using the simulation computer code. → The proposed method is easy to apply and guarantees desired closed loop performance. - Abstract: In this paper, a robust water level control system for the horizontal steam generator (SG) using the quantitative feedback theory (QFT) method is presented. To design a robust QFT controller for the nonlinear uncertain SG, control oriented linear models are identified. Then, the nonlinear system is modeled as an uncertain linear time invariant (LTI) system. The robust designed controller is applied to the nonlinear plant model. This nonlinear model is based on a locally linear neuro-fuzzy (LLNF) model. This model is trained using the locally linear model tree (LOLIMOT) algorithm. Finally, simulation results are employed to show the effectiveness of the designed QFT level controller. It is shown that it will ensure the entire designer's water level closed loop specifications.
Identification and robust water level control of horizontal steam generators using quantitative feedback theory

Energy Technology Data Exchange (ETDEWEB)

Safarzadeh, O., E-mail: O_Safarzadeh@sbu.ac.ir [Shahid Beheshti University, P.O. Box: 19839-63113, Tehran (Iran, Islamic Republic of); Khaki-Sedigh, A. [K. N. Toosi University of Technology, Tehran (Iran, Islamic Republic of); Shirani, A.S. [Shahid Beheshti University, P.O. Box: 19839-63113, Tehran (Iran, Islamic Republic of)

2011-09-15

Highlights: {yields} A robust water level controller for steam generators (SGs) is designed based on the Quantitative Feedback Theory. {yields} To design the controller, fairly accurate linear models are identified for the SG. {yields} The designed controller is verified using a developed novel global locally linear neuro-fuzzy model of the SG. {yields} Both of the linear and nonlinear models are based on the SG mathematical thermal-hydraulic model developed using the simulation computer code. {yields} The proposed method is easy to apply and guarantees desired closed loop performance. - Abstract: In this paper, a robust water level control system for the horizontal steam generator (SG) using the quantitative feedback theory (QFT) method is presented. To design a robust QFT controller for the nonlinear uncertain SG, control oriented linear models are identified. Then, the nonlinear system is modeled as an uncertain linear time invariant (LTI) system. The robust designed controller is applied to the nonlinear plant model. This nonlinear model is based on a locally linear neuro-fuzzy (LLNF) model. This model is trained using the locally linear model tree (LOLIMOT) algorithm. Finally, simulation results are employed to show the effectiveness of the designed QFT level controller. It is shown that it will ensure the entire designer's water level closed loop specifications.
Pitch perception and production in congenital amusia: Evidence from Cantonese speakers.

Science.gov (United States)

Liu, Fang; Chan, Alice H D; Ciocca, Valter; Roquet, Catherine; Peretz, Isabelle; Wong, Patrick C M

2016-07-01

This study investigated pitch perception and production in speech and music in individuals with congenital amusia (a disorder of musical pitch processing) who are native speakers of Cantonese, a tone language with a highly complex tonal system. Sixteen Cantonese-speaking congenital amusics and 16 controls performed a set of lexical tone perception, production, singing, and psychophysical pitch threshold tasks. Their tone production accuracy and singing proficiency were subsequently judged by independent listeners, and subjected to acoustic analyses. Relative to controls, amusics showed impaired discrimination of lexical tones in both speech and non-speech conditions. They also received lower ratings for singing proficiency, producing larger pitch interval deviations and making more pitch interval errors compared to controls. Demonstrating higher pitch direction identification thresholds than controls for both speech syllables and piano tones, amusics nevertheless produced native lexical tones with comparable pitch trajectories and intelligibility as controls. Significant correlations were found between pitch threshold and lexical tone perception, music perception and production, but not between lexical tone perception and production for amusics. These findings provide further evidence that congenital amusia is a domain-general language-independent pitch-processing deficit that is associated with severely impaired music perception and production, mildly impaired speech perception, and largely intact speech production.
Pragmatic Instruction May Not Be Necessary among Heritage Speakers of Spanish: A Study on Requests

Science.gov (United States)

Barros García, María J.; Bachelor, Jeremy W.

2018-01-01

This paper studies the pragmatic competence of U.S. heritage speakers of Spanish in an attempt to determine (a) the degree of pragmatic transfer from English to Spanish experienced by heritage speakers when producing different types of requests in Spanish; and (b) how to best teach pragmatics to students of Spanish as a Heritage Language (SHL).…
The effects of L2 proficiency level on the processing of wh-questions among Dutch second language speakers of English

NARCIS (Netherlands)

Jackson, C.N.; Hell, J.G. van

2011-01-01

Using a self-paced reading task, the present study explores how Dutch-English L2 speakers parse English wh-subject-extractions and wh-object-extractions. Results suggest that English native speakers and highly-proficient Dutch–English L2 speakers do not always exhibit measurable signs of on-line
Within the School and the Community--A Speaker's Bureau.

Science.gov (United States)

McClintock, Joy H.

Student interest prompted the formation of a Speaker's Bureau in Seminole Senior High School, Seminole, Florida. First, students compiled a list of community contacts, including civic clubs, churches, retirement villages, newspaper offices, and the County School Administration media center. A letter of introduction was composed and speaking…
Robust and efficient walking with spring-like legs

Energy Technology Data Exchange (ETDEWEB)

Rummel, J; Blum, Y; Seyfarth, A, E-mail: juergen.rummel@uni-jena.d, E-mail: andre.seyfarth@uni-jena.d [Lauflabor Locomotion Laboratory, University of Jena, Dornburger Strasse 23, 07743 Jena (Germany)

2010-12-15

The development of bipedal walking robots is inspired by human walking. A way of implementing walking could be performed by mimicking human leg dynamics. A fundamental model, representing human leg dynamics during walking and running, is the bipedal spring-mass model which is the basis for this paper. The aim of this study is the identification of leg parameters leading to a compromise between robustness and energy efficiency in walking. It is found that, compared to asymmetric walking, symmetric walking with flatter angles of attack reveals such a compromise. With increasing leg stiffness, energy efficiency increases continuously. However, robustness is the maximum at moderate leg stiffness and decreases slightly with increasing stiffness. Hence, an adjustable leg compliance would be preferred, which is adaptable to the environment. If the ground is even, a high leg stiffness leads to energy efficient walking. However, if external perturbations are expected, e.g. when the robot walks on uneven terrain, the leg should be softer and the angle of attack flatter. In the case of underactuated robots with constant physical springs, the leg stiffness should be larger than k-tilde = 14 in order to use the most robust gait. Soft legs, however, lack in both robustness and efficiency.
Robust and efficient walking with spring-like legs

International Nuclear Information System (INIS)

Rummel, J; Blum, Y; Seyfarth, A

2010-01-01

The development of bipedal walking robots is inspired by human walking. A way of implementing walking could be performed by mimicking human leg dynamics. A fundamental model, representing human leg dynamics during walking and running, is the bipedal spring-mass model which is the basis for this paper. The aim of this study is the identification of leg parameters leading to a compromise between robustness and energy efficiency in walking. It is found that, compared to asymmetric walking, symmetric walking with flatter angles of attack reveals such a compromise. With increasing leg stiffness, energy efficiency increases continuously. However, robustness is the maximum at moderate leg stiffness and decreases slightly with increasing stiffness. Hence, an adjustable leg compliance would be preferred, which is adaptable to the environment. If the ground is even, a high leg stiffness leads to energy efficient walking. However, if external perturbations are expected, e.g. when the robot walks on uneven terrain, the leg should be softer and the angle of attack flatter. In the case of underactuated robots with constant physical springs, the leg stiffness should be larger than k-tilde = 14 in order to use the most robust gait. Soft legs, however, lack in both robustness and efficiency.

Non-English speakers attend gastroenterology clinic appointments at higher rates than English speakers in a vulnerable patient population

Science.gov (United States)

Sewell, Justin L.; Kushel, Margot B.; Inadomi, John M.; Yee, Hal F.

2009-01-01

Goals We sought to identify factors associated with gastroenterology clinic attendance in an urban safety net healthcare system. Background Missed clinic appointments reduce the efficiency and availability of healthcare, but subspecialty clinic attendance among patients with established healthcare access has not been studied. Study We performed an observational study using secondary data from administrative sources to study patients referred to, and scheduled for an appointment in, the adult gastroenterology clinic serving the safety net healthcare system of San Francisco, California. Our dependent variable was whether subjects attended or missed a scheduled appointment. Analysis included multivariable logistic regression and classification tree analysis. 1,833 patients were referred and scheduled for an appointment between 05/2005 and 08/2006. Prisoners were excluded. All patients had a primary care provider. Results 683 patients (37.3%) missed their appointment; 1,150 (62.7%) attended. Language was highly associated with attendance in the logistic regression; non-English speakers were less likely than English speakers to miss an appointment (adjusted odds ratio 0.42 [0.28,0.63] for Spanish, 0.56 [0.38,0.82] for Asian language, p gastroenterology clinic appointment, not speaking English was most strongly associated with higher attendance rates. Patient related factors associated with not speaking English likely influence subspecialty clinic attendance rates, and these factors may differ from those affecting general healthcare access. PMID:19169147
The impact of musical training and tone language experience on talker identification.

Science.gov (United States)

Xie, Xin; Myers, Emily

2015-01-01

Listeners can use pitch changes in speech to identify talkers. Individuals exhibit large variability in sensitivity to pitch and in accuracy perceiving talker identity. In particular, people who have musical training or long-term tone language use are found to have enhanced pitch perception. In the present study, the influence of pitch experience on talker identification was investigated as listeners identified talkers in native language as well as non-native languages. Experiment 1 was designed to explore the influence of pitch experience on talker identification in two groups of individuals with potential advantages for pitch processing: musicians and tone language speakers. Experiment 2 further investigated individual differences in pitch processing and the contribution to talker identification by testing a mediation model. Cumulatively, the results suggested that (a) musical training confers an advantage for talker identification, supporting a shared resources hypothesis regarding music and language and (b) linguistic use of lexical tones also increases accuracy in hearing talker identity. Importantly, these two types of hearing experience enhance talker identification by sharpening pitch perception skills in a domain-general manner.
Forensic Automatic Speaker Recognition Based on Likelihood Ratio Using Acoustic-phonetic Features Measured Automatically

Directory of Open Access Journals (Sweden)

Huapeng Wang

2015-01-01

Full Text Available Forensic speaker recognition is experiencing a remarkable paradigm shift in terms of the evaluation framework and presentation of voice evidence. This paper proposes a new method of forensic automatic speaker recognition using the likelihood ratio framework to quantify the strength of voice evidence. The proposed method uses a reference database to calculate the within- and between-speaker variability. Some acoustic-phonetic features are extracted automatically using the software VoiceSauce. The effectiveness of the approach was tested using two Mandarin databases: A mobile telephone database and a landline database. The experiment's results indicate that these acoustic-phonetic features do have some discriminating potential and are worth trying in discrimination. The automatic acoustic-phonetic features have acceptable discriminative performance and can provide more reliable results in evidence analysis when fused with other kind of voice features.
Enhanced echolocation via robust statistics and super-resolution of sonar images

Science.gov (United States)

Kim, Kio

Echolocation is a process in which an animal uses acoustic signals to exchange information with environments. In a recent study, Neretti et al. have shown that the use of robust statistics can significantly improve the resiliency of echolocation against noise and enhance its accuracy by suppressing the development of sidelobes in the processing of an echo signal. In this research, the use of robust statistics is extended to problems in underwater explorations. The dissertation consists of two parts. Part I describes how robust statistics can enhance the identification of target objects, which in this case are cylindrical containers filled with four different liquids. Particularly, this work employs a variation of an existing robust estimator called an L-estimator, which was first suggested by Koenker and Bassett. As pointed out by Au et al.; a 'highlight interval' is an important feature, and it is closely related with many other important features that are known to be crucial for dolphin echolocation. A varied L-estimator described in this text is used to enhance the detection of highlight intervals, which eventually leads to a successful classification of echo signals. Part II extends the problem into 2 dimensions. Thanks to the advances in material and computer technology, various sonar imaging modalities are available on the market. By registering acoustic images from such video sequences, one can extract more information on the region of interest. Computer vision and image processing allowed application of robust statistics to the acoustic images produced by forward looking sonar systems, such as Dual-frequency Identification Sonar and ProViewer. The first use of robust statistics for sonar image enhancement in this text is in image registration. Random Sampling Consensus (RANSAC) is widely used for image registration. The registration algorithm using RANSAC is optimized for sonar image registration, and the performance is studied. The second use of robust
Individual differences in selective attention predict speech identification at a cocktail party.

Science.gov (United States)

Oberfeld, Daniel; Klöckner-Nowotny, Felicitas

2016-08-31

Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise.
A virtual speaker in noisy classroom conditions: supporting or disrupting children's listening comprehension?

Science.gov (United States)

Nirme, Jens; Haake, Magnus; Lyberg Åhlander, Viveka; Brännström, Jonas; Sahlén, Birgitta

2018-04-05

Seeing a speaker's face facilitates speech recognition, particularly under noisy conditions. Evidence for how it might affect comprehension of the content of the speech is more sparse. We investigated how children's listening comprehension is affected by multi-talker babble noise, with or without presentation of a digitally animated virtual speaker, and whether successful comprehension is related to performance on a test of executive functioning. We performed a mixed-design experiment with 55 (34 female) participants (8- to 9-year-olds), recruited from Swedish elementary schools. The children were presented with four different narratives, each in one of four conditions: audio-only presentation in a quiet setting, audio-only presentation in noisy setting, audio-visual presentation in a quiet setting, and audio-visual presentation in a noisy setting. After each narrative, the children answered questions on the content and rated their perceived listening effort. Finally, they performed a test of executive functioning. We found significantly fewer correct answers to explicit content questions after listening in noise. This negative effect was only mitigated to a marginally significant degree by audio-visual presentation. Strong executive function only predicted more correct answers in quiet settings. Altogether, our results are inconclusive regarding how seeing a virtual speaker affects listening comprehension. We discuss how methodological adjustments, including modifications to our virtual speaker, can be used to discriminate between possible explanations to our results and contribute to understanding the listening conditions children face in a typical classroom.
Gesturing by Speakers with Aphasia: How Does It Compare?

Science.gov (United States)

Mol, Lisette; Krahmer, Emiel; van de Sandt-Koenderman, Mieke

2013-01-01

Purpose: To study the independence of gesture and verbal language production. The authors assessed whether gesture can be semantically compensatory in cases of verbal language impairment and whether speakers with aphasia and control participants use similar depiction techniques in gesture. Method: The informativeness of gesture was assessed in 3…
A comparison of three speaker-intrinsic vowel formant frequency normalization algorithms for sociophonetics

DEFF Research Database (Denmark)

Fabricius, Anne; Watt, Dominic; Johnson, Daniel Ezra

2009-01-01

from RP and Aberdeen English (northeast Scotland). We conclude that, for the data examined here, the S-centroid W&F procedures performs at least as well as the two most recognized speaker-intrinsic, vowel-extrinsic, formant-intrinsic normalization methods, Lobanov's (1971) z-score procedure and Nearey......This paper evaluates a speaker-intrinsic vowel formant frequency normalization algorithm initially proposed in Watt & Fabricius (2002). We compare how well this routine, known as the S-centroid procedure, performs as a sociophonetic research tool in three ways: reducing variance in area ratios...
Robustness Metrics: Consolidating the multiple approaches to quantify Robustness

DEFF Research Database (Denmark)

Göhler, Simon Moritz; Eifler, Tobias; Howard, Thomas J.

2016-01-01

robustness metrics; 3) Functional expectancy and dispersion robustness metrics; and 4) Probability of conformance robustness metrics. The goal was to give a comprehensive overview of robustness metrics and guidance to scholars and practitioners to understand the different types of robustness metrics...
Testing Template and Testing Concept of Operations for Speaker Authentication Technology

National Research Council Canada - National Science Library

Sipko, Marek M

2006-01-01

This thesis documents the findings of developing a generic testing template and supporting concept of operations for speaker verification technology as part of the Iraqi Enrollment via Voice Authentication Project (IEVAP...
Accent, Intelligibility, and the Role of the Listener: Perceptions of English-Accented German by Native German Speakers

Science.gov (United States)

Hayes-Harb, Rachel; Watzinger-Tharp, Johanna

2012-01-01

We explore the relationship between accentedness and intelligibility, and investigate how listeners' beliefs about nonnative speech interact with their accentedness and intelligibility judgments. Native German speakers and native English learners of German produced German sentences, which were presented to 12 native German speakers in accentedness…
Robust Adaptive Speed Control of Induction Motor Drives

DEFF Research Database (Denmark)

Bidstrup, N.

, (LS) identification and generalized predictive control (GPC) has been implemented and tested on the CVC drive. Allthough GPC is a robust control method, it was not possible to maintain specified controller performance in the entire operating range. This was the main reason for investigating truly...... adaptive speed control of the CVC drive. A direct truly adaptive speed controller has been implemented. The adaptive controller is a moving Average Self-Tuning Regulator which is abbreviated MASTR throughout the thesis. Two practical implementations of this controller were proposed. They were denoted MASTR...... and measurement noise in general, were the major reasons for the drifting parameters. Two approaches was proposed to robustify MASTR2 against the output noise. The first approach consists of filtering the output. Output filtering had a significant effect in simulations, but the robustness against the output noise...
Robust Adaptive Speed Control of Induction Motor Drives

DEFF Research Database (Denmark)

Bidstrup, N.

This thesis concerns speed control of current vector controlled induction motor drives (CVC drives). The CVC drive is an existing prototype drive developed by Danfoss A/S, Transmission Division. Practical tests have revealed that the open loop dynamical properties of the CVC drive are highly......, (LS) identification and generalized predictive control (GPC) has been implemented and tested on the CVC drive. Allthough GPC is a robust control method, it was not possible to maintain specified controller performance in the entire operating range. This was the main reason for investigating truly...... and measurement noise in general, were the major reasons for the drifting parameters. Two approaches was proposed to robustify MASTR2 against the output noise. The first approach consists of filtering the output. Output filtering had a significant effect in simulations, but the robustness against the output noise...
Beyond the language given: the neural correlates of inferring speaker meaning.

Science.gov (United States)

Bašnáková, Jana; Weber, Kirsten; Petersson, Karl Magnus; van Berkum, Jos; Hagoort, Peter

2014-10-01

Even though language allows us to say exactly what we mean, we often use language to say things indirectly, in a way that depends on the specific communicative context. For example, we can use an apparently straightforward sentence like "It is hard to give a good presentation" to convey deeper meanings, like "Your talk was a mess!" One of the big puzzles in language science is how listeners work out what speakers really mean, which is a skill absolutely central to communication. However, most neuroimaging studies of language comprehension have focused on the arguably much simpler, context-independent process of understanding direct utterances. To examine the neural systems involved in getting at contextually constrained indirect meaning, we used functional magnetic resonance imaging as people listened to indirect replies in spoken dialog. Relative to direct control utterances, indirect replies engaged dorsomedial prefrontal cortex, right temporo-parietal junction and insula, as well as bilateral inferior frontal gyrus and right medial temporal gyrus. This suggests that listeners take the speaker's perspective on both cognitive (theory of mind) and affective (empathy-like) levels. In line with classic pragmatic theories, our results also indicate that currently popular "simulationist" accounts of language comprehension fail to explain how listeners understand the speaker's intended message. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Revisiting the role of language in spatial cognition: Categorical perception of spatial relations in English and Korean speakers.

Science.gov (United States)

Holmes, Kevin J; Moty, Kelsey; Regier, Terry

2017-12-01

The spatial relation of support has been regarded as universally privileged in nonlinguistic cognition and immune to the influence of language. English, but not Korean, obligatorily distinguishes support from nonsupport via basic spatial terms. Despite this linguistic difference, previous research suggests that English and Korean speakers show comparable nonlinguistic sensitivity to the support/nonsupport distinction. Here, using a paradigm previously found to elicit cross-language differences in color discrimination, we provide evidence for a difference in sensitivity to support/nonsupport between native English speakers and native Korean speakers who were late English learners and tested in a context that privileged Korean. Whereas the former group showed categorical perception (CP) when discriminating spatial scenes capturing the support/nonsupport distinction, the latter did not. An additional group of native Korean speakers-relatively early English learners tested in an English-salient context-patterned with the native English speakers in showing CP for support/nonsupport. These findings suggest that obligatory marking of support/nonsupport in one's native language can affect nonlinguistic sensitivity to this distinction, contra earlier findings, but that such sensitivity may also depend on aspects of language background and the immediate linguistic context.
Age of acquisition and naming performance in Frisian-Dutch bilingual speakers with dementia.

Science.gov (United States)

Veenstra, Wencke S; Huisman, Mark; Miller, Nick

2014-01-01

Age of acquisition (AoA) of words is a recognised variable affecting language processing in speakers with and without language disorders. For bi- and multilingual speakers their languages can be differentially affected in neurological illness. Study of language loss in bilingual speakers with dementia has been relatively neglected. We investigated whether AoA of words was associated with level of naming impairment in bilingual speakers with probable Alzheimer's dementia within and across their languages. Twenty-six Frisian-Dutch bilinguals with mild to moderate dementia named 90 pictures in each language, employing items with rated AoA and other word variable measures matched across languages. Quantitative (totals correct) and qualitative (error types and (in)appropriate switching) aspects were measured. Impaired retrieval occurred in Frisian (Language 1) and Dutch (Language 2), with a significant effect of AoA on naming in both languages. Earlier acquired words were better preserved and retrieved. Performance was identical across languages, but better in Dutch when controlling for covariates. However, participants demonstrated more inappropriate code switching within the Frisian test setting. On qualitative analysis, no differences in overall error distribution were found between languages for early or late acquired words. There existed a significantly higher percentage of semantically than visually-related errors. These findings have implications for understanding problems in lexical retrieval among bilingual individuals with dementia and its relation to decline in other cognitive functions which may play a role in inappropriate code switching. We discuss the findings in the light of the close relationship between Frisian and Dutch and the pattern of usage across the life-span.
Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆

Science.gov (United States)

Cao, Houwei; Verma, Ragini; Nenkova, Ani

2015-01-01

We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion.
Hardware-efficient robust biometric identification from 0.58 second template and 12 features of limb (Lead I) ECG signal using logistic regression classifier.

Science.gov (United States)

Sahadat, Md Nazmus; Jacobs, Eddie L; Morshed, Bashir I

2014-01-01

The electrocardiogram (ECG), widely known as a cardiac diagnostic signal, has recently been proposed for biometric identification of individuals; however reliability and reproducibility are of research interest. In this paper, we propose a template matching technique with 12 features using logistic regression classifier that achieved high reliability and identification accuracy. Non-invasive ECG signals were captured using our custom-built ambulatory EEG/ECG embedded device (NeuroMonitor). ECG data were collected from healthy subjects (10), between 25-35 years, for 10 seconds per trial. The number of trials from each subject was 10. From each trial, only 0.58 seconds of Lead I ECG data were used as template. Hardware-efficient fiducial point detection technique was implemented for feature extraction. To obtain repeated random sub-sampling validation, data were randomly separated into training and testing sets at a ratio of 80:20. Test data were used to find the classification accuracy. ECG template data with 12 extracted features provided the best performance in terms of accuracy (up to 100%) and processing complexity (computation time of 1.2ms). This work shows that a single limb (Lead I) ECG can robustly identify an individual quickly and reliably with minimal contact and data processing using the proposed algorithm.
Individual differences in selective attention predict speech identification at a cocktail party

Science.gov (United States)

Oberfeld, Daniel; Klöckner-Nowotny, Felicitas

2016-01-01

Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise. DOI: http://dx.doi.org/10.7554/eLife.16747.001 PMID:27580272
Why reference to the past is difficult for agrammatic speakers

NARCIS (Netherlands)

Bastiaanse, Roelien

Many studies have shown that verb inflections are difficult to produce for agrammatic aphasic speakers: they are frequently omitted and substituted. The present article gives an overview of our search to understanding why this is the case. The hypothesis is that grammatical morphology referring to

Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data

Science.gov (United States)

2017-08-20

M speakers. We seek a probabilistic solution to domain adap- tation, and so we encode knowledge of the out-of-domain data in prior distributions...the VB solution from (16)-(21) becomes: µ =αȳ + (1− α)µout, (24) Σa =α ( 1 NT NT∑ n=1 〈ynyTn 〉 − ȳȳT ) + (1− α) Σouta (25) + α (1− α) ( ȳ − µout...non- English languages and from unseen channels. An inadequate in-domain set was provided, which consisted of 2272 samples from 1164 speakers, and
Infants' Understanding of False Labeling Events: The Referential Roles of Words and the Speakers Who Use Them.

Science.gov (United States)

Koenig, Melissa A.; Echols, Catharine H.

2003-01-01

Four studies examined whether 16-month-olds' responses to true/false utterances interacted with their knowledge of human agents. Findings suggested that infants are developing a critical conception of human speakers as truthful communicators and that infants understand that human speakers may provide uniquely useful information when a word fails…
Robust pattern decoding in shape-coded structured light

Science.gov (United States)

Tang, Suming; Zhang, Xu; Song, Zhan; Song, Lifang; Zeng, Hai

2017-09-01

Decoding is a challenging and complex problem in a coded structured light system. In this paper, a robust pattern decoding method is proposed for the shape-coded structured light in which the pattern is designed as grid shape with embedded geometrical shapes. In our decoding method, advancements are made at three steps. First, a multi-template feature detection algorithm is introduced to detect the feature point which is the intersection of each two orthogonal grid-lines. Second, pattern element identification is modelled as a supervised classification problem and the deep neural network technique is applied for the accurate classification of pattern elements. Before that, a training dataset is established, which contains a mass of pattern elements with various blurring and distortions. Third, an error correction mechanism based on epipolar constraint, coplanarity constraint and topological constraint is presented to reduce the false matches. In the experiments, several complex objects including human hand are chosen to test the accuracy and robustness of the proposed method. The experimental results show that our decoding method not only has high decoding accuracy, but also owns strong robustness to surface color and complex textures.
Robust facial landmark detection based on initializing multiple poses

Directory of Open Access Journals (Sweden)

Xin Chai

2016-10-01

Full Text Available For robot systems, robust facial landmark detection is the first and critical step for face-based human identification and facial expression recognition. In recent years, the cascaded-regression-based method has achieved excellent performance in facial landmark detection. Nevertheless, it still has certain weakness, such as high sensitivity to the initialization. To address this problem, regression based on multiple initializations is established in a unified model; face shapes are then estimated independently according to these initializations. With a ranking strategy, the best estimate is selected as the final output. Moreover, a face shape model based on restricted Boltzmann machines is built as a constraint to improve the robustness of ranking. Experiments on three challenging datasets demonstrate the effectiveness of the proposed facial landmark detection method against state-of-the-art methods.
Teaching the Native English Speaker How to Teach English

Science.gov (United States)

Odhuu, Kelli

2014-01-01

This article speaks to teachers who have been paired with native speakers (NSs) who have never taught before, and the feelings of frustration, discouragement, and nervousness on the teacher's behalf that can occur as a result. In order to effectively tackle this situation, teachers need to work together with the NSs. Teachers in this scenario…
On-line signal trend identification

International Nuclear Information System (INIS)

Tambouratzis, T.; Antonopoulos-Domis, M.

2004-01-01

An artificial neural network, based on the self-organizing map, is proposed for on-line signal trend identification. Trends are categorized at each incoming signal as steady-state, increasing and decreasing, while they are further classified according to characteristics such signal shape and rate of change. Tests with model-generated signals illustrate the ability of the self-organizing map to accurately and reliably perform on-line trend identification in terms of both detection and classification. The proposed methodology has been found robust to the presence of white noise
Identification of selective inhibitors of RET and comparison with current clinical candidates through development and validation of a robust screening cascade [version 1; referees: 2 approved

Directory of Open Access Journals (Sweden)

Amanda J. Watson

2016-05-01

Full Text Available RET (REarranged during Transfection is a receptor tyrosine kinase, which plays pivotal roles in regulating cell survival, differentiation, proliferation, migration and chemotaxis. Activation of RET is a mechanism of oncogenesis in medullary thyroid carcinomas where both germline and sporadic activating somatic mutations are prevalent. At present, there are no known specific RET inhibitors in clinical development, although many potent inhibitors of RET have been opportunistically identified through selectivity profiling of compounds initially designed to target other tyrosine kinases. Vandetanib and cabozantinib, both multi-kinase inhibitors with RET activity, are approved for use in medullary thyroid carcinoma, but additional pharmacological activities, most notably inhibition of vascular endothelial growth factor - VEGFR2 (KDR, lead to dose-limiting toxicity. The recent identification of RET fusions present in ~1% of lung adenocarcinoma patients has renewed interest in the identification and development of more selective RET inhibitors lacking the toxicities associated with the current treatments. In an earlier publication [Newton et al, 2016; 1] we reported the discovery of a series of 2-substituted phenol quinazolines as potent and selective RET kinase inhibitors. Here we describe the development of the robust screening cascade which allowed the identification and advancement of this chemical series. Furthermore we have profiled a panel of RET-active clinical compounds both to validate the cascade and to confirm that none display a RET-selective target profile.
Phraseology and Frequency of Occurrence on the Web: Native Speakers' Perceptions of Google-Informed Second Language Writing

Science.gov (United States)

Geluso, Joe

2013-01-01

Usage-based theories of language learning suggest that native speakers of a language are acutely aware of formulaic language due in large part to frequency effects. Corpora and data-driven learning can offer useful insights into frequent patterns of naturally occurring language to second/foreign language learners who, unlike native speakers, are…
A Robust Color Image Watermarking Scheme Using Entropy and QR Decomposition

Directory of Open Access Journals (Sweden)

L. Laur

2015-12-01

Full Text Available Internet has affected our everyday life drastically. Expansive volumes of information are exchanged over the Internet consistently which causes numerous security concerns. Issues like content identification, document and image security, audience measurement, ownership, copyrights and others can be settled by using digital watermarking. In this work, robust and imperceptible non-blind color image watermarking algorithm is proposed, which benefit from the fact that watermark can be hidden in different color channel which results into further robustness of the proposed technique to attacks. Given method uses some algorithms such as entropy, discrete wavelet transform, Chirp z-transform, orthogonal-triangular decomposition and Singular value decomposition in order to embed the watermark in a color image. Many experiments are performed using well-known signal processing attacks such as histogram equalization, adding noise and compression. Experimental results show that proposed scheme is imperceptible and robust against common signal processing attacks.
A Cross-Cultural Comparative Study of Apology Strategies Employed by Iranian EFL Learners and English Native Speakers

Directory of Open Access Journals (Sweden)

Elham Abedi

2016-10-01

Full Text Available The development of speech-act theory has provided the hearers with a better understanding of what speakers intend to perform in the act of communication. One type of speech act is apologizing. When an action or utterance has resulted in an offense, the offender needs to apologize. In the present study, an attempt was made to compare the apology strategies employed by Iranian EFL learners and those of English native speakers in order to find out the possible differences and similarities. To this end, a discourse completion test (DCT was given to 100 male and female Iranian EFL learners and English native speakers. The respondents were supposed to complete the DCTs based on nine situations, which varied in terms of power between the interlocutors and level of imposition. This study employed Cohen and Olshtain's (1981 model to classify various types of apology strategies. The obtained results revealed some similarities along with some (statistically insignificant differences between EFL learners and American English speakers in terms of their use of apology strategies. Furthermore, it was found that the illocutionary force indicating devices (IFIDs, such as request for forgiveness and an offer of apology were the strategies mostly employed by the Iranian EFL learners while taking on responsibility such as explicit self-blame, and expression of self-deficiency were found to be the strategies mostly used by English native speakers. In terms of gender, the male and female respondents more or less used the same apology strategies in response to the situations. The findings of the present research can be used by language teachers as well as sociolinguists. Keywords: Speech act theory, Speech act of apology, Apology strategies, Iranian EFL learners, English Native speakers, Gender
A voting-based star identification algorithm utilizing local and global distribution

Science.gov (United States)

Fan, Qiaoyun; Zhong, Xuyang; Sun, Junhua

2018-03-01

A novel star identification algorithm based on voting scheme is presented in this paper. In the proposed algorithm, the global distribution and local distribution of sensor stars are fully utilized, and the stratified voting scheme is adopted to obtain the candidates for sensor stars. The database optimization is employed to reduce its memory requirement and improve the robustness of the proposed algorithm. The simulation shows that the proposed algorithm exhibits 99.81% identification rate with 2-pixel standard deviations of positional noises and 0.322-Mv magnitude noises. Compared with two similar algorithms, the proposed algorithm is more robust towards noise, and the average identification time and required memory is less. Furthermore, the real sky test shows that the proposed algorithm performs well on the real star images.
Evidential Uses in the Spanish of Quechua Speakers in Peru.

Science.gov (United States)

Escobar, Anna Maria

1994-01-01

Analysis of recordings of spontaneous speech of native speakers of Quechua speaking Spanish as a second language reveals that, using verbal morphological resources of Spanish, they have grammaticalized an epistemic marking system resembling that of Quechua. Sources of this process in both Quechua and Spanish are analyzed. (MSE)
Openings and Closings in Telephone Conversations between Native Spanish Speakers.

Science.gov (United States)

Coronel-Molina, Serafin M.

1998-01-01

A study analyzed the opening and closing sequences of 11 dyads of native Spanish-speakers in natural telephone conversations conducted in Spanish. The objective was to determine how closely Hispanic cultural patterns of conduct for telephone conversations follow the sequences outlined in previous research. It is concluded that Spanish…
Age of acquisition and naming performance in Frisian-Dutch bilingual speakers with dementia

Directory of Open Access Journals (Sweden)

Wencke S. Veenstra

Full Text Available Age of acquisition (AoA of words is a recognised variable affecting language processing in speakers with and without language disorders. For bi- and multilingual speakers their languages can be differentially affected in neurological illness. Study of language loss in bilingual speakers with dementia has been relatively neglected.OBJECTIVE:We investigated whether AoA of words was associated with level of naming impairment in bilingual speakers with probable Alzheimer's dementia within and across their languages.METHODS:Twenty-six Frisian-Dutch bilinguals with mild to moderate dementia named 90 pictures in each language, employing items with rated AoA and other word variable measures matched across languages. Quantitative (totals correct and qualitative (error types and (inappropriate switching aspects were measured.RESULTSImpaired retrieval occurred in Frisian (Language 1 and Dutch (Language 2, with a significant effect of AoA on naming in both languages. Earlier acquired words were better preserved and retrieved. Performance was identical across languages, but better in Dutch when controlling for covariates. However, participants demonstrated more inappropriate code switching within the Frisian test setting. On qualitative analysis, no differences in overall error distribution were found between languages for early or late acquired words. There existed a significantly higher percentage of semantically than visually-related errors.CONCLUSIONThese findings have implications for understanding problems in lexical retrieval among bilingual individuals with dementia and its relation to decline in other cognitive functions which may play a role in inappropriate code switching. We discuss the findings in the light of the close relationship between Frisian and Dutch and the pattern of usage across the life-span.
Effects of a metronome on the filled pauses of fluent speakers.

Science.gov (United States)

Christenfeld, N

1996-12-01

Filled pauses (the "ums" and "uhs" that litter spontaneous speech) seem to be a product of the speaker paying deliberate attention to the normally automatic act of talking. This is the same sort of explanation that has been offered for stuttering. In this paper we explore whether a manipulation that has long been known to decrease stuttering, synchronizing speech to the beats of a metronome, will then also decrease filled pauses. Two experiments indicate that a metronome has a dramatic effect on the production of filled pauses. This effect is not due to any simplification or slowing of the speech and supports the view that a metronome causes speakers to attend more to how they are talking and less to what they are saying. It also lends support to the connection between stutters and filled pauses.
An Autonomous Star Identification Algorithm Based on One-Dimensional Vector Pattern for Star Sensors.

Science.gov (United States)

Luo, Liyan; Xu, Luping; Zhang, Hua

2015-07-07

In order to enhance the robustness and accelerate the recognition speed of star identification, an autonomous star identification algorithm for star sensors is proposed based on the one-dimensional vector pattern (one_DVP). In the proposed algorithm, the space geometry information of the observed stars is used to form the one-dimensional vector pattern of the observed star. The one-dimensional vector pattern of the same observed star remains unchanged when the stellar image rotates, so the problem of star identification is simplified as the comparison of the two feature vectors. The one-dimensional vector pattern is adopted to build the feature vector of the star pattern, which makes it possible to identify the observed stars robustly. The characteristics of the feature vector and the proposed search strategy for the matching pattern make it possible to achieve the recognition result as quickly as possible. The simulation results demonstrate that the proposed algorithm can effectively accelerate the star identification. Moreover, the recognition accuracy and robustness by the proposed algorithm are better than those by the pyramid algorithm, the modified grid algorithm, and the LPT algorithm. The theoretical analysis and experimental results show that the proposed algorithm outperforms the other three star identification algorithms.
Source apportionment of soil heavy metals using robust absolute principal component scores-robust geographically weighted regression (RAPCS-RGWR) receptor model.

Science.gov (United States)

Qu, Mingkai; Wang, Yan; Huang, Biao; Zhao, Yongcun

2018-06-01

The traditional source apportionment models, such as absolute principal component scores-multiple linear regression (APCS-MLR), are usually susceptible to outliers, which may be widely present in the regional geochemical dataset. Furthermore, the models are merely built on variable space instead of geographical space and thus cannot effectively capture the local spatial characteristics of each source contributions. To overcome the limitations, a new receptor model, robust absolute principal component scores-robust geographically weighted regression (RAPCS-RGWR), was proposed based on the traditional APCS-MLR model. Then, the new method was applied to the source apportionment of soil metal elements in a region of Wuhan City, China as a case study. Evaluations revealed that: (i) RAPCS-RGWR model had better performance than APCS-MLR model in the identification of the major sources of soil metal elements, and (ii) source contributions estimated by RAPCS-RGWR model were more close to the true soil metal concentrations than that estimated by APCS-MLR model. It is shown that the proposed RAPCS-RGWR model is a more effective source apportionment method than APCS-MLR (i.e., non-robust and global model) in dealing with the regional geochemical dataset. Copyright © 2018 Elsevier B.V. All rights reserved.
The Blame Game: Performance Analysis of Speaker Diarization System Components

NARCIS (Netherlands)

Huijbregts, M.A.H.; Wooters, Chuck

2007-01-01

In this paper we discuss the performance analysis of a speaker diarization system similar to the system that was submitted by ICSI at the NIST RT06s evaluation benchmark. The analysis that is based on a series of oracle experiments, provides a good understanding of the performance of each system
An optimization methodology for identifying robust process integration investments under uncertainty

Energy Technology Data Exchange (ETDEWEB)

Svensson, Elin; Berntsson, Thore [Department of Energy and Environment, Division of Heat and Power Technology, Chalmers University of Technology, SE-412 96 Goeteborg (Sweden); Stroemberg, Ann-Brith [Fraunhofer-Chalmers Research Centre for Industrial Mathematics, Chalmers Science Park, SE-412 88 Gothenburg (Sweden); Patriksson, Michael [Department of Mathematical Sciences, Chalmers University of Technology and Department of Mathematical Sciences, University of Gothenburg, SE-412 96 Goeteborg (Sweden)

2009-02-15

Uncertainties in future energy prices and policies strongly affect decisions on investments in process integration measures in industry. In this paper, we present a five-step methodology for the identification of robust investment alternatives incorporating explicitly such uncertainties in the optimization model. Methods for optimization under uncertainty (or, stochastic programming) are thus combined with a deep understanding of process integration and process technology in order to achieve a framework for decision-making concerning the investment planning of process integration measures under uncertainty. The proposed methodology enables the optimization of investments in energy efficiency with respect to their net present value or an environmental objective. In particular, as a result of the optimization approach, complex investment alternatives, allowing for combinations of energy efficiency measures, can be analyzed. Uncertainties as well as time-dependent parameters, such as energy prices and policies, are modelled using a scenario-based approach, enabling the identification of robust investment solutions. The methodology is primarily an aid for decision-makers in industry, but it will also provide insight for policy-makers into how uncertainties regarding future price levels and policy instruments affect the decisions on investments in energy efficiency measures. (author)
An optimization methodology for identifying robust process integration investments under uncertainty

International Nuclear Information System (INIS)

Svensson, Elin; Berntsson, Thore; Stroemberg, Ann-Brith; Patriksson, Michael

2009-01-01

Uncertainties in future energy prices and policies strongly affect decisions on investments in process integration measures in industry. In this paper, we present a five-step methodology for the identification of robust investment alternatives incorporating explicitly such uncertainties in the optimization model. Methods for optimization under uncertainty (or, stochastic programming) are thus combined with a deep understanding of process integration and process technology in order to achieve a framework for decision-making concerning the investment planning of process integration measures under uncertainty. The proposed methodology enables the optimization of investments in energy efficiency with respect to their net present value or an environmental objective. In particular, as a result of the optimization approach, complex investment alternatives, allowing for combinations of energy efficiency measures, can be analyzed. Uncertainties as well as time-dependent parameters, such as energy prices and policies, are modelled using a scenario-based approach, enabling the identification of robust investment solutions. The methodology is primarily an aid for decision-makers in industry, but it will also provide insight for policy-makers into how uncertainties regarding future price levels and policy instruments affect the decisions on investments in energy efficiency measures. (author)

Adaptive Communication: Languages with More Non-Native Speakers Tend to Have Fewer Word Forms.

Directory of Open Access Journals (Sweden)

Christian Bentz

Full Text Available Explaining the diversity of languages across the world is one of the central aims of typological, historical, and evolutionary linguistics. We consider the effect of language contact-the number of non-native speakers a language has-on the way languages change and evolve. By analysing hundreds of languages within and across language families, regions, and text types, we show that languages with greater levels of contact typically employ fewer word forms to encode the same information content (a property we refer to as lexical diversity. Based on three types of statistical analyses, we demonstrate that this variance can in part be explained by the impact of non-native speakers on information encoding strategies. Finally, we argue that languages are information encoding systems shaped by the varying needs of their speakers. Language evolution and change should be modeled as the co-evolution of multiple intertwined adaptive systems: On one hand, the structure of human societies and human learning capabilities, and on the other, the structure of language.
Adaptive Communication: Languages with More Non-Native Speakers Tend to Have Fewer Word Forms

Science.gov (United States)

Bentz, Christian; Verkerk, Annemarie; Kiela, Douwe; Hill, Felix; Buttery, Paula

2015-01-01

Explaining the diversity of languages across the world is one of the central aims of typological, historical, and evolutionary linguistics. We consider the effect of language contact-the number of non-native speakers a language has-on the way languages change and evolve. By analysing hundreds of languages within and across language families, regions, and text types, we show that languages with greater levels of contact typically employ fewer word forms to encode the same information content (a property we refer to as lexical diversity). Based on three types of statistical analyses, we demonstrate that this variance can in part be explained by the impact of non-native speakers on information encoding strategies. Finally, we argue that languages are information encoding systems shaped by the varying needs of their speakers. Language evolution and change should be modeled as the co-evolution of multiple intertwined adaptive systems: On one hand, the structure of human societies and human learning capabilities, and on the other, the structure of language. PMID:26083380
Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

Science.gov (United States)

Kayasith, Prakasith; Theeramunkong, Thanaruk

It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.
Speech overlap detection in a two-pass speaker diarization system

NARCIS (Netherlands)

Huijbregts, M.A.H.; Leeuwen, D.A. van; Jong, F. M. G de

2009-01-01

In this paper we present the two-pass speaker diarization system that we developed for the NIST RT09s evaluation. In the first pass of our system a model for speech overlap detection is gen- erated automatically. This model is used in two ways to reduce the diarization errors due to overlapping
Speech overlap detection in a two-pass speaker diarization system

NARCIS (Netherlands)

Huijbregts, M.; Leeuwen, D.A. van; Jong, F.M.G. de

2009-01-01

In this paper we present the two-pass speaker diarization system that we developed for the NIST RT09s evaluation. In the first pass of our system a model for speech overlap detection is generated automatically. This model is used in two ways to reduce the diarization errors due to overlapping
Reading and Vocabulary Recommendations for Spanish for Native Speakers Materials.

Science.gov (United States)

Spencer, Laura Gutierrez

1995-01-01

Focuses on the need for appropriate materials to address the needs of native speakers of Spanish who study Spanish in American universities and high schools. The most important factors influencing the selection of readings should include the practical nature of themes for reading and vocabulary development, level of difficulty, and variety in…
Robust design optimization using the price of robustness, robust least squares and regularization methods

Science.gov (United States)

Bukhari, Hassan J.

2017-12-01

In this paper a framework for robust optimization of mechanical design problems and process systems that have parametric uncertainty is presented using three different approaches. Robust optimization problems are formulated so that the optimal solution is robust which means it is minimally sensitive to any perturbations in parameters. The first method uses the price of robustness approach which assumes the uncertain parameters to be symmetric and bounded. The robustness for the design can be controlled by limiting the parameters that can perturb.The second method uses the robust least squares method to determine the optimal parameters when data itself is subjected to perturbations instead of the parameters. The last method manages uncertainty by restricting the perturbation on parameters to improve sensitivity similar to Tikhonov regularization. The methods are implemented on two sets of problems; one linear and the other non-linear. This methodology will be compared with a prior method using multiple Monte Carlo simulation runs which shows that the approach being presented in this paper results in better performance.
Hemispheric lateralization of linguistic prosody recognition in comparison to speech and speaker recognition.

Science.gov (United States)

Kreitewolf, Jens; Friederici, Angela D; von Kriegstein, Katharina

2014-11-15

Hemispheric specialization for linguistic prosody is a controversial issue. While it is commonly assumed that linguistic prosody and emotional prosody are preferentially processed in the right hemisphere, neuropsychological work directly comparing processes of linguistic prosody and emotional prosody suggests a predominant role of the left hemisphere for linguistic prosody processing. Here, we used two functional magnetic resonance imaging (fMRI) experiments to clarify the role of left and right hemispheres in the neural processing of linguistic prosody. In the first experiment, we sought to confirm previous findings showing that linguistic prosody processing compared to other speech-related processes predominantly involves the right hemisphere. Unlike previous studies, we controlled for stimulus influences by employing a prosody and speech task using the same speech material. The second experiment was designed to investigate whether a left-hemispheric involvement in linguistic prosody processing is specific to contrasts between linguistic prosody and emotional prosody or whether it also occurs when linguistic prosody is contrasted against other non-linguistic processes (i.e., speaker recognition). Prosody and speaker tasks were performed on the same stimulus material. In both experiments, linguistic prosody processing was associated with activity in temporal, frontal, parietal and cerebellar regions. Activation in temporo-frontal regions showed differential lateralization depending on whether the control task required recognition of speech or speaker: recognition of linguistic prosody predominantly involved right temporo-frontal areas when it was contrasted against speech recognition; when contrasted against speaker recognition, recognition of linguistic prosody predominantly involved left temporo-frontal areas. The results show that linguistic prosody processing involves functions of both hemispheres and suggest that recognition of linguistic prosody is based on
Ordered short-term memory differs in signers and speakers: Implications for models of short-term memory

OpenAIRE

Bavelier, Daphne; Newport, Elissa L.; Hall, Matt; Supalla, Ted; Boutla, Mrim

2008-01-01

Capacity limits in linguistic short-term memory (STM) are typically measured with forward span tasks in which participants are asked to recall lists of words in the order presented. Using such tasks, native signers of American Sign Language (ASL) exhibit smaller spans than native speakers (Boutla, Supalla, Newport, & Bavelier, 2004). Here, we test the hypothesis that this population difference reflects differences in the way speakers and signers maintain temporal order information in short-te...
Native Speakers as Teachers in Turkey: Non-Native Pre-Service English Teachers' Reactions to a Nation-Wide Project

Science.gov (United States)

Coskun, Abdullah

2013-01-01

Although English is now a recognized international language and the concept of native speaker is becoming more doubtful every day, the empowerment of the native speakers of English as language teaching professionals is still continuing (McKay, 2002), especially in Asian countries like China and Japan. One of the latest examples showing the…
Lexical access in a bilingual speaker with dementia: Changes over time.

Science.gov (United States)

Lind, Marianne; Simonsen, Hanne Gram; Ribu, Ingeborg Sophie Bjønness; Svendsen, Bente Ailin; Svennevig, Jan; de Bot, Kees

2018-01-01

In this article, we explore the naming skills of a bilingual English-Norwegian speaker diagnosed with Primary Progressive Aphasia, in each of his languages across three different speech contexts: confrontation naming, semi-spontaneous narrative (picture description), and conversation, and at two points in time: 12 and 30 months post diagnosis, respectively. The results are discussed in light of two main theories of lexical retrieval in healthy, elderly speakers: the Transmission Deficit Hypothesis and the Inhibitory Deficit Theory. Our data show that, consistent with the participant's premorbid use of and proficiency in the two languages, his performance in his L2 is lower than in his L1, but this difference diminishes as the disease progresses. This is the case across the three speech contexts; however, the difference is smaller in the narrative task, where his performance is very low in both languages already at the first measurement point. Despite his word finding problems, he is able to take active part in conversation, particularly in his L1 and more so at the first measurement point. In addition to the task effect, we find effects of word class, frequency, and cognateness on his naming skills. His performance seems to support the Transmission Deficit Hypothesis. By combining different tools and methods of analysis, we get a more comprehensive picture of the impact of the dementia on the speaker's languages from an intra-individual as well as an inter-individual perspective, which may be useful in research as well as in clinical practice.
THE ROLE OF NON-NATIVE ENGLISH SPEAKER TEACHERS IN ENGLISH LANGUAGE LEARNING

Directory of Open Access Journals (Sweden)

Lutfi Ashar Mauludin

2017-04-01

Full Text Available Native-English Speaker Teachers (NESTs and Non-Native English Speaker Teachers (NNESTs have their own advantages and disadvantages. However, for English Language Learners (ELLs, NNESTs have more advantages in helping students to acquire English skills. At least there are three factors that can only be performed by NNESTs in English Language Learning. The factors are knowledge of the subject, effective communication, and understanding students‘ difficulties/needs. The NNESTs can effectively provide the clear explanation of knowledge of the language because they are supported by the same background and culture. NNESTs also can communicate with the students with all levels effectively. The use of L1 is effective to help students building their knowledge. Finally, NNESTs can provide the objectives and materials that are suitable with the needs of the students.
Automated robust registration of grossly misregistered whole-slide images with varying stains

Science.gov (United States)

Litjens, G.; Safferling, K.; Grabe, N.

2016-03-01

Cancer diagnosis and pharmaceutical research increasingly depend on the accurate quantification of cancer biomarkers. Identification of biomarkers is usually performed through immunohistochemical staining of cancer sections on glass slides. However, combination of multiple biomarkers from a wide variety of immunohistochemically stained slides is a tedious process in traditional histopathology due to the switching of glass slides and re-identification of regions of interest by pathologists. Digital pathology now allows us to apply image registration algorithms to digitized whole-slides to align the differing immunohistochemical stains automatically. However, registration algorithms need to be robust to changes in color due to differing stains and severe changes in tissue content between slides. In this work we developed a robust registration methodology to allow for fast coarse alignment of multiple immunohistochemical stains to the base hematyoxylin and eosin stained image. We applied HSD color model conversion to obtain a less stain color dependent representation of the whole-slide images. Subsequently, optical density thresholding and connected component analysis were used to identify the relevant regions for registration. Template matching using normalized mutual information was applied to provide initial translation and rotation parameters, after which a cost function-driven affine registration was performed. The algorithm was validated using 40 slides from 10 prostate cancer patients, with landmark registration error as a metric. Median landmark registration error was around 180 microns, which indicates performance is adequate for practical application. None of the registrations failed, indicating the robustness of the algorithm.
Politics of Participation in Benoît Maubrey’s Speaker Sculptures

DEFF Research Database (Denmark)

Keylin, Vadim

a designated number, or using Bluetooth or WiFi technologies, and express themselves freely through the sculpture. In my paper, I investigate the strategies of audience engagement the Maubrey employs and their applicability to the acoustic design of urban spaces. Through their numerous loudspeakers, Speaker...
Visual and auditory digit-span performance in native and nonnative speakers

NARCIS (Netherlands)

Olsthoorn, N.M.; Andringa, S.; Hulstijn, J.H.

2014-01-01

We compared 121 native and 114 non-native speakers of Dutch (with 35 different first languages) on four digit-span tasks, varying modality (visual/auditory) and direction (forward/backward). An interaction was observed between nativeness and modality, such that, while natives performed better than
Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem

NARCIS (Netherlands)

Friedland, G.; Yeo, C.; Hung, H.

2010-01-01

The following article presents a novel audio-visual approach for unsupervised speaker localization in both time and space and systematically analyzes its unique properties. Using recordings from a single, low-resolution room overview camera and a single far-field microphone, a state-of-the-art
The native-language benefit for talker identification is robust in 7.5-month-old infants.

Science.gov (United States)

Fecher, Natalie; Johnson, Elizabeth K

2018-04-26

Adults recognize talkers better when the talkers speak a familiar language than when they speak an unfamiliar language. This language familiarity effect (LFE) demonstrates the inseparable nature of linguistic and indexical information in adult spoken language processing. Relatively little is known about children's integration of linguistic and indexical information in speech. For example, to date, only one study has explored the LFE in infants. Here, we sought to better understand the maturation of speech processing abilities in infants by replicating this earlier study using a more stringent experimental design (eliminating a potential voice-language confound), a different test population (English- rather than Dutch-learning infants), and a new language pairing (English vs. Polish rather than Dutch vs. Italian or Japanese). Furthermore, we explored the language exposure conditions required for infants to develop an LFE for a formerly unfamiliar language. We hypothesized based on previous studies (including the perceptual narrowing literature) that infants might develop an LFE more readily than would adults. Although our findings replicate those of the earlier study-demonstrating that the LFE is robust in 7.5-month-olds-we found no evidence that infants need less language exposure than do adults to develop an LFE. We concluded that both infants and adults need extensive (potentially live) exposure to an unfamiliar language before talker identification in that language improves. Moreover, our study suggests that the LFE is likely rooted in early emerging phonology rather than shared lexical knowledge and that infants already closely resemble adults in their processing of linguistic and indexical information. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Speech rate normalization used to improve speaker verification

CSIR Research Space (South Africa)

Van Heerden, CJ

2006-11-01

Full Text Available the normalized durations is then compared with the EER using unnormalized durations, and also with the EER when duration information is not employed. 2. Proposed phoneme duration modeling 2.1. Choosing parametric models Since the duration of a phoneme... the known transcription and the speaker-specific acoustic model described above. Only one pronunciation per word was allowed, thus resulting in 49 triphones. To decide which parametric model to use for the duration density func- tions of the triphones...
Does the speaker's voice quality influence children's performance on a language comprehension test?

Science.gov (United States)

Lyberg-Åhlander, Viveka; Haake, Magnus; Brännström, Jonas; Schötz, Susanne; Sahlén, Birgitta

2015-02-01

A small number of studies have explored children's perception of speakers' voice quality and its possible influence on language comprehension. The aim of this explorative study was to investigate the relationship between the examiner's voice quality, the child's performance on a digital version of a language comprehension test, the Test for Reception of Grammar (TROG-2), and two measures of cognitive functioning. The participants were (n = 86) mainstreamed 8-year old children with typical language development. Two groups of children (n = 41/45) were presented with the TROG-2 through recordings of one female speaker: one group was presented with a typical voice and the other with a simulated dysphonic voice. Significant associations were found between executive functioning and language comprehension. The results also showed that children listening to the dysphonic voice achieved significantly lower scores for more difficult sentences ("the man but not the horse jumps") and used more self-corrections on simpler sentences ("the girl is sitting"). Findings suggest that a dysphonic speaker's voice may force the child to allocate capacity to the processing of the voice signal at the expense of comprehension. The findings have implications for clinical and research settings where standardized language tests are used.
Studies of stability and robustness for artificial neural networks and boosted decision trees

International Nuclear Information System (INIS)

Yang, H.-J.; Roe, Byron P.; Zhu Ji

2007-01-01

In this paper, we compare the performance, stability and robustness of Artificial Neural Networks (ANN) and Boosted Decision Trees (BDT) using MiniBooNE Monte Carlo samples. These methods attempt to classify events given a number of identification variables. The BDT algorithm has been discussed by us in previous publications. Testing is done in this paper by smearing and shifting the input variables of testing samples. Based on these studies, BDT has better particle identification performance than ANN. The degradation of the classifications obtained by shifting or smearing variables of testing results is smaller for BDT than for ANN

A Unifying Mathematical Framework for Genetic Robustness, Environmental Robustness, Network Robustness and their Trade-offs on Phenotype Robustness in Biological Networks. Part III: Synthetic Gene Networks in Synthetic Biology

Science.gov (United States)

Chen, Bor-Sen; Lin, Ying-Po

2013-01-01

Robust stabilization and environmental disturbance attenuation are ubiquitous systematic properties that are observed in biological systems at many different levels. The underlying principles for robust stabilization and environmental disturbance attenuation are universal to both complex biological systems and sophisticated engineering systems. In many biological networks, network robustness should be large enough to confer: intrinsic robustness for tolerating intrinsic parameter fluctuations; genetic robustness for buffering genetic variations; and environmental robustness for resisting environmental disturbances. Network robustness is needed so phenotype stability of biological network can be maintained, guaranteeing phenotype robustness. Synthetic biology is foreseen to have important applications in biotechnology and medicine; it is expected to contribute significantly to a better understanding of functioning of complex biological systems. This paper presents a unifying mathematical framework for investigating the principles of both robust stabilization and environmental disturbance attenuation for synthetic gene networks in synthetic biology. Further, from the unifying mathematical framework, we found that the phenotype robustness criterion for synthetic gene networks is the following: if intrinsic robustness + genetic robustness + environmental robustness ≦ network robustness, then the phenotype robustness can be maintained in spite of intrinsic parameter fluctuations, genetic variations, and environmental disturbances. Therefore, the trade-offs between intrinsic robustness, genetic robustness, environmental robustness, and network robustness in synthetic biology can also be investigated through corresponding phenotype robustness criteria from the systematic point of view. Finally, a robust synthetic design that involves network evolution algorithms with desired behavior under intrinsic parameter fluctuations, genetic variations, and environmental
Two-component network model in voice identification technologies

Directory of Open Access Journals (Sweden)

Edita K. Kuular

2018-03-01

Full Text Available Among the most important parameters of biometric systems with voice modalities that determine their effectiveness, along with reliability and noise immunity, a speed of identification and verification of a person has been accentuated. This parameter is especially sensitive while processing large-scale voice databases in real time regime. Many research studies in this area are aimed at developing new and improving existing algorithms for presentation and processing voice records to ensure high performance of voice biometric systems. Here, it seems promising to apply a modern approach, which is based on complex network platform for solving complex massive problems with a large number of elements and taking into account their interrelationships. Thus, there are known some works which while solving problems of analysis and recognition of faces from photographs, transform images into complex networks for their subsequent processing by standard techniques. One of the first applications of complex networks to sound series (musical and speech analysis are description of frequency characteristics by constructing network models - converting the series into networks. On the network ontology platform a previously proposed technique of audio information representation aimed on its automatic analysis and speaker recognition has been developed. This implies converting information into the form of associative semantic (cognitive network structure with amplitude and frequency components both. Two speaker exemplars have been recorded and transformed into pertinent networks with consequent comparison of their topological metrics. The set of topological metrics for each of network models (amplitude and frequency one is a vector, and together those combine a matrix, as a digital "network" voiceprint. The proposed network approach, with its sensitivity to personal conditions-physiological, psychological, emotional, might be useful not only for person identification
A Novel Approach to Speaker Weight Estimation Using a Fusion of the i-vector and NFA Frameworks

DEFF Research Database (Denmark)

Poorjam, Amir Hossein; Bahari, Mohamad Hasan; Van hamme, Hogo

2017-01-01

-negative Factor Analysis (NFA) framework which is based on a constrained factor analysis on GMM weight supervectors. Then, the available information in both Gaussian means and Gaussian weights is exploited through a feature-level fusion of the i-vectors and the NFA vectors. Finally, a least-squares support vector......This paper proposes a novel approach for automatic speaker weight estimation from spontaneous telephone speech signals. In this method, each utterance is modeled using the i-vector framework which is based on the factor analysis on Gaussian Mixture Model (GMM) mean supervectors, and the Non...... regression is employed to estimate the weight of speakers from the given utterances. The proposed approach is evaluated on spontaneous telephone speech signals of National Institute of Standards and Technology 2008 and 2010 Speaker Recognition Evaluation corpora. To investigate the effectiveness...
Robust audio-visual speech recognition under noisy audio-video conditions.

Science.gov (United States)

Stewart, Darryl; Seymour, Rowan; Pass, Adrian; Ming, Ji

2014-02-01

This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.
Robust Multimodal Dictionary Learning

Science.gov (United States)

Cao, Tian; Jojic, Vladimir; Modla, Shannon; Powell, Debbie; Czymmek, Kirk; Niethammer, Marc

2014-01-01

We propose a robust multimodal dictionary learning method for multimodal images. Joint dictionary learning for both modalities may be impaired by lack of correspondence between image modalities in training data, for example due to areas of low quality in one of the modalities. Dictionaries learned with such non-corresponding data will induce uncertainty about image representation. In this paper, we propose a probabilistic model that accounts for image areas that are poorly corresponding between the image modalities. We cast the problem of learning a dictionary in presence of problematic image patches as a likelihood maximization problem and solve it with a variant of the EM algorithm. Our algorithm iterates identification of poorly corresponding patches and re-finements of the dictionary. We tested our method on synthetic and real data. We show improvements in image prediction quality and alignment accuracy when using the method for multimodal image registration. PMID:24505674
MTGAN: Speaker Verification through Multitasking Triplet Generative Adversarial Networks

OpenAIRE

Ding, Wenhao; He, Liang

2018-01-01

In this paper, we propose an enhanced triplet method that improves the encoding process of embeddings by jointly utilizing generative adversarial mechanism and multitasking optimization. We extend our triplet encoder with Generative Adversarial Networks (GANs) and softmax loss function. GAN is introduced for increasing the generality and diversity of samples, while softmax is for reinforcing features about speakers. For simplification, we term our method Multitasking Triplet Generative Advers...
Diversity in the lexical and syntactic abilities of fluent aphasic speakers

NARCIS (Netherlands)

Bastiaanse, Y.R.M.; Edwards, S.

In an earlier study by the authors, it was suggested that some fluent aphasic speakers exhibit subtle grammatical deficits. In this paper, how far lexical accessing problems might account for these deficits is considered. For this study, spontaneous speech data collected from two groups of aphasic
Procedure for inscription in the list of speakers at meetings of the Board of Governors

International Nuclear Information System (INIS)

2002-01-01

Full text: 1. By Rule 23 (d) of the Provisional Rules of Procedure of the Board of Governors:' No Governor may address the Board without having previously obtained the permission of the presiding officer. The presiding officer shall call upon speakers in the order in which they signify their desire to speak. The presiding officer may call a speaker to order if his remarks are not relevant to the subject under discussion.' The following procedures are applied generally concerning the implementation of Rule 23. 2. Governors or other members of delegations who wish to speak on an item notify the Secretary of the Board of their intention to speak. A Secretariat staff member will also be present on the podium in the Boardroom each day from 9:30 a.m. until the Board meeting commences, to receive requests from delegations to be added to the list of speakers. After the meeting commences delegates wishing to speak should raise their flag to be recognized by the Secretary of the Board. The names of delegations are inscribed on a single list of speakers, maintained by the Secretary of the Board, in the order in which they have signified their wish to speak. e. It may be noted that, in the course of debate on a particular item, delegations may signify an intention to speak more than once, including speaking in response to issues raised by other delegations during the debate. When two or more delegations simultaneously indicate from the floor an intention to speak, their names are inscribed in the order in which they are brought to the attention of the Secretary. 4. The established practice is to first call on delegations wishing to speak on behalf of regional groups on any specific item, followed by individual delegations, in the order in which their names are inscribed on the list of speakers. 5. Member States will appreciate that these guidelines cannot cover all contingencies and that special cases may arise from time to time. The Chairman will exercise flexibility where
Increased vocal intensity due to the Lombard effect in speakers with Parkinson's disease: simultaneous laryngeal and respiratory strategies.

Science.gov (United States)

Stathopoulos, Elaine T; Huber, Jessica E; Richardson, Kelly; Kamphaus, Jennifer; DeCicco, Devan; Darling, Meghan; Fulcher, Katrina; Sussman, Joan E

2014-01-01

The objective of the present study was to investigate whether speakers with hypophonia, secondary to Parkinson's disease (PD), would increases their vocal intensity when speaking in a noisy environment (Lombard effect). The other objective was to examine the underlying laryngeal and respiratory strategies used to increase vocal intensity. Thirty-three participants with PD were included for study. Each participant was fitted with the SpeechVive™ device that played multi-talker babble noise into one ear during speech. Using acoustic, aerodynamic and respiratory kinematic techniques, the simultaneous laryngeal and respiratory mechanisms used to regulate vocal intensity were examined. Significant group results showed that most speakers with PD (26/33) were successful at increasing their vocal intensity when speaking in the condition of multi-talker babble noise. They were able to support their increased vocal intensity and subglottal pressure with combined strategies from both the laryngeal and respiratory mechanisms. Individual speaker analysis indicated that the particular laryngeal and respiratory interactions differed among speakers. The SpeechVive™ device elicited higher vocal intensities from patients with PD. Speakers used different combinations of laryngeal and respiratory physiologic mechanisms to increase vocal intensity, thus suggesting that disease process does not uniformly affect the speech subsystems. Readers will be able to: (1) identify speech characteristics of people with Parkinson's disease (PD), (2) identify typical respiratory strategies for increasing sound pressure level (SPL), (3) identify typical laryngeal strategies for increasing SPL, (4) define the Lombard effect. Copyright © 2014 Elsevier Inc. All rights reserved.
Extending Situated Language Comprehension (Accounts with Speaker and Comprehender Characteristics: Toward Socially Situated Interpretation

Directory of Open Access Journals (Sweden)

Katja Münster

2018-01-01

Full Text Available More and more findings suggest a tight temporal coupling between (non-linguistic socially interpreted context and language processing. Still, real-time language processing accounts remain largely elusive with respect to the influence of biological (e.g., age and experiential (e.g., world and moral knowledge comprehender characteristics and the influence of the ‘socially interpreted’ context, as for instance provided by the speaker. This context could include actions, facial expressions, a speaker’s voice or gaze, and gestures among others. We review findings from social psychology, sociolinguistics and psycholinguistics to highlight the relevance of (the interplay between the socially interpreted context and comprehender characteristics for language processing. The review informs the extension of an extant real-time processing account (already featuring a coordinated interplay between language comprehension and the non-linguistic visual context with a variable (‘ProCom’ that captures characteristics of the language user and with a first approximation of the comprehender’s speaker representation. Extending the CIA to the sCIA (social Coordinated Interplay Account is the first step toward a real-time language comprehension account which might eventually accommodate the socially situated communicative interplay between comprehenders and speakers.
Language matters: thirteen-month-olds understand that the language a speaker uses constrains conventionality.

Science.gov (United States)

Scott, Jessica C; Henderson, Annette M E

2013-11-01

Object labels are valuable communicative tools because their meanings are shared among the members of a particular linguistic community. The current research was conducted to investigate whether 13-month-old infants appreciate that object labels should not be generalized across individuals who have been shown to speak different languages. Using a visual habituation paradigm, Experiment 1 tested whether infants would generalize a new object label that was taught to them by a speaker of a foreign language to a speaker from the infant's own linguistic group. The results suggest that infants do not expect 2 individuals who have been shown to speak different languages to use the same label to refer to the same object. The results of Experiment 2 reveal that infants do not generalize a new object label that was taught to them by a speaker of their native language to an individual who had been shown to speak a foreign language. These findings offer the first evidence that by the end of the 1st year of life, infants are sensitive to the fact that the conventional nature of language is constrained by the language that a person has been shown to speak.
Speakers' acceptance of real-time speech exchange indicates that we use auditory feedback to specify the meaning of what we say.

Science.gov (United States)

Lind, Andreas; Hall, Lars; Breidegard, Björn; Balkenius, Christian; Johansson, Petter

2014-06-01

Speech is usually assumed to start with a clearly defined preverbal message, which provides a benchmark for self-monitoring and a robust sense of agency for one's utterances. However, an alternative hypothesis states that speakers often have no detailed preview of what they are about to say, and that they instead use auditory feedback to infer the meaning of their words. In the experiment reported here, participants performed a Stroop color-naming task while we covertly manipulated their auditory feedback in real time so that they said one thing but heard themselves saying something else. Under ideal timing conditions, two thirds of these semantic exchanges went undetected by the participants, and in 85% of all nondetected exchanges, the inserted words were experienced as self-produced. These findings indicate that the sense of agency for speech has a strong inferential component, and that auditory feedback of one's own voice acts as a pathway for semantic monitoring, potentially overriding other feedback loops. © The Author(s) 2014.
Achieving Speaker Gender Equity at the American Society for Microbiology General Meeting.

Science.gov (United States)

Casadevall, Arturo

2015-08-04

In 2015, the American Society for Microbiology (ASM) General Meeting essentially achieved gender equity, with 48.5% of the oral presentations being given by women. The mechanisms associated with increased female participation were (i) making the Program Committee aware of gender statistics, (ii) increasing female representation among session convener teams, and (iii) direct instruction to try to avoid all-male sessions. The experience with the ASM General Meeting shows that it is possible to increase the participation of female speakers in a relatively short time and suggests concrete steps that may be taken to achieve this at other meetings. Public speaking is very important for academic advancement in science. Historically women have been underrepresented as speakers in many scientific meetings. This article describes concrete steps that were associated with achieving gender equity at a major meeting. Copyright © 2015 Casadevall.
Infants' preferences for native speakers are associated with an expectation of information

DEFF Research Database (Denmark)

Begus, Katarina; Gliga, Teodora; Southgate, Victoria

2016-01-01

Humans' preference for others who share our group membership is well documented, and this heightened valuation of in-group members seems to be rooted in early development. Before 12 mo of age, infants already show behavioral preferences for others who evidence cues to same-group membership...... such as race or native language, yet the function of this selectivity remains unclear. We examine one of these social biases, the preference for native speakers, and propose that this preference may result from infants' motivation to obtain information and the expectation that interactions with native speakers...... in situations when they can expect to receive information. We then used this neural measure of anticipatory theta activity to explore the expectations of 11-mo-olds when facing social partners who either speak the infants' native language or a foreign tongue (study 2). A larger increase in theta oscillations...
Pharmaceutical speakers' bureaus, academic freedom, and the management of promotional speaking at academic medical centers.

Science.gov (United States)

Boumil, Marcia M; Cutrell, Emily S; Lowney, Kathleen E; Berman, Harris A

2012-01-01

Pharmaceutical companies routinely engage physicians, particularly those with prestigious academic credentials, to deliver "educational" talks to groups of physicians in the community to help market the company's brand-name drugs. Although presented as educational, and even though they provide educational content, these events are intended to influence decisions about drug selection in ways that are not based on the suitability and effectiveness of the product, but on the prestige and persuasiveness of the speaker. A number of state legislatures and most academic medical centers have attempted to restrict physician participation in pharmaceutical marketing activities, though most restrictions are not absolute and have proven difficult to enforce. This article reviews the literature on why Speakers' Bureaus have become a lightning rod for academic/industry conflicts of interest and examines the arguments of those who defend physician participation. It considers whether the restrictions on Speakers' Bureaus are consistent with principles of academic freedom and concludes with the legal and institutional efforts to manage industry speaking. © 2012 American Society of Law, Medicine & Ethics, Inc.
Perceptual Robust Design

DEFF Research Database (Denmark)

Pedersen, Søren Nygaard

The research presented in this PhD thesis has focused on a perceptual approach to robust design. The results of the research and the original contribution to knowledge is a preliminary framework for understanding, positioning, and applying perceptual robust design. Product quality is a topic...... been presented. Therefore, this study set out to contribute to the understanding and application of perceptual robust design. To achieve this, a state-of-the-art and current practice review was performed. From the review two main research problems were identified. Firstly, a lack of tools...... for perceptual robustness was found to overlap with the optimum for functional robustness and at most approximately 2.2% out of the 14.74% could be ascribed solely to the perceptual robustness optimisation. In conclusion, the thesis have offered a new perspective on robust design by merging robust design...
Production of lexical stress in non-native speakers of American English: kinematic correlates of stress and transfer.

Science.gov (United States)

Chakraborty, Rahul; Goffman, Lisa

2011-06-01

To assess the influence of second language (L2) proficiency on production characteristics of rhythmic sequences in the L1 (Bengali) and L2 (English), with emphasis on linguistic transfer. One goal was to examine, using kinematic evidence, how L2 proficiency influences the production of iambic and trochaic words, focusing on temporal and spatial aspects of prosody. A second goal was to assess whether prosodic structure influences judgment of foreign accent. Twenty Bengali-English bilingual individuals, 10 with low proficiency in English and 10 with high proficiency in English, and 10 monolingual English speakers, participated. Lip and jaw movements were recorded while the bilingual participants produced Bengali and English words embedded in sentences. Lower lip movement amplitude and duration were measured in trochaic and iambic words. Six native English listeners judged the nativeness of the bilingual speakers. Evidence of L1-L2 transfer was observed through duration but not amplitude cues. More proficient L2 speakers varied duration to mark iambic stress. Perceptually, the high-proficiency group received relatively higher native-like accent ratings. Trochees were judged as more native than iambs. Even in the face of L1-L2 lexical stress transfer, nonnative speakers demonstrated knowledge of prosodic contrasts. Movement duration appears to be more amenable than amplitude to modifications.
Speaker Recognition for Mobile User Authentication: An Android Solution

OpenAIRE

Brunet , Kevin; Taam , Karim; Cherrier , Estelle; Faye , Ndiaga; Rosenberger , Christophe

2013-01-01

National audience; This paper deals with a biometric solution for authentication on mobile devices. Among the possible biometric modalities, speaker recognition seems the most natural choice for a mobile phone. This work lies in the continuation of our previous work \\cite{Biosig2012}, where we evaluated a candidate algorithm in terms of performance and time processing. The proposed solution is implemented here as an Android application. Its performances are evaluated both on a public database...
Multistage Data Selection-based Unsupervised Speaker Adaptation for Personalized Speech Emotion Recognition

NARCIS (Netherlands)

Kim, Jaebok; Park, Jeong-Sik

This paper proposes an efficient speech emotion recognition (SER) approach that utilizes personal voice data accumulated on personal devices. A representative weakness of conventional SER systems is the user-dependent performance induced by the speaker independent (SI) acoustic model framework. But,
Robust Control with Enlaeged Interval of Uncertain Parameters

Directory of Open Access Journals (Sweden)

Marek Keresturi

2002-01-01

Full Text Available Robust control is advantageous for systems with defined interval of uncertain parameters. This can be substantially enlarged dividing it into a few sub-intervals. Corresponding controllers for each of them may be set after approximate identification of some uncertain plant parameters. The paper deals with application of the pole region assignment method for position control of the crane crab. The same track form is required for uncertain burden mass and approximate value of rope length. Measurement of crab position and speed is supposed, burden deviation angle is observed. Simulation results have verified feasibility of this design procedure.

The speaker's formant.

Science.gov (United States)

Bele, Irene Velsvik

2006-12-01

The current study concerns speaking voice quality in two groups of professional voice users, teachers (n = 35) and actors (n = 36), representing trained and untrained voices. The voice quality of text reading at two intensity levels was acoustically analyzed. The central concept was the speaker's formant (SPF), related to the perceptual characteristics "better normal voice quality" (BNQ) and "worse normal voice quality" (WNQ). The purpose of the current study was to get closer to the origin of the phenomenon of the SPF, and to discover the differences in spectral and formant characteristics between the two professional groups and the two voice quality groups. The acoustic analyses were long-term average spectrum (LTAS) and spectrographical measurements of formant frequencies. At very high intensities, the spectral slope was rather quandrangular without a clear SPF peak. The trained voices had a higher energy level in the SPF region compared with the untrained, significantly so in loud phonation. The SPF seemed to be related to both sufficiently strong overtones and a glottal setting, allowing for a lowering of F4 and a closeness of F3 and F4. However, the existence of SPF also in LTAS of the WNQ voices implies that more research is warranted concerning the formation of SPF, and concerning the acoustic correlates of the BNQ voices.
Identifying Core Vocabulary for Urdu Language Speakers Using Augmentative Alternative Communication

Science.gov (United States)

Mukati, Abdul Samad

2013-01-01

The purpose of this research is to identify a core set of vocabulary used by native Urdu language (UL) speakers during dyadic conversation for social interaction and relationship building. This study was conducted in Karachi, Pakistan at an institution of higher education. This research seeks to distinguish between general (nonspecific…
Compliment Responses of Thai and Punjabi Speakers of English in Thailand

Science.gov (United States)

Sachathep, Sukchai

2014-01-01

This variational pragmatics (VP) study investigates the similarities and differences of compliment responses (CR) between Thai and Punjabi speakers of English in Thailand, focusing on the strategies used in CR when the microsociolinguistic variables are integrated into the Discourse Completion Task (DCT). The participants were 20 Thai and 20…
Decoding speech perception by native and non-native speakers using single-trial electrophysiological data.

Directory of Open Access Journals (Sweden)

Alex Brandmeyer

Full Text Available Brain-computer interfaces (BCIs are systems that use real-time analysis of neuroimaging data to determine the mental state of their user for purposes such as providing neurofeedback. Here, we investigate the feasibility of a BCI based on speech perception. Multivariate pattern classification methods were applied to single-trial EEG data collected during speech perception by native and non-native speakers. Two principal questions were asked: 1 Can differences in the perceived categories of pairs of phonemes be decoded at the single-trial level? 2 Can these same categorical differences be decoded across participants, within or between native-language groups? Results indicated that classification performance progressively increased with respect to the categorical status (within, boundary or across of the stimulus contrast, and was also influenced by the native language of individual participants. Classifier performance showed strong relationships with traditional event-related potential measures and behavioral responses. The results of the cross-participant analysis indicated an overall increase in average classifier performance when trained on data from all participants (native and non-native. A second cross-participant classifier trained only on data from native speakers led to an overall improvement in performance for native speakers, but a reduction in performance for non-native speakers. We also found that the native language of a given participant could be decoded on the basis of EEG data with accuracy above 80%. These results indicate that electrophysiological responses underlying speech perception can be decoded at the single-trial level, and that decoding performance systematically reflects graded changes in the responses related to the phonological status of the stimuli. This approach could be used in extensions of the BCI paradigm to support perceptual learning during second language acquisition.
The beneficial effect of a speaker's gestures on the listener's memory for action phrases: The pivotal role of the listener's premotor cortex.

Science.gov (United States)

Ianì, Francesco; Burin, Dalila; Salatino, Adriana; Pia, Lorenzo; Ricci, Raffaella; Bucciarelli, Monica

2018-04-10

Memory for action phrases improves in the listeners when the speaker accompanies them with gestures compared to when the speaker stays still. Since behavioral studies revealed a pivotal role of the listeners' motor system, we aimed to disentangle the role of primary motor and premotor cortices. Participants had to recall phrases uttered by a speaker in two conditions: in the gesture condition, the speaker performed gestures congruent with the action; in the no-gesture condition, the speaker stayed still. In Experiment 1, half of the participants underwent inhibitory rTMS over the hand/arm region of the left premotor cortex (PMC) and the other half over the hand/arm region of the left primary motor cortex (M1). The enactment effect disappeared only following rTMS over PMC. In Experiment 2, we detected the usual enactment effect after rTMS over vertex, thereby excluding possible nonspecific rTMS effects. These findings suggest that the information encoded in the premotor cortex is a crucial part of the memory trace. Copyright © 2018 Elsevier Inc. All rights reserved.
The N400 effect during speaker-switch – Towards a conversational approach of measuring neural correlates of language

Directory of Open Access Journals (Sweden)

Tatiana Goregliad Fjaellingsdal

2016-11-01

Full Text Available Language occurs naturally in conversations. However, the study of the neural underpinnings of language has mainly taken place in single individuals using controlled language material. The interactive elements of a conversation (e.g., turn-taking are often not part of neurolinguistic setups. The prime reason is the difficulty to combine open unrestricted conversations with the requirements of neuroimaging. It is necessary to find a trade-off between the naturalness of a conversation and the restrictions imposed by neuroscientific methods to allow for ecologically more valid studies.Here we make an attempt to study the effects of a conversational element, namely turn-taking, on linguistic neural correlates, specifically the N400 effect. We focus on the physiological aspect of turn-taking, the speaker-switch, and its effect on the detectability of the N400 effect. The N400 event-related potential reflects expectation violations in a semantic context; the N400 effect describes the difference of the N400 amplitude between semantically expected and unexpected items.Sentences with semantically congruent and incongruent final words were presented in two turn-taking modes: (1 reading aloud first part of the sentence and listening to speaker-switch for the final word, and (2 listening to first part of the sentence and speaker-switch for the final word.A significant N400 effect was found for both turn-taking modes, which was not influenced by the mode itself. However, the mode significantly affected the P200, which was increased for the reading aloud mode compared to the listening mode.Our results show that an N400 effect can be detected during a speaker-switch. Speech articulation (reading aloud before the analyzed sentence fragment did also not impede the N400 effect detection for the final word. The speaker-switch, however, seems to influence earlier components of the electroencephalogram, related to processing of salient stimuli. We conclude that the N
Robust hashing for 3D models

Science.gov (United States)

Berchtold, Waldemar; Schäfer, Marcel; Rettig, Michael; Steinebach, Martin

2014-02-01

3D models and applications are of utmost interest in both science and industry. With the increment of their usage, their number and thereby the challenge to correctly identify them increases. Content identification is commonly done by cryptographic hashes. However, they fail as a solution in application scenarios such as computer aided design (CAD), scientific visualization or video games, because even the smallest alteration of the 3D model, e.g. conversion or compression operations, massively changes the cryptographic hash as well. Therefore, this work presents a robust hashing algorithm for 3D mesh data. The algorithm applies several different bit extraction methods. They are built to resist desired alterations of the model as well as malicious attacks intending to prevent correct allocation. The different bit extraction methods are tested against each other and, as far as possible, the hashing algorithm is compared to the state of the art. The parameters tested are robustness, security and runtime performance as well as False Acceptance Rate (FAR) and False Rejection Rate (FRR), also the probability calculation of hash collision is included. The introduced hashing algorithm is kept adaptive e.g. in hash length, to serve as a proper tool for all applications in practice.
Does Grammatical Gender Influence Perception? A Study of Polish and French Speakers

Directory of Open Access Journals (Sweden)

Haertlé Izabella

2017-12-01

Full Text Available Can the perception of a word be influenced by its grammatical gender? Can it happen that speakers of one language perceive an object to have masculine features, while speakers of another language perceive the same object to have feminine features? Previous studies suggest that this is the case, and also that there is some supra-language gender categorisation of objects as natural/feminine and artefact/masculine. This study was an attempt to replicate these findings on another population of subjects. This is the first Polish study of this kind, comparing the perceptions of objects by Polish- and French-speaking individuals. The results of this study show that grammatical gender may cue people to assess objects as masculine or feminine. However, the findings of some previous studies, that feminine features are more often ascribed to natural objects than artifacts, were not replicated.
Non-Native English Speakers and Nonstandard English: An In-Depth Investigation

Science.gov (United States)

Polat, Brittany

2012-01-01

Given the rising prominence of nonstandard varieties of English around the world (Jenkins 2007), learners of English as a second language are increasingly called on to communicate with speakers of both native and non-native nonstandard English varieties. In many classrooms around the world, however, learners continue to be exposed only to…
Speaker's presentations. Energy supply security

International Nuclear Information System (INIS)

Pierret, Ch.

2000-01-01

This document is a collection of most of the papers used by the speakers of the European Seminar on Energy Supply Security organised in Paris (at the French Ministry of Economy, Finance and Industry) on 24 November 2000 by the General Direction of Energy and Raw Materials, in co-operation with the European Commission and the French Planning Office. About 250 attendees were present, including a lot of high level Civil Servants from the 15 European State members, and their questions have allowed to create a rich debate. It took place five days before the publication, on 29 November 2000, by the European Commission, of the Green Paper 'Towards a European Strategy for the Security of Energy Supply'. This French initiative, which took place within the framework of the European Presidency of the European Union, during the second half-year 2000. will bring a first impetus to the brainstorming launched by the Commission. (author)
Accent modulates access to word meaning: Evidence for a speaker-model account of spoken word recognition.

Science.gov (United States)

Cai, Zhenguang G; Gilbert, Rebecca A; Davis, Matthew H; Gaskell, M Gareth; Farrar, Lauren; Adler, Sarah; Rodd, Jennifer M

2017-11-01

Speech carries accent information relevant to determining the speaker's linguistic and social background. A series of web-based experiments demonstrate that accent cues can modulate access to word meaning. In Experiments 1-3, British participants were more likely to retrieve the American dominant meaning (e.g., hat meaning of "bonnet") in a word association task if they heard the words in an American than a British accent. In addition, results from a speeded semantic decision task (Experiment 4) and sentence comprehension task (Experiment 5) confirm that accent modulates on-line meaning retrieval such that comprehension of ambiguous words is easier when the relevant word meaning is dominant in the speaker's dialect. Critically, neutral-accent speech items, created by morphing British- and American-accented recordings, were interpreted in a similar way to accented words when embedded in a context of accented words (Experiment 2). This finding indicates that listeners do not use accent to guide meaning retrieval on a word-by-word basis; instead they use accent information to determine the dialectic identity of a speaker and then use their experience of that dialect to guide meaning access for all words spoken by that person. These results motivate a speaker-model account of spoken word recognition in which comprehenders determine key characteristics of their interlocutor and use this knowledge to guide word meaning access. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Adding More Fuel to the Fire: An Eye-Tracking Study of Idiom Processing by Native and Non-Native Speakers

Science.gov (United States)

Siyanova-Chanturia, Anna; Conklin, Kathy; Schmitt, Norbert

2011-01-01

Using eye-tracking, we investigate on-line processing of idioms in a biasing story context by native and non-native speakers of English. The stimuli are idioms used figuratively ("at the end of the day"--"eventually"), literally ("at the end of the day"--"in the evening"), and novel phrases ("at the end of the war"). Native speaker results…
Correlation between low-proficiency in English and negative perceptions of what it means to be an English speaker

Directory of Open Access Journals (Sweden)

Kavarljit Kaur Gill

2013-01-01

Full Text Available Learning another language is very much affected by positive or negative connotations attached to the new language by the language learner. Entering Malaysian public universities there are many students with a low proficiency in English, despite spending eleven years studying English in schools. Could it be that the lack of progress among these students could be attributed to a negative view of what it means to be a speaker of English? This study investigated the perceptions of students at a public university, to determine whether there is a correlation between low-proficiency and negative perceptions of what it means to be an English speaker. Analysis of the results showed that Malaysian students have a very positive perception of what it means to be an English speaker.
A Study of the Effect of Emotional State upon the Variation of the Fundamental Frequency of a Speaker

Directory of Open Access Journals (Sweden)

Marius Vasile GHIURCAU

2010-01-01

Full Text Available Telephone banking or brokering, building accesssystems or forensics are some of the areas in which speakerrecognition is continuously developing. Fundamental frequencyrepresents an important speech feature used in theseapplications. In this paper we present a study of the effect ofemotional state of a speaker upon the variation of thefundamental frequency of the speech signal. Human beings arequite frequently overwhelmed by various emotions and most ofthe time one can not really control these emotional states. Forthe purpose of our work we have used the Berlin emotionalspeech database which contains utterances of 10 speakers indifferent emotional situations: happy, angry, fearful, bored andneutral. The mean fundamental frequency and also the standarddeviation for every speaker in all the emotional states werecomputed. The results show a very strong influence of theemotional state upon frequency variation.
Identification of GMS friction model without friction force measurement

International Nuclear Information System (INIS)

Grami, Said; Aissaoui, Hicham

2011-01-01

This paper deals with an online identification of the Generalized Maxwell Slip (GMS) friction model for both presliding and sliding regime at the same time. This identification is based on robust adaptive observer without friction force measurement. To apply the observer, a new approach of calculating the filtered friction force from the measurable signals is introduced. Moreover, two approximations are proposed to get the friction model linear over the unknown parameters and an approach of suitable filtering is introduced to guarantee the continuity of the model. Simulation results are presented to prove the efficiency of the approach of identification.
Determining robust impacts of land-use induced land-cover changes on surface climate over North America and Eurasia; Results from the first set of LUCID experiments

NARCIS (Netherlands)

Noblet-Ducoudré, de N.; Boisier, J.P.; Pitman, A.; Bonan, G.B.; Brovkin, V.; Cruz, F.; Delire, C.; Gayler, V.; Hurk, van den B.J.J.M.; Lawrence, P.J.; Molen, van der M.K.; Müller, C.; Reick, C.H.; Strengers, B.J.; Voldoire, A.

2012-01-01

The project Land-Use and Climate, Identification of Robust Impacts (LUCID) was conceived to address the robustness of biogeophysical impacts of historical land use–land cover change (LULCC). LUCID used seven atmosphere–land models with a common experimental design to explore those impacts of LULCC
Tormenta Espacial: Engaging Spanish Speakers in the Planetarium and K-12 Classroom

Science.gov (United States)

Salas, F.; Duncan, D.; Traub-Metlay, S.

2008-06-01

Reaching out to Spanish speakers is increasingly vital to workforce development and public support of space science projects. Building on a successful partnership with NASA's TIMED mission, LASP and Space Science Institute, Fiske Planetarium has translated its original planetarium show - ``Space Storm'' - into ``Tormenta Espacial.''
Muchas Caras: Engaging Spanish Speakers in the Planetarium and K--12 Classroom

Science.gov (United States)

Traub-Metlay, S.; Salas, F.; Duncan, D.

2008-11-01

Reaching out to Spanish speakers is increasingly vital to workforce development and public support of space science projects. Fiske Planetarium offers Spanish translations of our newest planetarium shows, such as ``Las Personas del Telescopio Hubble'' (``The Many Faces of Hubble'') and ``Tormenta Espacial'' (``Space Storm'').
Acoustic cues to perception of word stress by English, Mandarin, and Russian speakers.

Science.gov (United States)

Chrabaszcz, Anna; Winn, Matthew; Lin, Candise Y; Idsardi, William J

2014-08-01

This study investigated how listeners' native language affects their weighting of acoustic cues (such as vowel quality, pitch, duration, and intensity) in the perception of contrastive word stress. Native speakers (N = 45) of typologically diverse languages (English, Russian, and Mandarin) performed a stress identification task on nonce disyllabic words with fully crossed combinations of each of the 4 cues in both syllables. The results revealed that although the vowel quality cue was the strongest cue for all groups of listeners, pitch was the second strongest cue for the English and the Mandarin listeners but was virtually disregarded by the Russian listeners. Duration and intensity cues were used by the Russian listeners to a significantly greater extent compared with the English and Mandarin participants. Compared with when cues were noncontrastive across syllables, cues were stronger when they were in the iambic contour than when they were in the trochaic contour. Although both English and Russian are stress languages and Mandarin is a tonal language, stress perception performance of the Mandarin listeners but not of the Russian listeners is more similar to that of the native English listeners, both in terms of weighting of the acoustic cues and the cues' relative strength in different word positions. The findings suggest that tuning of second-language prosodic perceptions is not entirely predictable by prosodic similarities across languages.
An Improved Generalized Predictive Control in a Robust Dynamic Partial Least Square Framework

Directory of Open Access Journals (Sweden)

Jin Xin

2015-01-01

Full Text Available To tackle the sensitivity to outliers in system identification, a new robust dynamic partial least squares (PLS model based on an outliers detection method is proposed in this paper. An improved radial basis function network (RBFN is adopted to construct the predictive model from inputs and outputs dataset, and a hidden Markov model (HMM is applied to detect the outliers. After outliers are removed away, a more robust dynamic PLS model is obtained. In addition, an improved generalized predictive control (GPC with the tuning weights under dynamic PLS framework is proposed to deal with the interaction which is caused by the model mismatch. The results of two simulations demonstrate the effectiveness of proposed method.

Affective processing in bilingual speakers: disembodied cognition?

Science.gov (United States)

Pavlenko, Aneta

2012-01-01

A recent study by Keysar, Hayakawa, and An (2012) suggests that "thinking in a foreign language" may reduce decision biases because a foreign language provides a greater emotional distance than a native tongue. The possibility of such "disembodied" cognition is of great interest for theories of affect and cognition and for many other areas of psychological theory and practice, from clinical and forensic psychology to marketing, but first this claim needs to be properly evaluated. The purpose of this review is to examine the findings of clinical, introspective, cognitive, psychophysiological, and neuroimaging studies of affective processing in bilingual speakers in order to identify converging patterns of results, to evaluate the claim about "disembodied cognition," and to outline directions for future inquiry. The findings to date reveal two interrelated processing effects. First-language (L1) advantage refers to increased automaticity of affective processing in the L1 and heightened electrodermal reactivity to L1 emotion-laden words. Second-language (L2) advantage refers to decreased automaticity of affective processing in the L2, which reduces interference effects and lowers electrodermal reactivity to negative emotional stimuli. The differences in L1 and L2 affective processing suggest that in some bilingual speakers, in particular late bilinguals and foreign language users, respective languages may be differentially embodied, with the later learned language processed semantically but not affectively. This difference accounts for the reduction of framing biases in L2 processing in the study by Keysar et al. (2012). The follow-up discussion identifies the limits of the findings to date in terms of participant populations, levels of processing, and types of stimuli, puts forth alternative explanations of the documented effects, and articulates predictions to be tested in future research.
Transient identification system with noising data and 'don't know' response

International Nuclear Information System (INIS)

Mol, Antonio C. de A.; Martinez, Aquilino S.; Schirru, Roberto

2002-01-01

In the last years, many different approaches based on neural network (NN) has been proposed for transient identification in nuclear power plants (NPP). Some of them focus the dynamic identification using recurrent neural networks however, they are not able to deal with unrecognized transients. Other kind of solution uses competitive learning in order to allow the 'don't know' response. In this case dynamic, dynamic features are not well represented. This work presents a new approach for neural network based transient identification which allows either dynamic identification and 'don't know'response. Such approach uses two multilayer neural networks trained with backpropagation algorithm. The first one is responsible for the dynamic identification. This NN uses, a short set (in a movable time window) of recent measurements of each variable avoiding the necessity of using starting events. The other one is used to validate the instantaneous identification (from the first net) through the validation of each variable. This net is responsible for allowing the system to provide 'don't know' response. In order to validate the method a NPP transient identification problem comprising 15 postulated accidents, simulated for a pressurized water reactor, was proposed in the validation process it has been considered noising data in other to evaluate the method robustness. Obtained results reveal the ability of the method in dealing with both dynamic identification of transients and correct 'don't know' response. In order to validate the method, a NPP transient identification problem comprising 15 postulated accidents simulated for a pressurized water reactor, was proposed in the validation process it has been considered noising data in order to evaluate the method robustness. Obtained results reveal the ability of the method in dealing with both dynamic identification of transients and correct 'don't know' response. (author)
Identification of Influential Points in a Linear Regression Model

Directory of Open Access Journals (Sweden)

Jan Grosz

2011-03-01

Full Text Available The article deals with the detection and identification of influential points in the linear regression model. Three methods of detection of outliers and leverage points are described. These procedures can also be used for one-sample (independentdatasets. This paper briefly describes theoretical aspects of several robust methods as well. Robust statistics is a powerful tool to increase the reliability and accuracy of statistical modelling and data analysis. A simulation model of the simple linear regression is presented.
Gender and Number Agreement in the Oral Production of Arabic Heritage Speakers

Science.gov (United States)

Albirini, Abdulkafi; Benmamoun, Elabbas; Chakrani, Brahim

2013-01-01

Heritage language acquisition has been characterized by various asymmetries, including the differential acquisition rates of various linguistic areas and the unbalanced acquisition of different categories within a single area. This paper examines Arabic heritage speakers' knowledge of subject-verb agreement versus noun-adjective agreement with the…
Robust multivariate analysis

CERN Document Server

J Olive, David

2017-01-01

This text presents methods that are robust to the assumption of a multivariate normal distribution or methods that are robust to certain types of outliers. Instead of using exact theory based on the multivariate normal distribution, the simpler and more applicable large sample theory is given. The text develops among the first practical robust regression and robust multivariate location and dispersion estimators backed by theory. The robust techniques are illustrated for methods such as principal component analysis, canonical correlation analysis, and factor analysis. A simple way to bootstrap confidence regions is also provided. Much of the research on robust multivariate analysis in this book is being published for the first time. The text is suitable for a first course in Multivariate Statistical Analysis or a first course in Robust Statistics. This graduate text is also useful for people who are familiar with the traditional multivariate topics, but want to know more about handling data sets with...
Evaluation of speech errors in Putonghua speakers with cleft palate: a critical review of methodology issues.

Science.gov (United States)

Jiang, Chenghui; Whitehill, Tara L

2014-04-01

Speech errors associated with cleft palate are well established for English and several other Indo-European languages. Few articles describing the speech of Putonghua (standard Mandarin Chinese) speakers with cleft palate have been published in English language journals. Although methodological guidelines have been published for the perceptual speech evaluation of individuals with cleft palate, there has been no critical review of methodological issues in studies of Putonghua speakers with cleft palate. A literature search was conducted to identify relevant studies published over the past 30 years in Chinese language journals. Only studies incorporating perceptual analysis of speech were included. Thirty-seven articles which met inclusion criteria were analyzed and coded on a number of methodological variables. Reliability was established by having all variables recoded for all studies. This critical review identified many methodological issues. These design flaws make it difficult to draw reliable conclusions about characteristic speech errors in this group of speakers. Specific recommendations are made to improve the reliability and validity of future studies, as well to facilitate cross-center comparisons.
Teaching English to speakers of other languages an introduction

CERN Document Server

Nunan, David

2015-01-01

David Nunan's dynamic learner-centered teaching style has informed and inspired countless TESOL educators around the world. In this fresh, straightforward introduction to teaching English to speakers of other languages he presents teaching techniques and procedures along with the underlying theory and principles. Complex theories and research studies are explained in a clear and comprehensible, yet non-trivial, manner without trivializing them. Practical examples of how to develop teaching materials and tasks from sound principles provide rich illustrations of theoretical constructs.
Clear Speech - Mere Speech? How segmental and prosodic speech reduction shape the impression that speakers create on listeners

DEFF Research Database (Denmark)

Niebuhr, Oliver

2017-01-01

of reduction levels and perceived speaker attributes in which moderate reduction can make a better impression on listeners than no reduction. In addition to its relevance in reduction models and theories, this interplay is instructive for various fields of speech application from social robotics to charisma...... whether variation in the degree of reduction also has a systematic effect on the attributes we ascribe to the speaker who produces the speech signal. A perception experiment was carried out for German in which 46 listeners judged whether or not speakers showing 3 different combinations of segmental...... and prosodic reduction levels (unreduced, moderately reduced, strongly reduced) are appropriately described by 13 physical, social, and cognitive attributes. The experiment shows that clear speech is not mere speech, and less clear speech is not just reduced either. Rather, results revealed a complex interplay...
On the status of the phoneme /b/ in heritage speakers of Spanish

Directory of Open Access Journals (Sweden)

Rajiv Rao

2014-12-01

Full Text Available This study examined intervocalic productions of /b/ in heritage speakers of Spanish residing in the United States. Eleven speakers were divided into two groups based on at-home exposure to Spanish, and subsequently completed reading and picture description tasks eliciting productions of intervocalic /b/ showing variation in word position, syllable stress, and orthography. The mixed-effects results revealed that while both groups manifested three clear phonetic categories, the group with more at-home experience followed a phonological rule of spirantization to a pure approximant to a higher degree across the data. The less-target-like stop and tense approximant allophones appeared more in the reading task, in stressed syllables, and in the less experienced group. Word boundary position interacted with group and task to induce less-target-like forms as well. The findings emphasize the influence of language background, linguistic context, orthography, and cognitive demands of tasks in accounting for heritage phonetics and phonology.
Identification of a robust subpathway-based signature for acute myeloid leukemia prognosis using an miRNA integrated strategy.

Science.gov (United States)

Chang, Huijuan; Gao, Qiuying; Ding, Wei; Qing, Xueqin

2018-01-01

Acute myeloid leukemia (AML) is a heterogeneous disease, and survival signatures are urgently needed to better monitor treatment. MiRNAs displayed vital regulatory roles on target genes, which was necessary involved in the complex disease. We therefore examined the expression levels of miRNAs and genes to identify robust signatures for survival benefit analyses. First, we reconstructed subpathway graphs by embedding miRNA components that were derived from low-throughput miRNA-gene interactions. Then, we randomly divided the data sets from The Cancer Genome Atlas (TCGA) into training and testing sets, and further formed 100 subsets based on the training set. Using each subset, we identified survival-related miRNAs and genes, and identified survival subpathways based on the reconstructed subpathway graphs. After statistical analyses of these survival subpathways, the most robust subpathways with the top three ranks were identified, and risk scores were calculated based on these robust subpathways for AML patient prognoses. Among these robust subpathways, three representative subpathways, path: 05200_10 from Pathways in cancer, path: 04110_20 from Cell cycle, and path: 04510_8 from Focal adhesion, were significantly associated with patient survival in the TCGA training and testing sets based on subpathway risk scores. In conclusion, we performed integrated analyses of miRNAs and genes to identify robust prognostic subpathways, and calculated subpathway risk scores to characterize AML patient survival.
Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker's mouth movements and speech.

Science.gov (United States)

Shahin, Antoine J; Shen, Stanley; Kerlin, Jess R

2017-01-01

We examined the relationship between tolerance for audiovisual onset asynchrony (AVOA) and the spectrotemporal fidelity of the spoken words and the speaker's mouth movements. In two experiments that only varied in the temporal order of sensory modality, visual speech leading (exp1) or lagging (exp2) acoustic speech, participants watched intact and blurred videos of a speaker uttering trisyllabic words and nonwords that were noise vocoded with 4-, 8-, 16-, and 32-channels. They judged whether the speaker's mouth movements and the speech sounds were in-sync or out-of-sync . Individuals perceived synchrony (tolerated AVOA) on more trials when the acoustic speech was more speech-like (8 channels and higher vs. 4 channels), and when visual speech was intact than blurred (exp1 only). These findings suggest that enhanced spectrotemporal fidelity of the audiovisual (AV) signal prompts the brain to widen the window of integration promoting the fusion of temporally distant AV percepts.
Nuclear Magnetic Resonance Spectroscopy-Based Identification of Yeast.

Science.gov (United States)

Himmelreich, Uwe; Sorrell, Tania C; Daniel, Heide-Marie

2017-01-01

Rapid and robust high-throughput identification of environmental, industrial, or clinical yeast isolates is important whenever relatively large numbers of samples need to be processed in a cost-efficient way. Nuclear magnetic resonance (NMR) spectroscopy generates complex data based on metabolite profiles, chemical composition and possibly on medium consumption, which can not only be used for the assessment of metabolic pathways but also for accurate identification of yeast down to the subspecies level. Initial results on NMR based yeast identification where comparable with conventional and DNA-based identification. Potential advantages of NMR spectroscopy in mycological laboratories include not only accurate identification but also the potential of automated sample delivery, automated analysis using computer-based methods, rapid turnaround time, high throughput, and low running costs.We describe here the sample preparation, data acquisition and analysis for NMR-based yeast identification. In addition, a roadmap for the development of classification strategies is given that will result in the acquisition of a database and analysis algorithms for yeast identification in different environments.
Robustness of Structures

DEFF Research Database (Denmark)

Faber, Michael Havbro; Vrouwenvelder, A.C.W.M.; Sørensen, John Dalsgaard

2011-01-01

In 2005, the Joint Committee on Structural Safety (JCSS) together with Working Commission (WC) 1 of the International Association of Bridge and Structural Engineering (IABSE) organized a workshop on robustness of structures. Two important decisions resulted from this workshop, namely...... ‘COST TU0601: Robustness of Structures’ was initiated in February 2007, aiming to provide a platform for exchanging and promoting research in the area of structural robustness and to provide a basic framework, together with methods, strategies and guidelines enhancing robustness of structures...... the development of a joint European project on structural robustness under the COST (European Cooperation in Science and Technology) programme and the decision to develop a more elaborate document on structural robustness in collaboration between experts from the JCSS and the IABSE. Accordingly, a project titled...
Language contact phenomena in the language use of speakers of German descent and the significance of their language attitudes

Directory of Open Access Journals (Sweden)

Ries, Veronika

2014-03-01

Full Text Available Within the scope of my investigation on language use and language attitudes of People of German Descent from the USSR, I find almost regular different language contact phenomena, such as viel bliny habn=wir gbackt (engl.: 'we cooked lots of pancakes' (cf. Ries 2011. The aim of analysis is to examine both language use with regard to different forms of language contact and the language attitudes of the observed speakers. To be able to analyse both of these aspects and synthesize them, different types of data are required. The research is based on the following two data types: everyday conversations and interviews. In addition, the individual speakers' biography is a key part of the analysis, because it allows one to draw conclusions about language attitudes and use. This qualitative research is based on morpho-syntactic and interactional linguistic analysis of authentic spoken data. The data arise from a corpus compiled and edited by myself. My being a member of the examined group allowed me to build up an authentic corpus. The natural language use is analysed from the perspective of different language contact phenomena and potential functions of language alternations. One central issue is: How do speakers use the languages available to them, German and Russian? Structural characteristics such as code switching and discursive motives for these phenomena are discussed as results, together with the socio-cultural background of the individual speaker. Within the scope of this article I present exemplarily the data and results of one speaker.
Effect of an 8-week practice of externally triggered speech on basal ganglia activity of stuttering and fluent speakers.

Science.gov (United States)

Toyomura, Akira; Fujii, Tetsunoshin; Kuriki, Shinya

2015-04-01

The neural mechanisms underlying stuttering are not well understood. It is known that stuttering appears when persons who stutter speak in a self-paced manner, but speech fluency is temporarily increased when they speak in unison with external trigger such as a metronome. This phenomenon is very similar to the behavioral improvement by external pacing in patients with Parkinson's disease. Recent imaging studies have also suggested that the basal ganglia are involved in the etiology of stuttering. In addition, previous studies have shown that the basal ganglia are involved in self-paced movement. Then, the present study focused on the basal ganglia and explored whether long-term speech-practice using external triggers can induce modification of the basal ganglia activity of stuttering speakers. Our study of functional magnetic resonance imaging revealed that stuttering speakers possessed significantly lower activity in the basal ganglia than fluent speakers before practice, especially when their speech was self-paced. After an 8-week speech practice of externally triggered speech using a metronome, the significant difference in activity between the two groups disappeared. The cerebellar vermis of stuttering speakers showed significantly decreased activity during the self-paced speech in the second compared to the first experiment. The speech fluency and naturalness of the stuttering speakers were also improved. These results suggest that stuttering is associated with defective motor control during self-paced speech, and that the basal ganglia and the cerebellum are involved in an improvement of speech fluency of stuttering by the use of external trigger. Copyright © 2015 Elsevier Inc. All rights reserved.
Intonation Contrast in Cantonese Speakers with Hypokinetic Dysarthria Associated with Parkinson's Disease

Science.gov (United States)

Ma, Joan K.-Y.; Whitehill, Tara L.; So, Susanne Y.-S.

2010-01-01

Purpose: Speech produced by individuals with hypokinetic dysarthria associated with Parkinson's disease (PD) is characterized by a number of features including impaired speech prosody. The purpose of this study was to investigate intonation contrasts produced by this group of speakers. Method: Speech materials with a question-statement contrast…
Self-Disclosure in Initial Interactions amongst Speakers of American and Australian English

Science.gov (United States)

Haugh, Michael; Carbaugh, Donal

2015-01-01

Getting acquainted with others is one of the most basic interpersonal communication events. Yet there has only been a limited number of studies that have examined variation in the interactional practices through which unacquainted persons become acquainted and establish relationships across speakers of the same language. The current study focuses…
The cognitive neuroscience of person identification.

Science.gov (United States)

Biederman, Irving; Shilowich, Bryan E; Herald, Sarah B; Margalit, Eshed; Maarek, Rafael; Meschke, Emily X; Hacker, Catrina M

2018-02-14

We compare and contrast five differences between person identification by voice and face. 1. There is little or no cost when a familiar face is to be recognized from an unrestricted set of possible faces, even at Rapid Serial Visual Presentation (RSVP) rates, but the accuracy of familiar voice recognition declines precipitously when the set of possible speakers is increased from one to a mere handful. 2. Whereas deficits in face recognition are typically perceptual in origin, those with normal perception of voices can manifest severe deficits in their identification. 3. Congenital prosopagnosics (CPros) and congenital phonagnosics (CPhon) are generally unable to imagine familiar faces and voices, respectively. Only in CPros, however, is this deficit a manifestation of a general inability to form visual images of any kind. CPhons report no deficit in imaging non-voice sounds. 4. The prevalence of CPhons of 3.2% is somewhat higher than the reported prevalence of approximately 2.0% for CPros in the population. There is evidence that CPhon represents a distinct condition statistically and not just normal variation. 5. Face and voice recognition proficiency are uncorrelated rather than reflecting limitations of a general capacity for person individuation. Copyright © 2018 Elsevier Ltd. All rights reserved.
Are subject-specific musculoskeletal models robust to the uncertainties in parameter identification?

Directory of Open Access Journals (Sweden)

Giordano Valente

Full Text Available Subject-specific musculoskeletal modeling can be applied to study musculoskeletal disorders, allowing inclusion of personalized anatomy and properties. Independent of the tools used for model creation, there are unavoidable uncertainties associated with parameter identification, whose effect on model predictions is still not fully understood. The aim of the present study was to analyze the sensitivity of subject-specific model predictions (i.e., joint angles, joint moments, muscle and joint contact forces during walking to the uncertainties in the identification of body landmark positions, maximum muscle tension and musculotendon geometry. To this aim, we created an MRI-based musculoskeletal model of the lower limbs, defined as a 7-segment, 10-degree-of-freedom articulated linkage, actuated by 84 musculotendon units. We then performed a Monte-Carlo probabilistic analysis perturbing model parameters according to their uncertainty, and solving a typical inverse dynamics and static optimization problem using 500 models that included the different sets of perturbed variable values. Model creation and gait simulations were performed by using freely available software that we developed to standardize the process of model creation, integrate with OpenSim and create probabilistic simulations of movement. The uncertainties in input variables had a moderate effect on model predictions, as muscle and joint contact forces showed maximum standard deviation of 0.3 times body-weight and maximum range of 2.1 times body-weight. In addition, the output variables significantly correlated with few input variables (up to 7 out of 312 across the gait cycle, including the geometry definition of larger muscles and the maximum muscle tension in limited gait portions. Although we found subject-specific models not markedly sensitive to parameter identification, researchers should be aware of the model precision in relation to the intended application. In fact, force
Robust technology and system for management of sucker rod pumping units in oil wells

Science.gov (United States)

Aliev, T. A.; Rzayev, A. H.; Guluyev, G. A.; Alizada, T. A.; Rzayeva, N. E.

2018-01-01

We propose a technology for calculating the robust, normalized correlation functions of the signal from the force sensor on the rod string attached to the hanger of the sucker rod pumping unit. The robust normalized correlation functions are used to form sets of informative attribute combinations, each of which corresponds to a technical condition of the sucker rod pumping unit. We demonstrate how these sets can be used to solve identification and management problems in the oil production process in real time using inexpensive controllers. The results obtained from using the system on real objects are also presented in this paper. It was determined that the energy saved and prolonged overhaul period substantially increased the cost-effectiveness.

The Halo surrounding native English speaker teachers in Indonesia

Directory of Open Access Journals (Sweden)

Angga Kramadibrata

2016-01-01

Full Text Available The Native Speaker Fallacy, a commonly held belief that Native English Speaker Teachers (NESTs are inherently better than Non-NESTs, has long been questioned by ELT researchers. However, this belief still stands strong in the general public. This research looks to understand how much a teacher’s nativeness affects a student’s attitude towards them, as well as the underlying reasons for their attitudes. Sixty seven respondents in two groups were asked to watch an animated teaching video, after which they completed a questionnaire that used Likert-scales to assess comprehensibility, clarity of explanation, engagement, and preference. The videos for both groups were identical apart from the narrator; one spoke in British English, while the other, Indian English. In addition, they were also visually identified as Caucasian and Asian, respectively. The video was controlled for speed of delivery. The quantitative data were then triangulated using qualitative data collected through open questions in the questionnaire as well as from a semi-structured interview conducted with 10 respondents. The data show that there is a significant implicit preference for NEST teachers in the video, as well as in respondent’s actual classes. However, when asked explicitly, respondents didn’t rank nativeness as a very important quality in English teachers. This discrepancy between implicit and explicit attitudes might be due to a subconscious cognitive bias, namely the Halo Effect, in which humans tend to make unjustified presumptions about a person based on known but irrelevant information.
Spanish Is Foreign: Heritage Speakers' Interpretations of the Introductory Spanish Language Curriculum

Science.gov (United States)

DeFeo, Dayna Jean

2015-01-01

This article presents a case study of the perceptions of Spanish heritage speakers enrolled in introductory-level Spanish foreign language courses. Despite their own identities that were linked to the United States and Spanish of the Borderlands, the participants felt that the curriculum acknowledged the Spanish of Spain and foreign countries but…
Lexical and grammatical development in trilingual speakers of isiXhosa, English and Afrikaans

Directory of Open Access Journals (Sweden)

Anneke P. Potgieter

2016-05-01

Full Text Available Background: There is a dearth of normative data on linguistic development among child speakers of Southern African languages, especially in the case of the multilingual children who constitute the largest part of this population. This inevitably impacts on the accuracy of developmental assessments of such speakers. Already negative lay opinion on the effect of early multilingualism on language development rates could be exacerbated by the lack of developmental data, ultimately affecting choices regarding home and school language policies. Objectives: To establish whether trilinguals necessarily exhibit developmental delay when compared to monolinguals and, if so, whether this delay (1 occurs in terms of both lexical and grammatical development; and (2 in all three the trilinguals’ languages, regardless of input quantity. Method: Focusing on isiXhosa, South African English and Afrikaans, the study involved a comparison of 11 four-year-old developing trilinguals’ acquisition of vocabulary and passive constructions with that of 10 age-matched monolingual speakers of each language. Results: The trilinguals proved to be monolingual-like in their lexical development in the language to which, on average, they had been exposed most over time, that is, isiXhosa. No developmental delay was found in the trilinguals’ acquisition of passive constructions, regardless of the language of testing. Conclusion: As previously found for bilingual development, necessarily reduced quantity of exposure does not hinder lexical development in the trilinguals’ input dominant language. The overall lack of delay in their acquisition of the passive is interpreted as possible evidence of cross-linguistic bootstrapping and support for early multilingual exposure.
Lexical and grammatical development in trilingual speakers of isiXhosa, English and Afrikaans

Science.gov (United States)

2016-01-01

Background There is a dearth of normative data on linguistic development among child speakers of Southern African languages, especially in the case of the multilingual children who constitute the largest part of this population. This inevitably impacts on the accuracy of developmental assessments of such speakers. Already negative lay opinion on the effect of early multilingualism on language development rates could be exacerbated by the lack of developmental data, ultimately affecting choices regarding home and school language policies. Objectives To establish whether trilinguals necessarily exhibit developmental delay when compared to monolinguals and, if so, whether this delay (1) occurs in terms of both lexical and grammatical development; and (2) in all three the trilinguals’ languages, regardless of input quantity. Method Focusing on isiXhosa, South African English and Afrikaans, the study involved a comparison of 11 four-year-old developing trilinguals’ acquisition of vocabulary and passive constructions with that of 10 age-matched monolingual speakers of each language. Results The trilinguals proved to be monolingual-like in their lexical development in the language to which, on average, they had been exposed most over time, that is, isiXhosa. No developmental delay was found in the trilinguals’ acquisition of passive constructions, regardless of the language of testing. Conclusion As previously found for bilingual development, necessarily reduced quantity of exposure does not hinder lexical development in the trilinguals’ input dominant language. The overall lack of delay in their acquisition of the passive is interpreted as possible evidence of cross-linguistic bootstrapping and support for early multilingual exposure. PMID:27245133
Lexical and grammatical development in trilingual speakers of isiXhosa, English and Afrikaans.

Science.gov (United States)

Potgieter, Anneke P

2016-05-20

There is a dearth of normative data on linguistic development among child speakers of Southern African languages, especially in the case of the multilingual children who constitute the largest part of this population. This inevitably impacts on the accuracy of developmental assessments of such speakers. Already negative lay opinion on the effect of early multilingualism on language development rates could be exacerbated by the lack of developmental data, ultimately affecting choices regarding home and school language policies. To establish whether trilinguals necessarily exhibit developmental delay when compared to monolinguals and, if so, whether this delay (1) occurs in terms of both lexical and grammatical development; and (2) in all three the trilinguals' languages, regardless of input quantity. Focusing on isiXhosa, South African English and Afrikaans, the study involved a comparison of 11 four-year-old developing trilinguals' acquisition of vocabulary and passive constructions with that of 10 age-matched monolingual speakers of each language. The trilinguals proved to be monolingual-like in their lexical development in the language to which, on average, they had been exposed most over time, that is, isiXhosa. No developmental delay was found in the trilinguals' acquisition of passive constructions, regardless of the language of testing. As previously found for bilingual development, necessarily reduced quantity of exposure does not hinder lexical development in the trilinguals' input dominant language. The overall lack of delay in their acquisition of the passive is interpreted as possible evidence of cross-linguistic bootstrapping and support for early multilingual exposure.
Characterizing opto-electret based paper speakers by using a real-time projection Moiré metrology system

Science.gov (United States)

Chang, Ya-Ling; Hsu, Kuan-Yu; Lee, Chih-Kung

2016-03-01

Advancement of distributed piezo-electret sensors and actuators facilitates various smart systems development, which include paper speakers, opto-piezo/electret bio-chips, etc. The array-based loudspeaker system possess several advantages over conventional coil speakers, such as light-weightness, flexibility, low power consumption, directivity, etc. With the understanding that the performance of the large-area piezo-electret loudspeakers or even the microfluidic biochip transport behavior could be tailored by changing their dynamic behaviors, a full-field real-time high-resolution non-contact metrology system was developed. In this paper, influence of the resonance modes and the transient vibrations of an arraybased loudspeaker system on the acoustic effect were measured by using a real-time projection moiré metrology system and microphones. To make the paper speaker even more versatile, we combine the photosensitive material TiOPc into the original electret loudspeaker. The vibration of this newly developed opto-electret loudspeaker could be manipulated by illuminating different light-intensity patterns. Trying to facilitate the tailoring process of the opto-electret loudspeaker, projection moiré was adopted to measure its vibration. By recording the projected fringes which are modulated by the contours of the testing sample, the phase unwrapping algorithm can give us a continuous phase distribution which is proportional to the object height variations. With the aid of the projection moiré metrology system, the vibrations associated with each distinctive light pattern could be characterized. Therefore, we expect that the overall acoustic performance could be improved by finding the suitable illuminating patterns. In this manuscript, the system performance of the projection moiré and the optoelectret paper speakers were cross-examined and verified by the experimental results obtained.
Measures of speech rhythm and the role of corpus-based word frequency: a multifactorial comparison of Spanish(-English speakers

Directory of Open Access Journals (Sweden)

Michael J. Harris

2011-12-01

Full Text Available In this study, we address various measures that have been employed to distinguish between syllable and stress- timed languages. This study differs from all previous ones by (i exploring and comparing multiple metrics within a quantitative and multifactorial perspective and by (ii also documenting the impact of corpus-based word frequency. We begin with the basic distinctions of speech rhythms, dealing with the differences between syllable-timed languages and stress-timed languages and several methods that have been used to attempt to distinguish between the two. We then describe how these metrics were used in the current study comparing the speech rhythms of Mexican Spanish speakers and bilingual English/Spanish speakers (speakers born to Mexican parents in California. More specifically, we evaluate how well various metrics of vowel duration variability as well as the so far understudied factor of corpus-based frequency allow to classify speakers as monolingual or bilingual. A binary logistic regression identifies several main effects and interactions. Most importantly, our results call the utility of a particular rhythm metric, the PVI, into question and indicate that corpus data in the form of lemma frequencies interact with two metrics of durational variability, suggesting that durational variability metrics should ideally be studied in conjunction with corpus-based frequency data.
Using perturbed handwriting to support writer identification in the presence of severe data constraints

Science.gov (United States)

Chen, Jin; Cheng, Wen; Lopresti, Daniel

2011-01-01

Since real data is time-consuming and expensive to collect and label, researchers have proposed approaches using synthetic variations for the tasks of signature verification, speaker authentication, handwriting recognition, keyword spotting, etc. However, the limitation of real data is particularly critical in the field of writer identification in that in forensics, adversaries cannot be expected to provide sufficient data to train a classifier. Therefore, it is unrealistic to always assume sufficient real data to train classifiers extensively for writer identification. In addition, this field differs from many others in that we strive to preserve as much inter-writer variations, but model-perturbed handwriting might break such discriminability among writers. Building on work described in another paper where human subjects were involved in calibrating realistic-looking transformation, we then measured the effects of incorporating perturbed handwriting into the training dataset. Experimental results justified our hypothesis that with limited real data, model-perturbed handwriting improved the performance of writer identification. Particularly, if only one single sample for each writer was available, incorporating perturbed data achieved a 36x performance gain.
Topic Continuity in Informal Conversations between Native and Non-Native Speakers of English

Science.gov (United States)

Morris-Adams, Muna

2013-01-01

Topic management by non-native speakers (NNSs) during informal conversations has received comparatively little attention from researchers, and receives surprisingly little attention in second language learning and teaching. This article reports on one of the topic management strategies employed by international students during informal, social…
Robust multi-model predictive control of multi-zone thermal plate system

Directory of Open Access Journals (Sweden)

Poom Jatunitanon

2018-02-01

Full Text Available A modern controller was designed by using the mathematical model of a multi–zone thermal plate system. An important requirement for this type of controller is that it must be able to keep the temperature set-point of each thermal zone. The mathematical model used in the design was determined through a system identification process. The results showed that when the operating condition is changed, the performance of the controller may be reduced as a result of the system parameter uncertainties. This paper proposes a weighting technique of combining the robust model predictive controller for each operating condition into a single robust multi-model predictive control. Simulation and experimental results showed that the proposed method performed better than the conventional multi-model predictive control in rise time of transient response, when used in a system designed to work over a wide range of operating conditions.
Ambiguity attacks on robust blind image watermarking scheme based on redundant discrete wavelet transform and singular value decomposition

Directory of Open Access Journals (Sweden)

Khaled Loukhaoukha

2017-12-01

Full Text Available Among emergent applications of digital watermarking are copyright protection and proof of ownership. Recently, Makbol and Khoo (2013 have proposed for these applications a new robust blind image watermarking scheme based on the redundant discrete wavelet transform (RDWT and the singular value decomposition (SVD. In this paper, we present two ambiguity attacks on this algorithm that have shown that this algorithm fails when used to provide robustness applications like owner identification, proof of ownership, and transaction tracking. Keywords: Ambiguity attack, Image watermarking, Singular value decomposition, Redundant discrete wavelet transform
An adaptive deep learning approach for PPG-based identification.

Science.gov (United States)

Jindal, V; Birjandtalab, J; Pouyan, M Baran; Nourani, M

2016-08-01

Wearable biosensors have become increasingly popular in healthcare due to their capabilities for low cost and long term biosignal monitoring. This paper presents a novel two-stage technique to offer biometric identification using these biosensors through Deep Belief Networks and Restricted Boltzman Machines. Our identification approach improves robustness in current monitoring procedures within clinical, e-health and fitness environments using Photoplethysmography (PPG) signals through deep learning classification models. The approach is tested on TROIKA dataset using 10-fold cross validation and achieved an accuracy of 96.1%.
Methods for robustness programming

NARCIS (Netherlands)

Olieman, N.J.

2008-01-01

Robustness of an object is defined as the probability that an object will have properties as required. Robustness Programming (RP) is a mathematical approach for Robustness estimation and Robustness optimisation. An example in the context of designing a food product, is finding the best composition
Wavelet Packet Entropy in Speaker-Independent Emotional State Detection from Speech Signal

Directory of Open Access Journals (Sweden)

Mina Kadkhodaei Elyaderani

2015-01-01

Full Text Available In this paper, wavelet packet entropy is proposed for speaker-independent emotion detection from speech. After pre-processing, wavelet packet decomposition using wavelet type db3 at level 4 is calculated and Shannon entropy in its nodes is calculated to be used as feature. In addition, prosodic features such as first four formants, jitter or pitch deviation amplitude, and shimmer or energy variation amplitude besides MFCC features are applied to complete the feature vector. Then, Support Vector Machine (SVM is used to classify the vectors in multi-class (all emotions or two-class (each emotion versus normal state format. 46 different utterances of a single sentence from Berlin Emotional Speech Dataset are selected. These are uttered by 10 speakers in sadness, happiness, fear, boredom, anger, and normal emotional state. Experimental results show that proposed features can improve emotional state detection accuracy in multi-class situation. Furthermore, adding to other features wavelet entropy coefficients increase the accuracy of two-class detection for anger, fear, and happiness.
Communicating with the crowd: speakers use abstract messages when addressing larger audiences.

Science.gov (United States)

Joshi, Priyanka D; Wakslak, Cheryl J

2014-02-01

Audience characteristics often shape communicators' message framing. Drawing from construal level theory, we suggest that when speaking to many individuals, communicators frame messages in terms of superordinate characteristics that focus attention on the essence of the message. On the other hand, when communicating with a single individual, communicators increasingly describe events and actions in terms of their concrete details. Using different communication tasks and measures of construal, we show that speakers communicating with many individuals, compared with 1 person, describe events more abstractly (Study 1), describe themselves as more trait-like (Study 2), and use more desirability-related persuasive messages (Study 3). Furthermore, speakers' motivation to communicate with their audience moderates their tendency to frame messages based on audience size (Studies 3 and 4). This audience-size abstraction effect is eliminated when a large audience is described as homogeneous, suggesting that people use abstract construal strategically in order to connect across a disparate group of individuals (Study 5). Finally, we show that participants' experienced fluency in communication is influenced by the match between message abstraction and audience size (Study 6).
Sequential blind identification of underdetermined mixtures using a novel deflation scheme.

Science.gov (United States)

Zhang, Mingjian; Yu, Simin; Wei, Gang

2013-09-01

In this brief, we consider the problem of blind identification in underdetermined instantaneous mixture cases, where there are more sources than sensors. A new blind identification algorithm, which estimates the mixing matrix in a sequential fashion, is proposed. By using the rank-1 detecting device, blind identification is reformulated as a constrained optimization problem. The identification of one column of the mixing matrix hence reduces to an optimization task for which an efficient iterative algorithm is proposed. The identification of the other columns of the mixing matrix is then carried out by a generalized eigenvalue decomposition-based deflation method. The key merit of the proposed deflation method is that it does not suffer from error accumulation. The proposed sequential blind identification algorithm provides more flexibility and better robustness than its simultaneous counterpart. Comparative simulation results demonstrate the superior performance of the proposed algorithm over the simultaneous blind identification algorithm.
An Evaluation of Native-speaker Judgements of Foreign-accented British and American English

NARCIS (Netherlands)

Doel, W.Z. van den

2006-01-01

This study is the first ever to employ a large-scale Internet survey to investigate priorities in English pronunciation training. Well over 500 native speakers from throughout the English-speaking world, including North America, the British Isles, Australia and New Zealand, were asked to detect and
Robustness of Structural Systems

DEFF Research Database (Denmark)

Canisius, T.D.G.; Sørensen, John Dalsgaard; Baker, J.W.

2007-01-01

The importance of robustness as a property of structural systems has been recognised following several structural failures, such as that at Ronan Point in 1968,where the consequenceswere deemed unacceptable relative to the initiating damage. A variety of research efforts in the past decades have...... attempted to quantify aspects of robustness such as redundancy and identify design principles that can improve robustness. This paper outlines the progress of recent work by the Joint Committee on Structural Safety (JCSS) to develop comprehensive guidance on assessing and providing robustness in structural...... systems. Guidance is provided regarding the assessment of robustness in a framework that considers potential hazards to the system, vulnerability of system components, and failure consequences. Several proposed methods for quantifying robustness are reviewed, and guidelines for robust design...
Participation of Second Language and Second Dialect Speakers in the Legal System.

Science.gov (United States)

Eades, Diana

2003-01-01

Overviews current theory and practice and research on second language and second dialect speakers and the language of the law. Suggests most of the studies on the topic have analyzed language in courtrooms, where access to data is much easier than in other legal settings, such as police interviews, mediation sessions, or lawyer-client interviews.…
Neural Control of Rising and Falling Tones in Mandarin Speakers Who Stutter

Science.gov (United States)

Howell, Peter; Jiang, Jing; Peng, Danling; Lu, Chunming

2012-01-01

Neural control of rising and falling tones in Mandarin people who stutter (PWS) was examined by comparing with that which occurs in fluent speakers [Howell, Jiang, Peng, and Lu (2012). Neural control of fundamental frequency rise and fall in Mandarin tones. "Brain and Language, 121"(1), 35-46]. Nine PWS and nine controls were scanned. Functional…

Robustness in laying hens

NARCIS (Netherlands)

Star, L.

2008-01-01

The aim of the project ‘The genetics of robustness in laying hens’ was to investigate nature and regulation of robustness in laying hens under sub-optimal conditions and the possibility to increase robustness by using animal breeding without loss of production. At the start of the project, a robust
Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points

Directory of Open Access Journals (Sweden)

Ettore Marubini

2014-01-01

Full Text Available This paper presents a robust two-stage procedure for identification of outlying observations in regression analysis. The exploratory stage identifies leverage points and vertical outliers through a robust distance estimator based on Minimum Covariance Determinant (MCD. After deletion of these points, the confirmatory stage carries out an Ordinary Least Squares (OLS analysis on the remaining subset of data and investigates the effect of adding back in the previously deleted observations. Cut-off points pertinent to different diagnostics are generated by bootstrapping and the cases are definitely labelled as good-leverage, bad-leverage, vertical outliers and typical cases. The procedure is applied to four examples.
Migration and interaction in a contact zone: mtDNA variation among Bantu-speakers in Southern Africa.

Directory of Open Access Journals (Sweden)

Chiara Barbieri

Full Text Available Bantu speech communities expanded over large parts of sub-Saharan Africa within the last 4000-5000 years, reaching different parts of southern Africa 1200-2000 years ago. The Bantu languages subdivide in several major branches, with languages belonging to the Eastern and Western Bantu branches spreading over large parts of Central, Eastern, and Southern Africa. There is still debate whether this linguistic divide is correlated with a genetic distinction between Eastern and Western Bantu speakers. During their expansion, Bantu speakers would have come into contact with diverse local populations, such as the Khoisan hunter-gatherers and pastoralists of southern Africa, with whom they may have intermarried. In this study, we analyze complete mtDNA genome sequences from over 900 Bantu-speaking individuals from Angola, Zambia, Namibia, and Botswana to investigate the demographic processes at play during the last stages of the Bantu expansion. Our results show that most of these Bantu-speaking populations are genetically very homogenous, with no genetic division between speakers of Eastern and Western Bantu languages. Most of the mtDNA diversity in our dataset is due to different degrees of admixture with autochthonous populations. Only the pastoralist Himba and Herero stand out due to high frequencies of particular L3f and L3d lineages; the latter are also found in the neighboring Damara, who speak a Khoisan language and were foragers and small-stock herders. In contrast, the close cultural and linguistic relatives of the Herero and Himba, the Kuvale, are genetically similar to other Bantu-speakers. Nevertheless, as demonstrated by resampling tests, the genetic divergence of Herero, Himba, and Kuvale is compatible with a common shared ancestry with high levels of drift, while the similarity of the Herero, Himba, and Damara probably reflects admixture, as also suggested by linguistic analyses.
Impact of Industry Guest Speakers on Business Students' Perceptions of Employability Skills Development

Science.gov (United States)

Riebe, L.; Sibson, R.; Roepen, D.; Meakins, K.

2013-01-01

This study provides insights into the perceptions and expectations of Australian undergraduate business students (n=150) regarding the incorporation of guest speakers into the curriculum of a leadership unit focused on employability skills development. The authors adopted a mixed methods approach. A survey was conducted, with quantitative results…
Non-Native Speakers of the Language of Instruction: Self-Perceptions of Teaching Ability

Science.gov (United States)

Samuel, Carolyn

2017-01-01

Given the linguistically diverse instructor and student populations at Canadian universities, mutually comprehensible oral language may not be a given. Indeed, both instructors who are non-native speakers of the language of instruction (NNSLIs) and students have acknowledged oral communication challenges. Little is known, though, about how the…
Detection of heart beats in multimodal data: a robust beat-to-beat interval estimation approach.

Science.gov (United States)

Antink, Christoph Hoog; Brüser, Christoph; Leonhardt, Steffen

2015-08-01

The heart rate and its variability play a vital role in the continuous monitoring of patients, especially in the critical care unit. They are commonly derived automatically from the electrocardiogram as the interval between consecutive heart beat. While their identification by QRS-complexes is straightforward under ideal conditions, the exact localization can be a challenging task if the signal is severely contaminated with noise and artifacts. At the same time, other signals directly related to cardiac activity are often available. In this multi-sensor scenario, methods of multimodal sensor-fusion allow the exploitation of redundancies to increase the accuracy and robustness of beat detection.In this paper, an algorithm for the robust detection of heart beats in multimodal data is presented. Classic peak-detection is augmented by robust multi-channel, multimodal interval estimation to eliminate false detections and insert missing beats. This approach yielded a score of 90.70 and was thus ranked third place in the PhysioNet/Computing in Cardiology Challenge 2014: Robust Detection of Heart Beats in Muthmodal Data follow-up analysis.In the future, the robust beat-to-beat interval estimator may directly be used for the automated processing of multimodal patient data for applications such as diagnosis support and intelligent alarming.
Pitch perception and production in congenital amusia: Evidence from Cantonese speakers

OpenAIRE

Liu, Fang; Chan, Alice H. D.; Ciocca, Valter; Roquet, Catherine; Peretz, Isabelle; Wong, Patrick C. M.

2016-01-01

This study investigated pitch perception and production in speech and music in individuals with congenital amusia (a disorder of musical pitch processing) who are native speakers of Cantonese, a tone language with a highly complex tonal system. Sixteen Cantonese-speaking congenital amusics and 16 controls performed a set of lexical tone perception, production, singing, and psychophysical pitch threshold tasks. Their tone production accuracy and singing proficiency were subsequently judged by ...
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.

Science.gov (United States)

Gebru, Israel D; Ba, Sileye; Li, Xiaofei; Horaud, Radu

2018-05-01

Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.
The immediate and chronic influence of spatio-temporal metaphors on the mental representations of time in English, Mandarin, and Mandarin-English speakers

Directory of Open Access Journals (Sweden)

Vicky T. Lai

2013-04-01

Full Text Available In this paper we examine whether experience with spatial metaphors for time has an influence on people’s representation of time. In particular we ask whether spatiotemporal metaphors can have both chronic and immediate effects on temporal thinking. In Study 1, we examine the prevalence of ego-moving representations for time in Mandarin speakers, English speakers, and Mandarin-English (ME bilinguals. As predicted by observations in linguistic analyses, we find that Mandarin speakers are less likely to take an ego-moving perspective than are English speakers. Further, we find that ME bilinguals tested in English are less likely to take an ego-moving perspective than are English monolinguals (an effect of L1 on meaning-making in L2, and also that ME bilinguals tested in Mandarin are more likely to take an ego-moving perspective than are Mandarin monolinguals (an effect of L2 on meaning-making in L1. These findings demonstrate that habits of metaphor use in one language can influence temporal reasoning in another language, suggesting the metaphors can have a chronic effect on patterns in thought. In Study 2 we test Mandarin speakers using either horizontal or vertical metaphors in the immediate context of the task. We find that Mandarin speakers are more likely to construct front-back representations of time when understanding front-back metaphors, and more likely to construct up-down representations of time when understanding up-down metaphors. These findings demonstrate that spatiotemporal metaphors can also have an immediate influence on temporal reasoning. Taken together, these findings demonstrate that the metaphors we use to talk about time have both immediate and long-term consequences for how we conceptualize and reason about this fundamental domain of experience.
Be My Guest: A Survey of Mass Communication Students' Perception of Guest Speakers

Science.gov (United States)

Merle, Patrick F.; Craig, Clay

2017-01-01

The use of guest speakers as a pedagogical technique across disciplines at the college level is hardly novel. However, empirical assessment of journalism and mass communication students' perceptions of this practice has not previously been conducted. To fill this gap, this article presents results from an online survey specifically administered to…
A Nonword Repetition Task for Speakers with Misarticulations: The Syllable Repetition Task (SRT)

Science.gov (United States)

Shriberg, Lawrence D.; Lohmeier, Heather L.; Campbell, Thomas F.; Dollaghan, Christine A.; Green, Jordan R.; Moore, Christopher A.

2009-01-01

Purpose: Conceptual and methodological confounds occur when non(sense) word repetition tasks are administered to speakers who do not have the target speech sounds in their phonetic inventories or who habitually misarticulate targeted speech sounds. In this article, the authors (a) describe a nonword repetition task, the Syllable Repetition Task…
STUDENTS WRITING EMAILS TO FACULTY: AN EXAMINATION OF E-POLITENESS AMONG NATIVE AND NON-NATIVE SPEAKERS OF ENGLISH

Directory of Open Access Journals (Sweden)

Sigrun Biesenbach-Lucas

2007-02-01

Full Text Available This study combines interlanguage pragmatics and speech act research with computer-mediated communication and examines how native and non-native speakers of English formulate low- and high-imposition requests to faculty. While some research claims that email, due to absence of non-verbal cues, encourages informal language, other research has claimed the opposite. However, email technology also allows writers to plan and revise messages before sending them, thus affording the opportunity to edit not only for grammar and mechanics, but also for pragmatic clarity and politeness.The study examines email requests sent by native and non-native English speaking graduate students to faculty at a major American university over a period of several semesters and applies Blum-Kulka, House, and Kasper’s (1989 speech act analysis framework – quantitatively to distinguish levels of directness, i.e. pragmatic clarity; and qualitatively to compare syntactic and lexical politeness devices, the request perspectives, and the specific linguistic request realization patterns preferred by native and non-native speakers. Results show that far more requests are realized through direct strategies as well as hints than conventionally indirect strategies typically found in comparative speech act studies. Politeness conventions in email, a text-only medium with little guidance in the academic institutional hierarchy, appear to be a work in progress, and native speakers demonstrate greater resources in creating e-polite messages to their professors than non-native speakers. A possible avenue for pedagogical intervention with regard to instruction in and acquisition of politeness routines in hierarchically upward email communication is presented.
Robust Growth Determinants

OpenAIRE

Doppelhofer, Gernot; Weeks, Melvyn

2011-01-01

This paper investigates the robustness of determinants of economic growth in the presence of model uncertainty, parameter heterogeneity and outliers. The robust model averaging approach introduced in the paper uses a flexible and parsi- monious mixture modeling that allows for fat-tailed errors compared to the normal benchmark case. Applying robust model averaging to growth determinants, the paper finds that eight out of eighteen variables found to be significantly related to economic growth ...
A Robust and Self-Paced BCI System Based on a Four Class SSVEP Paradigm: Algorithms and Protocols for a High-Transfer-Rate Direct Brain Communication

Directory of Open Access Journals (Sweden)

Sergio Parini

2009-01-01

Full Text Available In this paper, we present, with particular focus on the adopted processing and identification chain and protocol-related solutions, a whole self-paced brain-computer interface system based on a 4-class steady-state visual evoked potentials (SSVEPs paradigm. The proposed system incorporates an automated spatial filtering technique centred on the common spatial patterns (CSPs method, an autoscaled and effective signal features extraction which is used for providing an unsupervised biofeedback, and a robust self-paced classifier based on the discriminant analysis theory. The adopted operating protocol is structured in a screening, training, and testing phase aimed at collecting user-specific information regarding best stimulation frequencies, optimal sources identification, and overall system processing chain calibration in only a few minutes. The system, validated on 11 healthy/pathologic subjects, has proven to be reliable in terms of achievable communication speed (up to 70 bit/min and very robust to false positive identifications.
A fast iterative recursive least squares algorithm for Wiener model identification of highly nonlinear systems.

Science.gov (United States)

Kazemi, Mahdi; Arefi, Mohammad Mehdi

2017-03-01

In this paper, an online identification algorithm is presented for nonlinear systems in the presence of output colored noise. The proposed method is based on extended recursive least squares (ERLS) algorithm, where the identified system is in polynomial Wiener form. To this end, an unknown intermediate signal is estimated by using an inner iterative algorithm. The iterative recursive algorithm adaptively modifies the vector of parameters of the presented Wiener model when the system parameters vary. In addition, to increase the robustness of the proposed method against variations, a robust RLS algorithm is applied to the model. Simulation results are provided to show the effectiveness of the proposed approach. Results confirm that the proposed method has fast convergence rate with robust characteristics, which increases the efficiency of the proposed model and identification approach. For instance, the FIT criterion will be achieved 92% in CSTR process where about 400 data is used. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
SU-F-R-31: Identification of Robust Normal Lung CT Texture Features for the Prediction of Radiation-Induced Lung Disease

Energy Technology Data Exchange (ETDEWEB)

Choi, W; Riyahi, S; Lu, W [University of Maryland School of Medicine, Baltimore, MD (United States)

2016-06-15

Purpose: Normal lung CT texture features have been used for the prediction of radiation-induced lung disease (radiation pneumonitis and radiation fibrosis). For these features to be clinically useful, they need to be relatively invariant (robust) to tumor size and not correlated with normal lung volume. Methods: The free-breathing CTs of 14 lung SBRT patients were studied. Different sizes of GTVs were simulated with spheres placed at the upper lobe and lower lobe respectively in the normal lung (contralateral to tumor). 27 texture features (9 from intensity histogram, 8 from grey-level co-occurrence matrix [GLCM] and 10 from grey-level run-length matrix [GLRM]) were extracted from [normal lung-GTV]. To measure the variability of a feature F, the relative difference D=|Fref -Fsim|/Fref*100% was calculated, where Fref was for the entire normal lung and Fsim was for [normal lung-GTV]. A feature was considered as robust if the largest non-outlier (Q3+1.5*IQR) D was less than 5%, and considered as not correlated with normal lung volume when their Pearson correlation was lower than 0.50. Results: Only 11 features were robust. All first-order intensity-histogram features (mean, max, etc.) were robust, while most higher-order features (skewness, kurtosis, etc.) were unrobust. Only two of the GLCM and four of the GLRM features were robust. Larger GTV resulted greater feature variation, this was particularly true for unrobust features. All robust features were not correlated with normal lung volume while three unrobust features showed high correlation. Excessive variations were observed in two low grey-level run features and were later identified to be from one patient with local lung diseases (atelectasis) in the normal lung. There was no dependence on GTV location. Conclusion: We identified 11 robust normal lung CT texture features that can be further examined for the prediction of radiation-induced lung disease. Interestingly, low grey-level run features identified normal
Analysis of error type and frequency in apraxia of speech among Portuguese speakers

Directory of Open Access Journals (Sweden)

Maysa Luchesi Cera

Full Text Available Abstract Most studies characterizing errors in the speech of patients with apraxia involve English language. Objectives: To analyze the types and frequency of errors produced by patients with apraxia of speech whose mother tongue was Brazilian Portuguese. Methods: 20 adults with apraxia of speech caused by stroke were assessed. The types of error committed by patients were analyzed both quantitatively and qualitatively, and frequencies compared. Results: We observed the presence of substitution, omission, trial-and-error, repetition, self-correction, anticipation, addition, reiteration and metathesis, in descending order of frequency, respectively. Omission type errors were one of the most commonly occurring whereas addition errors were infrequent. These findings differed to those reported in English speaking patients, probably owing to differences in the methodologies used for classifying error types; the inclusion of speakers with apraxia secondary to aphasia; and the difference in the structure of Portuguese language to English in terms of syllable onset complexity and effect on motor control. Conclusions: The frequency of omission and addition errors observed differed to the frequency reported for speakers of English.
Numerical investigation on vibration characteristics of a micro-speaker diaphragm considering thermoforming effects

Energy Technology Data Exchange (ETDEWEB)

Kim, Kyeong Min; Park, Ke Un [Seoul National University of Science and Technology, Seoul (Korea, Republic of)

2013-10-15

Micro-speaker diaphragms play an important role in generating desired sound responses, and are designed to have thin membrane shapes for flexibility in the axial direction. The micro-speaker diaphragms are formed from thin polymer film through the thermoforming process, in which local thickness reductions occur due to strain localization. This thickness reduction results in a change in vibration characteristics of the diaphragm and different sound responses from that of the original design. In this study, the effect of this thickness change in the diaphragm on its vibration characteristics is numerically investigated by coupling thermoforming simulation, structural analysis and modal analysis. Thus, the thickness change in the diaphragm is calculated from the thermoforming simulation, and reflected in the further structural and modal analyses in order to estimate the relevant stiffness and vibration modes. Comparing these simulation results with those from a diaphragm with the uniform thickness, it is found that a local thickness reduction results in the stiffness reduction and the relevant change in the natural frequencies and the corresponding vibration modes.
Speaker box made of composite particle board based on mushroom growing media waste

Science.gov (United States)

Tjahjanti, P. H.; Sutarman, Widodo, E.; Kurniawan, A. R.; Winarno, A. T.; Yani, A.

2017-06-01

This research aimed to use mushroom growing media waste (MGMW) that was added by urea, starch and polyvinyl chloride (PVC) glue as a composite particle board to be used as the material of speaker box manufacture. Physical and mechanical testing of particle board including density, moisture content, thickness swelling after immersion in water, strength in water absorption, internal bonding, modulus of elasticity, modulus of rupture and screw holding power, were carried out in accordance with the Stándar Nasional Indonesia (SNI) 03-2105-2006 and Japanese International Standard (JIS) A 5908-2003. The optimum composition of composite particle boards was 60% MGMW + 39% (50% urea +50% starch) + 1% PVC glue. Furthermore, the optimum composition to create speaker box with hardness values of 14.9 Brinnel Hardness Number and results of vibration test obtained amplitude values of the Z-axis, minimum of 0.032007 and maximum of 0.151575. For the acoustic test, results showed good sound absorption coefficients at frequencies of 500 Hz and it has better damping absorption.
Numerical investigation on vibration characteristics of a micro-speaker diaphragm considering thermoforming effects

International Nuclear Information System (INIS)

Kim, Kyeong Min; Park, Ke Un

2013-01-01

Micro-speaker diaphragms play an important role in generating desired sound responses, and are designed to have thin membrane shapes for flexibility in the axial direction. The micro-speaker diaphragms are formed from thin polymer film through the thermoforming process, in which local thickness reductions occur due to strain localization. This thickness reduction results in a change in vibration characteristics of the diaphragm and different sound responses from that of the original design. In this study, the effect of this thickness change in the diaphragm on its vibration characteristics is numerically investigated by coupling thermoforming simulation, structural analysis and modal analysis. Thus, the thickness change in the diaphragm is calculated from the thermoforming simulation, and reflected in the further structural and modal analyses in order to estimate the relevant stiffness and vibration modes. Comparing these simulation results with those from a diaphragm with the uniform thickness, it is found that a local thickness reduction results in the stiffness reduction and the relevant change in the natural frequencies and the corresponding vibration modes.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.