WorldWideScience

Sample records for automatic speaker recognition

  1. Automatic Speaker Recognition for Mobile Forensic Applications

    Directory of Open Access Journals (Sweden)

    Mohammed Algabri

    2017-01-01

    Full Text Available Presently, lawyers, law enforcement agencies, and judges in courts use speech and other biometric features to recognize suspects. In general, speaker recognition is used for discriminating people based on their voices. The process of determining, if a suspected speaker is the source of trace, is called forensic speaker recognition. In such applications, the voice samples are most probably noisy, the recording sessions might mismatch each other, the sessions might not contain sufficient recording for recognition purposes, and the suspect voices are recorded through mobile channel. The identification of a person through his voice within a forensic quality context is challenging. In this paper, we propose a method for forensic speaker recognition for the Arabic language; the King Saud University Arabic Speech Database is used for obtaining experimental results. The advantage of this database is that each speaker’s voice is recorded in both clean and noisy environments, through a microphone and a mobile channel. This diversity facilitates its usage in forensic experimentations. Mel-Frequency Cepstral Coefficients are used for feature extraction and the Gaussian mixture model-universal background model is used for speaker modeling. Our approach has shown low equal error rates (EER, within noisy environments and with very short test samples.

  2. A system of automatic speaker recognition on a minicomputer

    International Nuclear Information System (INIS)

    El Chafei, Cherif

    1978-01-01

    This study describes a system of automatic speaker recognition using the pitch of the voice. The pre-treatment consists in the extraction of the speakers' discriminating characteristics taken from the pitch. The programme of recognition gives, firstly, a preselection and then calculates the distance between the speaker's characteristics to be recognized and those of the speakers already recorded. An experience of recognition has been realized. It has been undertaken with 15 speakers and included 566 tests spread over an intermittent period of four months. The discriminating characteristics used offer several interesting qualities. The algorithms concerning the measure of the characteristics on one hand, the speakers' classification on the other hand, are simple. The results obtained in real time with a minicomputer are satisfactory. Furthermore they probably could be improved if we considered other speaker's discriminating characteristics but this was unfortunately not in our possibilities. (author) [fr

  3. Speaker Recognition

    DEFF Research Database (Denmark)

    Mølgaard, Lasse Lohilahti; Jørgensen, Kasper Winther

    2005-01-01

    Speaker recognition is basically divided into speaker identification and speaker verification. Verification is the task of automatically determining if a person really is the person he or she claims to be. This technology can be used as a biometric feature for verifying the identity of a person...

  4. Human and automatic speaker recognition over telecommunication channels

    CERN Document Server

    Fernández Gallardo, Laura

    2016-01-01

    This work addresses the evaluation of the human and the automatic speaker recognition performances under different channel distortions caused by bandwidth limitation, codecs, and electro-acoustic user interfaces, among other impairments. Its main contribution is the demonstration of the benefits of communication channels of extended bandwidth, together with an insight into how speaker-specific characteristics of speech are preserved through different transmissions. It provides sufficient motivation for considering speaker recognition as a criterion for the migration from narrowband to enhanced bandwidths, such as wideband and super-wideband.

  5. Forensic Automatic Speaker Recognition Based on Likelihood Ratio Using Acoustic-phonetic Features Measured Automatically

    Directory of Open Access Journals (Sweden)

    Huapeng Wang

    2015-01-01

    Full Text Available Forensic speaker recognition is experiencing a remarkable paradigm shift in terms of the evaluation framework and presentation of voice evidence. This paper proposes a new method of forensic automatic speaker recognition using the likelihood ratio framework to quantify the strength of voice evidence. The proposed method uses a reference database to calculate the within- and between-speaker variability. Some acoustic-phonetic features are extracted automatically using the software VoiceSauce. The effectiveness of the approach was tested using two Mandarin databases: A mobile telephone database and a landline database. The experiment's results indicate that these acoustic-phonetic features do have some discriminating potential and are worth trying in discrimination. The automatic acoustic-phonetic features have acceptable discriminative performance and can provide more reliable results in evidence analysis when fused with other kind of voice features.

  6. Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

    Directory of Open Access Journals (Sweden)

    Umit H. Yapanel

    2008-08-01

    Full Text Available A proven method for achieving effective automatic speech recognition (ASR due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN, despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN, where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER by 24%, and (ii for a diverse noisy speech task (SPINE 2, where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.

  7. Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

    Directory of Open Access Journals (Sweden)

    Yapanel UmitH

    2008-01-01

    Full Text Available A proven method for achieving effective automatic speech recognition (ASR due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN, despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN, where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER by 24%, and (ii for a diverse noisy speech task (SPINE 2, where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.

  8. Forensic speaker recognition

    NARCIS (Netherlands)

    Meuwly, Didier

    2013-01-01

    The aim of forensic speaker recognition is to establish links between individuals and criminal activities, through audio speech recordings. This field is multidisciplinary, combining predominantly phonetics, linguistics, speech signal processing, and forensic statistics. On these bases, expert-based

  9. Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?

    Directory of Open Access Journals (Sweden)

    Héctor Delgado

    2015-12-01

    Full Text Available This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.

  10. Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?

    Directory of Open Access Journals (Sweden)

    Héctor Delgado

    2015-06-01

    This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.

  11. An automatic speech recognition system with speaker-independent identification support

    Science.gov (United States)

    Caranica, Alexandru; Burileanu, Corneliu

    2015-02-01

    The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.

  12. Assessing the Performance of Automatic Speech Recognition Systems When Used by Native and Non-Native Speakers of Three Major Languages in Dictation Workflows

    DEFF Research Database (Denmark)

    Zapata, Julián; Kirkedal, Andreas Søeborg

    2015-01-01

    In this paper, we report on a two-part experiment aiming to assess and compare the performance of two types of automatic speech recognition (ASR) systems on two different computational platforms when used to augment dictation workflows. The experiment was performed with a sample of speakers...

  13. Hybrid Speaker Recognition Using Universal Acoustic Model

    Science.gov (United States)

    Nishimura, Jun; Kuroda, Tadahiro

    We propose a novel speaker recognition approach using a speaker-independent universal acoustic model (UAM) for sensornet applications. In sensornet applications such as “Business Microscope”, interactions among knowledge workers in an organization can be visualized by sensing face-to-face communication using wearable sensor nodes. In conventional studies, speakers are detected by comparing energy of input speech signals among the nodes. However, there are often synchronization errors among the nodes which degrade the speaker recognition performance. By focusing on property of the speaker's acoustic channel, UAM can provide robustness against the synchronization error. The overall speaker recognition accuracy is improved by combining UAM with the energy-based approach. For 0.1s speech inputs and 4 subjects, speaker recognition accuracy of 94% is achieved at the synchronization error less than 100ms.

  14. Analysis of Feature Extraction Methods for Speaker Dependent Speech Recognition

    Directory of Open Access Journals (Sweden)

    Gurpreet Kaur

    2017-02-01

    Full Text Available Speech recognition is about what is being said, irrespective of who is saying. Speech recognition is a growing field. Major progress is taking place on the technology of automatic speech recognition (ASR. Still, there are lots of barriers in this field in terms of recognition rate, background noise, speaker variability, speaking rate, accent etc. Speech recognition rate mainly depends on the selection of features and feature extraction methods. This paper outlines the feature extraction techniques for speaker dependent speech recognition for isolated words. A brief survey of different feature extraction techniques like Mel-Frequency Cepstral Coefficients (MFCC, Linear Predictive Coding Coefficients (LPCC, Perceptual Linear Prediction (PLP, Relative Spectra Perceptual linear Predictive (RASTA-PLP analysis are presented and evaluation is done. Speech recognition has various applications from daily use to commercial use. We have made a speaker dependent system and this system can be useful in many areas like controlling a patient vehicle using simple commands.

  15. Real Time Recognition Of Speakers From Internet Audio Stream

    Directory of Open Access Journals (Sweden)

    Weychan Radoslaw

    2015-09-01

    Full Text Available In this paper we present an automatic speaker recognition technique with the use of the Internet radio lossy (encoded speech signal streams. We show an influence of the audio encoder (e.g., bitrate on the speaker model quality. The model of each speaker was calculated with the use of the Gaussian mixture model (GMM approach. Both the speaker recognition and the further analysis were realized with the use of short utterances to facilitate real time processing. The neighborhoods of the speaker models were analyzed with the use of the ISOMAP algorithm. The experiments were based on four 1-hour public debates with 7–8 speakers (including the moderator, acquired from the Polish radio Internet services. The presented software was developed with the MATLAB environment.

  16. The 2016 NIST Speaker Recognition Evaluation

    Science.gov (United States)

    2017-08-20

    impact on system performance. Index Terms: NIST evaluation, NIST SRE, speaker detection, speaker recognition, speaker verification 1. Introduction NIST... self -reported. Second, there were two training conditions in SRE16, namely fixed and open. In the fixed training condition, par- ticipants were only

  17. A New Database for Speaker Recognition

    DEFF Research Database (Denmark)

    Feng, Ling; Hansen, Lars Kai

    2005-01-01

    In this paper we discuss properties of speech databases used for speaker recognition research and evaluation, and we characterize some popular standard databases. The paper presents a new database called ELSDSR dedicated to speaker recognition applications. The main characteristics of this database...

  18. Robust speaker recognition in noisy environments

    CERN Document Server

    Rao, K Sreenivasa

    2014-01-01

    This book discusses speaker recognition methods to deal with realistic variable noisy environments. The text covers authentication systems for; robust noisy background environments, functions in real time and incorporated in mobile devices. The book focuses on different approaches to enhance the accuracy of speaker recognition in presence of varying background environments. The authors examine: (a) Feature compensation using multiple background models, (b) Feature mapping using data-driven stochastic models, (c) Design of super vector- based GMM-SVM framework for robust speaker recognition, (d) Total variability modeling (i-vectors) in a discriminative framework and (e) Boosting method to fuse evidences from multiple SVM models.

  19. Robustness-related issues in speaker recognition

    CERN Document Server

    Zheng, Thomas Fang

    2017-01-01

    This book presents an overview of speaker recognition technologies with an emphasis on dealing with robustness issues. Firstly, the book gives an overview of speaker recognition, such as the basic system framework, categories under different criteria, performance evaluation and its development history. Secondly, with regard to robustness issues, the book presents three categories, including environment-related issues, speaker-related issues and application-oriented issues. For each category, the book describes the current hot topics, existing technologies, and potential research focuses in the future. The book is a useful reference book and self-learning guide for early researchers working in the field of robust speech recognition.

  20. Utterance Verification for Text-Dependent Speaker Recognition

    DEFF Research Database (Denmark)

    Kinnunen, Tomi; Sahidullah, Md; Kukanov, Ivan

    2016-01-01

    Text-dependent automatic speaker verification naturally calls for the simultaneous verification of speaker identity and spoken content. These two tasks can be achieved with automatic speaker verification (ASV) and utterance verification (UV) technologies. While both have been addressed previously...

  1. Supervised and Unsupervised Speaker Adaptation in the NIST 2005 Speaker Recognition Evaluation

    National Research Council Canada - National Science Library

    Hansen, Eric G; Slyh, Raymond E; Anderson, Timothy R

    2006-01-01

    Starting in 2004, the annual NIST Speaker Recognition Evaluation (SRE) has added an optional unsupervised speaker adaptation track where test files are processed sequentially and one may update the target model...

  2. Speaker recognition through NLP and CWT modeling.

    Energy Technology Data Exchange (ETDEWEB)

    Brown-VanHoozer, A.; Kercel, S. W.; Tucker, R. W.

    1999-06-23

    The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the ''huge population'' problem by seeking two completely different kinds of characterizing features. These features are extracted using the techniques of Neuro-Linguistic Programming (NLP) and the continuous wavelet transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant

  3. Speaker emotion recognition: from classical classifiers to deep neural networks

    Science.gov (United States)

    Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri

    2018-04-01

    Speaker emotion recognition is considered among the most challenging tasks in recent years. In fact, automatic systems for security, medicine or education can be improved when considering the speech affective state. In this paper, a twofold approach for speech emotion classification is proposed. At the first side, a relevant set of features is adopted, and then at the second one, numerous supervised training techniques, involving classic methods as well as deep learning, are experimented. Experimental results indicate that deep architecture can improve classification performance on two affective databases, the Berlin Dataset of Emotional Speech and the SAVEE Dataset Surrey Audio-Visual Expressed Emotion.

  4. Improving Speaker Recognition by Biometric Voice Deconstruction

    Directory of Open Access Journals (Sweden)

    Luis Miguel eMazaira-Fernández

    2015-09-01

    Full Text Available Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g. YouTube to broadcast its message. In this new scenario, classical identification methods (such fingerprints or face recognition have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. Through the present paper, a new methodology to characterize speakers will be shown. This methodology is benefiting from the advances achieved during the last years in understanding and modelling voice production. The paper hypothesizes that a gender dependent characterization of speakers combined with the use of a new set of biometric parameters extracted from the components resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract gender-dependent extended biometric parameters are given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.

  5. A Method to Integrate GMM, SVM and DTW for Speaker Recognition

    Directory of Open Access Journals (Sweden)

    Ing-Jr Ding

    2014-01-01

    Full Text Available This paper develops an effective and efficient scheme to integrate Gaussian mixture model (GMM, support vector machine (SVM, and dynamic time wrapping (DTW for automatic speaker recognition. GMM and SVM are two popular classifiers for speaker recognition applications. DTW is a fast and simple template matching method, and it is frequently seen in applications of speech recognition. In this work, DTW does not play a role to perform speech recognition, and it will be employed to be a verifier for verification of valid speakers. The proposed combination scheme of GMM, SVM and DTW, called SVMGMM-DTW, for speaker recognition in this study is a two-phase verification process task including GMM-SVM verification of the first phase and DTW verification of the second phase. By providing a double check to verify the identity of a speaker, it will be difficult for imposters to try to pass the security protection; therefore, the safety degree of speaker recognition systems will be largely increased. A series of experiments designed on door access control applications demonstrated that the superiority of the developed SVMGMM-DTW on speaker recognition accuracy.

  6. Improving Speaker Recognition by Biometric Voice Deconstruction

    Science.gov (United States)

    Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro

    2015-01-01

    Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions. PMID:26442245

  7. Similar speaker recognition using nonlinear analysis

    International Nuclear Information System (INIS)

    Seo, J.P.; Kim, M.S.; Baek, I.C.; Kwon, Y.H.; Lee, K.S.; Chang, S.W.; Yang, S.I.

    2004-01-01

    Speech features of the conventional speaker identification system, are usually obtained by linear methods in spectral space. However, these methods have the drawback that speakers with similar voices cannot be distinguished, because the characteristics of their voices are also similar in spectral space. To overcome the difficulty in linear methods, we propose to use the correlation exponent in the nonlinear space as a new feature vector for speaker identification among persons with similar voices. We show that our proposed method surprisingly reduces the error rate of speaker identification system to speakers with similar voices

  8. Data-Model Relationship in Text-Independent Speaker Recognition

    Directory of Open Access Journals (Sweden)

    Stapert Robert

    2005-01-01

    Full Text Available Text-independent speaker recognition systems such as those based on Gaussian mixture models (GMMs do not include time sequence information (TSI within the model itself. The level of importance of TSI in speaker recognition is an interesting question and one addressed in this paper. Recent works has shown that the utilisation of higher-level information such as idiolect, pronunciation, and prosodics can be useful in reducing speaker recognition error rates. In accordance with these developments, the aim of this paper is to show that as more data becomes available, the basic GMM can be enhanced by utilising TSI, even in a text-independent mode. This paper presents experimental work incorporating TSI into the conventional GMM. The resulting system, known as the segmental mixture model (SMM, embeds dynamic time warping (DTW into a GMM framework. Results are presented on the 2000-speaker SpeechDat Welsh database which show improved speaker recognition performance with the SMM.

  9. Cost-Sensitive Learning for Emotion Robust Speaker Recognition

    Directory of Open Access Journals (Sweden)

    Dongdong Li

    2014-01-01

    Full Text Available In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.

  10. Cost-sensitive learning for emotion robust speaker recognition.

    Science.gov (United States)

    Li, Dongdong; Yang, Yingchun; Dai, Weihui

    2014-01-01

    In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.

  11. Forensic Speaker Recognition Law Enforcement and Counter-Terrorism

    CERN Document Server

    Patil, Hemant

    2012-01-01

    Forensic Speaker Recognition: Law Enforcement and Counter-Terrorism is an anthology of the research findings of 35 speaker recognition experts from around the world. The volume provides a multidimensional view of the complex science involved in determining whether a suspect’s voice truly matches forensic speech samples, collected by law enforcement and counter-terrorism agencies, that are associated with the commission of a terrorist act or other crimes. While addressing such topics as the challenges of forensic case work, handling speech signal degradation, analyzing features of speaker recognition to optimize voice verification system performance, and designing voice applications that meet the practical needs of law enforcement and counter-terrorism agencies, this material all sounds a common theme: how the rigors of forensic utility are demanding new levels of excellence in all aspects of speaker recognition. The contributors are among the most eminent scientists in speech engineering and signal process...

  12. Experiments on Automatic Recognition of Nonnative Arabic Speech

    Directory of Open Access Journals (Sweden)

    Douglas O'Shaughnessy

    2008-05-01

    Full Text Available The automatic recognition of foreign-accented Arabic speech is a challenging task since it involves a large number of nonnative accents. As well, the nonnative speech data available for training are generally insufficient. Moreover, as compared to other languages, the Arabic language has sparked a relatively small number of research efforts. In this paper, we are concerned with the problem of nonnative speech in a speaker independent, large-vocabulary speech recognition system for modern standard Arabic (MSA. We analyze some major differences at the phonetic level in order to determine which phonemes have a significant part in the recognition performance for both native and nonnative speakers. Special attention is given to specific Arabic phonemes. The performance of an HMM-based Arabic speech recognition system is analyzed with respect to speaker gender and its native origin. The WestPoint modern standard Arabic database from the language data consortium (LDC and the hidden Markov Model Toolkit (HTK are used throughout all experiments. Our study shows that the best performance in the overall phoneme recognition is obtained when nonnative speakers are involved in both training and testing phases. This is not the case when a language model and phonetic lattice networks are incorporated in the system. At the phonetic level, the results show that female nonnative speakers perform better than nonnative male speakers, and that emphatic phonemes yield a significant decrease in performance when they are uttered by both male and female nonnative speakers.

  13. Experiments on Automatic Recognition of Nonnative Arabic Speech

    Directory of Open Access Journals (Sweden)

    Selouani Sid-Ahmed

    2008-01-01

    Full Text Available The automatic recognition of foreign-accented Arabic speech is a challenging task since it involves a large number of nonnative accents. As well, the nonnative speech data available for training are generally insufficient. Moreover, as compared to other languages, the Arabic language has sparked a relatively small number of research efforts. In this paper, we are concerned with the problem of nonnative speech in a speaker independent, large-vocabulary speech recognition system for modern standard Arabic (MSA. We analyze some major differences at the phonetic level in order to determine which phonemes have a significant part in the recognition performance for both native and nonnative speakers. Special attention is given to specific Arabic phonemes. The performance of an HMM-based Arabic speech recognition system is analyzed with respect to speaker gender and its native origin. The WestPoint modern standard Arabic database from the language data consortium (LDC and the hidden Markov Model Toolkit (HTK are used throughout all experiments. Our study shows that the best performance in the overall phoneme recognition is obtained when nonnative speakers are involved in both training and testing phases. This is not the case when a language model and phonetic lattice networks are incorporated in the system. At the phonetic level, the results show that female nonnative speakers perform better than nonnative male speakers, and that emphatic phonemes yield a significant decrease in performance when they are uttered by both male and female nonnative speakers.

  14. Thai Automatic Speech Recognition

    National Research Council Canada - National Science Library

    Suebvisai, Sinaporn; Charoenpornsawat, Paisarn; Black, Alan; Woszczyna, Monika; Schultz, Tanja

    2005-01-01

    .... We focus on the discussion of the rapid deployment of ASR for Thai under limited time and data resources, including rapid data collection issues, acoustic model bootstrap, and automatic generation of pronunciations...

  15. AUTOMATIC ARCHITECTURAL STYLE RECOGNITION

    Directory of Open Access Journals (Sweden)

    M. Mathias

    2012-09-01

    Full Text Available Procedural modeling has proven to be a very valuable tool in the field of architecture. In the last few years, research has soared to automatically create procedural models from images. However, current algorithms for this process of inverse procedural modeling rely on the assumption that the building style is known. So far, the determination of the building style has remained a manual task. In this paper, we propose an algorithm which automates this process through classification of architectural styles from facade images. Our classifier first identifies the images containing buildings, then separates individual facades within an image and determines the building style. This information could then be used to initialize the building reconstruction process. We have trained our classifier to distinguish between several distinct architectural styles, namely Flemish Renaissance, Haussmannian and Neoclassical. Finally, we demonstrate our approach on various street-side images.

  16. Scalable Learning for Geostatistics and Speaker Recognition

    Science.gov (United States)

    2011-01-01

    Device Architecture (CUDA)[63], a parallel programming model that leverages the parallel compute engine in NVIDIA GPUs to solve general purpose...validation. 3.1 Geospatial data reconstruction Sensors deployed on satellites are often used to collect enviromental data where a direct measurement is...same decision as training a model on B and testing on A, which is desirable in many recognition engines . We shall address this in the next chapter. The

  17. Physics of Automatic Target Recognition

    CERN Document Server

    Sadjadi, Firooz

    2007-01-01

    Physics of Automatic Target Recognition addresses the fundamental physical bases of sensing, and information extraction in the state-of-the art automatic target recognition field. It explores both passive and active multispectral sensing, polarimetric diversity, complex signature exploitation, sensor and processing adaptation, transformation of electromagnetic and acoustic waves in their interactions with targets, background clutter, transmission media, and sensing elements. The general inverse scattering, and advanced signal processing techniques and scientific evaluation methodologies being used in this multi disciplinary field will be part of this exposition. The issues of modeling of target signatures in various spectral modalities, LADAR, IR, SAR, high resolution radar, acoustic, seismic, visible, hyperspectral, in diverse geometric aspects will be addressed. The methods for signal processing and classification will cover concepts such as sensor adaptive and artificial neural networks, time reversal filt...

  18. Parametric Representation of the Speaker's Lips for Multimodal Sign Language and Speech Recognition

    Science.gov (United States)

    Ryumin, D.; Karpov, A. A.

    2017-05-01

    In this article, we propose a new method for parametric representation of human's lips region. The functional diagram of the method is described and implementation details with the explanation of its key stages and features are given. The results of automatic detection of the regions of interest are illustrated. A speed of the method work using several computers with different performances is reported. This universal method allows applying parametrical representation of the speaker's lipsfor the tasks of biometrics, computer vision, machine learning, and automatic recognition of face, elements of sign languages, and audio-visual speech, including lip-reading.

  19. Automatic recognition of printed Oriya script

    Indian Academy of Sciences (India)

    R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22

    Some studies have been reported on Tamil, Telugu and. Gurmukhi scripts ..... leaf node, giving rise to recognition errors. The contribution of these ... As a native speaker of Oriya, Anil Chand gave us useful advice about the script. P. Sashank.

  20. Robust Digital Speech Watermarking For Online Speaker Recognition

    Directory of Open Access Journals (Sweden)

    Mohammad Ali Nematollahi

    2015-01-01

    Full Text Available A robust and blind digital speech watermarking technique has been proposed for online speaker recognition systems based on Discrete Wavelet Packet Transform (DWPT and multiplication to embed the watermark in the amplitudes of the wavelet’s subbands. In order to minimize the degradation effect of the watermark, these subbands are selected where less speaker-specific information was available (500 Hz–3500 Hz and 6000 Hz–7000 Hz. Experimental results on Texas Instruments Massachusetts Institute of Technology (TIMIT, Massachusetts Institute of Technology (MIT, and Mobile Biometry (MOBIO show that the degradation for speaker verification and identification is 1.16% and 2.52%, respectively. Furthermore, the proposed watermark technique can provide enough robustness against different signal processing attacks.

  1. Recognition of speaker-dependent continuous speech with KEAL

    Science.gov (United States)

    Mercier, G.; Bigorgne, D.; Miclet, L.; Le Guennec, L.; Querre, M.

    1989-04-01

    A description of the speaker-dependent continuous speech recognition system KEAL is given. An unknown utterance, is recognized by means of the followng procedures: acoustic analysis, phonetic segmentation and identification, word and sentence analysis. The combination of feature-based, speaker-independent coarse phonetic segmentation with speaker-dependent statistical classification techniques is one of the main design features of the acoustic-phonetic decoder. The lexical access component is essentially based on a statistical dynamic programming technique which aims at matching a phonemic lexical entry containing various phonological forms, against a phonetic lattice. Sentence recognition is achieved by use of a context-free grammar and a parsing algorithm derived from Earley's parser. A speaker adaptation module allows some of the system parameters to be adjusted by matching known utterances with their acoustical representation. The task to be performed, described by its vocabulary and its grammar, is given as a parameter of the system. Continuously spoken sentences extracted from a 'pseudo-Logo' language are analyzed and results are presented.

  2. On the Use of Complementary Spectral Features for Speaker Recognition

    Directory of Open Access Journals (Sweden)

    Sridhar Krishnan

    2007-12-01

    Full Text Available The most popular features for speaker recognition are Mel frequency cepstral coefficients (MFCCs and linear prediction cepstral coefficients (LPCCs. These features are used extensively because they characterize the vocal tract configuration which is known to be highly speaker-dependent. In this work, several features are introduced that can characterize the vocal system in order to complement the traditional features and produce better speaker recognition models. The spectral centroid (SC, spectral bandwidth (SBW, spectral band energy (SBE, spectral crest factor (SCF, spectral flatness measure (SFM, Shannon entropy (SE, and Renyi entropy (RE were utilized for this purpose. This work demonstrates that these features are robust in noisy conditions by simulating some common distortions that are found in the speakers' environment and a typical telephone channel. Babble noise, additive white Gaussian noise (AWGN, and a bandpass channel with 1 dB of ripple were used to simulate these noisy conditions. The results show significant improvements in classification performance for all noise conditions when these features were used to complement the MFCC and ΔMFCC features. In particular, the SC and SCF improved performance in almost all noise conditions within the examined SNR range (10–40 dB. For example, in cases where there was only one source of distortion, classification improvements of up to 8% and 10% were achieved under babble noise and AWGN, respectively, using the SCF feature.

  3. Speaker Recognition for Mobile User Authentication: An Android Solution

    OpenAIRE

    Brunet , Kevin; Taam , Karim; Cherrier , Estelle; Faye , Ndiaga; Rosenberger , Christophe

    2013-01-01

    National audience; This paper deals with a biometric solution for authentication on mobile devices. Among the possible biometric modalities, speaker recognition seems the most natural choice for a mobile phone. This work lies in the continuation of our previous work \\cite{Biosig2012}, where we evaluated a candidate algorithm in terms of performance and time processing. The proposed solution is implemented here as an Android application. Its performances are evaluated both on a public database...

  4. An introduction to application-independent evaluation of speaker recognition systems

    NARCIS (Netherlands)

    Leeuwen, D.A. van; Brümmer, N.

    2007-01-01

    In the evaluation of speaker recognition systems - an important part of speaker classification [1], the trade-off between missed speakers and false alarms has always been an important diagnostic tool. NIST has defined the task of speaker detection with the associated Detection Cost Function (DCF) to

  5. Using Reversed MFCC and IT-EM for Automatic Speaker Verification

    Directory of Open Access Journals (Sweden)

    Sheeraz Memon

    2012-01-01

    Full Text Available This paper proposes text independent automatic speaker verification system using IMFCC (Inverse/ Reverse Mel Frequency Coefficients and IT-EM (Information Theoretic Expectation Maximization. To perform speaker verification, feature extraction using Mel scale has been widely applied and has established better results. The IMFCC is based on inverse Mel-scale. The IMFCC effectively captures information available at the high frequency formants which is ignored by the MFCC. In this paper the fusion of MFCC and IMFCC at input level is proposed. GMMs (Gaussian Mixture Models based on EM (Expectation Maximization have been widely used for classification of text independent verification. However EM comes across the convergence issue. In this paper we use our proposed IT-EM which has faster convergence, to train speaker models. IT-EM uses information theory principles such as PDE (Parzen Density Estimation and KL (Kullback-Leibler divergence measure. IT-EM acclimatizes the weights, means and covariances, like EM. However, IT-EM process is not performed on feature vector sets but on a set of centroids obtained using IT (Information Theoretic metric. The IT-EM process at once diminishes divergence measure between PDE estimates of features distribution within a given class and the centroids distribution within the same class. The feature level fusion and IT-EM is tested for the task of speaker verification using NIST2001 and NIST2004. The experimental evaluation validates that MFCC/IMFCC has better results than the conventional delta/MFCC feature set. The MFCC/IMFCC feature vector size is also much smaller than the delta MFCC thus reducing the computational burden as well. IT-EM method also showed faster convergence, than the conventional EM method, and thus it leads to higher speaker recognition scores.

  6. Hemispheric lateralization of linguistic prosody recognition in comparison to speech and speaker recognition.

    Science.gov (United States)

    Kreitewolf, Jens; Friederici, Angela D; von Kriegstein, Katharina

    2014-11-15

    Hemispheric specialization for linguistic prosody is a controversial issue. While it is commonly assumed that linguistic prosody and emotional prosody are preferentially processed in the right hemisphere, neuropsychological work directly comparing processes of linguistic prosody and emotional prosody suggests a predominant role of the left hemisphere for linguistic prosody processing. Here, we used two functional magnetic resonance imaging (fMRI) experiments to clarify the role of left and right hemispheres in the neural processing of linguistic prosody. In the first experiment, we sought to confirm previous findings showing that linguistic prosody processing compared to other speech-related processes predominantly involves the right hemisphere. Unlike previous studies, we controlled for stimulus influences by employing a prosody and speech task using the same speech material. The second experiment was designed to investigate whether a left-hemispheric involvement in linguistic prosody processing is specific to contrasts between linguistic prosody and emotional prosody or whether it also occurs when linguistic prosody is contrasted against other non-linguistic processes (i.e., speaker recognition). Prosody and speaker tasks were performed on the same stimulus material. In both experiments, linguistic prosody processing was associated with activity in temporal, frontal, parietal and cerebellar regions. Activation in temporo-frontal regions showed differential lateralization depending on whether the control task required recognition of speech or speaker: recognition of linguistic prosody predominantly involved right temporo-frontal areas when it was contrasted against speech recognition; when contrasted against speaker recognition, recognition of linguistic prosody predominantly involved left temporo-frontal areas. The results show that linguistic prosody processing involves functions of both hemispheres and suggest that recognition of linguistic prosody is based on

  7. LEARNING VECTOR QUANTIZATION FOR ADAPTED GAUSSIAN MIXTURE MODELS IN AUTOMATIC SPEAKER IDENTIFICATION

    Directory of Open Access Journals (Sweden)

    IMEN TRABELSI

    2017-05-01

    Full Text Available Speaker Identification (SI aims at automatically identifying an individual by extracting and processing information from his/her voice. Speaker voice is a robust a biometric modality that has a strong impact in several application areas. In this study, a new combination learning scheme has been proposed based on Gaussian mixture model-universal background model (GMM-UBM and Learning vector quantization (LVQ for automatic text-independent speaker identification. Features vectors, constituted by the Mel Frequency Cepstral Coefficients (MFCC extracted from the speech signal are used to train the New England subset of the TIMIT database. The best results obtained (90% for gender- independent speaker identification, 97 % for male speakers and 93% for female speakers for test data using 36 MFCC features.

  8. Automaticity and stability of adaptation to a foreign-accented speaker

    NARCIS (Netherlands)

    Witteman, M.J.; Bardhan, N.P.; Weber, A.C.; McQueen, J.M.

    2015-01-01

    In three cross-modal priming experiments we asked whether adaptation to a foreign-accented speaker is automatic, and whether adaptation can be seen after a long delay between initial exposure and test. Dutch listeners were exposed to a Hebrew-accented Dutch speaker with two types of Dutch words:

  9. Support vector machine for automatic pain recognition

    Science.gov (United States)

    Monwar, Md Maruf; Rezaei, Siamak

    2009-02-01

    Facial expressions are a key index of emotion and the interpretation of such expressions of emotion is critical to everyday social functioning. In this paper, we present an efficient video analysis technique for recognition of a specific expression, pain, from human faces. We employ an automatic face detector which detects face from the stored video frame using skin color modeling technique. For pain recognition, location and shape features of the detected faces are computed. These features are then used as inputs to a support vector machine (SVM) for classification. We compare the results with neural network based and eigenimage based automatic pain recognition systems. The experiment results indicate that using support vector machine as classifier can certainly improve the performance of automatic pain recognition system.

  10. Automatic speech recognition (zero crossing method). Automatic recognition of isolated vowels

    International Nuclear Information System (INIS)

    Dupeyrat, Benoit

    1975-01-01

    This note describes a recognition method of isolated vowels, using a preprocessing of the vocal signal. The processing extracts the extrema of the vocal signal and the interval time separating them (Zero crossing distances of the first derivative of the signal). The recognition of vowels uses normalized histograms of the values of these intervals. The program determines a distance between the histogram of the sound to be recognized and histograms models built during a learning phase. The results processed on real time by a minicomputer, are relatively independent of the speaker, the fundamental frequency being not allowed to vary too much (i.e. speakers of the same sex). (author) [fr

  11. Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

    Science.gov (United States)

    Kayasith, Prakasith; Theeramunkong, Thanaruk

    It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.

  12. Towards automatic forensic face recognition

    NARCIS (Netherlands)

    Ali, Tauseef; Spreeuwers, Lieuwe Jan; Veldhuis, Raymond N.J.

    2011-01-01

    In this paper we present a methodology and experimental results for evidence evaluation in the context of forensic face recognition. In forensic applications, the matching score (hereafter referred to as similarity score) from a biometric system must be represented as a Likelihood Ratio (LR). In our

  13. Automatic modulation recognition of communication signals

    CERN Document Server

    Azzouz, Elsayed Elsayed

    1996-01-01

    Automatic modulation recognition is a rapidly evolving area of signal analysis. In recent years, interest from the academic and military research institutes has focused around the research and development of modulation recognition algorithms. Any communication intelligence (COMINT) system comprises three main blocks: receiver front-end, modulation recogniser and output stage. Considerable work has been done in the area of receiver front-ends. The work at the output stage is concerned with information extraction, recording and exploitation and begins with signal demodulation, that requires accurate knowledge about the signal modulation type. There are, however, two main reasons for knowing the current modulation type of a signal; to preserve the signal information content and to decide upon the suitable counter action, such as jamming. Automatic Modulation Recognition of Communications Signals describes in depth this modulation recognition process. Drawing on several years of research, the authors provide a cr...

  14. The automaticity of emotion recognition.

    Science.gov (United States)

    Tracy, Jessica L; Robins, Richard W

    2008-02-01

    Evolutionary accounts of emotion typically assume that humans evolved to quickly and efficiently recognize emotion expressions because these expressions convey fitness-enhancing messages. The present research tested this assumption in 2 studies. Specifically, the authors examined (a) how quickly perceivers could recognize expressions of anger, contempt, disgust, embarrassment, fear, happiness, pride, sadness, shame, and surprise; (b) whether accuracy is improved when perceivers deliberate about each expression's meaning (vs. respond as quickly as possible); and (c) whether accurate recognition can occur under cognitive load. Across both studies, perceivers quickly and efficiently (i.e., under cognitive load) recognized most emotion expressions, including the self-conscious emotions of pride, embarrassment, and shame. Deliberation improved accuracy in some cases, but these improvements were relatively small. Discussion focuses on the implications of these findings for the cognitive processes underlying emotion recognition.

  15. Optimization of multilayer neural network parameters for speaker recognition

    Science.gov (United States)

    Tovarek, Jaromir; Partila, Pavol; Rozhon, Jan; Voznak, Miroslav; Skapa, Jan; Uhrin, Dominik; Chmelikova, Zdenka

    2016-05-01

    This article discusses the impact of multilayer neural network parameters for speaker identification. The main task of speaker identification is to find a specific person in the known set of speakers. It means that the voice of an unknown speaker (wanted person) belongs to a group of reference speakers from the voice database. One of the requests was to develop the text-independent system, which means to classify wanted person regardless of content and language. Multilayer neural network has been used for speaker identification in this research. Artificial neural network (ANN) needs to set parameters like activation function of neurons, steepness of activation functions, learning rate, the maximum number of iterations and a number of neurons in the hidden and output layers. ANN accuracy and validation time are directly influenced by the parameter settings. Different roles require different settings. Identification accuracy and ANN validation time were evaluated with the same input data but different parameter settings. The goal was to find parameters for the neural network with the highest precision and shortest validation time. Input data of neural networks are a Mel-frequency cepstral coefficients (MFCC). These parameters describe the properties of the vocal tract. Audio samples were recorded for all speakers in a laboratory environment. Training, testing and validation data set were split into 70, 15 and 15 %. The result of the research described in this article is different parameter setting for the multilayer neural network for four speakers.

  16. System for automatic crate recognition

    Directory of Open Access Journals (Sweden)

    Radovan Kukla

    2012-01-01

    Full Text Available This contribution describes usage of computer vision and artificial intelligence methods for application. The method solves abuse of reverse vending machine. This topic has been solved as innovation voucher for the South Moravian Region. It was developed by Mendel university in Brno (Department of informatics – Faculty of Business and Economics and Department of Agricultural, Food and Environmental Engineering – Faculty of Agronomy together with the Czech subsidiary of Tomra. The project is focused on a possibility of integration industrial cameras and computers to process recognition of crates in the verse vending machine. The aim was the effective security system that will be able to save hundreds-thousands financial loss. As suitable development and runtime platform there was chosen product ControlWeb and VisionLab developed by Moravian Instruments Inc.

  17. Image-based automatic recognition of larvae

    Science.gov (United States)

    Sang, Ru; Yu, Guiying; Fan, Weijun; Guo, Tiantai

    2010-08-01

    As the main objects, imagoes have been researched in quarantine pest recognition in these days. However, pests in their larval stage are latent, and the larvae spread abroad much easily with the circulation of agricultural and forest products. It is presented in this paper that, as the new research objects, larvae are recognized by means of machine vision, image processing and pattern recognition. More visional information is reserved and the recognition rate is improved as color image segmentation is applied to images of larvae. Along with the characteristics of affine invariance, perspective invariance and brightness invariance, scale invariant feature transform (SIFT) is adopted for the feature extraction. The neural network algorithm is utilized for pattern recognition, and the automatic identification of larvae images is successfully achieved with satisfactory results.

  18. Speaker information affects false recognition of unstudied lexical-semantic associates.

    Science.gov (United States)

    Luthra, Sahil; Fox, Neal P; Blumstein, Sheila E

    2018-05-01

    Recognition of and memory for a spoken word can be facilitated by a prior presentation of that word spoken by the same talker. However, it is less clear whether this speaker congruency advantage generalizes to facilitate recognition of unheard related words. The present investigation employed a false memory paradigm to examine whether information about a speaker's identity in items heard by listeners could influence the recognition of novel items (critical intruders) phonologically or semantically related to the studied items. In Experiment 1, false recognition of semantically associated critical intruders was sensitive to speaker information, though only when subjects attended to talker identity during encoding. Results from Experiment 2 also provide some evidence that talker information affects the false recognition of critical intruders. Taken together, the present findings indicate that indexical information is able to contact the lexical-semantic network to affect the processing of unheard words.

  19. Artificially intelligent recognition of Arabic speaker using voice print-based local features

    Science.gov (United States)

    Mahmood, Awais; Alsulaiman, Mansour; Muhammad, Ghulam; Akram, Sheeraz

    2016-11-01

    Local features for any pattern recognition system are based on the information extracted locally. In this paper, a local feature extraction technique was developed. This feature was extracted in the time-frequency plain by taking the moving average on the diagonal directions of the time-frequency plane. This feature captured the time-frequency events producing a unique pattern for each speaker that can be viewed as a voice print of the speaker. Hence, we referred to this technique as voice print-based local feature. The proposed feature was compared to other features including mel-frequency cepstral coefficient (MFCC) for speaker recognition using two different databases. One of the databases used in the comparison is a subset of an LDC database that consisted of two short sentences uttered by 182 speakers. The proposed feature attained 98.35% recognition rate compared to 96.7% for MFCC using the LDC subset.

  20. Speaker Recognition from Emotional Speech Using I-vector Approach

    Directory of Open Access Journals (Sweden)

    MACKOVÁ Lenka

    2014-05-01

    Full Text Available In recent years the concept of i-vectors become very popular and successful in the field of the speaker verification. The basic principle of i-vectors is that each utterance is represented by fixed-length feature vector of low-dimension. In the literature for purpose of speaker verification various recordings obtained from telephones or microphones were used. The aim of this experiment was to perform speaker verification using speaker model trained with emotional recordings on i-vector basis. The Mel Frequency Cepstral Coefficients (MFCC, log energy, their deltas and acceleration coefficients were used in process of features extraction. As the classification methods of the verification system Mahalanobis distance metric in combination with Eigen Factor Radial normalization was used and in the second approach Cosine Distance Scoring (CSS metric with Within-class Covariance Normalization as a channel compensation was employed. This verification system used emotional recordings of male subjects from freely available German emotional database (Emo-DB.

  1. Revisiting vocal perception in non-human animals: a review of vowel discrimination, speaker voice recognition, and speaker normalization

    Directory of Open Access Journals (Sweden)

    Buddhamas eKriengwatana

    2015-01-01

    Full Text Available The extent to which human speech perception evolved by taking advantage of predispositions and pre-existing features of vertebrate auditory and cognitive systems remains a central question in the evolution of speech. This paper reviews asymmetries in vowel perception, speaker voice recognition, and speaker normalization in non-human animals – topics that have not been thoroughly discussed in relation to the abilities of non-human animals, but are nonetheless important aspects of vocal perception. Throughout this paper we demonstrate that addressing these issues in non-human animals is relevant and worthwhile because many non-human animals must deal with similar issues in their natural environment. That is, they must also discriminate between similar-sounding vocalizations, determine signaler identity from vocalizations, and resolve signaler-dependent variation in vocalizations from conspecifics. Overall, we find that, although plausible, the current evidence is insufficiently strong to conclude that directional asymmetries in vowel perception are specific to humans, or that non-human animals can use voice characteristics to recognize human individuals. However, we do find some indication that non-human animals can normalize speaker differences. Accordingly, we identify avenues for future research that would greatly improve and advance our understanding of these topics.

  2. Multilingual speaker age recognition: regression analyses on the Lwazi corpus

    CSIR Research Space (South Africa)

    Feld, M

    2009-12-01

    Full Text Available Multilinguality represents an area of significant opportunities for automatic speech-processing systems: whereas multilingual societies are commonplace, the majority of speechprocessing systems are developed with a single language in mind. As a step...

  3. Who spoke when? Audio-based speaker location estimation for diarization

    NARCIS (Netherlands)

    Dadvar, M.

    2011-01-01

    Speaker diarization is the process which detects active speakers and groups those speech signals which has been uttered by the same speaker. Generally we can find two main applications for speaker diarization. Automatic Speech Recognition systems make use of the speaker homogeneous clusters to adapt

  4. Authentication: From Passwords to Biometrics: An implementation of a speaker recognition system on Android

    OpenAIRE

    Heimark, Erlend

    2012-01-01

    We implement a biometric authentication system on the Android platform, which is based on text-dependent speaker recognition. The Android version used in the application is Android 4.0. The application makes use of the Modular Audio Recognition Framework, from which many of the algorithms are adapted in the processes of preprocessing and feature extraction. In addition, we employ the Dynamic Time Warping (DTW) algorithm for the comparison of different voice features. A training procedure is i...

  5. Multistage Data Selection-based Unsupervised Speaker Adaptation for Personalized Speech Emotion Recognition

    NARCIS (Netherlands)

    Kim, Jaebok; Park, Jeong-Sik

    This paper proposes an efficient speech emotion recognition (SER) approach that utilizes personal voice data accumulated on personal devices. A representative weakness of conventional SER systems is the user-dependent performance induced by the speaker independent (SI) acoustic model framework. But,

  6. Towards automatic musical instrument timbre recognition

    Science.gov (United States)

    Park, Tae Hong

    This dissertation is comprised of two parts---focus on issues concerning research and development of an artificial system for automatic musical instrument timbre recognition and musical compositions. The technical part of the essay includes a detailed record of developed and implemented algorithms for feature extraction and pattern recognition. A review of existing literature introducing historical aspects surrounding timbre research, problems associated with a number of timbre definitions, and highlights of selected research activities that have had significant impact in this field are also included. The developed timbre recognition system follows a bottom-up, data-driven model that includes a pre-processing module, feature extraction module, and a RBF/EBF (Radial/Elliptical Basis Function) neural network-based pattern recognition module. 829 monophonic samples from 12 instruments have been chosen from the Peter Siedlaczek library (Best Service) and other samples from the Internet and personal collections. Significant emphasis has been put on feature extraction development and testing to achieve robust and consistent feature vectors that are eventually passed to the neural network module. In order to avoid a garbage-in-garbage-out (GIGO) trap and improve generality, extra care was taken in designing and testing the developed algorithms using various dynamics, different playing techniques, and a variety of pitches for each instrument with inclusion of attack and steady-state portions of a signal. Most of the research and development was conducted in Matlab. The compositional part of the essay includes brief introductions to "A d'Ess Are ," "Aboji," "48 13 N, 16 20 O," and "pH-SQ." A general outline pertaining to the ideas and concepts behind the architectural designs of the pieces including formal structures, time structures, orchestration methods, and pitch structures are also presented.

  7. Robust Speaker Authentication Based on Combined Speech and Voiceprint Recognition

    Science.gov (United States)

    Malcangi, Mario

    2009-08-01

    Personal authentication is becoming increasingly important in many applications that have to protect proprietary data. Passwords and personal identification numbers (PINs) prove not to be robust enough to ensure that unauthorized people do not use them. Biometric authentication technology may offer a secure, convenient, accurate solution but sometimes fails due to its intrinsically fuzzy nature. This research aims to demonstrate that combining two basic speech processing methods, voiceprint identification and speech recognition, can provide a very high degree of robustness, especially if fuzzy decision logic is used.

  8. Accuracy of MFCC-Based Speaker Recognition in Series 60 Device

    Directory of Open Access Journals (Sweden)

    Pasi Fränti

    2005-10-01

    Full Text Available A fixed point implementation of speaker recognition based on MFCC signal processing is considered. We analyze the numerical error of the MFCC and its effect on the recognition accuracy. Techniques to reduce the information loss in a converted fixed point implementation are introduced. We increase the signal processing accuracy by adjusting the ratio of presentation accuracy of the operators and the signal. The signal processing error is found out to be more important to the speaker recognition accuracy than the error in the classification algorithm. The results are verified by applying the alternative technique to speech data. We also discuss the specific programming requirements set up by the Symbian and Series 60.

  9. Hidden Markov models in automatic speech recognition

    Science.gov (United States)

    Wrzoskowicz, Adam

    1993-11-01

    This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.

  10. Image simulation for automatic license plate recognition

    Science.gov (United States)

    Bala, Raja; Zhao, Yonghui; Burry, Aaron; Kozitsky, Vladimir; Fillion, Claude; Saunders, Craig; Rodríguez-Serrano, José

    2012-01-01

    Automatic license plate recognition (ALPR) is an important capability for traffic surveillance applications, including toll monitoring and detection of different types of traffic violations. ALPR is a multi-stage process comprising plate localization, character segmentation, optical character recognition (OCR), and identification of originating jurisdiction (i.e. state or province). Training of an ALPR system for a new jurisdiction typically involves gathering vast amounts of license plate images and associated ground truth data, followed by iterative tuning and optimization of the ALPR algorithms. The substantial time and effort required to train and optimize the ALPR system can result in excessive operational cost and overhead. In this paper we propose a framework to create an artificial set of license plate images for accelerated training and optimization of ALPR algorithms. The framework comprises two steps: the synthesis of license plate images according to the design and layout for a jurisdiction of interest; and the modeling of imaging transformations and distortions typically encountered in the image capture process. Distortion parameters are estimated by measurements of real plate images. The simulation methodology is successfully demonstrated for training of OCR.

  11. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment

    Science.gov (United States)

    2015-10-01

    Dallas Erik Jonsson School of Engineering & Computer Science EC32 P.O. Box 830688 Richardson, Texas 75083-0688 8. PERFORMING ORGANIZATION REPORT...87 4.3 Whisper Based Processing for ASR ………………………………………….…. 92 5.0 Task 5: SPEAKER STATE ASSESSMENT/ ENVIROMENTAL SNIFFING (SSA/ENVS...Dec. 7-10, 2014 [3] S. Amuda, H. Boril, A. Sangwan, J.H.L. Hansen, T.S. Ibiyemi, “ Engineering analysis and recognition of Nigerian English: An

  12. Phoneme Error Pattern by Heritage Speakers of Spanish on an English Word Recognition Test.

    Science.gov (United States)

    Shi, Lu-Feng

    2017-04-01

    Heritage speakers acquire their native language from home use in their early childhood. As the native language is typically a minority language in the society, these individuals receive their formal education in the majority language and eventually develop greater competency with the majority than their native language. To date, there have not been specific research attempts to understand word recognition by heritage speakers. It is not clear if and to what degree we may infer from evidence based on bilingual listeners in general. This preliminary study investigated how heritage speakers of Spanish perform on an English word recognition test and analyzed their phoneme errors. A prospective, cross-sectional, observational design was employed. Twelve normal-hearing adult Spanish heritage speakers (four men, eight women, 20-38 yr old) participated in the study. Their language background was obtained through the Language Experience and Proficiency Questionnaire. Nine English monolingual listeners (three men, six women, 20-41 yr old) were also included for comparison purposes. Listeners were presented with 200 Northwestern University Auditory Test No. 6 words in quiet. They repeated each word orally and in writing. Their responses were scored by word, word-initial consonant, vowel, and word-final consonant. Performance was compared between groups with Student's t test or analysis of variance. Group-specific error patterns were primarily descriptive, but intergroup comparisons were made using 95% or 99% confidence intervals for proportional data. The two groups of listeners yielded comparable scores when their responses were examined by word, vowel, and final consonant. However, heritage speakers of Spanish misidentified significantly more word-initial consonants and had significantly more difficulty with initial /p, b, h/ than their monolingual peers. The two groups yielded similar patterns for vowel and word-final consonants, but heritage speakers made significantly

  13. Development of a System for Automatic Recognition of Speech

    Directory of Open Access Journals (Sweden)

    Roman Jarina

    2003-01-01

    Full Text Available The article gives a review of a research on processing and automatic recognition of speech signals (ARR at the Department of Telecommunications of the Faculty of Electrical Engineering, University of iilina. On-going research is oriented to speech parametrization using 2-dimensional cepstral analysis, and to an application of HMMs and neural networks for speech recognition in Slovak language. The article summarizes achieved results and outlines future orientation of our research in automatic speech recognition.

  14. A Novel Approach in Text-Independent Speaker Recognition in Noisy Environment

    Directory of Open Access Journals (Sweden)

    Nona Heydari Esfahani

    2014-10-01

    Full Text Available In this paper, robust text-independent speaker recognition is taken into consideration. The proposed method performs on manual silence-removed utterances that are segmented into smaller speech units containing few phones and at least one vowel. The segments are basic units for long-term feature extraction. Sub-band entropy is directly extracted in each segment. A robust vowel detection method is then applied on each segment to separate a high energy vowel that is used as unit for pitch frequency and formant extraction. By applying a clustering technique, extracted short-term features namely MFCC coefficients are combined with long term features. Experiments using MLP classifier show that the average speaker accuracy recognition rate is 97.33% for clean speech and 61.33% in noisy environment for -2db SNR, that shows improvement compared to other conventional methods.

  15. Speed and automaticity of word recognition - inseparable twins?

    DEFF Research Database (Denmark)

    Poulsen, Mads; Asmussen, Vibeke; Elbro, Carsten

    'Speed and automaticity' of word recognition is a standard collocation. However, it is not clear whether speed and automaticity (i.e., effortlessness) make independent contributions to reading comprehension. In theory, both speed and automaticity may save cognitive resources for comprehension...... processes. Hence, the aim of the present study was to assess the unique contributions of word recognition speed and automaticity to reading comprehension while controlling for decoding speed and accuracy. Method: 139 Grade 5 students completed tests of reading comprehension and computer-based tests of speed...... of decoding and word recognition together with a test of effortlessness (automaticity) of word recognition. Effortlessness was measured in a dual task in which participants were presented with a word enclosed in an unrelated figure. The task was to read the word and decide whether the figure was a triangle...

  16. Towards PLDA-RBM based speaker recognition in mobile environment: Designing stacked/deep PLDA-RBM systems

    DEFF Research Database (Denmark)

    Nautsch, Andreas; Hao, Hong; Stafylakis, Themos

    2016-01-01

    recognition: two deep architectures are presented and examined, which aim at suppressing channel effects and recovering speaker-discriminative information on back-ends trained on a small dataset. Experiments are carried out on the MOBIO SRE'13 database, which is a challenging and publicly available dataset...... for mobile speaker recognition with limited amounts of training data. The experiments show that the proposed system outperforms the baseline i-vector/PLDA approach by relative gains of 31% on female and 9% on male speakers in terms of half total error rate....

  17. Automatic sign language recognition inspired by human sign perception

    NARCIS (Netherlands)

    Ten Holt, G.A.

    2010-01-01

    Automatic sign language recognition is a relatively new field of research (since ca. 1990). Its objectives are to automatically analyze sign language utterances. There are several issues within the research area that merit investigation: how to capture the utterances (cameras, magnetic sensors,

  18. A model based method for automatic facial expression recognition

    NARCIS (Netherlands)

    Kuilenburg, H. van; Wiering, M.A.; Uyl, M. den

    2006-01-01

    Automatic facial expression recognition is a research topic with interesting applications in the field of human-computer interaction, psychology and product marketing. The classification accuracy for an automatic system which uses static images as input is however largely limited by the image

  19. Statistical pattern recognition for automatic writer identification and verification

    NARCIS (Netherlands)

    Bulacu, Marius Lucian

    2007-01-01

    The thesis addresses the problem of automatic person identification using scanned images of handwriting.Identifying the author of a handwritten sample using automatic image-based methods is an interesting pattern recognition problem with direct applicability in the forensic and historic document

  20. An HMM-Like Dynamic Time Warping Scheme for Automatic Speech Recognition

    Directory of Open Access Journals (Sweden)

    Ing-Jr Ding

    2014-01-01

    Full Text Available In the past, the kernel of automatic speech recognition (ASR is dynamic time warping (DTW, which is feature-based template matching and belongs to the category technique of dynamic programming (DP. Although DTW is an early developed ASR technique, DTW has been popular in lots of applications. DTW is playing an important role for the known Kinect-based gesture recognition application now. This paper proposed an intelligent speech recognition system using an improved DTW approach for multimedia and home automation services. The improved DTW presented in this work, called HMM-like DTW, is essentially a hidden Markov model- (HMM- like method where the concept of the typical HMM statistical model is brought into the design of DTW. The developed HMM-like DTW method, transforming feature-based DTW recognition into model-based DTW recognition, will be able to behave as the HMM recognition technique and therefore proposed HMM-like DTW with the HMM-like recognition model will have the capability to further perform model adaptation (also known as speaker adaptation. A series of experimental results in home automation-based multimedia access service environments demonstrated the superiority and effectiveness of the developed smart speech recognition system by HMM-like DTW.

  1. Sign language perception research for improving automatic sign language recognition

    NARCIS (Netherlands)

    Ten Holt, G.A.; Arendsen, J.; De Ridder, H.; Van Doorn, A.J.; Reinders, M.J.T.; Hendriks, E.A.

    2009-01-01

    Current automatic sign language recognition (ASLR) seldom uses perceptual knowledge about the recognition of sign language. Using such knowledge can improve ASLR because it can give an indication which elements or phases of a sign are important for its meaning. Also, the current generation of

  2. Contribution to automatic speech recognition. Analysis of the direct acoustical signal. Recognition of isolated words and phoneme identification

    International Nuclear Information System (INIS)

    Dupeyrat, Benoit

    1981-01-01

    This report deals with the acoustical-phonetic step of the automatic recognition of the speech. The parameters used are the extrema of the acoustical signal (coded in amplitude and duration). This coding method, the properties of which are described, is simple and well adapted to a digital processing. The quality and the intelligibility of the coded signal after reconstruction are particularly satisfactory. An experiment for the automatic recognition of isolated words has been carried using this coding system. We have designed a filtering algorithm operating on the parameters of the coding. Thus the characteristics of the formants can be derived under certain conditions which are discussed. Using these characteristics the identification of a large part of the phonemes for a given speaker was achieved. Carrying on the studies has required the development of a particular methodology of real time processing which allowed immediate evaluation of the improvement of the programs. Such processing on temporal coding of the acoustical signal is extremely powerful and could represent, used in connection with other methods an efficient tool for the automatic processing of the speech.(author) [fr

  3. Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆

    Science.gov (United States)

    Cao, Houwei; Verma, Ragini; Nenkova, Ani

    2015-01-01

    We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion.

  4. Automatic Number Plate Recognition System for IPhone Devices

    Directory of Open Access Journals (Sweden)

    Călin Enăchescu

    2013-06-01

    Full Text Available This paper presents a system for automatic number plate recognition, implemented for devices running the iOS operating system. The methods used for number plate recognition are based on existing methods, but optimized for devices with low hardware resources. To solve the task of automatic number plate recognition we have divided it into the following subtasks: image acquisition, localization of the number plate position on the image and character detection. The first subtask is performed by the camera of an iPhone, the second one is done using image pre-processing methods and template matching. For the character recognition we are using a feed-forward artificial neural network. Each of these methods is presented along with its results.

  5. Quality Assessment of Compressed Video for Automatic License Plate Recognition

    DEFF Research Database (Denmark)

    Ukhanova, Ann; Støttrup-Andersen, Jesper; Forchhammer, Søren

    2014-01-01

    Definition of video quality requirements for video surveillance poses new questions in the area of quality assessment. This paper presents a quality assessment experiment for an automatic license plate recognition scenario. We explore the influence of the compression by H.264/AVC and H.265/HEVC s...... recognition in our study has a behavior similar to human recognition, allowing the use of the same mathematical models. We furthermore propose an application of one of the models for video surveillance systems......Definition of video quality requirements for video surveillance poses new questions in the area of quality assessment. This paper presents a quality assessment experiment for an automatic license plate recognition scenario. We explore the influence of the compression by H.264/AVC and H.265/HEVC...... standards on the recognition performance. We compare logarithmic and logistic functions for quality modeling. Our results show that a logistic function can better describe the dependence of recognition performance on the quality for both compression standards. We observe that automatic license plate...

  6. Two Systems for Automatic Music Genre Recognition

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2012-01-01

    We re-implement and test two state-of-the-art systems for automatic music genre classification; but unlike past works in this area, we look closer than ever before at their behavior. First, we look at specific instances where each system consistently applies the same wrong label across multiple...... trials of cross-validation. Second, we test the robustness of each system to spectral equalization. Finally, we test how well human subjects recognize the genres of music excerpts composed by each system to be highly genre representative. Our results suggest that neither high-performing system has...... a capacity to recognize music genre....

  7. A Context Dependent Automatic Target Recognition System

    Science.gov (United States)

    Kim, J. H.; Payton, D. W.; Olin, K. E.; Tseng, D. Y.

    1984-06-01

    This paper describes a new approach to automatic target recognizer (ATR) development utilizing artificial intelligent techniques. The ATR system exploits contextual information in its detection and classification processes to provide a high degree of robustness and adaptability. In the system, knowledge about domain objects and their contextual relationships is encoded in frames, separating it from low level image processing algorithms. This knowledge-based system demonstrates an improvement over the conventional statistical approach through the exploitation of diverse forms of knowledge in its decision-making process.

  8. Radar automatic target recognition (ATR) and non-cooperative target recognition (NCTR)

    CERN Document Server

    Blacknell, David

    2013-01-01

    The ability to detect and locate targets by day or night, over wide areas, regardless of weather conditions has long made radar a key sensor in many military and civil applications. However, the ability to automatically and reliably distinguish different targets represents a difficult challenge. Radar Automatic Target Recognition (ATR) and Non-Cooperative Target Recognition (NCTR) captures material presented in the NATO SET-172 lecture series to provide an overview of the state-of-the-art and continuing challenges of radar target recognition. Topics covered include the problem as applied to th

  9. Automatization and Orthographic Development in Second Language Visual Word Recognition

    Science.gov (United States)

    Kida, Shusaku

    2016-01-01

    The present study investigated second language (L2) learners' acquisition of automatic word recognition and the development of L2 orthographic representation in the mental lexicon. Participants in the study were Japanese university students enrolled in a compulsory course involving a weekly 30-minute sustained silent reading (SSR) activity with…

  10. Laser gated viewing : An enabler for Automatic Target Recognition?

    NARCIS (Netherlands)

    Bovenkamp, E.G.P.; Schutte, K.

    2010-01-01

    For many decades attempts to accomplish Automatic Target Recognition have been made using both visual and FLIR camera systems. A recurring problem in these approaches is the segmentation problem, which is the separation between the target and its background. This paper describes an approach to

  11. Auditory signal design for automatic number plate recognition system

    NARCIS (Netherlands)

    Heydra, C.G.; Jansen, R.J.; Van Egmond, R.

    2014-01-01

    This paper focuses on the design of an auditory signal for the Automatic Number Plate Recognition system of Dutch national police. The auditory signal is designed to alert police officers of suspicious cars in their proximity, communicating priority level and location of the suspicious car and

  12. Automatic Emotion Recognition in Speech: Possibilities and Significance

    Directory of Open Access Journals (Sweden)

    Milana Bojanić

    2009-12-01

    Full Text Available Automatic speech recognition and spoken language understanding are crucial steps towards a natural humanmachine interaction. The main task of the speech communication process is the recognition of the word sequence, but the recognition of prosody, emotion and stress tags may be of particular importance as well. This paper discusses thepossibilities of recognition emotion from speech signal in order to improve ASR, and also provides the analysis of acoustic features that can be used for the detection of speaker’s emotion and stress. The paper also provides a short overview of emotion and stress classification techniques. The importance and place of emotional speech recognition is shown in the domain of human-computer interactive systems and transaction communication model. The directions for future work are given at the end of this work.

  13. Automatic gang graffiti recognition and interpretation

    Science.gov (United States)

    Parra, Albert; Boutin, Mireille; Delp, Edward J.

    2017-09-01

    One of the roles of emergency first responders (e.g., police and fire departments) is to prevent and protect against events that can jeopardize the safety and well-being of a community. In the case of criminal gang activity, tools are needed for finding, documenting, and taking the necessary actions to mitigate the problem or issue. We describe an integrated mobile-based system capable of using location-based services, combined with image analysis, to track and analyze gang activity through the acquisition, indexing, and recognition of gang graffiti images. This approach uses image analysis methods for color recognition, image segmentation, and image retrieval and classification. A database of gang graffiti images is described that includes not only the images but also metadata related to the images, such as date and time, geoposition, gang, gang member, colors, and symbols. The user can then query the data in a useful manner. We have implemented these features both as applications for Android and iOS hand-held devices and as a web-based interface.

  14. Automatic Phonetic Transcription for Danish Speech Recognition

    DEFF Research Database (Denmark)

    Kirkedal, Andreas Søeborg

    , like Danish, the graphemic and phonetic representations are very dissimilar and more complex rewriting rules must be applied to create the correct phonetic representation. Automatic phonetic transcribers use different strategies, from deep analysis to shallow rewriting rules, to produce phonetic......, syllabication, stød and several other suprasegmental features (Kirkedal, 2013). Simplifying the transcriptions by filtering out the symbols for suprasegmental features in a post-processing step produces a format that is suitable for ASR purposes. eSpeak is an open source speech synthesizer originally created...... for particular words and word classes in addition. In comparison, English has 5,852 spelling-tophoneme rules and 4,133 additional rules and 8,278 rules and 3,829 additional rules. Phonix applies deep morphological analysis as a preprocessing step. Should the analysis fail, several fallback strategies...

  15. Automatic Recognition of Object Names in Literature

    Science.gov (United States)

    Bonnin, C.; Lesteven, S.; Derriere, S.; Oberto, A.

    2008-08-01

    SIMBAD is a database of astronomical objects that provides (among other things) their bibliographic references in a large number of journals. Currently, these references have to be entered manually by librarians who read each paper. To cope with the increasing number of papers, CDS develops a tool to assist the librarians in their work, taking advantage of the Dictionary of Nomenclature of Celestial Objects, which keeps track of object acronyms and of their origin. The program searches for object names directly in PDF documents by comparing the words with all the formats stored in the Dictionary of Nomenclature. It also searches for variable star names based on constellation names and for a large list of usual names such as Aldebaran or the Crab. Object names found in the documents often correspond to several astronomical objects. The system retrieves all possible matches, displays them with their object type given by SIMBAD, and lets the librarian make the final choice. The bibliographic reference can then be automatically added to the object identifiers in the database. Besides, the systematic usage of the Dictionary of Nomenclature, which is updated manually, permitted to automatically check it and to detect errors and inconsistencies. Last but not least, the program collects some additional information such as the position of the object names in the document (in the title, subtitle, abstract, table, figure caption...) and their number of occurrences. In the future, this will permit to calculate the 'weight' of an object in a reference and to provide SIMBAD users with an important new information, which will help them to find the most relevant papers in the object reference list.

  16. Automatic Modulation Recognition by Support Vector Machines Using Wavelet Kernel

    Energy Technology Data Exchange (ETDEWEB)

    Feng, X Z; Yang, J; Luo, F L; Chen, J Y; Zhong, X P [College of Mechatronic Engineering and Automation, National University of Defense Technology, Changsha (China)

    2006-10-15

    Automatic modulation identification plays a significant role in electronic warfare, electronic surveillance systems and electronic counter measure. The task of modulation recognition of communication signals is to determine the modulation type and signal parameters. In fact, automatic modulation identification can be range to an application of pattern recognition in communication field. The support vector machines (SVM) is a new universal learning machine which is widely used in the fields of pattern recognition, regression estimation and probability density. In this paper, a new method using wavelet kernel function was proposed, which maps the input vector xi into a high dimensional feature space F. In this feature space F, we can construct the optimal hyperplane that realizes the maximal margin in this space. That is to say, we can use SVM to classify the communication signals into two groups, namely analogue modulated signals and digitally modulated signals. In addition, computer simulation results are given at last, which show good performance of the method.

  17. Automatic Modulation Recognition by Support Vector Machines Using Wavelet Kernel

    International Nuclear Information System (INIS)

    Feng, X Z; Yang, J; Luo, F L; Chen, J Y; Zhong, X P

    2006-01-01

    Automatic modulation identification plays a significant role in electronic warfare, electronic surveillance systems and electronic counter measure. The task of modulation recognition of communication signals is to determine the modulation type and signal parameters. In fact, automatic modulation identification can be range to an application of pattern recognition in communication field. The support vector machines (SVM) is a new universal learning machine which is widely used in the fields of pattern recognition, regression estimation and probability density. In this paper, a new method using wavelet kernel function was proposed, which maps the input vector xi into a high dimensional feature space F. In this feature space F, we can construct the optimal hyperplane that realizes the maximal margin in this space. That is to say, we can use SVM to classify the communication signals into two groups, namely analogue modulated signals and digitally modulated signals. In addition, computer simulation results are given at last, which show good performance of the method

  18. Automatic speech recognition for radiological reporting

    International Nuclear Information System (INIS)

    Vidal, B.

    1991-01-01

    Large vocabulary speech recognition, its techniques and its software and hardware technology, are being developed, aimed at providing the office user with a tool that could significantly improve both quantity and quality of his work: the dictation machine, which allows memos and documents to be input using voice and a microphone instead of fingers and a keyboard. The IBM Rome Science Center, together with the IBM Research Division, has built a prototype recognizer that accepts sentences in natural language from 20.000-word Italian vocabulary. The unit runs on a personal computer equipped with a special hardware capable of giving all the necessary computing power. The first laboratory experiments yielded very interesting results and pointed out such system characteristics to make its use possible in operational environments. To this purpose, the dictation of medical reports was considered as a suitable application. In cooperation with the 2nd Radiology Department of S. Maria della Misericordia Hospital (Udine, Italy), a system was experimented by radiology department doctors during their everyday work. The doctors were able to directly dictate their reports to the unit. The text appeared immediately on the screen, and eventual errors could be corrected either by voice or by using the keyboard. At the end of report dictation, the doctors could both print and archive the text. The report could also be forwarded to hospital information system, when the latter was available. Our results have been very encouraging: the system proved to be robust, simple to use, and accurate (over 95% average recognition rate). The experiment was precious for suggestion and comments, and its results are useful for system evolution towards improved system management and efficency

  19. The effect on recognition memory of noise cancelling headphones in a noisy environment with native and nonnative speakers

    Directory of Open Access Journals (Sweden)

    Brett R C Molesworth

    2014-01-01

    Full Text Available Noise has the potential to impair cognitive performance. For nonnative speakers, the effect of noise on performance is more severe than their native counterparts. What remains unknown is the effectiveness of countermeasures such as noise attenuating devices in such circumstances. Therefore, the main aim of the present research was to examine the effectiveness of active noise attenuating countermeasures in the presence of simulated aircraft noise for both native and nonnative English speakers. Thirty-two participants, half native English speakers and half native German speakers completed four recognition (cued recall tasks presented in English under four different audio conditions, all in the presence of simulated aircraft noise. The results of the research indicated that in simulated aircraft noise at 65 dB(A, performance of nonnative English speakers was poorer than for native English speakers. The beneficial effects of noise cancelling headphones in improving the signal to noise ratio led to an improved performance for nonnative speakers. These results have particular importance for organizations operating in a safety-critical environment such as aviation.

  20. Spike Pattern Recognition for Automatic Collimation Alignment

    CERN Document Server

    Azzopardi, Gabriella; Salvachua Ferrando, Belen Maria; Mereghetti, Alessio; Redaelli, Stefano; CERN. Geneva. ATS Department

    2017-01-01

    The LHC makes use of a collimation system to protect its sensitive equipment by intercepting potentially dangerous beam halo particles. The appropriate collimator settings to protect the machine against beam losses relies on a very precise alignment of all the collimators with respect to the beam. The beam center at each collimator is then found by touching the beam halo using an alignment procedure. Until now, in order to determine whether a collimator is aligned with the beam or not, a user is required to follow the collimator’s BLM loss data and detect spikes. A machine learning (ML) model was trained in order to automatically recognize spikes when a collimator is aligned. The model was loosely integrated with the alignment implementation to determine the classification performance and reliability, without effecting the alignment process itself. The model was tested on a number of collimators during this MD and the machine learning was able to output the classifications in real-time.

  1. Automatic Speech Recognition from Neural Signals: A Focused Review

    Directory of Open Access Journals (Sweden)

    Christian Herff

    2016-09-01

    Full Text Available Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e.~patients suffering from locked-in syndrome. For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people.This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography. As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the emph{Brain-to-text} system.

  2. Automatic anatomy recognition via multiobject oriented active shape models.

    Science.gov (United States)

    Chen, Xinjian; Udupa, Jayaram K; Alavi, Abass; Torigian, Drew A

    2010-12-01

    This paper studies the feasibility of developing an automatic anatomy recognition (AAR) system in clinical radiology and demonstrates its operation on clinical 2D images. The anatomy recognition method described here consists of two main components: (a) multiobject generalization of OASM and (b) object recognition strategies. The OASM algorithm is generalized to multiple objects by including a model for each object and assigning a cost structure specific to each object in the spirit of live wire. The delineation of multiobject boundaries is done in MOASM via a three level dynamic programming algorithm, wherein the first level is at pixel level which aims to find optimal oriented boundary segments between successive landmarks, the second level is at landmark level which aims to find optimal location for the landmarks, and the third level is at the object level which aims to find optimal arrangement of object boundaries over all objects. The object recognition strategy attempts to find that pose vector (consisting of translation, rotation, and scale component) for the multiobject model that yields the smallest total boundary cost for all objects. The delineation and recognition accuracies were evaluated separately utilizing routine clinical chest CT, abdominal CT, and foot MRI data sets. The delineation accuracy was evaluated in terms of true and false positive volume fractions (TPVF and FPVF). The recognition accuracy was assessed (1) in terms of the size of the space of the pose vectors for the model assembly that yielded high delineation accuracy, (2) as a function of the number of objects and objects' distribution and size in the model, (3) in terms of the interdependence between delineation and recognition, and (4) in terms of the closeness of the optimum recognition result to the global optimum. When multiple objects are included in the model, the delineation accuracy in terms of TPVF can be improved to 97%-98% with a low FPVF of 0.1%-0.2%. Typically, a

  3. Automatic Facial Expression Recognition and Operator Functional State

    Science.gov (United States)

    Blanson, Nina

    2012-01-01

    The prevalence of human error in safety-critical occupations remains a major challenge to mission success despite increasing automation in control processes. Although various methods have been proposed to prevent incidences of human error, none of these have been developed to employ the detection and regulation of Operator Functional State (OFS), or the optimal condition of the operator while performing a task, in work environments due to drawbacks such as obtrusiveness and impracticality. A video-based system with the ability to infer an individual's emotional state from facial feature patterning mitigates some of the problems associated with other methods of detecting OFS, like obtrusiveness and impracticality in integration with the mission environment. This paper explores the utility of facial expression recognition as a technology for inferring OFS by first expounding on the intricacies of OFS and the scientific background behind emotion and its relationship with an individual's state. Then, descriptions of the feedback loop and the emotion protocols proposed for the facial recognition program are explained. A basic version of the facial expression recognition program uses Haar classifiers and OpenCV libraries to automatically locate key facial landmarks during a live video stream. Various methods of creating facial expression recognition software are reviewed to guide future extensions of the program. The paper concludes with an examination of the steps necessary in the research of emotion and recommendations for the creation of an automatic facial expression recognition program for use in real-time, safety-critical missions

  4. Automatic Facial Expression Recognition and Operator Functional State

    Science.gov (United States)

    Blanson, Nina

    2011-01-01

    The prevalence of human error in safety-critical occupations remains a major challenge to mission success despite increasing automation in control processes. Although various methods have been proposed to prevent incidences of human error, none of these have been developed to employ the detection and regulation of Operator Functional State (OFS), or the optimal condition of the operator while performing a task, in work environments due to drawbacks such as obtrusiveness and impracticality. A video-based system with the ability to infer an individual's emotional state from facial feature patterning mitigates some of the problems associated with other methods of detecting OFS, like obtrusiveness and impracticality in integration with the mission environment. This paper explores the utility of facial expression recognition as a technology for inferring OFS by first expounding on the intricacies of OFS and the scientific background behind emotion and its relationship with an individual's state. Then, descriptions of the feedback loop and the emotion protocols proposed for the facial recognition program are explained. A basic version of the facial expression recognition program uses Haar classifiers and OpenCV libraries to automatically locate key facial landmarks during a live video stream. Various methods of creating facial expression recognition software are reviewed to guide future extensions of the program. The paper concludes with an examination of the steps necessary in the research of emotion and recommendations for the creation of an automatic facial expression recognition program for use in real-time, safety-critical missions.

  5. Automatic speech recognition for report generation in computed tomography

    International Nuclear Information System (INIS)

    Teichgraeber, U.K.M.; Ehrenstein, T.; Lemke, M.; Liebig, T.; Stobbe, H.; Hosten, N.; Keske, U.; Felix, R.

    1999-01-01

    Purpose: A study was performed to compare the performance of automatic speech recognition (ASR) with conventional transcription. Materials and Methods: 100 CT reports were generated by using ASR and 100 CT reports were dictated and written by medical transcriptionists. The time for dictation and correction of errors by the radiologist was assessed and the type of mistakes was analysed. The text recognition rate was calculated in both groups and the average time between completion of the imaging study by the technologist and generation of the written report was assessed. A commercially available speech recognition technology (ASKA Software, IBM Via Voice) running of a personal computer was used. Results: The time for the dictation using digital voice recognition was 9.4±2.3 min compared to 4.5±3.6 min with an ordinary Dictaphone. The text recognition rate was 97% with digital voice recognition and 99% with medical transcriptionists. The average time from imaging completion to written report finalisation was reduced from 47.3 hours with medical transcriptionists to 12.7 hours with ASR. The analysis of misspellings demonstrated (ASR vs. medical transcriptionists): 3 vs. 4 for syntax errors, 0 vs. 37 orthographic mistakes, 16 vs. 22 mistakes in substance and 47 vs. erroneously applied terms. Conclusions: The use of digital voice recognition as a replacement for medical transcription is recommendable when an immediate availability of written reports is necessary. (orig.) [de

  6. Measuring the accuracy of automatic shoeprint recognition methods.

    Science.gov (United States)

    Luostarinen, Tapio; Lehmussola, Antti

    2014-11-01

    Shoeprints are an important source of information for criminal investigation. Therefore, an increasing number of automatic shoeprint recognition methods have been proposed for detecting the corresponding shoe models. However, comprehensive comparisons among the methods have not previously been made. In this study, an extensive set of methods proposed in the literature was implemented, and their performance was studied in varying conditions. Three datasets of different quality shoeprints were used, and the methods were evaluated also with partial and rotated prints. The results show clear differences between the algorithms: while the best performing method, based on local image descriptors and RANSAC, provides rather good results with most of the experiments, some methods are almost completely unrobust against any unidealities in the images. Finally, the results demonstrate that there is still a need for extensive research to improve the accuracy of automatic recognition of crime scene prints. © 2014 American Academy of Forensic Sciences.

  7. Analysis of Phonetic Transcriptions for Danish Automatic Speech Recognition

    DEFF Research Database (Denmark)

    Kirkedal, Andreas Søeborg

    2013-01-01

    Automatic speech recognition (ASR) relies on three resources: audio, orthographic transcriptions and a pronunciation dictionary. The dictionary or lexicon maps orthographic words to sequences of phones or phonemes that represent the pronunciation of the corresponding word. The quality of a speech....... The analysis indicates that transcribing e.g. stress or vowel duration has a negative impact on performance. The best performance is obtained with coarse phonetic annotation and improves performance 1% word error rate and 3.8% sentence error rate....

  8. Recent advances in Automatic Speech Recognition for Vietnamese

    OpenAIRE

    Le , Viet-Bac; Besacier , Laurent; Seng , Sopheap; Bigi , Brigitte; Do , Thi-Ngoc-Diep

    2008-01-01

    International audience; This paper presents our recent activities for automatic speech recognition for Vietnamese. First, our text data collection and processing methods and tools are described. For language modeling, we investigate word, sub-word and also hybrid word/sub-word models. For acoustic modeling, when only limited speech data are available for Vietnamese, we propose some crosslingual acoustic modeling techniques. Furthermore, since the use of sub-word units can reduce the high out-...

  9. Automatic TLI recognition system. Part 1: System description

    Energy Technology Data Exchange (ETDEWEB)

    Partin, J.K.; Lassahn, G.D.; Davidson, J.R.

    1994-05-01

    This report describes an automatic target recognition system for fast screening of large amounts of multi-sensor image data, based on low-cost parallel processors. This system uses image data fusion and gives uncertainty estimates. It is relatively low cost, compact, and transportable. The software is easily enhanced to expand the system`s capabilities, and the hardware is easily expandable to increase the system`s speed. This volume gives a general description of the ATR system.

  10. Automatic Blastomere Recognition from a Single Embryo Image

    Directory of Open Access Journals (Sweden)

    Yun Tian

    2014-01-01

    Full Text Available The number of blastomeres of human day 3 embryos is one of the most important criteria for evaluating embryo viability. However, due to the transparency and overlap of blastomeres, it is a challenge to recognize blastomeres automatically using a single embryo image. This study proposes an approach based on least square curve fitting (LSCF for automatic blastomere recognition from a single image. First, combining edge detection, deletion of multiple connected points, and dilation and erosion, an effective preprocessing method was designed to obtain part of blastomere edges that were singly connected. Next, an automatic recognition method for blastomeres was proposed using least square circle fitting. This algorithm was tested on 381 embryo microscopic images obtained from the eight-cell period, and the results were compared with those provided by experts. Embryos were recognized with a 0 error rate occupancy of 21.59%, and the ratio of embryos in which the false recognition number was less than or equal to 2 was 83.16%. This experiment demonstrated that our method could efficiently and rapidly recognize the number of blastomeres from a single embryo image without the need to reconstruct the three-dimensional model of the blastomeres first; this method is simple and efficient.

  11. Automatic anatomy recognition on CT images with pathology

    Science.gov (United States)

    Huang, Lidong; Udupa, Jayaram K.; Tong, Yubing; Odhner, Dewey; Torigian, Drew A.

    2016-03-01

    Body-wide anatomy recognition on CT images with pathology becomes crucial for quantifying body-wide disease burden. This, however, is a challenging problem because various diseases result in various abnormalities of objects such as shape and intensity patterns. We previously developed an automatic anatomy recognition (AAR) system [1] whose applicability was demonstrated on near normal diagnostic CT images in different body regions on 35 organs. The aim of this paper is to investigate strategies for adapting the previous AAR system to diagnostic CT images of patients with various pathologies as a first step toward automated body-wide disease quantification. The AAR approach consists of three main steps - model building, object recognition, and object delineation. In this paper, within the broader AAR framework, we describe a new strategy for object recognition to handle abnormal images. In the model building stage an optimal threshold interval is learned from near-normal training images for each object. This threshold is optimally tuned to the pathological manifestation of the object in the test image. Recognition is performed following a hierarchical representation of the objects. Experimental results for the abdominal body region based on 50 near-normal images used for model building and 20 abnormal images used for object recognition show that object localization accuracy within 2 voxels for liver and spleen and 3 voxels for kidney can be achieved with the new strategy.

  12. One-against-all weighted dynamic time warping for language-independent and speaker-dependent speech recognition in adverse conditions.

    Directory of Open Access Journals (Sweden)

    Xianglilan Zhang

    Full Text Available Considering personal privacy and difficulty of obtaining training material for many seldom used English words and (often non-English names, language-independent (LI with lightweight speaker-dependent (SD automatic speech recognition (ASR is a promising option to solve the problem. The dynamic time warping (DTW algorithm is the state-of-the-art algorithm for small foot-print SD ASR applications with limited storage space and small vocabulary, such as voice dialing on mobile devices, menu-driven recognition, and voice control on vehicles and robotics. Even though we have successfully developed two fast and accurate DTW variations for clean speech data, speech recognition for adverse conditions is still a big challenge. In order to improve recognition accuracy in noisy environment and bad recording conditions such as too high or low volume, we introduce a novel one-against-all weighted DTW (OAWDTW. This method defines a one-against-all index (OAI for each time frame of training data and applies the OAIs to the core DTW process. Given two speech signals, OAWDTW tunes their final alignment score by using OAI in the DTW process. Our method achieves better accuracies than DTW and merge-weighted DTW (MWDTW, as 6.97% relative reduction of error rate (RRER compared with DTW and 15.91% RRER compared with MWDTW are observed in our extensive experiments on one representative SD dataset of four speakers' recordings. To the best of our knowledge, OAWDTW approach is the first weighted DTW specially designed for speech data in adverse conditions.

  13. Evaluation of automatic face recognition for automatic border control on actual data recorded of travellers at Schiphol Airport

    NARCIS (Netherlands)

    Spreeuwers, Lieuwe Jan; Hendrikse, A.J.; Gerritsen, K.J.; Brömme, A.; Busch, C.

    2012-01-01

    Automatic border control at airports using automated facial recognition for checking the passport is becoming more and more common. A problem is that it is not clear how reliable these automatic gates are. Very few independent studies exist that assess the reliability of automated facial recognition

  14. Demographic Recommendation by means of Group Profile Elicitation Using Speaker Age and Gender Recognition

    DEFF Research Database (Denmark)

    Shepstone, Sven Ewan; Tan, Zheng-Hua; Jensen, Søren Holdt

    2013-01-01

    , which itself is the input to a recommender system. The recommender system finds the content items whose demographics best match the group profile. We tested the effectiveness of the system for several typical home audience configurations. In a survey, users were given a configuration and asked to rate......In this paper we show a new method of using automatic age and gender recognition to recommend a sequence of multimedia items to a home TV audience comprising multiple viewers. Instead of relying on explicitly provided demographic data for each user, we define an audio-based demographic group...... a set of advertisements on how well each advertisement matched the configuration. Unbeknown to the subjects, half of the adverts were recommended using the derived audio demographics and the other half were randomly chosen. The recommended adverts received a significantly higher median rating of 7...

  15. Indonesian Automatic Speech Recognition For Command Speech Controller Multimedia Player

    Directory of Open Access Journals (Sweden)

    Vivien Arief Wardhany

    2014-12-01

    Full Text Available The purpose of multimedia devices development is controlling through voice. Nowdays voice that can be recognized only in English. To overcome the issue, then recognition using Indonesian language model and accousticc model and dictionary. Automatic Speech Recognizier is build using engine CMU Sphinx with modified english language to Indonesian Language database and XBMC used as the multimedia player. The experiment is using 10 volunteers testing items based on 7 commands. The volunteers is classifiedd by the genders, 5 Male & 5 female. 10 samples is taken in each command, continue with each volunteer perform 10 testing command. Each volunteer also have to try all 7 command that already provided. Based on percentage clarification table, the word “Kanan” had the most recognize with percentage 83% while “pilih” is the lowest one. The word which had the most wrong clarification is “kembali” with percentagee 67%, while the word “kanan” is the lowest one. From the result of Recognition Rate by male there are several command such as “Kembali”, “Utama”, “Atas “ and “Bawah” has the low Recognition Rate. Especially for “kembali” cannot be recognized as the command in the female voices but in male voice that command has 4% of RR this is because the command doesn’t have similar word in english near to “kembali” so the system unrecognize the command. Also for the command “Pilih” using the female voice has 80% of RR but for the male voice has only 4% of RR. This problem is mostly because of the different voice characteristic between adult male and female which male has lower voice frequencies (from 85 to 180 Hz than woman (165 to 255 Hz.The result of the experiment showed that each man had different number of recognition rate caused by the difference tone, pronunciation, and speed of speech. For further work needs to be done in order to improving the accouracy of the Indonesian Automatic Speech Recognition system

  16. Automatic radar target recognition of objects falling on railway tracks

    International Nuclear Information System (INIS)

    Mroué, A; Heddebaut, M; Elbahhar, F; Rivenq, A; Rouvaen, J-M

    2012-01-01

    This paper presents an automatic radar target recognition procedure based on complex resonances using the signals provided by ultra-wideband radar. This procedure is dedicated to detection and identification of objects lying on railway tracks. For an efficient complex resonance extraction, a comparison between several pole extraction methods is illustrated. Therefore, preprocessing methods are presented aiming to remove most of the erroneous poles interfering with the discrimination scheme. Once physical poles are determined, a specific discrimination technique is introduced based on the Euclidean distances. Both simulation and experimental results are depicted showing an efficient discrimination of different targets including guided transport passengers

  17. Speaker Authentication

    CERN Document Server

    Li, Qi (Peter)

    2012-01-01

    This book focuses on use of voice as a biometric measure for personal authentication. In particular, "Speaker Recognition" covers two approaches in speaker authentication: speaker verification (SV) and verbal information verification (VIV). The SV approach attempts to verify a speaker’s identity based on his/her voice characteristics while the VIV approach validates a speaker’s identity through verification of the content of his/her utterance(s). SV and VIV can be combined for new applications. This is still a new research topic with significant potential applications. The book provides with a broad overview of the recent advances in speaker authentication while giving enough attention to advanced and useful algorithms and techniques. It also provides a step by step introduction to the current state of the speaker authentication technology, from the fundamental concepts to advanced algorithms. We will also present major design methodologies and share our experience in developing real and successful speake...

  18. Entrance C - New Automatic Number Plate Recognition System

    CERN Multimedia

    2013-01-01

    Entrance C (Satigny) is now equipped with a latest-generation Automatic Number Plate Recognition (ANPR) system and a fast-action road gate.   During the month of August, Entrance C will be continuously open from 7.00 a.m. to 7.00 p.m. (working days only). The security guards will open the gate as usual from 7.00 a.m. to 9.00 a.m. and from 5.00 p.m. to 7.00 p.m. For the rest of the working day (9.00 a.m. to 5.00 p.m.) the gate will operate automatically. Please observe the following points:       Stop at the STOP sign on the ground     Position yourself next to the card reader for optimal recognition     Motorcyclists must use their CERN card     Cyclists may not activate the gate and should use the bicycle turnstile     Keep a safe distance from the vehicle in front of you   If access is denied, please check that your vehicle regist...

  19. Accent modulates access to word meaning: Evidence for a speaker-model account of spoken word recognition.

    Science.gov (United States)

    Cai, Zhenguang G; Gilbert, Rebecca A; Davis, Matthew H; Gaskell, M Gareth; Farrar, Lauren; Adler, Sarah; Rodd, Jennifer M

    2017-11-01

    Speech carries accent information relevant to determining the speaker's linguistic and social background. A series of web-based experiments demonstrate that accent cues can modulate access to word meaning. In Experiments 1-3, British participants were more likely to retrieve the American dominant meaning (e.g., hat meaning of "bonnet") in a word association task if they heard the words in an American than a British accent. In addition, results from a speeded semantic decision task (Experiment 4) and sentence comprehension task (Experiment 5) confirm that accent modulates on-line meaning retrieval such that comprehension of ambiguous words is easier when the relevant word meaning is dominant in the speaker's dialect. Critically, neutral-accent speech items, created by morphing British- and American-accented recordings, were interpreted in a similar way to accented words when embedded in a context of accented words (Experiment 2). This finding indicates that listeners do not use accent to guide meaning retrieval on a word-by-word basis; instead they use accent information to determine the dialectic identity of a speaker and then use their experience of that dialect to guide meaning access for all words spoken by that person. These results motivate a speaker-model account of spoken word recognition in which comprehenders determine key characteristics of their interlocutor and use this knowledge to guide word meaning access. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  20. Automatic speech recognition (ASR) based approach for speech therapy of aphasic patients: A review

    Science.gov (United States)

    Jamal, Norezmi; Shanta, Shahnoor; Mahmud, Farhanahani; Sha'abani, MNAH

    2017-09-01

    This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system's library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.

  1. Speaker diarization system on the 2007 NIST rich transcription meeting recognition evaluation

    Science.gov (United States)

    Sun, Hanwu; Nwe, Tin Lay; Koh, Eugene Chin Wei; Bin, Ma; Li, Haizhou

    2007-09-01

    This paper presents a speaker diarization system developed at the Institute for Infocomm Research (I2R) for NIST Rich Transcription 2007 (RT-07) evaluation task. We describe in details our primary approaches for the speaker diarization on the Multiple Distant Microphones (MDM) conditions in conference room scenario. Our proposed system consists of six modules: 1). Least-mean squared (NLMS) adaptive filter for the speaker direction estimate via Time Difference of Arrival (TDOA), 2). An initial speaker clustering via two-stage TDOA histogram distribution quantization approach, 3). Multiple microphone speaker data alignment via GCC-PHAT Time Delay Estimate (TDE) among all the distant microphone channel signals, 4). A speaker clustering algorithm based on GMM modeling approach, 5). Non-speech removal via speech/non-speech verification mechanism and, 6). Silence removal via "Double-Layer Windowing"(DLW) method. We achieves error rate of 31.02% on the 2006 Spring (RT-06s) MDM evaluation task and a competitive overall error rate of 15.32% for the NIST Rich Transcription 2007 (RT-07) MDM evaluation task.

  2. Automatic Pavement Crack Recognition Based on BP Neural Network

    Directory of Open Access Journals (Sweden)

    Li Li

    2014-02-01

    Full Text Available A feasible pavement crack detection system plays an important role in evaluating the road condition and providing the necessary road maintenance. In this paper, a back propagation neural network (BPNN is used to recognize pavement cracks from images. To improve the recognition accuracy of the BPNN, a complete framework of image processing is proposed including image preprocessing and crack information extraction. In this framework, the redundant image information is reduced as much as possible and two sets of feature parameters are constructed to classify the crack images. Then a BPNN is adopted to distinguish pavement images between linear and alligator cracks to acquire high recognition accuracy. Besides, the linear cracks can be further classified into transversal and longitudinal cracks according to the direction angle. Finally, the proposed method is evaluated on the data of 400 pavement images obtained by the Automatic Road Analyzer (ARAN in Northern China and the results show that the proposed method seems to be a powerful tool for pavement crack recognition. The rates of correct classification for alligator, transversal and longitudinal cracks are 97.5%, 100% and 88.0%, respectively. Compared to some previous studies, the method proposed in this paper is effective for all three kinds of cracks and the results are also acceptable for engineering application.

  3. Predicting automatic speech recognition performance over communication channels from instrumental speech quality and intelligibility scores

    NARCIS (Netherlands)

    Gallardo, L.F.; Möller, S.; Beerends, J.

    2017-01-01

    The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility

  4. AUTOMATIC RECOGNITION OF INDOOR NAVIGATION ELEMENTS FROM KINECT POINT CLOUDS

    Directory of Open Access Journals (Sweden)

    L. Zeng

    2017-09-01

    Full Text Available This paper realizes automatically the navigating elements defined by indoorGML data standard – door, stairway and wall. The data used is indoor 3D point cloud collected by Kinect v2 launched in 2011 through the means of ORB-SLAM. By contrast, it is cheaper and more convenient than lidar, but the point clouds also have the problem of noise, registration error and large data volume. Hence, we adopt a shape descriptor – histogram of distances between two randomly chosen points, proposed by Osada and merges with other descriptor – in conjunction with random forest classifier to recognize the navigation elements (door, stairway and wall from Kinect point clouds. This research acquires navigation elements and their 3-d location information from each single data frame through segmentation of point clouds, boundary extraction, feature calculation and classification. Finally, this paper utilizes the acquired navigation elements and their information to generate the state data of the indoor navigation module automatically. The experimental results demonstrate a high recognition accuracy of the proposed method.

  5. Automatic Recognition of Indoor Navigation Elements from Kinect Point Clouds

    Science.gov (United States)

    Zeng, L.; Kang, Z.

    2017-09-01

    This paper realizes automatically the navigating elements defined by indoorGML data standard - door, stairway and wall. The data used is indoor 3D point cloud collected by Kinect v2 launched in 2011 through the means of ORB-SLAM. By contrast, it is cheaper and more convenient than lidar, but the point clouds also have the problem of noise, registration error and large data volume. Hence, we adopt a shape descriptor - histogram of distances between two randomly chosen points, proposed by Osada and merges with other descriptor - in conjunction with random forest classifier to recognize the navigation elements (door, stairway and wall) from Kinect point clouds. This research acquires navigation elements and their 3-d location information from each single data frame through segmentation of point clouds, boundary extraction, feature calculation and classification. Finally, this paper utilizes the acquired navigation elements and their information to generate the state data of the indoor navigation module automatically. The experimental results demonstrate a high recognition accuracy of the proposed method.

  6. Automatic recognition of offensive team formation in american football plays

    KAUST Repository

    Atmosukarto, Indriyati

    2013-06-01

    Compared to security surveillance and military applications, where automated action analysis is prevalent, the sports domain is extremely under-served. Most existing software packages for sports video analysis require manual annotation of important events in the video. American football is the most popular sport in the United States, however most game analysis is still done manually. Line of scrimmage and offensive team formation recognition are two statistics that must be tagged by American Football coaches when watching and evaluating past play video clips, a process which takes many man hours per week. These two statistics are also the building blocks for more high-level analysis such as play strategy inference and automatic statistic generation. In this paper, we propose a novel framework where given an American football play clip, we automatically identify the video frame in which the offensive team lines in formation (formation frame), the line of scrimmage for that play, and the type of player formation the offensive team takes on. The proposed framework achieves 95% accuracy in detecting the formation frame, 98% accuracy in detecting the line of scrimmage, and up to 67% accuracy in classifying the offensive team\\'s formation. To validate our framework, we compiled a large dataset comprising more than 800 play-clips of standard and high definition resolution from real-world football games. This dataset will be made publicly available for future comparison. © 2013 IEEE.

  7. I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance.

    Science.gov (United States)

    Hantke, Simone; Weninger, Felix; Kurle, Richard; Ringeval, Fabien; Batliner, Anton; Mousa, Amr El-Desoky; Schuller, Björn

    2016-01-01

    We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient.

  8. Using Face Recognition in the Automatic Door Access Control in a Secured Room

    Directory of Open Access Journals (Sweden)

    Gheorghe Gilca

    2017-06-01

    Full Text Available The aim of this paper is to help users improve the door security of sensitive locations by using face detection and recognition. This paper is comprised mainly of three subsystems: face detection, face recognition and automatic door access control. The door will open automatically for the known person due to the command of the microcontroller.

  9. Mathematical algorithm for the automatic recognition of intestinal parasites.

    Directory of Open Access Journals (Sweden)

    Alicia Alva

    Full Text Available Parasitic infections are generally diagnosed by professionals trained to recognize the morphological characteristics of the eggs in microscopic images of fecal smears. However, this laboratory diagnosis requires medical specialists which are lacking in many of the areas where these infections are most prevalent. In response to this public health issue, we developed a software based on pattern recognition analysis from microscopi digital images of fecal smears, capable of automatically recognizing and diagnosing common human intestinal parasites. To this end, we selected 229, 124, 217, and 229 objects from microscopic images of fecal smears positive for Taenia sp., Trichuris trichiura, Diphyllobothrium latum, and Fasciola hepatica, respectively. Representative photographs were selected by a parasitologist. We then implemented our algorithm in the open source program SCILAB. The algorithm processes the image by first converting to gray-scale, then applies a fourteen step filtering process, and produces a skeletonized and tri-colored image. The features extracted fall into two general categories: geometric characteristics and brightness descriptions. Individual characteristics were quantified and evaluated with a logistic regression to model their ability to correctly identify each parasite separately. Subsequently, all algorithms were evaluated for false positive cross reactivity with the other parasites studied, excepting Taenia sp. which shares very few morphological characteristics with the others. The principal result showed that our algorithm reached sensitivities between 99.10%-100% and specificities between 98.13%- 98.38% to detect each parasite separately. We did not find any cross-positivity in the algorithms for the three parasites evaluated. In conclusion, the results demonstrated the capacity of our computer algorithm to automatically recognize and diagnose Taenia sp., Trichuris trichiura, Diphyllobothrium latum, and Fasciola hepatica

  10. Mathematical algorithm for the automatic recognition of intestinal parasites.

    Science.gov (United States)

    Alva, Alicia; Cangalaya, Carla; Quiliano, Miguel; Krebs, Casey; Gilman, Robert H; Sheen, Patricia; Zimic, Mirko

    2017-01-01

    Parasitic infections are generally diagnosed by professionals trained to recognize the morphological characteristics of the eggs in microscopic images of fecal smears. However, this laboratory diagnosis requires medical specialists which are lacking in many of the areas where these infections are most prevalent. In response to this public health issue, we developed a software based on pattern recognition analysis from microscopi digital images of fecal smears, capable of automatically recognizing and diagnosing common human intestinal parasites. To this end, we selected 229, 124, 217, and 229 objects from microscopic images of fecal smears positive for Taenia sp., Trichuris trichiura, Diphyllobothrium latum, and Fasciola hepatica, respectively. Representative photographs were selected by a parasitologist. We then implemented our algorithm in the open source program SCILAB. The algorithm processes the image by first converting to gray-scale, then applies a fourteen step filtering process, and produces a skeletonized and tri-colored image. The features extracted fall into two general categories: geometric characteristics and brightness descriptions. Individual characteristics were quantified and evaluated with a logistic regression to model their ability to correctly identify each parasite separately. Subsequently, all algorithms were evaluated for false positive cross reactivity with the other parasites studied, excepting Taenia sp. which shares very few morphological characteristics with the others. The principal result showed that our algorithm reached sensitivities between 99.10%-100% and specificities between 98.13%- 98.38% to detect each parasite separately. We did not find any cross-positivity in the algorithms for the three parasites evaluated. In conclusion, the results demonstrated the capacity of our computer algorithm to automatically recognize and diagnose Taenia sp., Trichuris trichiura, Diphyllobothrium latum, and Fasciola hepatica with a high

  11. Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

    Science.gov (United States)

    Ye, Sherry

    2015-01-01

    NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.

  12. Automatic initial and final segmentation in cleft palate speech of Mandarin speakers.

    Directory of Open Access Journals (Sweden)

    Ling He

    Full Text Available The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with "quasi-unvoiced" or with "quasi-voiced" initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the

  13. Automatic Type Recognition and Mapping of Global Tropical Cyclone Disaster Chains (TDC

    Directory of Open Access Journals (Sweden)

    Ran Wang

    2016-10-01

    Full Text Available The catastrophic events caused by meteorological disasters are becoming more severe in the context of global warming. The disaster chains triggered by Tropical Cyclones induce the serious losses of population and economy. It is necessary to make the regional type recognition of Tropical Cyclone Disaster Chain (TDC effective in order to make targeted preventions. This study mainly explores the method of automatic recognition and the mapping of TDC and designs a software system. We constructed an automatic recognition system in terms of the characteristics of a hazard-formative environment based on the theory of a natural disaster system. The ArcEngine components enable an intelligent software system to present results by the automatic mapping approach. The study data comes from global metadata such as Digital Elevation Model (DEM, terrain slope, population density and Gross Domestic Product (GDP. The result shows that: (1 according to the characteristic of geomorphology type, we establish a type of recognition system for global TDC; (2 based on the recognition principle, we design a software system with the functions of automatic recognition and mapping; and (3 we validate the type of distribution in terms of real cases of TDC. The result shows that the automatic recognition function has good reliability. The study can provide the basis for targeted regional disaster prevention strategy, as well as regional sustainable development.

  14. Optimizing Automatic Speech Recognition for Low-Proficient Non-Native Speakers

    Directory of Open Access Journals (Sweden)

    Catia Cucchiarini

    2010-01-01

    Full Text Available Computer-Assisted Language Learning (CALL applications for improving the oral skills of low-proficient learners have to cope with non-native speech that is particularly challenging. Since unconstrained non-native ASR is still problematic, a possible solution is to elicit constrained responses from the learners. In this paper, we describe experiments aimed at selecting utterances from lists of responses. The first experiment on utterance selection indicates that the decoding process can be improved by optimizing the language model and the acoustic models, thus reducing the utterance error rate from 29–26% to 10–8%. Since giving feedback on incorrectly recognized utterances is confusing, we verify the correctness of the utterance before providing feedback. The results of the second experiment on utterance verification indicate that combining duration-related features with a likelihood ratio (LR yield an equal error rate (EER of 10.3%, which is significantly better than the EER for the other measures in isolation.

  15. Towards social touch intelligence: developing a robust system for automatic touch recognition

    NARCIS (Netherlands)

    Jung, Merel Madeleine

    2014-01-01

    Touch behavior is of great importance during social interaction. Automatic recognition of social touch is necessary to transfer the touch modality from interpersonal interaction to other areas such as Human-Robot Interaction (HRI). This paper describes a PhD research program on the automatic

  16. AUTOMATIC SPEECH RECOGNITION SYSTEM CONCERNING THE MOROCCAN DIALECTE (Darija and Tamazight)

    OpenAIRE

    A. EL GHAZI; C. DAOUI; N. IDRISSI

    2012-01-01

    In this work we present an automatic speech recognition system for Moroccan dialect mainly: Darija (Arab dialect) and Tamazight. Many approaches have been used to model the Arabic and Tamazightphonetic units. In this paper, we propose to use the hidden Markov model (HMM) for modeling these phoneticunits. Experimental results show that the proposed approach further improves the recognition.

  17. Leveraging Automatic Speech Recognition Errors to Detect Challenging Speech Segments in TED Talks

    Science.gov (United States)

    Mirzaei, Maryam Sadat; Meshgi, Kourosh; Kawahara, Tatsuya

    2016-01-01

    This study investigates the use of Automatic Speech Recognition (ASR) systems to epitomize second language (L2) listeners' problems in perception of TED talks. ASR-generated transcripts of videos often involve recognition errors, which may indicate difficult segments for L2 listeners. This paper aims to discover the root-causes of the ASR errors…

  18. Automatic landmark detection and face recognition for side-view face images

    NARCIS (Netherlands)

    Santemiz, P.; Spreeuwers, Lieuwe Jan; Veldhuis, Raymond N.J.; Broemme, Arslan; Busch, Christoph

    2013-01-01

    In real-life scenarios where pose variation is up to side-view positions, face recognition becomes a challenging task. In this paper we propose an automatic side-view face recognition system designed for home-safety applications. Our goal is to recognize people as they pass through doors in order to

  19. Determination of ocular torsion by means of automatic pattern recognition

    NARCIS (Netherlands)

    Groen, E.L.; Bos, J.E.; Nacken, P.F.M.; Graaf, B. de

    1996-01-01

    A new, automatic method for determination of human ocular torsion (OT) was devel-oped based on the tracking of iris patterns in digitized video images. Instead of quanti-fying OT by means of cross-correlation of circular iris samples, a procedure commonly applied, this new method automatically

  20. Determination of ocular torsion by means of automatic pattern recognition

    NARCIS (Netherlands)

    Groen, Eric; Bos, Jelte E.; Nacken, Peter F M; De Graaf, Bernd

    A new, automatic method for determination of human ocular torsion (OT) was developed based on the tracking of iris patterns in digitized video images. Instead of quantifying OT by means of cross-correlation of circular iris samples, a procedure commonly applied, this new method automatically selects

  1. Effects of pose and image resolution on automatic face recognition

    NARCIS (Netherlands)

    Mahmood, Zahid; Ali, Tauseef; Khan, Samee U.

    The popularity of face recognition systems have increased due to their use in widespread applications. Driven by the enormous number of potential application domains, several algorithms have been proposed for face recognition. Face pose and image resolutions are among the two important factors that

  2. Man-system interface based on automatic speech recognition: integration to a virtual control desk

    Energy Technology Data Exchange (ETDEWEB)

    Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C., E-mail: calexandre@ien.gov.b, E-mail: mol@ien.gov.b, E-mail: cmnap@ien.gov.b, E-mail: mag@ien.gov.b [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil); Nomiya, Diogo V., E-mail: diogonomiya@gmail.co [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil)

    2009-07-01

    This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)

  3. Man-system interface based on automatic speech recognition: integration to a virtual control desk

    International Nuclear Information System (INIS)

    Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C.; Nomiya, Diogo V.

    2009-01-01

    This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)

  4. Automatic speech recognition (zero crossing method). Automatic recognition of isolated vowels; Reconnaissance automatique de la parole (methode des passages par zero). Reconnaissance automatique de voyelles isolees

    Energy Technology Data Exchange (ETDEWEB)

    Dupeyrat, Benoit

    1975-06-10

    This note describes a recognition method of isolated vowels, using a preprocessing of the vocal signal. The processing extracts the extrema of the vocal signal and the interval time separating them (Zero crossing distances of the first derivative of the signal). The recognition of vowels uses normalized histograms of the values of these intervals. The program determines a distance between the histogram of the sound to be recognized and histograms models built during a learning phase. The results processed on real time by a minicomputer, are relatively independent of the speaker, the fundamental frequency being not allowed to vary too much (i.e. speakers of the same sex). (author) [French] Cette note decrit une methode de reconnaissance automatique de voyelles isolees basee sur un pretraitement particulier du signal vocal. Ce pretraitement consiste a extraire les extrema du signal vocal et les intervalles de temps les separant (distances entre passages par zero de la derivee du signal). La reconnaissance des voyelles est faite en utilisant des histogrammes normalises des valeurs de ces interval les. Le programme de reconnaissance utilise une distance entre l'histogramme du son a reconnaitre et des histogrammes modeles provenant d'un apprentissage. Les resultats obtenus en temps reels sur un minicalculateur, sont assez independants du locuteur, pourvu que la frequence fondamentale de la voix ne varie pas trop (locuteurs de meme sexe). (auteur)

  5. Automatic recognition of smoke-plume signatures in lidar signal

    Science.gov (United States)

    Utkin, Andrei B.; Lavrov, Alexander; Vilar, Rui

    2008-10-01

    A simple and robust algorithm for lidar-signal classification based on the fast extraction of sufficiently pronounced peaks and their recognition with a perceptron, whose efficiency is enhanced by a fast nonlinear preprocessing that increases the signal dimension, is reported. The method allows smoke-plume recognition with an error rate as small as 0.31% (19 misdetections and 4 false alarms in analyzing a test set of 7409 peaks).

  6. Automatic system for localization and recognition of vehicle plate numbers

    OpenAIRE

    Vázquez, N.; Nakano, M.; Pérez-Meana, H.

    2003-01-01

    This paper proposes a vehicle numbers plate identification system, which extracts the characters features of a plate from a captured image by a digital camera. Then identify the symbols of the number plate using a multilayer neural network. The proposed recognition system consists of two processes: The training process and the recognition process. During the training process, a database is created using 310 vehicular plate images. Then using this database a multilayer neural network is traine...

  7. Automatic SIMD parallelization of embedded applications based on pattern recognition

    NARCIS (Netherlands)

    Manniesing, R.; Karkowski, I.P.; Corporaal, H.

    2000-01-01

    This paper investigates the potential for automatic mapping of typical embedded applications to architectures with multimedia instruction set extensions. For this purpose a (pattern matching based) code transformation engine is used, which involves a three-step process of matching, condition

  8. Improved Techniques for Automatic Chord Recognition from Music Audio Signals

    Science.gov (United States)

    Cho, Taemin

    2014-01-01

    This thesis is concerned with the development of techniques that facilitate the effective implementation of capable automatic chord transcription from music audio signals. Since chord transcriptions can capture many important aspects of music, they are useful for a wide variety of music applications and also useful for people who learn and perform…

  9. Fully Automatic Recognition of the Temporal Phases of Facial Actions

    NARCIS (Netherlands)

    Valstar, M.F.; Pantic, Maja

    Past work on automatic analysis of facial expressions has focused mostly on detecting prototypic expressions of basic emotions like happiness and anger. The method proposed here enables the detection of a much larger range of facial behavior by recognizing facial muscle actions [action units (AUs)

  10. Preliminary Analysis of Automatic Speech Recognition and Synthesis Technology.

    Science.gov (United States)

    1983-05-01

    ANDELES CA 0 SHDAP ET AL MAY 93 UNCISSIFED UCG -020-8 MDA04-8’-C-415F/ 17/2 N mE = h IEEE 11111 10’ ~ 2.0 11-41 & 11111I25IID MICROCOPY RESOLUTION TEST...speech. Private industry, which sees a major market for improved speech recognition systems, is attempting to solve the problems involved in...manufacturer is able to market such a recognition system. A second requirement for the spotting of keywords in distress signals concerns the need for a

  11. Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition

    NARCIS (Netherlands)

    Huijbregts, M.A.H.; Ordelman, Roeland J.F.; de Jong, Franciska M.G.

    2007-01-01

    This paper reports on the setup and evaluation of robust speech recognition system parts, geared towards transcript generation for heterogeneous, real-life media collections. The system is deployed for generating speech transcripts for the NIST/TRECVID-2007 test collection, part of a Dutch real-life

  12. Narrowing the gap between automatic and human word recognition

    NARCIS (Netherlands)

    Scharenborg, O.E.

    2005-01-01

    In everyday life, speech is all around us, on the radio, television, and in human-human interaction. We are continually confronted with novel utterances, and usually we have no problem recognising and understanding them. Several research fields investigate the speech recognition process. This thesis

  13. Automatic recognition of the unconscious reactions from physiological signals

    NARCIS (Netherlands)

    Ivonin, L.; Chang, H.M.; Chen, W.; Rauterberg, G.W.M.; Holzinger, A.; Ziefle, M.; Hitz, M.; Debevc, M.

    2013-01-01

    While the research in affective computing has been exclusively dealing with the recognition of explicit affective and cognitive states, carefully designed psychological and neuroimaging studies indicated that a considerable part of human experiences is tied to a deeper level of a psyche and not

  14. Automatic recognition of context and stress to support knowledge workers

    NARCIS (Netherlands)

    Koldijk, S.J.

    2012-01-01

    Motivation – Developing a computer tool that improves well-being at work. Research approach – We collect unobtrusive sensor data and apply pattern recognition approaches to infer the context and stress level of the user. We will develop a coaching tool based upon this information and evaluate its

  15. Four-Channel Biosignal Analysis and Feature Extraction for Automatic Emotion Recognition

    Science.gov (United States)

    Kim, Jonghwa; André, Elisabeth

    This paper investigates the potential of physiological signals as a reliable channel for automatic recognition of user's emotial state. For the emotion recognition, little attention has been paid so far to physiological signals compared to audio-visual emotion channels such as facial expression or speech. All essential stages of automatic recognition system using biosignals are discussed, from recording physiological dataset up to feature-based multiclass classification. Four-channel biosensors are used to measure electromyogram, electrocardiogram, skin conductivity and respiration changes. A wide range of physiological features from various analysis domains, including time/frequency, entropy, geometric analysis, subband spectra, multiscale entropy, etc., is proposed in order to search the best emotion-relevant features and to correlate them with emotional states. The best features extracted are specified in detail and their effectiveness is proven by emotion recognition results.

  16. Fusing Eye-gaze and Speech Recognition for Tracking in an Automatic Reading Tutor

    DEFF Research Database (Denmark)

    Rasmussen, Morten Højfeldt; Tan, Zheng-Hua

    2013-01-01

    In this paper we present a novel approach for automatically tracking the reading progress using a combination of eye-gaze tracking and speech recognition. The two are fused by first generating word probabilities based on eye-gaze information and then using these probabilities to augment the langu......In this paper we present a novel approach for automatically tracking the reading progress using a combination of eye-gaze tracking and speech recognition. The two are fused by first generating word probabilities based on eye-gaze information and then using these probabilities to augment...

  17. Automatic identification of otological drilling faults: an intelligent recognition algorithm.

    Science.gov (United States)

    Cao, Tianyang; Li, Xisheng; Gao, Zhiqiang; Feng, Guodong; Shen, Peng

    2010-06-01

    This article presents an intelligent recognition algorithm that can recognize milling states of the otological drill by fusing multi-sensor information. An otological drill was modified by the addition of sensors. The algorithm was designed according to features of the milling process and is composed of a characteristic curve, an adaptive filter and a rule base. The characteristic curve can weaken the impact of the unstable normal milling process and reserve the features of drilling faults. The adaptive filter is capable of suppressing interference in the characteristic curve by fusing multi-sensor information. The rule base can identify drilling faults through the filtering result data. The experiments were repeated on fresh porcine scapulas, including normal milling and two drilling faults. The algorithm has high rates of identification. This study shows that the intelligent recognition algorithm can identify drilling faults under interference conditions. (c) 2010 John Wiley & Sons, Ltd.

  18. Bi-channel Sensor Fusion for Automatic Sign Language Recognition

    DEFF Research Database (Denmark)

    Kim, Jonghwa; Wagner, Johannes; Rehm, Matthias

    2008-01-01

    In this paper, we investigate the mutual-complementary functionality of accelerometer (ACC) and electromyogram (EMG) for recognizing seven word-level sign vocabularies in German sign language (GSL). Results are discussed for the single channels and for feature-level fusion for the bichannel senso......-independent condition, where subjective differences do not allow for high recognition rates. Finally we discuss a problem of feature-level fusion caused by high disparity between accuracies of each single channel classification....

  19. Application of image recognition-based automatic hyphae detection in fungal keratitis.

    Science.gov (United States)

    Wu, Xuelian; Tao, Yuan; Qiu, Qingchen; Wu, Xinyi

    2018-03-01

    The purpose of this study is to evaluate the accuracy of two methods in diagnosis of fungal keratitis, whereby one method is automatic hyphae detection based on images recognition and the other method is corneal smear. We evaluate the sensitivity and specificity of the method in diagnosis of fungal keratitis, which is automatic hyphae detection based on image recognition. We analyze the consistency of clinical symptoms and the density of hyphae, and perform quantification using the method of automatic hyphae detection based on image recognition. In our study, 56 cases with fungal keratitis (just single eye) and 23 cases with bacterial keratitis were included. All cases underwent the routine inspection of slit lamp biomicroscopy, corneal smear examination, microorganism culture and the assessment of in vivo confocal microscopy images before starting medical treatment. Then, we recognize the hyphae images of in vivo confocal microscopy by using automatic hyphae detection based on image recognition to evaluate its sensitivity and specificity and compare with the method of corneal smear. The next step is to use the index of density to assess the severity of infection, and then find the correlation with the patients' clinical symptoms and evaluate consistency between them. The accuracy of this technology was superior to corneal smear examination (p hyphae detection of image recognition was 89.29%, and the specificity was 95.65%. The area under the ROC curve was 0.946. The correlation coefficient between the grading of the severity in the fungal keratitis by the automatic hyphae detection based on image recognition and the clinical grading is 0.87. The technology of automatic hyphae detection based on image recognition was with high sensitivity and specificity, able to identify fungal keratitis, which is better than the method of corneal smear examination. This technology has the advantages when compared with the conventional artificial identification of confocal

  20. Developing a broadband automatic speech recognition system for Afrikaans

    CSIR Research Space (South Africa)

    De Wet, Febe

    2011-08-01

    Full Text Available baseline transcription for the news data. The match between a baseline transcription and its corre- sponding audio can be evaluated automatically using an ASR system in forced alignment mode. Only those bulletins for which a bad match is indicated... Component Index for data [3]. occurrence of Afrikaans words3. Other text corpora that are currently under construction in- clude daily downloads of the scripts of news bulletins that are read on an Afrikaans radio station as well as transcripts of par...

  1. Automatic Recognition of Chinese Personal Name Using Conditional Random Fields and Knowledge Base

    Directory of Open Access Journals (Sweden)

    Chuan Gu

    2015-01-01

    Full Text Available According to the features of Chinese personal name, we present an approach for Chinese personal name recognition based on conditional random fields (CRF and knowledge base in this paper. The method builds multiple features of CRF model by adopting Chinese character as processing unit, selects useful features based on selection algorithm of knowledge base and incremental feature template, and finally implements the automatic recognition of Chinese personal name from Chinese document. The experimental results on open real corpus demonstrated the effectiveness of our method and obtained high accuracy rate and high recall rate of recognition.

  2. Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

    NARCIS (Netherlands)

    Ahmadi, S.; Ahadi, S.M.; Cranen, B.; Boves, L.W.J.

    2014-01-01

    The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equivalent of a sequence of short-time power spectra, mainly to simplify the computation of the

  3. Automatic stimulation of experiments and learning based on prediction failure recognition

    NARCIS (Netherlands)

    Juarez Cordova, A.G.; Kahl, B.; Henne, T.; Prassler, E.

    2009-01-01

    In this paper we focus on the task of automatically and autonomously initiating experimentation and learning based on the recognition of prediction failure. We present a mechanism that utilizes conceptual knowledge to predict the outcome of robot actions, observes their execution and indicates when

  4. Modular Algorithm Testbed Suite (MATS): A Software Framework for Automatic Target Recognition

    Science.gov (United States)

    2017-01-01

    NAVAL SURFACE WARFARE CENTER PANAMA CITY DIVISION PANAMA CITY, FL 32407-7001 TECHNICAL REPORT NSWC PCD TR-2017-004 MODULAR ...31-01-2017 Technical Modular Algorithm Testbed Suite (MATS): A Software Framework for Automatic Target Recognition DR...flexible platform to facilitate the development and testing of ATR algorithms. To that end, NSWC PCD has created the Modular Algorithm Testbed Suite

  5. Evaluation of missing data techniques for in-car automatic speech recognition

    OpenAIRE

    Wang, Y.; Vuerinckx, R.; Gemmeke, J.F.; Cranen, B.; Hamme, H. Van

    2009-01-01

    Wang Y., Vuerinckx R., Gemmeke J., Cranen B., Van hamme H., ''Evaluation of missing data techniques for in-car automatic speech recognition'', Proceedings NAG/DAGA 2009 - international conference on acoustics, 4 pp., March 23-26, 2009, Rotterdam, The Netherlands.

  6. Automatic Recognition of Improperly Pronounced Initial 'r' Consonant in Romanian

    Directory of Open Access Journals (Sweden)

    VELICAN, V.

    2012-08-01

    Full Text Available Correctly assessing the degree of mispronunciation and deciding upon the necessary treatment are fundamental activities for all speech disorder specialists. Obviously, the experience and the availability of the specialists are essentials in order to assure an efficient therapy for the speech impaired. To overcome this deficiency a more objective approach would include the existence of a tool that independent of the specialist's abilities could be used to establish the diagnostics. A complete automated system based on speech processing algorithms capable of performing the recognition task is therefore thoroughly justified and can be viewed as a goal that will bring many benefits to the field of speech pronunciation correction. This paper presents further results of the authors' work on developing speech processing algorithms able to identify mispronunciations in Romanian language, more exactly we propose the use of the Walsh-Hadamard Transform (WHT as feature selection tool in the case of identifying rhotacism. The results are encouraging with a best recognition rate of 92.55%.

  7. Automatic music genres classification as a pattern recognition problem

    Science.gov (United States)

    Ul Haq, Ihtisham; Khan, Fauzia; Sharif, Sana; Shaukat, Arsalan

    2013-12-01

    Music genres are the simplest and effect descriptors for searching music libraries stores or catalogues. The paper compares the results of two automatic music genres classification systems implemented by using two different yet simple classifiers (K-Nearest Neighbor and Naïve Bayes). First a 10-12 second sample is selected and features are extracted from it, and then based on those features results of both classifiers are represented in the form of accuracy table and confusion matrix. An experiment carried out on test 60 taken from middle of a song represents the true essence of its genre as compared to the samples taken from beginning and ending of a song. The novel techniques have achieved an accuracy of 91% and 78% by using Naïve Bayes and KNN classifiers respectively.

  8. Lip motion recognition of speaker based on SIFT%基于SIFT的说话人唇动识别

    Institute of Scientific and Technical Information of China (English)

    马新军; 吴晨晨; 仲乾元; 李园园

    2017-01-01

    Aiming at the problem that the lip feature dimension is too high and sensitive to the scale space,a technique based on the Scale-Invariant Feature Transform (SIFT) algorithm was proposed to carry out the speaker authentication.Firstly,a simple video frame image neat algorithm was proposed to adjust the length of the lip video to the same length,and the representative lip motion pictures were extracted.Then,a new algorithm based on key points of SIFT was proposed to extract the texture and motion features.After the integration of Principal Component Analysis (PCA) algorithm,the typical lip motion features were obtained for authentication.Finally,a simple classification algorithm was presented according to the obtained features.The experimental results show that compared to the common Local Binary Pattern (LBP) feature and the Histogram of Oriental Gradient (HOG) feature,the False Acceptance Rate (FAR) and False Rejection Rate (FRR) of the proposed feature extraction algorithm are better,which proves that the whole speaker lip motion recognition algorithm is effective and can get the ideal results.%针对唇部特征提取维度过高以及对尺度空间敏感的问题,提出了一种基于尺度不变特征变换(SIFT)算法作特征提取来进行说话人身份认证的技术.首先,提出了一种简单的视频帧图片规整算法,将不同长度的唇动视频规整到同一的长度,提取出具有代表性的唇动图片;然后,提出一种在SIFT关键点的基础上,进行纹理和运动特征的提取算法,并经过主成分分析(PCA)算法的整合,最终得到具有代表性的唇动特征进行认证;最后,根据所得到的特征,提出了一种简单的分类算法.实验结果显示,和常见的局部二元模式(LBP)特征和方向梯度直方图(HOG)特征相比较,该特征提取算法的错误接受率(FAR)和错误拒绝率(FRR)表现更佳.说明整个说话人唇动特征识别算法是有效的,能够得到较为理想的结果.

  9. Visual object recognition for automatic micropropagation of plants

    Science.gov (United States)

    Brendel, Thorsten; Schwanke, Joerg; Jensch, Peter F.

    1994-11-01

    Micropropagation of plants is done by cutting juvenile plants and placing them into special container-boxes with nutrient-solution where the pieces can grow up and be cut again several times. To produce high amounts of biomass it is necessary to do plant micropropagation by a robotic system. In this paper we describe parts of the vision system that recognizes plants and their particular cutting points. Therefore, it is necessary to extract elements of the plants and relations between these elements (for example root, stem, leaf). Different species vary in their morphological appearance, variation is also immanent in plants of the same species. Therefore, we introduce several morphological classes of plants from that we expect same recognition methods.

  10. Uav Visual Autolocalizaton Based on Automatic Landmark Recognition

    Science.gov (United States)

    Silva Filho, P.; Shiguemori, E. H.; Saotome, O.

    2017-08-01

    Deploying an autonomous unmanned aerial vehicle in GPS-denied areas is a highly discussed problem in the scientific community. There are several approaches being developed, but the main strategies yet considered are computer vision based navigation systems. This work presents a new real-time computer-vision position estimator for UAV navigation. The estimator uses images captured during flight to recognize specific, well-known, landmarks in order to estimate the latitude and longitude of the aircraft. The method was tested in a simulated environment, using a dataset of real aerial images obtained in previous flights, with synchronized images, GPS and IMU data. The estimated position in each landmark recognition was compatible with the GPS data, stating that the developed method can be used as an alternative navigation system.

  11. UAV VISUAL AUTOLOCALIZATON BASED ON AUTOMATIC LANDMARK RECOGNITION

    Directory of Open Access Journals (Sweden)

    P. Silva Filho

    2017-08-01

    Full Text Available Deploying an autonomous unmanned aerial vehicle in GPS-denied areas is a highly discussed problem in the scientific community. There are several approaches being developed, but the main strategies yet considered are computer vision based navigation systems. This work presents a new real-time computer-vision position estimator for UAV navigation. The estimator uses images captured during flight to recognize specific, well-known, landmarks in order to estimate the latitude and longitude of the aircraft. The method was tested in a simulated environment, using a dataset of real aerial images obtained in previous flights, with synchronized images, GPS and IMU data. The estimated position in each landmark recognition was compatible with the GPS data, stating that the developed method can be used as an alternative navigation system.

  12. Automatic shape recognition of a fast transient signal

    International Nuclear Information System (INIS)

    Charles, Gilbert.

    1976-01-01

    A system was developed to recognize if the shape of a signal x(t) is similar (or identical) to the one of an element yi(t) of an ensemble S composed by N known signals, that are memorised. x(t) is a time limited T 2 ) give the similarity measure of two signals. To solve the problem of the digital recording of the signals x(t) two devices were realized: a digital-to-analog converter which permits the recording of fast transient signals (band pass>1GHz, sampling-frequency approximately 100GHz, resolution: 9 bits, 576 samples); an automatic attenuator which scales the signal x(t) before the digitalization (the band pass is 70MHz at -1dB). A theoretical analysis permits to determine what must be the resolution of the digital-to-analog converter as a fonction of the signal-caracteristics and of the wanted precision for the calculus of rho 2 [fr

  13. Automatic target recognition performance losses in the presence of atmospheric and camera effects

    Science.gov (United States)

    Chen, Xiaohan; Schmid, Natalia A.

    2010-04-01

    The importance of networked automatic target recognition systems for surveillance applications is continuously increasing. Because of the requirement of a low cost and limited payload, these networks are traditionally equipped with lightweight, low-cost sensors such as electro-optical (EO) or infrared sensors. The quality of imagery acquired by these sensors critically depends on the environmental conditions, type and characteristics of sensors, and absence of occluding or concealing objects. In the past, a large number of efficient detection, tracking, and recognition algorithms have been designed to operate on imagery of good quality. However, detection and recognition limits under nonideal environmental and/or sensor-based distortions have not been carefully evaluated. We introduce a fully automatic target recognition system that involves a Haar-based detector to select potential regions of interest within images, performs adjustment of detected regions, segments potential targets using a region-based approach, identifies targets using Bessel K form-based encoding, and performs clutter rejection. We investigate the effects of environmental and camera conditions on target detection and recognition performance. Two databases are involved. One is a simulated database generated using a 3-D tool. The other database is formed by imaging 10 die-cast models of military vehicles from different elevation and orientation angles. The database contains imagery acquired both indoors and outdoors. The indoors data set is composed of clear and distorted images. The distortions include defocus blur, sided illumination, low contrast, shadows, and occlusions. All images in this database, however, have a uniform (blue) background. The indoors database is applied to evaluate the degradations of recognition performance due to camera and illumination effects. The database collected outdoors includes a real background and is much more complex to process. The numerical results

  14. Contribution to automatic image recognition applied to robot technology

    International Nuclear Information System (INIS)

    Juvin, Didier

    1983-01-01

    This paper describes a method for the analysis and interpretation of the images of objects located in a plain scene which is the environment of a robot. The first part covers the recovery of the contour of objects present in the image, and discusses a novel contour-following technique based on the line arborescence concept in combination with a 'cost function' giving a quantitative assessment of contour quality. We present heuristics for moderate-cost, minimum-time arborescence coverage, which is equivalent to following probable contour lines in the image. A contour segmentation technique, invariant in the translational and rotational modes, is presented next. The second part describes a recognition method based on the above invariant encoding: the algorithm performs a preliminary screening based on coarse data derived from segmentation, followed by a comparison of forms with probable identity through application of a distance specified in terms of the invariant encoding. The last part covers the outcome of the above investigations, which have found an industrial application in the vision system of a range of robots. The system is set up in a 16-bit microprocessor and operates in real time. (author) [fr

  15. Automatic Recognition Method for Optical Measuring Instruments Based on Machine Vision

    Institute of Scientific and Technical Information of China (English)

    SONG Le; LIN Yuchi; HAO Liguo

    2008-01-01

    Based on a comprehensive study of various algorithms, the automatic recognition of traditional ocular optical measuring instruments is realized. Taking a universal tools microscope (UTM) lens view image as an example, a 2-layer automatic recognition model for data reading is established after adopting a series of pre-processing algorithms. This model is an optimal combination of the correlation-based template matching method and a concurrent back propagation (BP) neural network. Multiple complementary feature extraction is used in generating the eigenvectors of the concurrent network. In order to improve fault-tolerance capacity, rotation invariant features based on Zernike moments are extracted from digit characters and a 4-dimensional group of the outline features is also obtained. Moreover, the operating time and reading accuracy can be adjusted dynamically by setting the threshold value. The experimental result indicates that the newly developed algorithm has optimal recognition precision and working speed. The average reading ratio can achieve 97.23%. The recognition method can automatically obtain the results of optical measuring instruments rapidly and stably without modifying their original structure, which meets the application requirements.

  16. Multi-Stage System for Automatic Target Recognition

    Science.gov (United States)

    Chao, Tien-Hsin; Lu, Thomas T.; Ye, David; Edens, Weston; Johnson, Oliver

    2010-01-01

    A multi-stage automated target recognition (ATR) system has been designed to perform computer vision tasks with adequate proficiency in mimicking human vision. The system is able to detect, identify, and track targets of interest. Potential regions of interest (ROIs) are first identified by the detection stage using an Optimum Trade-off Maximum Average Correlation Height (OT-MACH) filter combined with a wavelet transform. False positives are then eliminated by the verification stage using feature extraction methods in conjunction with neural networks. Feature extraction transforms the ROIs using filtering and binning algorithms to create feature vectors. A feedforward back-propagation neural network (NN) is then trained to classify each feature vector and to remove false positives. The system parameter optimizations process has been developed to adapt to various targets and datasets. The objective was to design an efficient computer vision system that can learn to detect multiple targets in large images with unknown backgrounds. Because the target size is small relative to the image size in this problem, there are many regions of the image that could potentially contain the target. A cursory analysis of every region can be computationally efficient, but may yield too many false positives. On the other hand, a detailed analysis of every region can yield better results, but may be computationally inefficient. The multi-stage ATR system was designed to achieve an optimal balance between accuracy and computational efficiency by incorporating both models. The detection stage first identifies potential ROIs where the target may be present by performing a fast Fourier domain OT-MACH filter-based correlation. Because threshold for this stage is chosen with the goal of detecting all true positives, a number of false positives are also detected as ROIs. The verification stage then transforms the regions of interest into feature space, and eliminates false positives using an

  17. A Full-Body Layered Deformable Model for Automatic Model-Based Gait Recognition

    Science.gov (United States)

    Lu, Haiping; Plataniotis, Konstantinos N.; Venetsanopoulos, Anastasios N.

    2007-12-01

    This paper proposes a full-body layered deformable model (LDM) inspired by manually labeled silhouettes for automatic model-based gait recognition from part-level gait dynamics in monocular video sequences. The LDM is defined for the fronto-parallel gait with 22 parameters describing the human body part shapes (widths and lengths) and dynamics (positions and orientations). There are four layers in the LDM and the limbs are deformable. Algorithms for LDM-based human body pose recovery are then developed to estimate the LDM parameters from both manually labeled and automatically extracted silhouettes, where the automatic silhouette extraction is through a coarse-to-fine localization and extraction procedure. The estimated LDM parameters are used for model-based gait recognition by employing the dynamic time warping for matching and adopting the combination scheme in AdaBoost.M2. While the existing model-based gait recognition approaches focus primarily on the lower limbs, the estimated LDM parameters enable us to study full-body model-based gait recognition by utilizing the dynamics of the upper limbs, the shoulders and the head as well. In the experiments, the LDM-based gait recognition is tested on gait sequences with differences in shoe-type, surface, carrying condition and time. The results demonstrate that the recognition performance benefits from not only the lower limb dynamics, but also the dynamics of the upper limbs, the shoulders and the head. In addition, the LDM can serve as an analysis tool for studying factors affecting the gait under various conditions.

  18. Morphological self-organizing feature map neural network with applications to automatic target recognition

    Science.gov (United States)

    Zhang, Shijun; Jing, Zhongliang; Li, Jianxun

    2005-01-01

    The rotation invariant feature of the target is obtained using the multi-direction feature extraction property of the steerable filter. Combining the morphological operation top-hat transform with the self-organizing feature map neural network, the adaptive topological region is selected. Using the erosion operation, the topological region shrinkage is achieved. The steerable filter based morphological self-organizing feature map neural network is applied to automatic target recognition of binary standard patterns and real-world infrared sequence images. Compared with Hamming network and morphological shared-weight networks respectively, the higher recognition correct rate, robust adaptability, quick training, and better generalization of the proposed method are achieved.

  19. Neural Network Based Recognition of Signal Patterns in Application to Automatic Testing of Rails

    Directory of Open Access Journals (Sweden)

    Tomasz Ciszewski

    2006-01-01

    Full Text Available The paper describes the application of neural network for recognition of signal patterns in measuring data gathered by the railroad ultrasound testing car. Digital conversion of the measuring signal allows to store and process large quantities of data. The elaboration of smart, effective and automatic procedures recognizing the obtained patterns on the basisof measured signal amplitude has been presented. The test shows only two classes of pattern recognition. In authors’ opinion if we deliver big enough quantity of training data, presented method is applicable to a system that recognizes many classes.

  20. When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.

    Science.gov (United States)

    Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola

    2017-11-01

    Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. An automatic system for Turkish word recognition using Discrete Wavelet Neural Network based on adaptive entropy

    International Nuclear Information System (INIS)

    Avci, E.

    2007-01-01

    In this paper, an automatic system is presented for word recognition using real Turkish word signals. This paper especially deals with combination of the feature extraction and classification from real Turkish word signals. A Discrete Wavelet Neural Network (DWNN) model is used, which consists of two layers: discrete wavelet layer and multi-layer perceptron. The discrete wavelet layer is used for adaptive feature extraction in the time-frequency domain and is composed of Discrete Wavelet Transform (DWT) and wavelet entropy. The multi-layer perceptron used for classification is a feed-forward neural network. The performance of the used system is evaluated by using noisy Turkish word signals. Test results showing the effectiveness of the proposed automatic system are presented in this paper. The rate of correct recognition is about 92.5% for the sample speech signals. (author)

  2. Integrating Automatic Speech Recognition and Machine Translation for Better Translation Outputs

    DEFF Research Database (Denmark)

    Liyanapathirana, Jeevanthi

    translations, combining machine translation with computer assisted translation has drawn attention in current research. This combines two prospects: the opportunity of ensuring high quality translation along with a significant performance gain. Automatic Speech Recognition (ASR) is another important area......, which caters important functionalities in language processing and natural language understanding tasks. In this work we integrate automatic speech recognition and machine translation in parallel. We aim to avoid manual typing of possible translations as dictating the translation would take less time...... to the n-best list rescoring, we also use word graphs with the expectation of arriving at a tighter integration of ASR and MT models. Integration methods include constraining ASR models using language and translation models of MT, and vice versa. We currently develop and experiment different methods...

  3. Practising verbal maritime communication with computer dialogue systems using automatic speech recognition (My Practice session)

    OpenAIRE

    John, Peter; Wellmann, J.; Appell, J.E.

    2016-01-01

    This My Practice session presents a novel online tool for practising verbal communication in a maritime setting. It is based on low-fi ChatBot simulation exercises which employ computer-based dialogue systems. The ChatBot exercises are equipped with an automatic speech recognition engine specifically designed for maritime communication. The speech input and output functionality enables learners to communicate with the computer freely and spontaneously. The exercises replicate real communicati...

  4. Health smart home for elders - a tool for automatic recognition of activities of daily living.

    Science.gov (United States)

    Le, Xuan Hoa Binh; Di Mascolo, Maria; Gouin, Alexia; Noury, Norbert

    2008-01-01

    Elders live preferently in their own home, but with aging comes the loss of autonomy and associated risks. In order to help them live longer in safe conditions, we need a tool to automatically detect their loss of autonomy by assessing the degree of performance of activities of daily living. This article presents an approach enabling the activities recognition of an elder living alone in a home equipped with noninvasive sensors.

  5. Comparison of HMM and DTW methods in automatic recognition of pathological phoneme pronunciation

    OpenAIRE

    Wielgat, Robert; Zielinski, Tomasz P.; Swietojanski, Pawel; Zoladz, Piotr; Król, Daniel; Wozniak, Tomasz; Grabias, Stanislaw

    2007-01-01

    In the paper recently proposed Human Factor Cepstral Coefficients (HFCC) are used to automatic recognition of pathological phoneme pronunciation in speech of impaired children and efficiency of this approach is compared to application of the standard Mel-Frequency Cepstral Coefficients (MFCC) as a feature vector. Both dynamic time warping (DTW), working on whole words or embedded phoneme patterns, and hidden Markov models (HMM) are used as classifiers in the presented research. Obtained resul...

  6. Face Prediction Model for an Automatic Age-invariant Face Recognition System

    OpenAIRE

    Yadav, Poonam

    2015-01-01

    07.11.14 KB. Emailed author re copyright. Author says that copyright is retained by author. Ok to add to spiral Automated face recognition and identi cation softwares are becoming part of our daily life; it nds its abode not only with Facebooks auto photo tagging, Apples iPhoto, Googles Picasa, Microsofts Kinect, but also in Homeland Security Departments dedicated biometric face detection systems. Most of these automatic face identification systems fail where the e ects of aging come into...

  7. AUTOMATIC RECOGNITION OF FALLS IN GAIT-SLIP: A HARNESS LOAD CELL BASED CRITERION

    OpenAIRE

    Yang, Feng; Pai, Yi-Chung

    2011-01-01

    Over-head-harness systems, equipped with load cell sensors, are essential to the participants’ safety and to the outcome assessment in perturbation training. The purpose of this study was to first develop an automatic outcome recognition criterion among young adults for gait-slip training and then verify such criterion among older adults. Each of 39 young and 71 older subjects, all protected by safety harness, experienced 8 unannounced, repeated slips, while walking on a 7-m walkway. Each tri...

  8. Contribution to automatic handwritten characters recognition. Application to optical moving characters recognition

    International Nuclear Information System (INIS)

    Gokana, Denis

    1986-01-01

    This paper describes a research work on computer aided vision relating to the design of a vision system which can recognize isolated handwritten characters written on a mobile support. We use a technique which consists in analyzing information contained in the contours of the polygon circumscribed to the character's shape. These contours are segmented and labelled to give a new set of features constituted by: - right and left 'profiles', - topological and algebraic unvarying properties. A new method of character's recognition induced from this representation based on a multilevel hierarchical technique is then described. In the primary level, we use a fuzzy classification with dynamic programming technique using 'profiles'. The other levels adjust the recognition by using topological and algebraic unvarying properties. Several results are presented and an accuracy of 99 pc was reached for handwritten numeral characters, thereby attesting the robustness of our algorithm. (author) [fr

  9. Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes.

    Science.gov (United States)

    Meyer, Bernd T; Brand, Thomas; Kollmeier, Birger

    2011-01-01

    The aim of this study is to quantify the gap between the recognition performance of human listeners and an automatic speech recognition (ASR) system with special focus on intrinsic variations of speech, such as speaking rate and effort, altered pitch, and the presence of dialect and accent. Second, it is investigated if the most common ASR features contain all information required to recognize speech in noisy environments by using resynthesized ASR features in listening experiments. For the phoneme recognition task, the ASR system achieved the human performance level only when the signal-to-noise ratio (SNR) was increased by 15 dB, which is an estimate for the human-machine gap in terms of the SNR. The major part of this gap is attributed to the feature extraction stage, since human listeners achieve comparable recognition scores when the SNR difference between unaltered and resynthesized utterances is 10 dB. Intrinsic variabilities result in strong increases of error rates, both in human speech recognition (HSR) and ASR (with a relative increase of up to 120%). An analysis of phoneme duration and recognition rates indicates that human listeners are better able to identify temporal cues than the machine at low SNRs, which suggests incorporating information about the temporal dynamics of speech into ASR systems.

  10. The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system

    NARCIS (Netherlands)

    Zekveld, A.A.; Kramer, S.E.; Kessens, J.M.; Vlaming, M.S.M.G.; Houtgast, T.

    2009-01-01

    Objective: The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that

  11. Automatic recognition of ship types from infrared images using superstructure moment invariants

    Science.gov (United States)

    Li, Heng; Wang, Xinyu

    2007-11-01

    Automatic object recognition is an active area of interest for military and commercial applications. In this paper, a system addressing autonomous recognition of ship types in infrared images is proposed. Firstly, an approach of segmentation based on detection of salient features of the target with subsequent shadow removing is proposed, as is the base of the subsequent object recognition. Considering the differences between the shapes of various ships mainly lie in their superstructures, we then use superstructure moment functions invariant to translation, rotation and scale differences in input patterns and develop a robust algorithm of obtaining ship superstructure. Subsequently a back-propagation neural network is used as a classifier in the recognition stage and projection images of simulated three-dimensional ship models are used as the training sets. Our recognition model was implemented and experimentally validated using both simulated three-dimensional ship model images and real images derived from video of an AN/AAS-44V Forward Looking Infrared(FLIR) sensor.

  12. iFER: facial expression recognition using automatically selected geometric eye and eyebrow features

    Science.gov (United States)

    Oztel, Ismail; Yolcu, Gozde; Oz, Cemil; Kazan, Serap; Bunyak, Filiz

    2018-03-01

    Facial expressions have an important role in interpersonal communications and estimation of emotional states or intentions. Automatic recognition of facial expressions has led to many practical applications and became one of the important topics in computer vision. We present a facial expression recognition system that relies on geometry-based features extracted from eye and eyebrow regions of the face. The proposed system detects keypoints on frontal face images and forms a feature set using geometric relationships among groups of detected keypoints. Obtained feature set is refined and reduced using the sequential forward selection (SFS) algorithm and fed to a support vector machine classifier to recognize five facial expression classes. The proposed system, iFER (eye-eyebrow only facial expression recognition), is robust to lower face occlusions that may be caused by beards, mustaches, scarves, etc. and lower face motion during speech production. Preliminary experiments on benchmark datasets produced promising results outperforming previous facial expression recognition studies using partial face features, and comparable results to studies using whole face information, only slightly lower by ˜ 2.5 % compared to the best whole face facial recognition system while using only ˜ 1 / 3 of the facial region.

  13. Automatic Speech Recognition Systems for the Evaluation of Voice and Speech Disorders in Head and Neck Cancer

    Directory of Open Access Journals (Sweden)

    Andreas Maier

    2010-01-01

    Full Text Available In patients suffering from head and neck cancer, speech intelligibility is often restricted. For assessment and outcome measurements, automatic speech recognition systems have previously been shown to be appropriate for objective and quick evaluation of intelligibility. In this study we investigate the applicability of the method to speech disorders caused by head and neck cancer. Intelligibility was quantified by speech recognition on recordings of a standard text read by 41 German laryngectomized patients with cancer of the larynx or hypopharynx and 49 German patients who had suffered from oral cancer. The speech recognition provides the percentage of correctly recognized words of a sequence, that is, the word recognition rate. Automatic evaluation was compared to perceptual ratings by a panel of experts and to an age-matched control group. Both patient groups showed significantly lower word recognition rates than the control group. Automatic speech recognition yielded word recognition rates which complied with experts' evaluation of intelligibility on a significant level. Automatic speech recognition serves as a good means with low effort to objectify and quantify the most important aspect of pathologic speech—the intelligibility. The system was successfully applied to voice and speech disorders.

  14. Thoracic lymph node station recognition on CT images based on automatic anatomy recognition with an optimal parent strategy

    Science.gov (United States)

    Xu, Guoping; Udupa, Jayaram K.; Tong, Yubing; Cao, Hanqiang; Odhner, Dewey; Torigian, Drew A.; Wu, Xingyu

    2018-03-01

    Currently, there are many papers that have been published on the detection and segmentation of lymph nodes from medical images. However, it is still a challenging problem owing to low contrast with surrounding soft tissues and the variations of lymph node size and shape on computed tomography (CT) images. This is particularly very difficult on low-dose CT of PET/CT acquisitions. In this study, we utilize our previous automatic anatomy recognition (AAR) framework to recognize the thoracic-lymph node stations defined by the International Association for the Study of Lung Cancer (IASLC) lymph node map. The lymph node stations themselves are viewed as anatomic objects and are localized by using a one-shot method in the AAR framework. Two strategies have been taken in this paper for integration into AAR framework. The first is to combine some lymph node stations into composite lymph node stations according to their geometrical nearness. The other is to find the optimal parent (organ or union of organs) as an anchor for each lymph node station based on the recognition error and thereby find an overall optimal hierarchy to arrange anchor organs and lymph node stations. Based on 28 contrast-enhanced thoracic CT image data sets for model building, 12 independent data sets for testing, our results show that thoracic lymph node stations can be localized within 2-3 voxels compared to the ground truth.

  15. Automatic modulation format recognition for the next generation optical communication networks using artificial neural networks

    Science.gov (United States)

    Guesmi, Latifa; Hraghi, Abir; Menif, Mourad

    2015-03-01

    A new technique for Automatic Modulation Format Recognition (AMFR) in next generation optical communication networks is presented. This technique uses the Artificial Neural Network (ANN) in conjunction with the features of Linear Optical Sampling (LOS) of the detected signal at high bit rates using direct detection or coherent detection. The use of LOS method for this purpose mainly driven by the increase of bit rates which enables the measurement of eye diagrams. The efficiency of this technique is demonstrated under different transmission impairments such as chromatic dispersion (CD) in the range of -500 to 500 ps/nm, differential group delay (DGD) in the range of 0-15 ps and the optical signal tonoise ratio (OSNR) in the range of 10-30 dB. The results of numerical simulation for various modulation formats demonstrate successful recognition from a known bit rates with a higher estimation accuracy, which exceeds 99.8%.

  16. Monitoring caustic injuries from emergency department databases using automatic keyword recognition software.

    Science.gov (United States)

    Vignally, P; Fondi, G; Taggi, F; Pitidis, A

    2011-03-31

    In Italy the European Union Injury Database reports the involvement of chemical products in 0.9% of home and leisure accidents. The Emergency Department registry on domestic accidents in Italy and the Poison Control Centres record that 90% of cases of exposure to toxic substances occur in the home. It is not rare for the effects of chemical agents to be observed in hospitals, with a high potential risk of damage - the rate of this cause of hospital admission is double the domestic injury average. The aim of this study was to monitor the effects of injuries caused by caustic agents in Italy using automatic free-text recognition in Emergency Department medical databases. We created a Stata software program to automatically identify caustic or corrosive injury cases using an agent-specific list of keywords. We focused attention on the procedure's sensitivity and specificity. Ten hospitals in six regions of Italy participated in the study. The program identified 112 cases of injury by caustic or corrosive agents. Checking the cases by quality controls (based on manual reading of ED reports), we assessed 99 cases as true positive, i.e. 88.4% of the patients were automatically recognized by the software as being affected by caustic substances (99% CI: 80.6%- 96.2%), that is to say 0.59% (99% CI: 0.45%-0.76%) of the whole sample of home injuries, a value almost three times as high as that expected (p < 0.0001) from European codified information. False positives were 11.6% of the recognized cases (99% CI: 5.1%- 21.5%). Our automatic procedure for caustic agent identification proved to have excellent product recognition capacity with an acceptable level of excess sensitivity. Contrary to our a priori hypothesis, the automatic recognition system provided a level of identification of agents possessing caustic effects that was significantly much greater than was predictable on the basis of the values from current codifications reported in the European Database.

  17. Multimodal Approach for Automatic Emotion Recognition Applied to the Tension Levels Study in TV Newscasts

    Directory of Open Access Journals (Sweden)

    Moisés Henrique Ramos Pereira

    2015-12-01

    Full Text Available This article addresses a multimodal approach to automatic emotion recognition in participants of TV newscasts (presenters, reporters, commentators and others able to assist the tension levels study in narratives of events in this television genre. The methodology applies state-of-the-art computational methods to process and analyze facial expressions, as well as speech signals. The proposed approach contributes to semiodiscoursive study of TV newscasts and their enunciative praxis, assisting, for example, the identification of the communication strategy of these programs. To evaluate the effectiveness of the proposed approach was applied it in a video related to a report displayed on a Brazilian TV newscast great popularity in the state of Minas Gerais. The experimental results are promising on the recognition of emotions on the facial expressions of tele journalists and are in accordance with the distribution of audiovisual indicators extracted over a TV newscast, demonstrating the potential of the approach to support the TV journalistic discourse analysis.This article addresses a multimodal approach to automatic emotion recognition in participants of TV newscasts (presenters, reporters, commentators and others able to assist the tension levels study in narratives of events in this television genre. The methodology applies state-of-the-art computational methods to process and analyze facial expressions, as well as speech signals. The proposed approach contributes to semiodiscoursive study of TV newscasts and their enunciative praxis, assisting, for example, the identification of the communication strategy of these programs. To evaluate the effectiveness of the proposed approach was applied it in a video related to a report displayed on a Brazilian TV newscast great popularity in the state of Minas Gerais. The experimental results are promising on the recognition of emotions on the facial expressions of tele journalists and are in accordance

  18. Automatic recognition of conceptualization zones in scientific articles and two life science applications.

    Science.gov (United States)

    Liakata, Maria; Saha, Shyamasree; Dobnik, Simon; Batchelor, Colin; Rebholz-Schuhmann, Dietrich

    2012-04-01

    Scholarly biomedical publications report on the findings of a research investigation. Scientists use a well-established discourse structure to relate their work to the state of the art, express their own motivation and hypotheses and report on their methods, results and conclusions. In previous work, we have proposed ways to explicitly annotate the structure of scientific investigations in scholarly publications. Here we present the means to facilitate automatic access to the scientific discourse of articles by automating the recognition of 11 categories at the sentence level, which we call Core Scientific Concepts (CoreSCs). These include: Hypothesis, Motivation, Goal, Object, Background, Method, Experiment, Model, Observation, Result and Conclusion. CoreSCs provide the structure and context to all statements and relations within an article and their automatic recognition can greatly facilitate biomedical information extraction by characterizing the different types of facts, hypotheses and evidence available in a scientific publication. We have trained and compared machine learning classifiers (support vector machines and conditional random fields) on a corpus of 265 full articles in biochemistry and chemistry to automatically recognize CoreSCs. We have evaluated our automatic classifications against a manually annotated gold standard, and have achieved promising accuracies with 'Experiment', 'Background' and 'Model' being the categories with the highest F1-scores (76%, 62% and 53%, respectively). We have analysed the task of CoreSC annotation both from a sentence classification as well as sequence labelling perspective and we present a detailed feature evaluation. The most discriminative features are local sentence features such as unigrams, bigrams and grammatical dependencies while features encoding the document structure, such as section headings, also play an important role for some of the categories. We discuss the usefulness of automatically generated Core

  19. A Joint Approach for Single-Channel Speaker Identification and Speech Separation

    DEFF Research Database (Denmark)

    Mowlaee, Pejman; Saeidi, Rahim; Christensen, Mads Græsbøll

    2012-01-01

    ) accuracy, here, we report the objective and subjective results as well. The results show that the proposed system performs as well as the best of the state-of-the-art in terms of perceived quality while its performance in terms of speaker identification and automatic speech recognition results......In this paper, we present a novel system for joint speaker identification and speech separation. For speaker identification a single-channel speaker identification algorithm is proposed which provides an estimate of signal-to-signal ratio (SSR) as a by-product. For speech separation, we propose...... a sinusoidal model-based algorithm. The speech separation algorithm consists of a double-talk/single-talk detector followed by a minimum mean square error estimator of sinusoidal parameters for finding optimal codevectors from pre-trained speaker codebooks. In evaluating the proposed system, we start from...

  20. Neural-network classifiers for automatic real-world aerial image recognition

    Science.gov (United States)

    Greenberg, Shlomo; Guterman, Hugo

    1996-08-01

    We describe the application of the multilayer perceptron (MLP) network and a version of the adaptive resonance theory version 2-A (ART 2-A) network to the problem of automatic aerial image recognition (AAIR). The classification of aerial images, independent of their positions and orientations, is required for automatic tracking and target recognition. Invariance is achieved by the use of different invariant feature spaces in combination with supervised and unsupervised neural networks. The performance of neural-network-based classifiers in conjunction with several types of invariant AAIR global features, such as the Fourier-transform space, Zernike moments, central moments, and polar transforms, are examined. The advantages of this approach are discussed. The performance of the MLP network is compared with that of a classical correlator. The MLP neural-network correlator outperformed the binary phase-only filter (BPOF) correlator. It was found that the ART 2-A distinguished itself with its speed and its low number of required training vectors. However, only the MLP classifier was able to deal with a combination of shift and rotation geometric distortions.

  1. A pattern recognition approach based on DTW for automatic transient identification in nuclear power plants

    International Nuclear Information System (INIS)

    Galbally, Javier; Galbally, David

    2015-01-01

    Highlights: • Novel transient identification method for NPPs. • Low-complexity. • Low training data requirements. • High accuracy. • Fully reproducible protocol carried out on a real benchmark. - Abstract: Automatic identification of transients in nuclear power plants (NPPs) allows monitoring the fatigue damage accumulated by critical components during plant operation, and is therefore of great importance for ensuring that usage factors remain within the original design bases postulated by the plant designer. Although several schemes to address this important issue have been explored in the literature, there is still no definitive solution available. In the present work, a new method for automatic transient identification is proposed, based on the Dynamic Time Warping (DTW) algorithm, largely used in other related areas such as signature or speech recognition. The novel transient identification system is evaluated on real operational data following a rigorous pattern recognition protocol. Results show the high accuracy of the proposed approach, which is combined with other interesting features such as its low complexity and its very limited requirements of training data

  2. A Knowledge Base for Automatic Feature Recognition from Point Clouds in an Urban Scene

    Directory of Open Access Journals (Sweden)

    Xu-Feng Xing

    2018-01-01

    Full Text Available LiDAR technology can provide very detailed and highly accurate geospatial information on an urban scene for the creation of Virtual Geographic Environments (VGEs for different applications. However, automatic 3D modeling and feature recognition from LiDAR point clouds are very complex tasks. This becomes even more complex when the data is incomplete (occlusion problem or uncertain. In this paper, we propose to build a knowledge base comprising of ontology and semantic rules aiming at automatic feature recognition from point clouds in support of 3D modeling. First, several modules for ontology are defined from different perspectives to describe an urban scene. For instance, the spatial relations module allows the formalized representation of possible topological relations extracted from point clouds. Then, a knowledge base is proposed that contains different concepts, their properties and their relations, together with constraints and semantic rules. Then, instances and their specific relations form an urban scene and are added to the knowledge base as facts. Based on the knowledge and semantic rules, a reasoning process is carried out to extract semantic features of the objects and their components in the urban scene. Finally, several experiments are presented to show the validity of our approach to recognize different semantic features of buildings from LiDAR point clouds.

  3. Automatic recognition of cardiac arrhythmias based on the geometric patterns of Poincaré plots

    International Nuclear Information System (INIS)

    Zhang, Lijuan; Guo, Tianci; Xi, Bin; Fan, Yang; Wang, Kun; Bi, Jiacheng; Wang, Ying

    2015-01-01

    The Poincaré plot emerges as an effective tool for assessing cardiovascular autonomic regulation. It displays nonlinear characteristics of heart rate variability (HRV) from electrocardiographic (ECG) recordings and gives a global view of the long range of ECG signals. In the telemedicine or computer-aided diagnosis system, it would offer significant auxiliary information for diagnosis if the patterns of the Poincaré plots can be automatically classified. Therefore, we developed an automatic classification system to distinguish five geometric patterns of the Poincaré plots from four types of cardiac arrhythmias. The statistics features are designed on measurements and an ensemble classifier of three types of neural networks is proposed. Aiming at the difficulty to set a proper threshold for classifying the multiple categories, the threshold selection strategy is analyzed. 24 h ECG monitoring recordings from 674 patients, which have four types of cardiac arrhythmias, are adopted for recognition. For comparison, Support Vector Machine (SVM) classifiers with linear and Gaussian kernels are also applied. The experiment results demonstrate the effectiveness of the extracted features and the better performance of the designed classifier. Our study can be applied to diagnose the corresponding sinus rhythm and arrhythmia substrates disease automatically in the telemedicine and computer-aided diagnosis system. (paper)

  4. Automatic recognition of falls in gait-slip training: Harness load cell based criteria.

    Science.gov (United States)

    Yang, Feng; Pai, Yi-Chung

    2011-08-11

    Over-head-harness systems, equipped with load cell sensors, are essential to the participants' safety and to the outcome assessment in perturbation training. The purpose of this study was to first develop an automatic outcome recognition criterion among young adults for gait-slip training and then verify such criterion among older adults. Each of 39 young and 71 older subjects, all protected by safety harness, experienced 8 unannounced, repeated slips, while walking on a 7m walkway. Each trial was monitored with a motion capture system, bilateral ground reaction force (GRF), harness force, and video recording. The fall trials were first unambiguously indentified with careful visual inspection of all video records. The recoveries without balance loss (in which subjects' trailing foot landed anteriorly to the slipping foot) were also first fully recognized from motion and GRF analyses. These analyses then set the gold standard for the outcome recognition with load cell measurements. Logistic regression analyses based on young subjects' data revealed that the peak load cell force was the best predictor of falls (with 100% accuracy) at the threshold of 30% body weight. On the other hand, the peak moving average force of load cell across 1s period, was the best predictor (with 100% accuracy) separating recoveries with backward balance loss (in which the recovery step landed posterior to slipping foot) from harness assistance at the threshold of 4.5% body weight. These threshold values were fully verified using the data from older adults (100% accuracy in recognizing falls). Because of the increasing popularity in the perturbation training coupling with the protective over-head-harness system, this new criterion could have far reaching implications in automatic outcome recognition during the movement therapy. Copyright © 2011 Elsevier Ltd. All rights reserved.

  5. AUTOMATIC RECOGNITION OF FALLS IN GAIT-SLIP: A HARNESS LOAD CELL BASED CRITERION

    Science.gov (United States)

    Yang, Feng; Pai, Yi-Chung

    2012-01-01

    Over-head-harness systems, equipped with load cell sensors, are essential to the participants’ safety and to the outcome assessment in perturbation training. The purpose of this study was to first develop an automatic outcome recognition criterion among young adults for gait-slip training and then verify such criterion among older adults. Each of 39 young and 71 older subjects, all protected by safety harness, experienced 8 unannounced, repeated slips, while walking on a 7-m walkway. Each trial was monitored with a motion capture system, bilateral ground reaction force (GRF), harness force and video recording. The fall trials were first unambiguously indentified with careful visual inspection of all video records. The recoveries without balance loss (in which subjects’ trailing foot landed anteriorly to the slipping foot) were also first fully recognized from motion and GRF analyses. These analyses then set the gold standard for the outcome recognition with load cell measurements. Logistic regression analyses based on young subjects’ data revealed that peak load cell force was the best predictor of falls (with 100% accuracy) at the threshold of 30% body weight. On the other hand, the peak moving average force of load cell across 1-s period, was the best predictor (with 100% accuracy) separating recoveries with backward balance loss (in which the recovery step landed posterior to slipping foot) from harness assistance at the threshold of 4.5% body weight. These threshold values were fully verified using the data from older adults (100% accuracy in recognizing falls). Because of the increasing popularity in the perturbation training coupling with the protective over-head-harness system, this new criterion could have far reaching implications in automatic outcome recognition during the movement therapy. PMID:21696744

  6. Morphological characterization of Mycobacterium tuberculosis in a MODS culture for an automatic diagnostics through pattern recognition.

    Directory of Open Access Journals (Sweden)

    Alicia Alva

    Full Text Available Tuberculosis control efforts are hampered by a mismatch in diagnostic technology: modern optimal diagnostic tests are least available in poor areas where they are needed most. Lack of adequate early diagnostics and MDR detection is a critical problem in control efforts. The Microscopic Observation Drug Susceptibility (MODS assay uses visual recognition of cording patterns from Mycobacterium tuberculosis (MTB to diagnose tuberculosis infection and drug susceptibility directly from a sputum sample in 7-10 days with a low cost. An important limitation that laboratories in the developing world face in MODS implementation is the presence of permanent technical staff with expertise in reading MODS. We developed a pattern recognition algorithm to automatically interpret MODS results from digital images. The algorithm using image processing, feature extraction and pattern recognition determined geometrical and illumination features used in an object-model and a photo-model to classify TB-positive images. 765 MODS digital photos were processed. The single-object model identified MTB (96.9% sensitivity and 96.3% specificity and was able to discriminate non-tuberculous mycobacteria with a high specificity (97.1% M. avium, 99.1% M. chelonae, and 93.8% M. kansasii. The photo model identified TB-positive samples with 99.1% sensitivity and 99.7% specificity. This algorithm is a valuable tool that will enable automatic remote diagnosis using Internet or cellphone telephony. The use of this algorithm and its further implementation in a telediagnostics platform will contribute to both faster TB detection and MDR TB determination leading to an earlier initiation of appropriate treatment.

  7. Automatic anatomy recognition in whole-body PET/CT images

    International Nuclear Information System (INIS)

    Wang, Huiqian; Udupa, Jayaram K.; Odhner, Dewey; Tong, Yubing; Torigian, Drew A.; Zhao, Liming

    2016-01-01

    Purpose: Whole-body positron emission tomography/computed tomography (PET/CT) has become a standard method of imaging patients with various disease conditions, especially cancer. Body-wide accurate quantification of disease burden in PET/CT images is important for characterizing lesions, staging disease, prognosticating patient outcome, planning treatment, and evaluating disease response to therapeutic interventions. However, body-wide anatomy recognition in PET/CT is a critical first step for accurately and automatically quantifying disease body-wide, body-region-wise, and organwise. This latter process, however, has remained a challenge due to the lower quality of the anatomic information portrayed in the CT component of this imaging modality and the paucity of anatomic details in the PET component. In this paper, the authors demonstrate the adaptation of a recently developed automatic anatomy recognition (AAR) methodology [Udupa et al., “Body-wide hierarchical fuzzy modeling, recognition, and delineation of anatomy in medical images,” Med. Image Anal. 18, 752–771 (2014)] to PET/CT images. Their goal was to test what level of object localization accuracy can be achieved on PET/CT compared to that achieved on diagnostic CT images. Methods: The authors advance the AAR approach in this work in three fronts: (i) from body-region-wise treatment in the work of Udupa et al. to whole body; (ii) from the use of image intensity in optimal object recognition in the work of Udupa et al. to intensity plus object-specific texture properties, and (iii) from the intramodality model-building-recognition strategy to the intermodality approach. The whole-body approach allows consideration of relationships among objects in different body regions, which was previously not possible. Consideration of object texture allows generalizing the previous optimal threshold-based fuzzy model recognition method from intensity images to any derived fuzzy membership image, and in the process

  8. Automatic anatomy recognition in whole-body PET/CT images

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Huiqian [College of Optoelectronic Engineering, Chongqing University, Chongqing 400044, China and Medical Image Processing Group Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Udupa, Jayaram K., E-mail: jay@mail.med.upenn.edu; Odhner, Dewey; Tong, Yubing; Torigian, Drew A. [Medical Image Processing Group Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Zhao, Liming [Medical Image Processing Group Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 and Research Center of Intelligent System and Robotics, Chongqing University of Posts and Telecommunications, Chongqing 400065 (China)

    2016-01-15

    Purpose: Whole-body positron emission tomography/computed tomography (PET/CT) has become a standard method of imaging patients with various disease conditions, especially cancer. Body-wide accurate quantification of disease burden in PET/CT images is important for characterizing lesions, staging disease, prognosticating patient outcome, planning treatment, and evaluating disease response to therapeutic interventions. However, body-wide anatomy recognition in PET/CT is a critical first step for accurately and automatically quantifying disease body-wide, body-region-wise, and organwise. This latter process, however, has remained a challenge due to the lower quality of the anatomic information portrayed in the CT component of this imaging modality and the paucity of anatomic details in the PET component. In this paper, the authors demonstrate the adaptation of a recently developed automatic anatomy recognition (AAR) methodology [Udupa et al., “Body-wide hierarchical fuzzy modeling, recognition, and delineation of anatomy in medical images,” Med. Image Anal. 18, 752–771 (2014)] to PET/CT images. Their goal was to test what level of object localization accuracy can be achieved on PET/CT compared to that achieved on diagnostic CT images. Methods: The authors advance the AAR approach in this work in three fronts: (i) from body-region-wise treatment in the work of Udupa et al. to whole body; (ii) from the use of image intensity in optimal object recognition in the work of Udupa et al. to intensity plus object-specific texture properties, and (iii) from the intramodality model-building-recognition strategy to the intermodality approach. The whole-body approach allows consideration of relationships among objects in different body regions, which was previously not possible. Consideration of object texture allows generalizing the previous optimal threshold-based fuzzy model recognition method from intensity images to any derived fuzzy membership image, and in the process

  9. Automatic Human Facial Expression Recognition Based on Integrated Classifier From Monocular Video with Uncalibrated Camera

    Directory of Open Access Journals (Sweden)

    Yu Tao

    2017-01-01

    Full Text Available An automatic recognition framework for human facial expressions from a monocular video with an uncalibrated camera is proposed. The expression characteristics are first acquired from a kind of deformable template, similar to a facial muscle distribution. After associated regularization, the time sequences from the trait changes in space-time under complete expressional production are then arranged line by line in a matrix. Next, the matrix dimensionality is reduced by a method of manifold learning of neighborhood-preserving embedding. Finally, the refined matrix containing the expression trait information is recognized by a classifier that integrates the hidden conditional random field (HCRF and support vector machine (SVM. In an experiment using the Cohn–Kanade database, the proposed method showed a comparatively higher recognition rate than the individual HCRF or SVM methods in direct recognition from two-dimensional human face traits. Moreover, the proposed method was shown to be more robust than the typical Kotsia method because the former contains more structural characteristics of the data to be classified in space-time

  10. Robust Recognition of Loud and Lombard speech in the Fighter Cockpit Environment

    Science.gov (United States)

    1988-08-01

    the latter as inter-speaker variability. According to Zue [Z85j, inter-speaker variabilities can be attributed to sociolinguistic background, dialect...34 Journal of the Acoustical Society of America , Vol 50, 1971. [At74I B. S. Atal, "Linear prediction for speaker identification," Journal of the Acoustical...Society of America , Vol 55, 1974. [B771 B. Beek, E. P. Neuberg, and D. C. Hodge, "An Assessment of the Technology of Automatic Speech Recognition for

  11. Automatic Smoker Detection from Telephone Speech Signals

    DEFF Research Database (Denmark)

    Poorjam, Amir Hossein; Hesaraki, Soheila; Safavi, Saeid

    2017-01-01

    This paper proposes an automatic smoking habit detection from spontaneous telephone speech signals. In this method, each utterance is modeled using i-vector and non-negative factor analysis (NFA) frameworks, which yield low-dimensional representation of utterances by applying factor analysis...... method is evaluated on telephone speech signals of speakers whose smoking habits are known drawn from the National Institute of Standards and Technology (NIST) 2008 and 2010 Speaker Recognition Evaluation databases. Experimental results over 1194 utterances show the effectiveness of the proposed approach...... for the automatic smoking habit detection task....

  12. Development of Portable Automatic Number Plate Recognition System on Android Mobile Phone

    Science.gov (United States)

    Mutholib, Abdul; Gunawan, Teddy S.; Chebil, Jalel; Kartiwi, Mira

    2013-12-01

    The Automatic Number Plate Recognition (ANPR) System has performed as the main role in various access control and security, such as: tracking of stolen vehicles, traffic violations (speed trap) and parking management system. In this paper, the portable ANPR implemented on android mobile phone is presented. The main challenges in mobile application are including higher coding efficiency, reduced computational complexity, and improved flexibility. Significance efforts are being explored to find suitable and adaptive algorithm for implementation of ANPR on mobile phone. ANPR system for mobile phone need to be optimize due to its limited CPU and memory resources, its ability for geo-tagging image captured using GPS coordinates and its ability to access online database to store the vehicle's information. In this paper, the design of portable ANPR on android mobile phone will be described as follows. First, the graphical user interface (GUI) for capturing image using built-in camera was developed to acquire vehicle plate number in Malaysia. Second, the preprocessing of raw image was done using contrast enhancement. Next, character segmentation using fixed pitch and an optical character recognition (OCR) using neural network were utilized to extract texts and numbers. Both character segmentation and OCR were using Tesseract library from Google Inc. The proposed portable ANPR algorithm was implemented and simulated using Android SDK on a computer. Based on the experimental results, the proposed system can effectively recognize the license plate number at 90.86%. The required processing time to recognize a license plate is only 2 seconds on average. The result is consider good in comparison with the results obtained from previous system that was processed in a desktop PC with the range of result from 91.59% to 98% recognition rate and 0.284 second to 1.5 seconds recognition time.

  13. Automatic Speech Recognition Systems for the Evaluation of Voice and Speech Disorders in Head and Neck Cancer

    OpenAIRE

    Andreas Maier; Tino Haderlein; Florian Stelzle; Elmar Nöth; Emeka Nkenke; Frank Rosanowski; Anne Schützenberger; Maria Schuster

    2010-01-01

    In patients suffering from head and neck cancer, speech intelligibility is often restricted. For assessment and outcome measurements, automatic speech recognition systems have previously been shown to be appropriate for objective and quick evaluation of intelligibility. In this study we investigate the applicability of the method to speech disorders caused by head and neck cancer. Intelligibility was quantified by speech recognition on recordings of a standard text read by 41 German laryngect...

  14. Semi-automatic parking slot marking recognition for intelligent parking assist systems

    Directory of Open Access Journals (Sweden)

    Ho Gi Jung

    2014-01-01

    Full Text Available This paper proposes a semi-automatic parking slot marking-based target position designation method for parking assist systems in cases where the parking slot markings are of a rectangular type, and its efficient implementation for real-time operation. After the driver observes a rearview image captured by a rearward camera installed at the rear of the vehicle through a touchscreen-based human machine interface, a target parking position is designated by touching the inside of a parking slot. To ensure the proposed method operates in real-time in an embedded environment, access of the bird's-eye view image is made efficient: image-wise batch transformation is replaced with pixel-wise instantaneous transformation. The proposed method showed a 95.5% recognition rate in 378 test cases with 63 test images. Additionally, experiments confirmed that the pixel-wise instantaneous transformation reduced execution time by 92%.

  15. Model-based vision system for automatic recognition of structures in dental radiographs

    Science.gov (United States)

    Acharya, Raj S.; Samarabandu, Jagath K.; Hausmann, E.; Allen, K. A.

    1991-07-01

    X-ray diagnosis of destructive periodontal disease requires assessing serial radiographs by an expert to determine the change in the distance between cemento-enamel junction (CEJ) and the bone crest. To achieve this without the subjectivity of a human expert, a knowledge based system is proposed to automatically locate the two landmarks which are the CEJ and the level of alveolar crest at its junction with the periodontal ligament space. This work is a part of an ongoing project to automatically measure the distance between CEJ and the bone crest along a line parallel to the axis of the tooth. The approach presented in this paper is based on identifying a prominent feature such as the tooth boundary using local edge detection and edge thresholding to establish a reference and then using model knowledge to process sub-regions in locating the landmarks. Segmentation techniques invoked around these regions consists of a neural-network like hierarchical refinement scheme together with local gradient extraction, multilevel thresholding and ridge tracking. Recognition accuracy is further improved by first locating the easily identifiable parts of the bone surface and the interface between the enamel and the dentine and then extending these boundaries towards the periodontal ligament space and the tooth boundary respectively. The system is realized as a collection of tools (or knowledge sources) for pre-processing, segmentation, primary and secondary feature detection and a control structure based on the blackboard model to coordinate the activities of these tools.

  16. Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients

    Directory of Open Access Journals (Sweden)

    TjongWan Sen

    2009-11-01

    Full Text Available To improve the performance of phoneme based Automatic Speech Recognition (ASR in noisy environment; we developed a new technique that could add robustness to clean phonemes features. These robust features are obtained from Complex Wavelet Packet Transform (CWPT coefficients. Since the CWPT coefficients represent all different frequency bands of the input signal, decomposing the input signal into complete CWPT tree would also cover all frequencies involved in recognition process. For time overlapping signals with different frequency contents, e. g. phoneme signal with noises, its CWPT coefficients are the combination of CWPT coefficients of phoneme signal and CWPT coefficients of noises. The CWPT coefficients of phonemes signal would be changed according to frequency components contained in noises. Since the numbers of phonemes in every language are relatively small (limited and already well known, one could easily derive principal component vectors from clean training dataset using Principal Component Analysis (PCA. These principal component vectors could be used then to add robustness and minimize noises effects in testing phase. Simulation results, using Alpha Numeric 4 (AN4 from Carnegie Mellon University and NOISEX-92 examples from Rice University, showed that this new technique could be used as features extractor that improves the robustness of phoneme based ASR systems in various adverse noisy conditions and still preserves the performance in clean environments.

  17. A rapid automatic analyzer and its methodology for effective bentonite content based on image recognition technology

    Directory of Open Access Journals (Sweden)

    Wei Long

    2016-09-01

    Full Text Available Fast and accurate determination of effective bentonite content in used clay bonded sand is very important for selecting the correct mixing ratio and mixing process to obtain high-performance molding sand. Currently, the effective bentonite content is determined by testing the ethylene blue absorbed in used clay bonded sand, which is usually a manual operation with some disadvantages including complicated process, long testing time and low accuracy. A rapid automatic analyzer of the effective bentonite content in used clay bonded sand was developed based on image recognition technology. The instrument consists of auto stirring, auto liquid removal, auto titration, step-rotation and image acquisition components, and processor. The principle of the image recognition method is first to decompose the color images into three-channel gray images based on the photosensitive degree difference of the light blue and dark blue in the three channels of red, green and blue, then to make the gray values subtraction calculation and gray level transformation of the gray images, and finally, to extract the outer circle light blue halo and the inner circle blue spot and calculate their area ratio. The titration process can be judged to reach the end-point while the area ratio is higher than the setting value.

  18. Robust Automatic Target Recognition via HRRP Sequence Based on Scatterer Matching

    Directory of Open Access Journals (Sweden)

    Yuan Jiang

    2018-02-01

    Full Text Available High resolution range profile (HRRP plays an important role in wideband radar automatic target recognition (ATR. In order to alleviate the sensitivity to clutter and target aspect, employing a sequence of HRRP is a promising approach to enhance the ATR performance. In this paper, a novel HRRP sequence-matching method based on singular value decomposition (SVD is proposed. First, the HRRP sequence is decoupled into the angle space and the range space via SVD, which correspond to the span of the left and the right singular vectors, respectively. Second, atomic norm minimization (ANM is utilized to estimate dominant scatterers in the range space and the Hausdorff distance is employed to measure the scatter similarity between the test and training data. Next, the angle space similarity between the test and training data is evaluated based on the left singular vector correlations. Finally, the range space matching result and the angle space correlation are fused with the singular values as weights. Simulation and outfield experimental results demonstrate that the proposed matching metric is a robust similarity measure for HRRP sequence recognition.

  19. A Compact Methodology to Understand, Evaluate, and Predict the Performance of Automatic Target Recognition

    Science.gov (United States)

    Li, Yanpeng; Li, Xiang; Wang, Hongqiang; Chen, Yiping; Zhuang, Zhaowen; Cheng, Yongqiang; Deng, Bin; Wang, Liandong; Zeng, Yonghu; Gao, Lei

    2014-01-01

    This paper offers a compacted mechanism to carry out the performance evaluation work for an automatic target recognition (ATR) system: (a) a standard description of the ATR system's output is suggested, a quantity to indicate the operating condition is presented based on the principle of feature extraction in pattern recognition, and a series of indexes to assess the output in different aspects are developed with the application of statistics; (b) performance of the ATR system is interpreted by a quality factor based on knowledge of engineering mathematics; (c) through a novel utility called “context-probability” estimation proposed based on probability, performance prediction for an ATR system is realized. The simulation result shows that the performance of an ATR system can be accounted for and forecasted by the above-mentioned measures. Compared to existing technologies, the novel method can offer more objective performance conclusions for an ATR system. These conclusions may be helpful in knowing the practical capability of the tested ATR system. At the same time, the generalization performance of the proposed method is good. PMID:24967605

  20. Automatic recognition of severity level for diagnosis of diabetic retinopathy using deep visual features.

    Science.gov (United States)

    Abbas, Qaisar; Fondon, Irene; Sarmiento, Auxiliadora; Jiménez, Soledad; Alemany, Pedro

    2017-11-01

    Diabetic retinopathy (DR) is leading cause of blindness among diabetic patients. Recognition of severity level is required by ophthalmologists to early detect and diagnose the DR. However, it is a challenging task for both medical experts and computer-aided diagnosis systems due to requiring extensive domain expert knowledge. In this article, a novel automatic recognition system for the five severity level of diabetic retinopathy (SLDR) is developed without performing any pre- and post-processing steps on retinal fundus images through learning of deep visual features (DVFs). These DVF features are extracted from each image by using color dense in scale-invariant and gradient location-orientation histogram techniques. To learn these DVF features, a semi-supervised multilayer deep-learning algorithm is utilized along with a new compressed layer and fine-tuning steps. This SLDR system was evaluated and compared with state-of-the-art techniques using the measures of sensitivity (SE), specificity (SP) and area under the receiving operating curves (AUC). On 750 fundus images (150 per category), the SE of 92.18%, SP of 94.50% and AUC of 0.924 values were obtained on average. These results demonstrate that the SLDR system is appropriate for early detection of DR and provide an effective treatment for prediction type of diabetes.

  1. Automatic data-driven real-time segmentation and recognition of surgical workflow.

    Science.gov (United States)

    Dergachyova, Olga; Bouget, David; Huaulmé, Arnaud; Morandi, Xavier; Jannin, Pierre

    2016-06-01

    With the intention of extending the perception and action of surgical staff inside the operating room, the medical community has expressed a growing interest towards context-aware systems. Requiring an accurate identification of the surgical workflow, such systems make use of data from a diverse set of available sensors. In this paper, we propose a fully data-driven and real-time method for segmentation and recognition of surgical phases using a combination of video data and instrument usage signals, exploiting no prior knowledge. We also introduce new validation metrics for assessment of workflow detection. The segmentation and recognition are based on a four-stage process. Firstly, during the learning time, a Surgical Process Model is automatically constructed from data annotations to guide the following process. Secondly, data samples are described using a combination of low-level visual cues and instrument information. Then, in the third stage, these descriptions are employed to train a set of AdaBoost classifiers capable of distinguishing one surgical phase from others. Finally, AdaBoost responses are used as input to a Hidden semi-Markov Model in order to obtain a final decision. On the MICCAI EndoVis challenge laparoscopic dataset we achieved a precision and a recall of 91 % in classification of 7 phases. Compared to the analysis based on one data type only, a combination of visual features and instrument signals allows better segmentation, reduction of the detection delay and discovery of the correct phase order.

  2. Analysis of human scream and its impact on text-independent speaker verification.

    Science.gov (United States)

    Hansen, John H L; Nandwana, Mahesh Kumar; Shokouhi, Navid

    2017-04-01

    Scream is defined as sustained, high-energy vocalizations that lack phonological structure. Lack of phonological structure is how scream is identified from other forms of loud vocalization, such as "yell." This study investigates the acoustic aspects of screams and addresses those that are known to prevent standard speaker identification systems from recognizing the identity of screaming speakers. It is well established that speaker variability due to changes in vocal effort and Lombard effect contribute to degraded performance in automatic speech systems (i.e., speech recognition, speaker identification, diarization, etc.). However, previous research in the general area of speaker variability has concentrated on human speech production, whereas less is known about non-speech vocalizations. The UT-NonSpeech corpus is developed here to investigate speaker verification from scream samples. This study considers a detailed analysis in terms of fundamental frequency, spectral peak shift, frame energy distribution, and spectral tilt. It is shown that traditional speaker recognition based on the Gaussian mixture models-universal background model framework is unreliable when evaluated with screams.

  3. An Efficient Multimodal 2D + 3D Feature-based Approach to Automatic Facial Expression Recognition

    KAUST Repository

    Li, Huibin

    2015-07-29

    We present a fully automatic multimodal 2D + 3D feature-based facial expression recognition approach and demonstrate its performance on the BU-3DFE database. Our approach combines multi-order gradient-based local texture and shape descriptors in order to achieve efficiency and robustness. First, a large set of fiducial facial landmarks of 2D face images along with their 3D face scans are localized using a novel algorithm namely incremental Parallel Cascade of Linear Regression (iPar-CLR). Then, a novel Histogram of Second Order Gradients (HSOG) based local image descriptor in conjunction with the widely used first-order gradient based SIFT descriptor are used to describe the local texture around each 2D landmark. Similarly, the local geometry around each 3D landmark is described by two novel local shape descriptors constructed using the first-order and the second-order surface differential geometry quantities, i.e., Histogram of mesh Gradients (meshHOG) and Histogram of mesh Shape index (curvature quantization, meshHOS). Finally, the Support Vector Machine (SVM) based recognition results of all 2D and 3D descriptors are fused at both feature-level and score-level to further improve the accuracy. Comprehensive experimental results demonstrate that there exist impressive complementary characteristics between the 2D and 3D descriptors. We use the BU-3DFE benchmark to compare our approach to the state-of-the-art ones. Our multimodal feature-based approach outperforms the others by achieving an average recognition accuracy of 86.32%. Moreover, a good generalization ability is shown on the Bosphorus database.

  4. An Efficient Multimodal 2D + 3D Feature-based Approach to Automatic Facial Expression Recognition

    KAUST Repository

    Li, Huibin; Ding, Huaxiong; Huang, Di; Wang, Yunhong; Zhao, Xi; Morvan, Jean-Marie; Chen, Liming

    2015-01-01

    We present a fully automatic multimodal 2D + 3D feature-based facial expression recognition approach and demonstrate its performance on the BU-3DFE database. Our approach combines multi-order gradient-based local texture and shape descriptors in order to achieve efficiency and robustness. First, a large set of fiducial facial landmarks of 2D face images along with their 3D face scans are localized using a novel algorithm namely incremental Parallel Cascade of Linear Regression (iPar-CLR). Then, a novel Histogram of Second Order Gradients (HSOG) based local image descriptor in conjunction with the widely used first-order gradient based SIFT descriptor are used to describe the local texture around each 2D landmark. Similarly, the local geometry around each 3D landmark is described by two novel local shape descriptors constructed using the first-order and the second-order surface differential geometry quantities, i.e., Histogram of mesh Gradients (meshHOG) and Histogram of mesh Shape index (curvature quantization, meshHOS). Finally, the Support Vector Machine (SVM) based recognition results of all 2D and 3D descriptors are fused at both feature-level and score-level to further improve the accuracy. Comprehensive experimental results demonstrate that there exist impressive complementary characteristics between the 2D and 3D descriptors. We use the BU-3DFE benchmark to compare our approach to the state-of-the-art ones. Our multimodal feature-based approach outperforms the others by achieving an average recognition accuracy of 86.32%. Moreover, a good generalization ability is shown on the Bosphorus database.

  5. Intelligent Automatic Right-Left Sign Lamp Based on Brain Signal Recognition System

    Science.gov (United States)

    Winda, A.; Sofyan; Sthevany; Vincent, R. S.

    2017-12-01

    Comfort as a part of the human factor, plays important roles in nowadays advanced automotive technology. Many of the current technologies go in the direction of automotive driver assistance features. However, many of the driver assistance features still require physical movement by human to enable the features. In this work, the proposed method is used in order to make certain feature to be functioning without any physical movement, instead human just need to think about it in their mind. In this work, brain signal is recorded and processed in order to be used as input to the recognition system. Right-Left sign lamp based on the brain signal recognition system can potentially replace the button or switch of the specific device in order to make the lamp work. The system then will decide whether the signal is ‘Right’ or ‘Left’. The decision of the Right-Left side of brain signal recognition will be sent to a processing board in order to activate the automotive relay, which will be used to activate the sign lamp. Furthermore, the intelligent system approach is used to develop authorized model based on the brain signal. Particularly Support Vector Machines (SVMs)-based classification system is used in the proposed system to recognize the Left-Right of the brain signal. Experimental results confirm the effectiveness of the proposed intelligent Automatic brain signal-based Right-Left sign lamp access control system. The signal is processed by Linear Prediction Coefficient (LPC) and Support Vector Machines (SVMs), and the resulting experiment shows the training and testing accuracy of 100% and 80%, respectively.

  6. Language modeling for automatic speech recognition of inflective languages an applications-oriented approach using lexical data

    CERN Document Server

    Donaj, Gregor

    2017-01-01

    This book covers language modeling and automatic speech recognition for inflective languages (e.g. Slavic languages), which represent roughly half of the languages spoken in Europe. These languages do not perform as well as English in speech recognition systems and it is therefore harder to develop an application with sufficient quality for the end user. The authors describe the most important language features for the development of a speech recognition system. This is then presented through the analysis of errors in the system and the development of language models and their inclusion in speech recognition systems, which specifically address the errors that are relevant for targeted applications. The error analysis is done with regard to morphological characteristics of the word in the recognized sentences. The book is oriented towards speech recognition with large vocabularies and continuous and even spontaneous speech. Today such applications work with a rather small number of languages compared to the nu...

  7. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    Science.gov (United States)

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  8. Advances in image compression and automatic target recognition; Proceedings of the Meeting, Orlando, FL, Mar. 30, 31, 1989

    Science.gov (United States)

    Tescher, Andrew G. (Editor)

    1989-01-01

    Various papers on image compression and automatic target recognition are presented. Individual topics addressed include: target cluster detection in cluttered SAR imagery, model-based target recognition using laser radar imagery, Smart Sensor front-end processor for feature extraction of images, object attitude estimation and tracking from a single video sensor, symmetry detection in human vision, analysis of high resolution aerial images for object detection, obscured object recognition for an ATR application, neural networks for adaptive shape tracking, statistical mechanics and pattern recognition, detection of cylinders in aerial range images, moving object tracking using local windows, new transform method for image data compression, quad-tree product vector quantization of images, predictive trellis encoding of imagery, reduced generalized chain code for contour description, compact architecture for a real-time vision system, use of human visibility functions in segmentation coding, color texture analysis and synthesis using Gibbs random fields.

  9. The Usefulness of Automatic Speech Recognition (ASR Eyespeak Software in Improving Iraqi EFL Students’ Pronunciation

    Directory of Open Access Journals (Sweden)

    Lina Fathi Sidig Sidgi

    2017-02-01

    Full Text Available The present study focuses on determining whether automatic speech recognition (ASR technology is reliable for improving English pronunciation to Iraqi EFL students. Non-native learners of English are generally concerned about improving their pronunciation skills, and Iraqi students face difficulties in pronouncing English sounds that are not found in their native language (Arabic. This study is concerned with ASR and its effectiveness in overcoming this difficulty. The data were obtained from twenty participants randomly selected from first-year college students at Al-Turath University College from the Department of English in Baghdad-Iraq. The students had participated in a two month pronunciation instruction course using ASR Eyespeak software. At the end of the pronunciation instruction course using ASR Eyespeak software, the students completed a questionnaire to get their opinions about the usefulness of the ASR Eyespeak in improving their pronunciation. The findings of the study revealed that the students found ASR Eyespeak software very useful in improving their pronunciation and helping them realise their pronunciation mistakes. They also reported that learning pronunciation with ASR Eyespeak enjoyable.

  10. Terminology of the public relations field: corpus — automatic term recognition — terminology database

    Directory of Open Access Journals (Sweden)

    Nataša Logar Berginc

    2013-12-01

    Full Text Available The article describes an analysis of automatic term recognition results performed for single- and multi-word terms with the LUIZ term extraction system. The target application of the results is a terminology database of Public Relations and the main resource the KoRP Public Relations Corpus. Our analysis is focused on two segments: (a single-word noun term candidates, which we compare with the frequency list of nouns from KoRP and evaluate termhood on the basis of the judgements of two domain experts, and (b multi-word term candidates with verb and noun as headword. In order to better assess the performance of the system and the soundness of our approach we also performed an analysis of recall. Our results show that the terminological relevance of extracted nouns is indeed higher than that of merely frequent nouns, and that verbal phrases only rarely count as proper terms. The most productive patterns of multi-word terms with noun as a headword have the following structure: [adjective + noun], [adjective + and + adjective + noun] and [adjective + adjective + noun]. The analysis of recall shows low inter-annotator agreement, but nevertheless very satisfactory recall levels.

  11. Automatic recognition of alertness and drowsiness from EEG by an artificial neural network.

    Science.gov (United States)

    Vuckovic, Aleksandra; Radivojevic, Vlada; Chen, Andrew C N; Popovic, Dejan

    2002-06-01

    We present a novel method for classifying alert vs drowsy states from 1 s long sequences of full spectrum EEG recordings in an arbitrary subject. This novel method uses time series of interhemispheric and intrahemispheric cross spectral densities of full spectrum EEG as the input to an artificial neural network (ANN) with two discrete outputs: drowsy and alert. The experimental data were collected from 17 subjects. Two experts in EEG interpretation visually inspected the data and provided the necessary expertise for the training of an ANN. We selected the following three ANNs as potential candidates: (1) the linear network with Widrow-Hoff (WH) algorithm; (2) the non-linear ANN with the Levenberg-Marquardt (LM) rule; and (3) the Learning Vector Quantization (LVQ) neural network. We showed that the LVQ neural network gives the best classification compared with the linear network that uses WH algorithm (the worst), and the non-linear network trained with the LM rule. Classification properties of LVQ were validated using the data recorded in 12 healthy volunteer subjects, yet whose EEG recordings have not been used for the training of the ANN. The statistics were used as a measure of potential applicability of the LVQ: the t-distribution showed that matching between the human assessment and the network output was 94.37+/-1.95%. This result suggests that the automatic recognition algorithm is applicable for distinguishing between alert and drowsy state in recordings that have not been used for the training.

  12. Application of an automatic pattern recognition for aleatory signals for the surveillance of nuclear reactor and rotating machinery

    International Nuclear Information System (INIS)

    Nascimento, J.A. do.

    1982-02-01

    An automatic pattern recognition program PSDREC, developed for the surveillance of nuclear reactor and rotating machinery is described and the relevant theory is outlined. Pattern recognition analysis of noise signals is a powerful technique for assessing 'system normality' in dynamic systems. This program, with applies 8 statistical tests to calculated power spectral density (PSD) distribution, was earlier installed in a PDP-11/45 computer at IPEN. To analyse recorded signals from three systems, namely an operational BWR power reactor (neutron signals), a water pump and a diesel engine (vibration signals) this technique was used. Results of the tests are considered satisfactory. (Author) [pt

  13. Automatic Classification of Station Quality by Image Based Pattern Recognition of Ppsd Plots

    Science.gov (United States)

    Weber, B.; Herrnkind, S.

    2017-12-01

    The number of seismic stations is growing and it became common practice to share station waveform data in real-time with the main data centers as IRIS, GEOFON, ORFEUS and RESIF. This made analyzing station performance of increasing importance for automatic real-time processing and station selection. The value of a station depends on different factors as quality and quantity of the data, location of the site and general station density in the surrounding area and finally the type of application it can be used for. The approach described by McNamara and Boaz (2006) became standard in the last decade. It incorporates a probability density function (PDF) to display the distribution of seismic power spectral density (PSD). The low noise model (LNM) and high noise model (HNM) introduced by Peterson (1993) are also displayed in the PPSD plots introduced by McNamara and Boaz allowing an estimation of the station quality. Here we describe how we established an automatic station quality classification module using image based pattern recognition on PPSD plots. The plots were split into 4 bands: short-period characteristics (0.1-0.8 s), body wave characteristics (0.8-5 s), microseismic characteristics (5-12 s) and long-period characteristics (12-100 s). The module sqeval connects to a SeedLink server, checks available stations, requests PPSD plots through the Mustang service from IRIS or PQLX/SQLX or from GIS (gempa Image Server), a module to generate different kind of images as trace plots, map plots, helicorder plots or PPSD plots. It compares the image based quality patterns for the different period bands with the retrieved PPSD plot. The quality of a station is divided into 5 classes for each of the 4 bands. Classes A, B, C, D define regular quality between LNM and HNM while the fifth class represents out of order stations with gain problems, missing data etc. Over all period bands about 100 different patterns are required to classify most of the stations available on the

  14. Speaker segmentation and clustering

    OpenAIRE

    Kotti, M; Moschou, V; Kotropoulos, C

    2008-01-01

    07.08.13 KB. Ok to add the accepted version to Spiral, Elsevier says ok whlile mandate not enforced. This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker...

  15. Connected digit speech recognition system for Malayalam language

    Indian Academy of Sciences (India)

    A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer for Malayalam language. The system employs Perceptual ...

  16. Data requirements for speaker independent acoustic models

    CSIR Research Space (South Africa)

    Badenhorst, JAC

    2008-11-01

    Full Text Available When developing speech recognition systems in resource-constrained environments, careful design of the training corpus can play an important role in compensating for data scarcity. One of the factors to consider relates to the speaker composition...

  17. Development of Filtered Bispectrum for EEG Signal Feature Extraction in Automatic Emotion Recognition Using Artificial Neural Networks

    Directory of Open Access Journals (Sweden)

    Prima Dewi Purnamasari

    2017-05-01

    Full Text Available The development of automatic emotion detection systems has recently gained significant attention due to the growing possibility of their implementation in several applications, including affective computing and various fields within biomedical engineering. Use of the electroencephalograph (EEG signal is preferred over facial expression, as people cannot control the EEG signal generated by their brain; the EEG ensures a stronger reliability in the psychological signal. However, because of its uniqueness between individuals and its vulnerability to noise, use of EEG signals can be rather complicated. In this paper, we propose a methodology to conduct EEG-based emotion recognition by using a filtered bispectrum as the feature extraction subsystem and an artificial neural network (ANN as the classifier. The bispectrum is theoretically superior to the power spectrum because it can identify phase coupling between the nonlinear process components of the EEG signal. In the feature extraction process, to extract the information contained in the bispectrum matrices, a 3D pyramid filter is used for sampling and quantifying the bispectrum value. Experiment results show that the mean percentage of the bispectrum value from 5 × 5 non-overlapped 3D pyramid filters produces the highest recognition rate. We found that reducing the number of EEG channels down to only eight in the frontal area of the brain does not significantly affect the recognition rate, and the number of data samples used in the training process is then increased to improve the recognition rate of the system. We have also utilized a probabilistic neural network (PNN as another classifier and compared its recognition rate with that of the back-propagation neural network (BPNN, and the results show that the PNN produces a comparable recognition rate and lower computational costs. Our research shows that the extracted bispectrum values of an EEG signal using 3D filtering as a feature extraction

  18. Novel Techniques for Dialectal Arabic Speech Recognition

    CERN Document Server

    Elmahdy, Mohamed; Minker, Wolfgang

    2012-01-01

    Novel Techniques for Dialectal Arabic Speech describes approaches to improve automatic speech recognition for dialectal Arabic. Since speech resources for dialectal Arabic speech recognition are very sparse, the authors describe how existing Modern Standard Arabic (MSA) speech data can be applied to dialectal Arabic speech recognition, while assuming that MSA is always a second language for all Arabic speakers. In this book, Egyptian Colloquial Arabic (ECA) has been chosen as a typical Arabic dialect. ECA is the first ranked Arabic dialect in terms of number of speakers, and a high quality ECA speech corpus with accurate phonetic transcription has been collected. MSA acoustic models were trained using news broadcast speech. In order to cross-lingually use MSA in dialectal Arabic speech recognition, the authors have normalized the phoneme sets for MSA and ECA. After this normalization, they have applied state-of-the-art acoustic model adaptation techniques like Maximum Likelihood Linear Regression (MLLR) and M...

  19. Context and Quasi-Invariants in Automatic Target Recognition (ATR) with Synthetic Aperture Radar (SAR) Imagery

    National Research Council Canada - National Science Library

    Binford, Thomas

    2000-01-01

    .... Experiments based on conventional recognition techniques were conducted for comparisons. Study of persistent scattering confirms the feasibility of implementing a SAR ATR system using physical image features...

  20. An exploration of the potential of Automatic Speech Recognition to assist and enable receptive communication in higher education

    Directory of Open Access Journals (Sweden)

    Mike Wald

    2006-12-01

    Full Text Available The potential use of Automatic Speech Recognition to assist receptive communication is explored. The opportunities and challenges that this technology presents students and staff to provide captioning of speech online or in classrooms for deaf or hard of hearing students and assist blind, visually impaired or dyslexic learners to read and search learning material more readily by augmenting synthetic speech with natural recorded real speech is also discussed and evaluated. The automatic provision of online lecture notes, synchronised with speech, enables staff and students to focus on learning and teaching issues, while also benefiting learners unable to attend the lecture or who find it difficult or impossible to take notes at the same time as listening, watching and thinking.

  1. Recognition and automatic tracking of weld line in fringe welding by autonomous mobile robot with visual sensor

    International Nuclear Information System (INIS)

    Suga, Yasuo; Saito, Keishin; Ishii, Hideaki.

    1994-01-01

    An autonomous mobile robot with visual sensor and four driving axes for welding of pipe and fringe was constructed. The robot can move along a pipe, and detect the weld line to be welded by visual sensor. Moreover, in order to perform welding automatically, the tip of welding torch can track the weld line of the joint by rotating the robot head. In the case of welding of pipe and fringe, the robot can detect the contact angle between the two base metals to be welded, and the torch angle changes according to the contact angle. As the result of tracking test by the robot system, it was made clear that the recognition of geometry of the joint by the laser lighting method and automatic tracking of weld line were possible. The average tracking error was ±0.3 mm approximately and the torch angle could be always kept at the optimum angle. (author)

  2. Automatic micropropagation of plants--the vision-system: graph rewriting as pattern recognition

    Science.gov (United States)

    Schwanke, Joerg; Megnet, Roland; Jensch, Peter F.

    1993-03-01

    The automation of plant-micropropagation is necessary to produce high amounts of biomass. Plants have to be dissected on particular cutting-points. A vision-system is needed for the recognition of the cutting-points on the plants. With this background, this contribution is directed to the underlying formalism to determine cutting-points on abstract-plant models. We show the usefulness of pattern recognition by graph-rewriting along with some examples in this context.

  3. The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system.

    Science.gov (United States)

    Zekveld, Adriana A; Kramer, Sophia E; Kessens, Judith M; Vlaming, Marcel S M G; Houtgast, Tammo

    2009-04-01

    The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that speech comprehension in noise by young listeners with normal hearing improves when presenting partly incorrect, automatically generated subtitles. The current study focused on the effects of age, hearing loss, visual working memory capacity, and linguistic skills on the benefit obtained from automatically generated subtitles during listening to speech in noise. In order to investigate the effects of age and hearing loss, three groups of participants were included: 22 young persons with normal hearing (YNH, mean age = 21 years), 22 middle-aged adults with normal hearing (MA-NH, mean age = 55 years) and 30 middle-aged adults with hearing impairment (MA-HI, mean age = 57 years). The benefit from automatic subtitling was measured by Speech Reception Threshold (SRT) tests (Plomp & Mimpen, 1979). Both unimodal auditory and bimodal audiovisual SRT tests were performed. In the audiovisual tests, the subtitles were presented simultaneously with the speech, whereas in the auditory test, only speech was presented. The difference between the auditory and audiovisual SRT was defined as the audiovisual benefit. Participants additionally rated the listening effort. We examined the influences of ASR accuracy level and text delay on the audiovisual benefit and the listening effort using a repeated measures General Linear Model analysis. In a correlation analysis, we evaluated the relationships between age, auditory SRT, visual working memory capacity and the audiovisual benefit and listening effort. The automatically generated subtitles improved speech comprehension in noise for all ASR accuracies and delays covered by the current study. Higher ASR accuracy levels resulted in more benefit obtained

  4. Automatic recognition of holistic functional brain networks using iteratively optimized convolutional neural networks (IO-CNN) with weak label initialization.

    Science.gov (United States)

    Zhao, Yu; Ge, Fangfei; Liu, Tianming

    2018-07-01

    fMRI data decomposition techniques have advanced significantly from shallow models such as Independent Component Analysis (ICA) and Sparse Coding and Dictionary Learning (SCDL) to deep learning models such Deep Belief Networks (DBN) and Convolutional Autoencoder (DCAE). However, interpretations of those decomposed networks are still open questions due to the lack of functional brain atlases, no correspondence across decomposed or reconstructed networks across different subjects, and significant individual variabilities. Recent studies showed that deep learning, especially deep convolutional neural networks (CNN), has extraordinary ability of accommodating spatial object patterns, e.g., our recent works using 3D CNN for fMRI-derived network classifications achieved high accuracy with a remarkable tolerance for mistakenly labelled training brain networks. However, the training data preparation is one of the biggest obstacles in these supervised deep learning models for functional brain network map recognitions, since manual labelling requires tedious and time-consuming labours which will sometimes even introduce label mistakes. Especially for mapping functional networks in large scale datasets such as hundreds of thousands of brain networks used in this paper, the manual labelling method will become almost infeasible. In response, in this work, we tackled both the network recognition and training data labelling tasks by proposing a new iteratively optimized deep learning CNN (IO-CNN) framework with an automatic weak label initialization, which enables the functional brain networks recognition task to a fully automatic large-scale classification procedure. Our extensive experiments based on ABIDE-II 1099 brains' fMRI data showed the great promise of our IO-CNN framework. Copyright © 2018 Elsevier B.V. All rights reserved.

  5. Reliability of automatic biometric iris recognition after phacoemulsification or drug-induced pupil dilation.

    Science.gov (United States)

    Seyeddain, Orang; Kraker, Hannes; Redlberger, Andreas; Dexl, Alois K; Grabner, Günther; Emesz, Martin

    2014-01-01

    To investigate the reliability of a biometric iris recognition system for personal authentication after cataract surgery or iatrogenic pupil dilation. This was a prospective, nonrandomized, single-center, cohort study for evaluating the performance of an iris recognition system 2-24 hours after phacoemulsification and intraocular lens implantation (group 1) and before and after iatrogenic pupil dilation (group 2). Of the 173 eyes that could be enrolled before cataract surgery, 164 (94.8%) were easily recognized postoperatively, whereas in 9 (5.2%) this was not possible. However, these 9 eyes could be reenrolled and afterwards recognized successfully. In group 2, of a total of 184 eyes that were enrolled in miosis, a total of 22 (11.9%) could not be recognized in mydriasis and therefore needed reenrollment. No single case of false-positive acceptance occurred in either group. The results of this trial indicate that standard cataract surgery seems not to be a limiting factor for iris recognition in the large majority of cases. Some patients (5.2% in this study) might need "reenrollment" after cataract surgery. Iris recognition was primarily successful in eyes with medically dilated pupils in nearly 9 out of 10 eyes. No single case of false-positive acceptance occurred in either group in this trial. It seems therefore that iris recognition is a valid biometric method in the majority of cases after cataract surgery or after pupil dilation.

  6. Automatic detection and recognition of multiple macular lesions in retinal optical coherence tomography images with multi-instance multilabel learning

    Science.gov (United States)

    Fang, Leyuan; Yang, Liumao; Li, Shutao; Rabbani, Hossein; Liu, Zhimin; Peng, Qinghua; Chen, Xiangdong

    2017-06-01

    Detection and recognition of macular lesions in optical coherence tomography (OCT) are very important for retinal diseases diagnosis and treatment. As one kind of retinal disease (e.g., diabetic retinopathy) may contain multiple lesions (e.g., edema, exudates, and microaneurysms) and eye patients may suffer from multiple retinal diseases, multiple lesions often coexist within one retinal image. Therefore, one single-lesion-based detector may not support the diagnosis of clinical eye diseases. To address this issue, we propose a multi-instance multilabel-based lesions recognition (MIML-LR) method for the simultaneous detection and recognition of multiple lesions. The proposed MIML-LR method consists of the following steps: (1) segment the regions of interest (ROIs) for different lesions, (2) compute descriptive instances (features) for each lesion region, (3) construct multilabel detectors, and (4) recognize each ROI with the detectors. The proposed MIML-LR method was tested on 823 clinically labeled OCT images with normal macular and macular with three common lesions: epiretinal membrane, edema, and drusen. For each input OCT image, our MIML-LR method can automatically identify the number of lesions and assign the class labels, achieving the average accuracy of 88.72% for the cases with multiple lesions, which better assists macular disease diagnosis and treatment.

  7. Contribution to automatic image recognition. Application to analysis of plain scenes of overlapping parts in robot technology

    International Nuclear Information System (INIS)

    Tan, Shengbiao

    1987-01-01

    A method for object modeling and overlapped object automatic recognition is presented. Our work is composed of three essential parts: image processing, object modeling, and evaluation of the implementation of the stated concepts. In the first part, we present a method of edge encoding which is based on a re-sampling of the data encoded according to Freeman, this method generates an isotropic, homogenous and very precise representation. The second part relates to object modeling. This important step makes much easier the recognition work. The new method proposed characterizes a model with two groups of information: the description group containing the primitives, the discrimination group containing data packs, called 'transition vectors'. Based on this original method of information organization, a 'relative learning' is able to select, to ignore and to update the information concerning the objects already learned, according to the new information to be included into the data base. The recognition is a two-pass process: the first pass determines very efficiently the presence of objects by making use of each object's particularities, and this hypothesis is either confirmed or rejected by the following fine verification pass. The last part describes in detail the experimentation results. We demonstrate the robustness of the algorithms with images in both poor lighting and overlapping objects conditions. The system, named SOFIA, has been installed into an industrial vision system series and works in real time. (author) [fr

  8. Automatic recognition of touch gestures in the corpus of social touch

    NARCIS (Netherlands)

    Jung, Merel Madeleine; Poel, Mannes; Poppe, Ronald Walter; Heylen, Dirk K.J.

    For an artifact such as a robot or a virtual agent to respond appropriately to human social touch behavior, it should be able to automatically detect and recognize touch. This paper describes the data collection of CoST: Corpus of Social Touch, a data set containing 7805 captures of 14 different

  9. Automatic Target Recognition in Synthetic Aperture Sonar Images Based on Geometrical Feature Extraction

    Directory of Open Access Journals (Sweden)

    J. Del Rio Vera

    2009-01-01

    Full Text Available This paper presents a new supervised classification approach for automated target recognition (ATR in SAS images. The recognition procedure starts with a novel segmentation stage based on the Hilbert transform. A number of geometrical features are then extracted and used to classify observed objects against a previously compiled database of target and non-target features. The proposed approach has been tested on a set of 1528 simulated images created by the NURC SIGMAS sonar model, achieving up to 95% classification accuracy.

  10. User Experience of a Mobile Speaking Application with Automatic Speech Recognition for EFL Learning

    Science.gov (United States)

    Ahn, Tae youn; Lee, Sangmin-Michelle

    2016-01-01

    With the spread of mobile devices, mobile phones have enormous potential regarding their pedagogical use in language education. The goal of this study is to analyse user experience of a mobile-based learning system that is enhanced by speech recognition technology for the improvement of EFL (English as a foreign language) learners' speaking…

  11. Automaticity of Basic-Level Categorization Accounts for Labeling Effects in Visual Recognition Memory

    Science.gov (United States)

    Richler, Jennifer J.; Gauthier, Isabel; Palmeri, Thomas J.

    2011-01-01

    Are there consequences of calling objects by their names? Lupyan (2008) suggested that overtly labeling objects impairs subsequent recognition memory because labeling shifts stored memory representations of objects toward the category prototype (representational shift hypothesis). In Experiment 1, we show that processing objects at the basic…

  12. Automatic speech recognition used for evaluation of text-to-speech systems

    Czech Academy of Sciences Publication Activity Database

    Vích, Robert; Nouza, J.; Vondra, Martin

    -, č. 5042 (2008), s. 136-148 ISSN 0302-9743 R&D Projects: GA AV ČR 1ET301710509; GA AV ČR 1QS108040569 Institutional research plan: CEZ:AV0Z20670512 Keywords : speech recognition * speech processing Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering

  13. Communication Interface for Mexican Spanish Dysarthric Speakers

    Directory of Open Access Journals (Sweden)

    Gladys Bonilla-Enriquez

    2012-03-01

    Full Text Available La disartria es una discapacidad motora del habla caracterizada por debilidad o poca coordinación de los músculos del habla. Esta condición puede ser causada por un infarto, parálisis cerebral, o por una lesión severa en el cerebro. Para mexicanos con esta condición hay muy pocas, si es que hay alguna, tecnologías de asistencia para mejorar sus habilidades sociales de interacción. En este artículo presentamos nuestros avances hacia el desarrollo de una interfazde comunicación para hablantes con disartria cuya lengua materna sea el español mexicano. La metodología propuesta depende de (1 diseño especial de un corpus de entrenamiento con voz normal y recursos limitados, (2 adaptación de usuario estándar, y (3 control de la perplejidad del modelo de lenguaje para lograr alta precisión en el Reconocimiento Automático del Habla (RAH. La interfaz permite al usuario y terapéuta el realizar actividades como adaptación dinámica de usuario, adaptación de vocabulario, y síntesis de texto a voz. Pruebas en vivo fueron realizadas con un usuario con disartria leve, logrando precisiones de 93%-95% para habla espontánea.Dysarthria is a motor speech disorder due to weakness or poor coordination of the speechmuscles. This condition can be caused by a stroke, cerebral palsy, or by a traumatic braininjury. For Mexican people with this condition there are few, if any, assistive technologies to improve their social interaction skills. In this paper we present our advances towards the development of a communication interface for dysarthric speakers whose native language is Mexican Spanish. We propose a methodology that relies on (1 special design of a training normal-speech corpus with limited resources, (2 standard speaker adaptation, and (3 control of language model perplexity, to achieve high Automatic Speech Recognition (ASR accuracy. The interface allows the user and therapist to perform tasks such as dynamic speaker adaptation, vocabulary

  14. Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition

    Directory of Open Access Journals (Sweden)

    Suzuki Motoyuki

    2009-01-01

    Full Text Available Abstract We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves Web documents using the chosen keywords. A problem is that the selected keywords tend to contain misrecognized words. The proposed method introduces two new ideas for avoiding the effects of keywords derived from misrecognized words. The first idea is to compose multiple queries from selected keyword candidates so that the misrecognized words and correct words do not fall into one query. The second idea is that the number of Web documents downloaded for each query is determined according to the "query relevance." Combining these two ideas, we can alleviate bad effect of misrecognized keywords by decreasing the number of downloaded Web documents from queries that contain misrecognized keywords. Finally, we examine a method of determining the number of iterative adaptations based on the recognition likelihood. Experiments have shown that the proposed stopping criterion can determine almost the optimum number of iterations. In the final experiment, the word accuracy without adaptation (55.29% was improved to 60.38%, which was 1.13 point better than the result of the conventional unsupervised adaptation method (59.25%.

  15. Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition

    Directory of Open Access Journals (Sweden)

    Akinori Ito

    2009-01-01

    Full Text Available We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves Web documents using the chosen keywords. A problem is that the selected keywords tend to contain misrecognized words. The proposed method introduces two new ideas for avoiding the effects of keywords derived from misrecognized words. The first idea is to compose multiple queries from selected keyword candidates so that the misrecognized words and correct words do not fall into one query. The second idea is that the number of Web documents downloaded for each query is determined according to the “query relevance.” Combining these two ideas, we can alleviate bad effect of misrecognized keywords by decreasing the number of downloaded Web documents from queries that contain misrecognized keywords. Finally, we examine a method of determining the number of iterative adaptations based on the recognition likelihood. Experiments have shown that the proposed stopping criterion can determine almost the optimum number of iterations. In the final experiment, the word accuracy without adaptation (55.29% was improved to 60.38%, which was 1.13 point better than the result of the conventional unsupervised adaptation method (59.25%.

  16. Wavelet-SVM classification and automatic recognition of unstained viable cells in phase-contrast microscopy

    International Nuclear Information System (INIS)

    Skoczylas, M.; Rakowski, W.; Cherubini, R.; Gerardi, S.

    2011-01-01

    Irradiation of individual cultured mammalian cells with a pre-selected number of ions down to one ion per single cell is a useful experimental approach to investigating the low-dose ionising radiation exposure effects and thus contributing to a more realistic human cancer risk assessment. One of the crucial tasks of all the microbeam apparatuses is the visualisation, recognition and positioning of every individual cell of the cell culture to be irradiated. Before irradiations, mammalian cells (specifically, Chinese hamster V79 cells) are seeded and grown as a monolayer on a mylar surface used as the bottom of a specially designed holder. Manual recognition of unstained cells in a bright-field microscope is a time-consuming procedure; therefore, a parallel algorithm has been conceived and developed in order to speed up this irradiation protocol step. Many technical problems have been faced to overcome the complexity of the images to be analysed: cell discrimination in an inhomogeneous background, among many disturbing bodies mainly due to the mylar surface roughness and culture medium bodies; cell shapes, depending on how they attach to the surface, which phase of the cell cycle they are in and on cell density. Preliminary results of the recognition and classification based on a method of wavelet kernels for the support vector machine classifier will be presented. (authors)

  17. MO-F-CAMPUS-J-02: Automatic Recognition of Patient Treatment Site in Portal Images Using Machine Learning

    International Nuclear Information System (INIS)

    Chang, X; Yang, D

    2015-01-01

    Purpose: To investigate the method to automatically recognize the treatment site in the X-Ray portal images. It could be useful to detect potential treatment errors, and to provide guidance to sequential tasks, e.g. automatically verify the patient daily setup. Methods: The portal images were exported from MOSAIQ as DICOM files, and were 1) processed with a threshold based intensity transformation algorithm to enhance contrast, and 2) where then down-sampled (from 1024×768 to 128×96) by using bi-cubic interpolation algorithm. An appearance-based vector space model (VSM) was used to rearrange the images into vectors. A principal component analysis (PCA) method was used to reduce the vector dimensions. A multi-class support vector machine (SVM), with radial basis function kernel, was used to build the treatment site recognition models. These models were then used to recognize the treatment sites in the portal image. Portal images of 120 patients were included in the study. The images were selected to cover six treatment sites: brain, head and neck, breast, lung, abdomen and pelvis. Each site had images of the twenty patients. Cross-validation experiments were performed to evaluate the performance. Results: MATLAB image processing Toolbox and scikit-learn (a machine learning library in python) were used to implement the proposed method. The average accuracies using the AP and RT images separately were 95% and 94% respectively. The average accuracy using AP and RT images together was 98%. Computation time was ∼0.16 seconds per patient with AP or RT image, ∼0.33 seconds per patient with both of AP and RT images. Conclusion: The proposed method of treatment site recognition is efficient and accurate. It is not sensitive to the differences of image intensity, size and positions of patients in the portal images. It could be useful for the patient safety assurance. The work was partially supported by a research grant from Varian Medical System

  18. MO-F-CAMPUS-J-02: Automatic Recognition of Patient Treatment Site in Portal Images Using Machine Learning

    Energy Technology Data Exchange (ETDEWEB)

    Chang, X; Yang, D [Washington University in St Louis, St Louis, MO (United States)

    2015-06-15

    Purpose: To investigate the method to automatically recognize the treatment site in the X-Ray portal images. It could be useful to detect potential treatment errors, and to provide guidance to sequential tasks, e.g. automatically verify the patient daily setup. Methods: The portal images were exported from MOSAIQ as DICOM files, and were 1) processed with a threshold based intensity transformation algorithm to enhance contrast, and 2) where then down-sampled (from 1024×768 to 128×96) by using bi-cubic interpolation algorithm. An appearance-based vector space model (VSM) was used to rearrange the images into vectors. A principal component analysis (PCA) method was used to reduce the vector dimensions. A multi-class support vector machine (SVM), with radial basis function kernel, was used to build the treatment site recognition models. These models were then used to recognize the treatment sites in the portal image. Portal images of 120 patients were included in the study. The images were selected to cover six treatment sites: brain, head and neck, breast, lung, abdomen and pelvis. Each site had images of the twenty patients. Cross-validation experiments were performed to evaluate the performance. Results: MATLAB image processing Toolbox and scikit-learn (a machine learning library in python) were used to implement the proposed method. The average accuracies using the AP and RT images separately were 95% and 94% respectively. The average accuracy using AP and RT images together was 98%. Computation time was ∼0.16 seconds per patient with AP or RT image, ∼0.33 seconds per patient with both of AP and RT images. Conclusion: The proposed method of treatment site recognition is efficient and accurate. It is not sensitive to the differences of image intensity, size and positions of patients in the portal images. It could be useful for the patient safety assurance. The work was partially supported by a research grant from Varian Medical System.

  19. An image-based automatic recognition method for the flowering stage of maize

    Science.gov (United States)

    Yu, Zhenghong; Zhou, Huabing; Li, Cuina

    2018-03-01

    In this paper, we proposed an image-based approach for automatic recognizing the flowering stage of maize. A modified HOG/SVM detection framework is first adopted to detect the ears of maize. Then, we use low-rank matrix recovery technology to precisely extract the ears at pixel level. At last, a new feature called color gradient histogram, as an indicator, is proposed to determine the flowering stage. Comparing experiment has been carried out to testify the validity of our method and the results indicate that our method can meet the demand for practical observation.

  20. Application of pattern recognition technique on randon signals for automatic monitoring of dynamic systems with emphasis on nuclear reactors

    International Nuclear Information System (INIS)

    Nascimento, J.A. do.

    1981-01-01

    The time varying or noise component of dynamic system parameters contains information on the system state. Pattern recognition analysis of noise signals for such systems is a powerful technique for assessing 'system normality' or 'correct operation'. Data analysis with modern small computers enables the otherwise unmanageable volumes of data to be processed on line and the results presented in a meaningful form. These informations provide necessary data for maintaining the system at optimum operating conditions. An automatic pattern recognition program, PSDREC, developmed for the surveillance of nuclear reactor and rotating machinery is described, and the relevant theory is outlined. This program, which applies 8 statistical tests to calculated power spectral density (PSD) distributions, was earlier installed in a PDP-11/45 computer at IPEN. In this work it has been used to separately analyse recorded signals from three systems, namely an operational BWR power reactor (neutron signals), a water pump and a diesel motor (vibration signals). The latter two were, respectively, operated over a wide-range of flow and load conditions. The statistical tests were applied to frequency bands of (0,1-40) Hz, (0-1000) Hz and (0,20000) Hz. for the BWR, pump and diesel signal data, respectively. Operation and analysis conditions are given together with representative graphs of the analysed PSD distributions. Results of the tests - discussed in some detail - are considered to be satisfactory. (Author) [pt

  1. Role of Speaker Cues in Attention Inference

    OpenAIRE

    Jin Joo Lee; Cynthia Breazeal; David DeSteno

    2017-01-01

    Current state-of-the-art approaches to emotion recognition primarily focus on modeling the nonverbal expressions of the sole individual without reference to contextual elements such as the co-presence of the partner. In this paper, we demonstrate that the accurate inference of listeners’ social-emotional state of attention depends on accounting for the nonverbal behaviors of their storytelling partner, namely their speaker cues. To gain a deeper understanding of the role of speaker cues in at...

  2. Automatic recognition of coronal type II radio bursts: The ARBIS 2 method and first observations

    Science.gov (United States)

    Lobzin, Vasili; Cairns, Iver; Robinson, Peter; Steward, Graham; Patterson, Garth

    Major space weather events such as solar flares and coronal mass ejections are usually accompa-nied by solar radio bursts, which can potentially be used for real-time space weather forecasts. Type II radio bursts are produced near the local plasma frequency and its harmonic by fast electrons accelerated by a shock wave moving through the corona and solar wind with a typi-cal speed of 1000 km s-1 . The coronal bursts have dynamic spectra with frequency gradually falling with time and durations of several minutes. We present a new method developed to de-tect type II coronal radio bursts automatically and describe its implementation in an extended Automated Radio Burst Identification System (ARBIS 2). Preliminary tests of the method with spectra obtained in 2002 show that the performance of the current implementation is quite high, ˜ 80%, while the probability of false positives is reasonably low, with one false positive per 100-200 hr for high solar activity and less than one false event per 10000 hr for low solar activity periods. The first automatically detected coronal type II radio bursts are also presented. ARBIS 2 is now operational with IPS Radio and Space Services, providing email alerts and event lists internationally.

  3. Automatic Fault Recognition of Photovoltaic Modules Based on Statistical Analysis of Uav Thermography

    Science.gov (United States)

    Kim, D.; Youn, J.; Kim, C.

    2017-08-01

    As a malfunctioning PV (Photovoltaic) cell has a higher temperature than adjacent normal cells, we can detect it easily with a thermal infrared sensor. However, it will be a time-consuming way to inspect large-scale PV power plants by a hand-held thermal infrared sensor. This paper presents an algorithm for automatically detecting defective PV panels using images captured with a thermal imaging camera from an UAV (unmanned aerial vehicle). The proposed algorithm uses statistical analysis of thermal intensity (surface temperature) characteristics of each PV module to verify the mean intensity and standard deviation of each panel as parameters for fault diagnosis. One of the characteristics of thermal infrared imaging is that the larger the distance between sensor and target, the lower the measured temperature of the object. Consequently, a global detection rule using the mean intensity of all panels in the fault detection algorithm is not applicable. Therefore, a local detection rule based on the mean intensity and standard deviation range was developed to detect defective PV modules from individual array automatically. The performance of the proposed algorithm was tested on three sample images; this verified a detection accuracy of defective panels of 97 % or higher. In addition, as the proposed algorithm can adjust the range of threshold values for judging malfunction at the array level, the local detection rule is considered better suited for highly sensitive fault detection compared to a global detection rule.

  4. AUTOMATIC FAULT RECOGNITION OF PHOTOVOLTAIC MODULES BASED ON STATISTICAL ANALYSIS OF UAV THERMOGRAPHY

    Directory of Open Access Journals (Sweden)

    D. Kim

    2017-08-01

    Full Text Available As a malfunctioning PV (Photovoltaic cell has a higher temperature than adjacent normal cells, we can detect it easily with a thermal infrared sensor. However, it will be a time-consuming way to inspect large-scale PV power plants by a hand-held thermal infrared sensor. This paper presents an algorithm for automatically detecting defective PV panels using images captured with a thermal imaging camera from an UAV (unmanned aerial vehicle. The proposed algorithm uses statistical analysis of thermal intensity (surface temperature characteristics of each PV module to verify the mean intensity and standard deviation of each panel as parameters for fault diagnosis. One of the characteristics of thermal infrared imaging is that the larger the distance between sensor and target, the lower the measured temperature of the object. Consequently, a global detection rule using the mean intensity of all panels in the fault detection algorithm is not applicable. Therefore, a local detection rule based on the mean intensity and standard deviation range was developed to detect defective PV modules from individual array automatically. The performance of the proposed algorithm was tested on three sample images; this verified a detection accuracy of defective panels of 97 % or higher. In addition, as the proposed algorithm can adjust the range of threshold values for judging malfunction at the array level, the local detection rule is considered better suited for highly sensitive fault detection compared to a global detection rule.

  5. On Automatic Music Genre Recognition by Sparse Representation Classification using Auditory Temporal Modulations

    DEFF Research Database (Denmark)

    Sturm, Bob L.; Noorzad, Pardis

    2012-01-01

    A recent system combining sparse representation classification (SRC) and a perceptually-based acoustic feature (ATM) \\cite{Panagakis2009,Panagakis2009b,Panagakis2010c}, outperforms by a significant margin the state of the art in music genre recognition, e.g., \\cite{Bergstra2006}. With genre so...... to reproduce the results of \\cite{Panagakis2009}. First, we find that classification results are consistent for features extracted from different analyses. Second, we find that SRC accuracy improves when we pose the sparse representation problem with inequality constraints. Finally, we find that only when we...

  6. Optical implementation of a feature-based neural network with application to automatic target recognition

    Science.gov (United States)

    Chao, Tien-Hsin; Stoner, William W.

    1993-01-01

    An optical neural network based on the neocognitron paradigm is introduced. A novel aspect of the architecture design is shift-invariant multichannel Fourier optical correlation within each processing layer. Multilayer processing is achieved by feeding back the ouput of the feature correlator interatively to the input spatial light modulator and by updating the Fourier filters. By training the neural net with characteristic features extracted from the target images, successful pattern recognition with intraclass fault tolerance and interclass discrimination is achieved. A detailed system description is provided. Experimental demonstrations of a two-layer neural network for space-object discrimination is also presented.

  7. Automatic target recognition using a feature-based optical neural network

    Science.gov (United States)

    Chao, Tien-Hsin

    1992-01-01

    An optical neural network based upon the Neocognitron paradigm (K. Fukushima et al. 1983) is introduced. A novel aspect of the architectural design is shift-invariant multichannel Fourier optical correlation within each processing layer. Multilayer processing is achieved by iteratively feeding back the output of the feature correlator to the input spatial light modulator and updating the Fourier filters. By training the neural net with characteristic features extracted from the target images, successful pattern recognition with intra-class fault tolerance and inter-class discrimination is achieved. A detailed system description is provided. Experimental demonstration of a two-layer neural network for space objects discrimination is also presented.

  8. The Use of an Autonomous Pedagogical Agent and Automatic Speech Recognition for Teaching Sight Words to Students with Autism Spectrum Disorder

    Science.gov (United States)

    Saadatzi, Mohammad Nasser; Pennington, Robert C.; Welch, Karla C.; Graham, James H.; Scott, Renee E.

    2017-01-01

    In the current study, we examined the effects of an instructional package comprised of an autonomous pedagogical agent, automatic speech recognition, and constant time delay during the instruction of reading sight words aloud to young adults with autism spectrum disorder. We used a concurrent multiple baseline across participants design to…

  9. A robust automatic leukocyte recognition method based on island-clustering texture

    Directory of Open Access Journals (Sweden)

    Xiaoshun Li

    2016-01-01

    Full Text Available A leukocyte recognition method for human peripheral blood smear based on island-clustering texture (ICT is proposed. By analyzing the features of the five typical classes of leukocyte images, a new ICT model is established. Firstly, some feature points are extracted in a gray leukocyte image by mean-shift clustering to be the centers of islands. Secondly, the growing region is employed to create regions of the islands in which the seeds are just these feature points. These islands distribution can describe a new texture. Finally, a distinguished parameter vector of these islands is created as the ICT features by combining the ICT features with the geometric features of the leukocyte. Then the five typical classes of leukocytes can be recognized successfully at the correct recognition rate of more than 92.3% with a total sample of 1310 leukocytes. Experimental results show the feasibility of the proposed method. Further analysis reveals that the method is robust and results can provide important information for disease diagnosis.

  10. A Novel Approach to Speaker Weight Estimation Using a Fusion of the i-vector and NFA Frameworks

    DEFF Research Database (Denmark)

    Poorjam, Amir Hossein; Bahari, Mohamad Hasan; Van hamme, Hogo

    2017-01-01

    -negative Factor Analysis (NFA) framework which is based on a constrained factor analysis on GMM weight supervectors. Then, the available information in both Gaussian means and Gaussian weights is exploited through a feature-level fusion of the i-vectors and the NFA vectors. Finally, a least-squares support vector......This paper proposes a novel approach for automatic speaker weight estimation from spontaneous telephone speech signals. In this method, each utterance is modeled using the i-vector framework which is based on the factor analysis on Gaussian Mixture Model (GMM) mean supervectors, and the Non...... regression is employed to estimate the weight of speakers from the given utterances. The proposed approach is evaluated on spontaneous telephone speech signals of National Institute of Standards and Technology 2008 and 2010 Speaker Recognition Evaluation corpora. To investigate the effectiveness...

  11. Automatic feature design for optical character recognition using an evolutionary search procedure.

    Science.gov (United States)

    Stentiford, F W

    1985-03-01

    An automatic evolutionary search is applied to the problem of feature extraction in an OCR application. A performance measure based on feature independence is used to generate features which do not appear to suffer from peaking effects [17]. Features are extracted from a training set of 30 600 machine printed 34 class alphanumeric characters derived from British mail. Classification results on the training set and a test set of 10 200 characters are reported for an increasing number of features. A 1.01 percent forced decision error rate is obtained on the test data using 316 features. The hardware implementation should be cheap and fast to operate. The performance compares favorably with current low cost OCR page readers.

  12. VISUAL PERCEPTION BASED AUTOMATIC RECOGNITION OF CELL MOSAICS IN HUMAN CORNEAL ENDOTHELIUMMICROSCOPY IMAGES

    Directory of Open Access Journals (Sweden)

    Yann Gavet

    2011-05-01

    Full Text Available The human corneal endothelium can be observed with two types of microscopes: classical optical microscope for ex-vivo imaging, and specular optical microscope for in-vivo imaging. The quality of the cornea is correlated to the endothelial cell density and morphometry. Automatic methods to analyze the human corneal endothelium images are still not totally efficient. Image analysis methods that focus only on cell contours do not give good results in presence of noise and of bad conditions of acquisition. More elaborated methods introduce regional informations in order to performthe cell contours completion, thus implementing the duality contour-region. Their good performance can be explained by their connections with several basic principles of human visual perception (Gestalt Theory and Marr's computational theory.

  13. NEUROIMAGING AND PATTERN RECOGNITION TECHNIQUES FOR AUTOMATIC DETECTION OF ALZHEIMER’S DISEASE: A REVIEW

    Directory of Open Access Journals (Sweden)

    Rupali Kamathe

    2017-08-01

    Full Text Available Alzheimer’s disease (AD is the most common form of dementia with currently unavailable firm treatments that can stop or reverse the disease progression. A combination of brain imaging and clinical tests for checking the signs of memory impairment is used to identify patients with AD. In recent years, Neuroimaging techniques combined with machine learning algorithms have received lot of attention in this field. There is a need for development of automated techniques to detect the disease well before patient suffers from irreversible loss. This paper is about the review of such semi or fully automatic techniques with detail comparison of methods implemented, class labels considered, data base used and the results obtained for related study. This review provides detailed comparison of different Neuroimaging techniques and reveals potential application of machine learning algorithms in medical image analysis; particularly in AD enabling even the early detection of the disease- the class labelled as Multiple Cognitive Impairment.

  14. Towards automatic recognition of irregular, short-open answers in Fill-in-the-blank tests

    Directory of Open Access Journals (Sweden)

    Sergio A. Rojas

    2014-01-01

    Full Text Available Assessment of student knowledge in Learning Management Systems such as Moodle is mostly conducted using close-ended questions (e.g. multiple-choice whose answers are straightforward to grade without human intervention. FILL-IN-THE-BLANK tests are usually more challenging since they require test-takers to recall concepts and associations not available in the statement of the question itself (no choices or hints are given. Automatic assessment of the latter currently requires the test-taker to give a verbatim answer, that is, free of spelling or typographical mistakes. In this paper, we consider an adapted version of a classical text-matching algorithm that may prevent wrong grading in automatic assessment of FILL-IN-THE-BLANK questions whenever irregular (similar but not exact answers occur due to such types of error. The technique was tested in two scenarios. In the first scenario, misspelled single-word answers to an Internet security questionnaire were correctly recognized within a two letter editing tolerance (achieving 99 % accuracy. The second scenario involved short-open answers to computer programming quizzes (i.e. small blocks of code requiring a structure that conforms to the syntactic rules of the programming language. Twenty-one real-world answers written up by students, taking a computer programming course, were assessed by the method. This assessment addressed the lack of precision in terms of programmer-style artifacts (such as unfamiliar variable or function nomenclature and uses an admissible tolerance of up to 20 % letter-level typos. These scores were satisfactory corroborated by a human expert. Additional findings and potential enhancements to the technique are also discussed.

  15. The development of an automatic recognition system for earmark and earprint comparisons.

    Science.gov (United States)

    Junod, Stéphane; Pasquier, Julien; Champod, Christophe

    2012-10-10

    The value of earmarks as an efficient means of personal identification is still subject to debate. It has been argued that the field is lacking a firm systematic and structured data basis to help practitioners to form their conclusions. Typically, there is a paucity of research guiding as to the selectivity of the features used in the comparison process between an earmark and reference earprints taken from an individual. This study proposes a system for the automatic comparison of earprints and earmarks, operating without any manual extraction of key-points or manual annotations. For each donor, a model is created using multiple reference prints, hence capturing the donor within source variability. For each comparison between a mark and a model, images are automatically aligned and a proximity score, based on a normalized 2D correlation coefficient, is calculated. Appropriate use of this score allows deriving a likelihood ratio that can be explored under known state of affairs (both in cases where it is known that the mark has been left by the donor that gave the model and conversely in cases when it is established that the mark originates from a different source). To assess the system performance, a first dataset containing 1229 donors elaborated during the FearID research project was used. Based on these data, for mark-to-print comparisons, the system performed with an equal error rate (EER) of 2.3% and about 88% of marks are found in the first 3 positions of a hitlist. When performing print-to-print transactions, results show an equal error rate of 0.5%. The system was then tested using real-case data obtained from police forces. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  16. Working with Speakers.

    Science.gov (United States)

    Pestel, Ann

    1989-01-01

    The author discusses working with speakers from business and industry to present career information at the secondary level. Advice for speakers is presented, as well as tips for program coordinators. (CH)

  17. Using pattern recognition to automatically localize reflection hyperbolas in data from ground penetrating radar

    Science.gov (United States)

    Maas, Christian; Schmalzl, Jörg

    2013-08-01

    Ground Penetrating Radar (GPR) is used for the localization of supply lines, land mines, pipes and many other buried objects. These objects can be recognized in the recorded data as reflection hyperbolas with a typical shape depending on depth and material of the object and the surrounding material. To obtain the parameters, the shape of the hyperbola has to be fitted. In the last years several methods were developed to automate this task during post-processing. In this paper we show another approach for the automated localization of reflection hyperbolas in GPR data by solving a pattern recognition problem in grayscale images. In contrast to other methods our detection program is also able to immediately mark potential objects in real-time. For this task we use a version of the Viola-Jones learning algorithm, which is part of the open source library "OpenCV". This algorithm was initially developed for face recognition, but can be adapted to any other simple shape. In our program it is used to narrow down the location of reflection hyperbolas to certain areas in the GPR data. In order to extract the exact location and the velocity of the hyperbolas we apply a simple Hough Transform for hyperbolas. Because the Viola-Jones Algorithm reduces the input for the computational expensive Hough Transform dramatically the detection system can also be implemented on normal field computers, so on-site application is possible. The developed detection system shows promising results and detection rates in unprocessed radargrams. In order to improve the detection results and apply the program to noisy radar images more data of different GPR systems as input for the learning algorithm is necessary.

  18. Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study.

    Science.gov (United States)

    Skeppstedt, Maria; Kvist, Maria; Nilsson, Gunnar H; Dalianis, Hercules

    2014-06-01

    Automatic recognition of clinical entities in the narrative text of health records is useful for constructing applications for documentation of patient care, as well as for secondary usage in the form of medical knowledge extraction. There are a number of named entity recognition studies on English clinical text, but less work has been carried out on clinical text in other languages. This study was performed on Swedish health records, and focused on four entities that are highly relevant for constructing a patient overview and for medical hypothesis generation, namely the entities: Disorder, Finding, Pharmaceutical Drug and Body Structure. The study had two aims: to explore how well named entity recognition methods previously applied to English clinical text perform on similar texts written in Swedish; and to evaluate whether it is meaningful to divide the more general category Medical Problem, which has been used in a number of previous studies, into the two more granular entities, Disorder and Finding. Clinical notes from a Swedish internal medicine emergency unit were annotated for the four selected entity categories, and the inter-annotator agreement between two pairs of annotators was measured, resulting in an average F-score of 0.79 for Disorder, 0.66 for Finding, 0.90 for Pharmaceutical Drug and 0.80 for Body Structure. A subset of the developed corpus was thereafter used for finding suitable features for training a conditional random fields model. Finally, a new model was trained on this subset, using the best features and settings, and its ability to generalise to held-out data was evaluated. This final model obtained an F-score of 0.81 for Disorder, 0.69 for Finding, 0.88 for Pharmaceutical Drug, 0.85 for Body Structure and 0.78 for the combined category Disorder+Finding. The obtained results, which are in line with or slightly lower than those for similar studies on English clinical text, many of them conducted using a larger training data set, show that

  19. Integration of low level and ontology derived features for automatic weapon recognition and identification

    Science.gov (United States)

    Sirakov, Nikolay M.; Suh, Sang; Attardo, Salvatore

    2011-06-01

    This paper presents a further step of a research toward the development of a quick and accurate weapons identification methodology and system. A basic stage of this methodology is the automatic acquisition and updating of weapons ontology as a source of deriving high level weapons information. The present paper outlines the main ideas used to approach the goal. In the next stage, a clustering approach is suggested on the base of hierarchy of concepts. An inherent slot of every node of the proposed ontology is a low level features vector (LLFV), which facilitates the search through the ontology. Part of the LLFV is the information about the object's parts. To partition an object a new approach is presented capable of defining the objects concavities used to mark the end points of weapon parts, considered as convexities. Further an existing matching approach is optimized to determine whether an ontological object matches the objects from an input image. Objects from derived ontological clusters will be considered for the matching process. Image resizing is studied and applied to decrease the runtime of the matching approach and investigate its rotational and scaling invariance. Set of experiments are preformed to validate the theoretical concepts.

  20. FRankenstein becomes a cyborg: the automatic recombination and realignment of fold recognition models in CASP6.

    Science.gov (United States)

    Kosinski, Jan; Gajda, Michal J; Cymerman, Iwona A; Kurowski, Michal A; Pawlowski, Marcin; Boniecki, Michal; Obarska, Agnieszka; Papaj, Grzegorz; Sroczynska-Obuchowicz, Paulina; Tkaczuk, Karolina L; Sniezynska, Paulina; Sasin, Joanna M; Augustyn, Anna; Bujnicki, Janusz M; Feder, Marcin

    2005-01-01

    In the course of CASP6, we generated models for all targets using a new version of the "FRankenstein's monster approach." Previously (in CASP5) we were able to build many very accurate full-atom models by selection and recombination of well-folded fragments obtained from crude fold recognition (FR) results, followed by optimization of the sequence-structure fit and assessment of alternative alignments on the structural level. This procedure was however very arduous, as most of the steps required extensive visual and manual input from the human modeler. Now, we have automated the most tedious steps, such as superposition of alternative models, extraction of best-scoring fragments, and construction of a hybrid "monster" structure, as well as generation of alternative alignments in the regions that remain poorly scored in the refined hybrid model. We have also included the ROSETTA method to construct those parts of the target for which no reasonable structures were generated by FR methods (such as long insertions and terminal extensions). The analysis of successes and failures of the current version of the FRankenstein approach in modeling of CASP6 targets reveals that the considerably streamlined and automated method performs almost as well as the initial, mostly manual version, which suggests that it may be a useful tool for accurate protein structure prediction even in the hands of nonexperts. 2005 Wiley-Liss, Inc.

  1. Fraudulent ID using face morphs: Experiments on human and automatic recognition.

    Science.gov (United States)

    Robertson, David J; Kramer, Robin S S; Burton, A Mike

    2017-01-01

    Matching unfamiliar faces is known to be difficult, and this can give an opportunity to those engaged in identity fraud. Here we examine a relatively new form of fraud, the use of photo-ID containing a graphical morph between two faces. Such a document may look sufficiently like two people to serve as ID for both. We present two experiments with human viewers, and a third with a smartphone face recognition system. In Experiment 1, viewers were asked to match pairs of faces, without being warned that one of the pair could be a morph. They very commonly accepted a morphed face as a match. However, in Experiment 2, following very short training on morph detection, their acceptance rate fell considerably. Nevertheless, there remained large individual differences in people's ability to detect a morph. In Experiment 3 we show that a smartphone makes errors at a similar rate to 'trained' human viewers-i.e. accepting a small number of morphs as genuine ID. We discuss these results in reference to the use of face photos for security.

  2. Recognition

    DEFF Research Database (Denmark)

    Gimmler, Antje

    2017-01-01

    In this article, I shall examine the cognitive, heuristic and theoretical functions of the concept of recognition. To evaluate both the explanatory power and the limitations of a sociological concept, the theory construction must be analysed and its actual productivity for sociological theory mus...

  3. Clinical pilot study for the automatic segmentation and recognition of abdominal adipose tissue compartments from MRI data

    International Nuclear Information System (INIS)

    Noel, P.B.; Bauer, J.S.; Ganter, C.; Markus, C.; Rummeny, E.J.; Engels, H.P.; Hauner, H.

    2012-01-01

    Purpose: In the diagnosis and risk assessment of obesity, both the amount and distribution of adipose tissue compartments are critical factors. We present a hybrid method for the quantitative measurement of human body fat compartments. Materials and Methods: MRI imaging was performed on a 1.5 T scanner. In a pre-processing step, the images were corrected for bias field inhomogeneity. For segmentation and recognition a hybrid algorithm was developed to automatically differentiate between different adipose tissue compartments. The presented algorithm is designed with a combination of shape and intensity-based techniques. To incorporate the presented algorithm into the clinical routine, we developed a graphical user interface. Results from our methods were compared with the known volume of an adipose tissue phantom. To evaluate our method, we analyzed 40 clinical MRI scans of the abdominal region. Results: Relatively low segmentation errors were found for subcutaneous adipose tissue (3.56 %) and visceral adipose tissue (0.29 %) in phantom studies. The clinical results indicated high correlations between the distribution of adipose tissue compartments and obesity. Conclusion: We present an approach that rapidly identifies and quantifies adipose tissue depots of interest. With this method examination and analysis can be performed in a clinically feasible timeframe. (orig.)

  4. Image processing and pattern recognition with CVIPtools MATLAB toolbox: automatic creation of masks for veterinary thermographic images

    Science.gov (United States)

    Mishra, Deependra K.; Umbaugh, Scott E.; Lama, Norsang; Dahal, Rohini; Marino, Dominic J.; Sackman, Joseph

    2016-09-01

    CVIPtools is a software package for the exploration of computer vision and image processing developed in the Computer Vision and Image Processing Laboratory at Southern Illinois University Edwardsville. CVIPtools is available in three variants - a) CVIPtools Graphical User Interface, b) CVIPtools C library and c) CVIPtools MATLAB toolbox, which makes it accessible to a variety of different users. It offers students, faculty, researchers and any user a free and easy way to explore computer vision and image processing techniques. Many functions have been implemented and are updated on a regular basis, the library has reached a level of sophistication that makes it suitable for both educational and research purposes. In this paper, the detail list of the functions available in the CVIPtools MATLAB toolbox are presented and how these functions can be used in image analysis and computer vision applications. The CVIPtools MATLAB toolbox allows the user to gain practical experience to better understand underlying theoretical problems in image processing and pattern recognition. As an example application, the algorithm for the automatic creation of masks for veterinary thermographic images is presented.

  5. Clinical pilot study for the automatic segmentation and recognition of abdominal adipose tissue compartments from MRI data

    Energy Technology Data Exchange (ETDEWEB)

    Noel, P.B.; Bauer, J.S.; Ganter, C.; Markus, C.; Rummeny, E.J.; Engels, H.P. [Klinikum rechts der Isar, Technische Univ. Muenchen (Germany). Inst. fuer Radiologie; Hauner, H. [Klinikum rechts der Isar, Technische Univ. Muenchen (Germany). Else Kroener-Fresenius-Center for Nutritional Medicine

    2012-06-15

    Purpose: In the diagnosis and risk assessment of obesity, both the amount and distribution of adipose tissue compartments are critical factors. We present a hybrid method for the quantitative measurement of human body fat compartments. Materials and Methods: MRI imaging was performed on a 1.5 T scanner. In a pre-processing step, the images were corrected for bias field inhomogeneity. For segmentation and recognition a hybrid algorithm was developed to automatically differentiate between different adipose tissue compartments. The presented algorithm is designed with a combination of shape and intensity-based techniques. To incorporate the presented algorithm into the clinical routine, we developed a graphical user interface. Results from our methods were compared with the known volume of an adipose tissue phantom. To evaluate our method, we analyzed 40 clinical MRI scans of the abdominal region. Results: Relatively low segmentation errors were found for subcutaneous adipose tissue (3.56 %) and visceral adipose tissue (0.29 %) in phantom studies. The clinical results indicated high correlations between the distribution of adipose tissue compartments and obesity. Conclusion: We present an approach that rapidly identifies and quantifies adipose tissue depots of interest. With this method examination and analysis can be performed in a clinically feasible timeframe. (orig.)

  6. Comparison of Diarization Tools for Building Speaker Database

    Directory of Open Access Journals (Sweden)

    Eva Kiktova

    2015-01-01

    Full Text Available This paper compares open source diarization toolkits (LIUM, DiarTK, ALIZE-Lia_Ral, which were designed for extraction of speaker identity from audio records without any prior information about the analysed data. The comparative study of used diarization tools was performed for three different types of analysed data (broadcast news - BN and TV shows. Corresponding values of achieved DER measure are presented here. The automatic speaker diarization system developed by LIUM was able to identified speech segments belonging to speakers at very good level. Its segmentation outputs can be used to build a speaker database.

  7. Speaker Clustering for a Mixture of Singing and Reading (Preprint)

    Science.gov (United States)

    2012-03-01

    diarization [2, 3] which answers the ques- tion of ”who spoke when?” is a combination of speaker segmentation and clustering. Although it is possible to...focuses on speaker clustering, the techniques developed here can be applied to speaker diarization . For the remainder of this paper, the term ”speech...and retrieval,” Proceedings of the IEEE, vol. 88, 2000. [2] S. Tranter and D. Reynolds, “An overview of automatic speaker diarization systems,” IEEE

  8. Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space

    Directory of Open Access Journals (Sweden)

    Kan Li

    2018-04-01

    Full Text Available This paper presents a novel real-time dynamic framework for quantifying time-series structure in spoken words using spikes. Audio signals are converted into multi-channel spike trains using a biologically-inspired leaky integrate-and-fire (LIF spike generator. These spike trains are mapped into a function space of infinite dimension, i.e., a Reproducing Kernel Hilbert Space (RKHS using point-process kernels, where a state-space model learns the dynamics of the multidimensional spike input using gradient descent learning. This kernelized recurrent system is very parsimonious and achieves the necessary memory depth via feedback of its internal states when trained discriminatively, utilizing the full context of the phoneme sequence. A main advantage of modeling nonlinear dynamics using state-space trajectories in the RKHS is that it imposes no restriction on the relationship between the exogenous input and its internal state. We are free to choose the input representation with an appropriate kernel, and changing the kernel does not impact the system nor the learning algorithm. Moreover, we show that this novel framework can outperform both traditional hidden Markov model (HMM speech processing as well as neuromorphic implementations based on spiking neural network (SNN, yielding accurate and ultra-low power word spotters. As a proof of concept, we demonstrate its capabilities using the benchmark TI-46 digit corpus for isolated-word automatic speech recognition (ASR or keyword spotting. Compared to HMM using Mel-frequency cepstral coefficient (MFCC front-end without time-derivatives, our MFCC-KAARMA offered improved performance. For spike-train front-end, spike-KAARMA also outperformed state-of-the-art SNN solutions. Furthermore, compared to MFCCs, spike trains provided enhanced noise robustness in certain low signal-to-noise ratio (SNR regime.

  9. Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space.

    Science.gov (United States)

    Li, Kan; Príncipe, José C

    2018-01-01

    This paper presents a novel real-time dynamic framework for quantifying time-series structure in spoken words using spikes. Audio signals are converted into multi-channel spike trains using a biologically-inspired leaky integrate-and-fire (LIF) spike generator. These spike trains are mapped into a function space of infinite dimension, i.e., a Reproducing Kernel Hilbert Space (RKHS) using point-process kernels, where a state-space model learns the dynamics of the multidimensional spike input using gradient descent learning. This kernelized recurrent system is very parsimonious and achieves the necessary memory depth via feedback of its internal states when trained discriminatively, utilizing the full context of the phoneme sequence. A main advantage of modeling nonlinear dynamics using state-space trajectories in the RKHS is that it imposes no restriction on the relationship between the exogenous input and its internal state. We are free to choose the input representation with an appropriate kernel, and changing the kernel does not impact the system nor the learning algorithm. Moreover, we show that this novel framework can outperform both traditional hidden Markov model (HMM) speech processing as well as neuromorphic implementations based on spiking neural network (SNN), yielding accurate and ultra-low power word spotters. As a proof of concept, we demonstrate its capabilities using the benchmark TI-46 digit corpus for isolated-word automatic speech recognition (ASR) or keyword spotting. Compared to HMM using Mel-frequency cepstral coefficient (MFCC) front-end without time-derivatives, our MFCC-KAARMA offered improved performance. For spike-train front-end, spike-KAARMA also outperformed state-of-the-art SNN solutions. Furthermore, compared to MFCCs, spike trains provided enhanced noise robustness in certain low signal-to-noise ratio (SNR) regime.

  10. Rapid Word Recognition as a Measure of Word-Level Automaticity and Its Relation to Other Measures of Reading

    Science.gov (United States)

    Frye, Elizabeth M.; Gosky, Ross

    2012-01-01

    The present study investigated the relationship between rapid recognition of individual words (Word Recognition Test) and two measures of contextual reading: (1) grade-level Passage Reading Test (IRI passage) and (2) performance on standardized STAR Reading Test. To establish if time of presentation on the word recognition test was a factor in…

  11. Role of Speaker Cues in Attention Inference

    Directory of Open Access Journals (Sweden)

    Jin Joo Lee

    2017-10-01

    Full Text Available Current state-of-the-art approaches to emotion recognition primarily focus on modeling the nonverbal expressions of the sole individual without reference to contextual elements such as the co-presence of the partner. In this paper, we demonstrate that the accurate inference of listeners’ social-emotional state of attention depends on accounting for the nonverbal behaviors of their storytelling partner, namely their speaker cues. To gain a deeper understanding of the role of speaker cues in attention inference, we conduct investigations into real-world interactions of children (5–6 years old storytelling with their peers. Through in-depth analysis of human–human interaction data, we first identify nonverbal speaker cues (i.e., backchannel-inviting cues and listener responses (i.e., backchannel feedback. We then demonstrate how speaker cues can modify the interpretation of attention-related backchannels as well as serve as a means to regulate the responsiveness of listeners. We discuss the design implications of our findings toward our primary goal of developing attention recognition models for storytelling robots, and we argue that social robots can proactively use speaker cues to form more accurate inferences about the attentive state of their human partners.

  12. Towards Automatic Threat Recognition

    Science.gov (United States)

    2006-12-01

    York: Bantam. Forschungsinstitut für Kommunikation, Informationsverarbeitung und Ergonomie FGAN Informationstechnik und Führungssysteme KIE Towards...Informationsverarbeitung und Ergonomie FGAN Informationstechnik und Führungssysteme KIE Content Preliminaries about Information Fusion The System Ontology Unification...as Processing Principle Back to the Example Conclusion and Outlook Forschungsinstitut für Kommunikation, Informationsverarbeitung und Ergonomie FGAN

  13. Learning speaker-specific characteristics with a deep neural architecture.

    Science.gov (United States)

    Chen, Ke; Salman, Ahmad

    2011-11-01

    Speech signals convey various yet mixed information ranging from linguistic to speaker-specific information. However, most of acoustic representations characterize all different kinds of information as whole, which could hinder either a speech or a speaker recognition (SR) system from producing a better performance. In this paper, we propose a novel deep neural architecture (DNA) especially for learning speaker-specific characteristics from mel-frequency cepstral coefficients, an acoustic representation commonly used in both speech recognition and SR, which results in a speaker-specific overcomplete representation. In order to learn intrinsic speaker-specific characteristics, we come up with an objective function consisting of contrastive losses in terms of speaker similarity/dissimilarity and data reconstruction losses used as regularization to normalize the interference of non-speaker-related information. Moreover, we employ a hybrid learning strategy for learning parameters of the deep neural networks: i.e., local yet greedy layerwise unsupervised pretraining for initialization and global supervised learning for the ultimate discriminative goal. With four Linguistic Data Consortium (LDC) benchmarks and two non-English corpora, we demonstrate that our overcomplete representation is robust in characterizing various speakers, no matter whether their utterances have been used in training our DNA, and highly insensitive to text and languages spoken. Extensive comparative studies suggest that our approach yields favorite results in speaker verification and segmentation. Finally, we discuss several issues concerning our proposed approach.

  14. Signal-to-Signal Ratio Independent Speaker Identification for Co-channel Speech Signals

    DEFF Research Database (Denmark)

    Saeidi, Rahim; Mowlaee, Pejman; Kinnunen, Tomi

    2010-01-01

    In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is recorded by one microphone only. The goal is to identify both of the speakers from their mixed signal. High recognition accuracies have already been reported when an accurately...

  15. Upgrade of the automatic analysis system in the TJ-II Thomson Scattering diagnostic: New image recognition classifier and fault condition detection

    International Nuclear Information System (INIS)

    Makili, L.; Vega, J.; Dormido-Canto, S.; Pastor, I.; Pereira, A.; Farias, G.; Portas, A.; Perez-Risco, D.; Rodriguez-Fernandez, M.C.; Busch, P.

    2010-01-01

    An automatic image classification system based on support vector machines (SVM) has been in operation for years in the TJ-II Thomson Scattering diagnostic. It recognizes five different types of images: CCD camera background, measurement of stray light without plasma or in a collapsed discharge, image during ECH phase, image during NBI phase and image after reaching the cut off density during ECH heating. Each kind of image implies the execution of different application software. Due to the fact that the recognition system is based on a learning system and major modifications have been carried out in both the diagnostic (optics) and TJ-II plasmas (injected power), the classifier model is no longer valid. A new SVM model has been developed with the current conditions. Also, specific error conditions in the data acquisition process can automatically be detected and managed now. The recovering process has been automated, thereby avoiding the loss of data in ensuing discharges.

  16. Upgrade of the automatic analysis system in the TJ-II Thomson Scattering diagnostic: New image recognition classifier and fault condition detection

    Energy Technology Data Exchange (ETDEWEB)

    Makili, L. [Dpto. Informatica y Automatica - UNED, Madrid (Spain); Vega, J. [Asociacion EURATOM/CIEMAT para Fusion, Madrid (Spain); Dormido-Canto, S., E-mail: sebas@dia.uned.e [Dpto. Informatica y Automatica - UNED, Madrid (Spain); Pastor, I.; Pereira, A. [Asociacion EURATOM/CIEMAT para Fusion, Madrid (Spain); Farias, G. [Dpto. Informatica y Automatica - UNED, Madrid (Spain); Portas, A.; Perez-Risco, D.; Rodriguez-Fernandez, M.C. [Asociacion EURATOM/CIEMAT para Fusion, Madrid (Spain); Busch, P. [FOM Institut voor PlasmaFysica Rijnhuizen, Nieuwegein (Netherlands)

    2010-07-15

    An automatic image classification system based on support vector machines (SVM) has been in operation for years in the TJ-II Thomson Scattering diagnostic. It recognizes five different types of images: CCD camera background, measurement of stray light without plasma or in a collapsed discharge, image during ECH phase, image during NBI phase and image after reaching the cut off density during ECH heating. Each kind of image implies the execution of different application software. Due to the fact that the recognition system is based on a learning system and major modifications have been carried out in both the diagnostic (optics) and TJ-II plasmas (injected power), the classifier model is no longer valid. A new SVM model has been developed with the current conditions. Also, specific error conditions in the data acquisition process can automatically be detected and managed now. The recovering process has been automated, thereby avoiding the loss of data in ensuing discharges.

  17. Automatic Picking of Foraminifera: Design of the Foraminifera Image Recognition and Sorting Tool (FIRST) Prototype and Results of the Image Classification Scheme

    Science.gov (United States)

    de Garidel-Thoron, T.; Marchant, R.; Soto, E.; Gally, Y.; Beaufort, L.; Bolton, C. T.; Bouslama, M.; Licari, L.; Mazur, J. C.; Brutti, J. M.; Norsa, F.

    2017-12-01

    Foraminifera tests are the main proxy carriers for paleoceanographic reconstructions. Both geochemical and taxonomical studies require large numbers of tests to achieve statistical relevance. To date, the extraction of foraminifera from the sediment coarse fraction is still done by hand and thus time-consuming. Moreover, the recognition of morphotypes, ecologically relevant, requires some taxonomical skills not easily taught. The automatic recognition and extraction of foraminifera would largely help paleoceanographers to overcome these issues. Recent advances in automatic image classification using machine learning opens the way to automatic extraction of foraminifera. Here we detail progress on the design of an automatic picking machine as part of the FIRST project. The machine handles 30 pre-sieved samples (100-1000µm), separating them into individual particles (including foraminifera) and imaging each in pseudo-3D. The particles are classified and specimens of interest are sorted either for Individual Foraminifera Analyses (44 per slide) and/or for classical multiple analyses (8 morphological classes per slide, up to 1000 individuals per hole). The classification is based on machine learning using Convolutional Neural Networks (CNNs), similar to the approach used in the coccolithophorid imaging system SYRACO. To prove its feasibility, we built two training image datasets of modern planktonic foraminifera containing approximately 2000 and 5000 images each, corresponding to 15 & 25 morphological classes. Using a CNN with a residual topology (ResNet) we achieve over 95% correct classification for each dataset. We tested the network on 160,000 images from 45 depths of a sediment core from the Pacific ocean, for which we have human counts. The current algorithm is able to reproduce the downcore variability in both Globigerinoides ruber and the fragmentation index (r2 = 0.58 and 0.88 respectively). The FIRST prototype yields some promising results for high

  18. Generalized synthetic aperture radar automatic target recognition by convolutional neural network with joint use of two-dimensional principal component analysis and support vector machine

    Science.gov (United States)

    Zheng, Ce; Jiang, Xue; Liu, Xingzhao

    2017-10-01

    Convolutional neural network (CNN), as a vital part of the deep learning research field, has shown powerful potential for automatic target recognition (ATR) of synthetic aperture radar (SAR). However, the high complexity caused by the deep structure of CNN makes it difficult to generalize. An improved form of CNN with higher generalization capability and less probability of overfitting, which further improves the efficiency and robustness of the SAR ATR system, is proposed. The convolution layers of CNN are combined with a two-dimensional principal component analysis algorithm. Correspondingly, the kernel support vector machine is utilized as the classifier layer instead of the multilayer perceptron. The verification experiments are implemented using the moving and stationary target acquisition and recognition database, and the results validate the efficiency of the proposed method.

  19. Development and Comparative Study of Effects of Training Algorithms on Performance of Artificial Neural Network Based Analog and Digital Automatic Modulation Recognition

    Directory of Open Access Journals (Sweden)

    Jide Julius Popoola

    2015-11-01

    Full Text Available This paper proposes two new classifiers that automatically recognise twelve combined analog and digital modulated signals without any a priori knowledge of the modulation schemes and the modulation parameters. The classifiers are developed using pattern recognition approach. Feature keys extracted from the instantaneous amplitude, instantaneous phase and the spectrum symmetry of the simulated signals are used as inputs to the artificial neural network employed in developing the classifiers. The two developed classifiers are trained using scaled conjugate gradient (SCG and conjugate gradient (CONJGRAD training algorithms. Sample results of the two classifiers show good success recognition performance with an average overall recognition rate above 99.50% at signal-to-noise ratio (SNR value from 0 dB and above with the two training algorithms employed and an average overall recognition rate slightly above 99.00% and 96.40% respectively at - 5 dB SNR value for SCG and CONJGRAD training algorithms. The comparative performance evaluation of the two developed classifiers using the two training algorithms shows that the two training algorithms have different effects on both the response rate and efficiency of the two developed artificial neural networks classifiers. In addition, the result of the performance evaluation carried out on the overall success recognition rates between the two developed classifiers in this study using pattern recognition approach with the two training algorithms and one reported classifier in surveyed literature using decision-theoretic approach shows that the classifiers developed in this study perform favourably with regard to accuracy and performance probability as compared to classifier presented in previous study.

  20. Quantile Acoustic Vectors vs. MFCC Applied to Speaker Verification

    Directory of Open Access Journals (Sweden)

    Mayorga-Ortiz Pedro

    2014-02-01

    Full Text Available In this paper we describe speaker and command recognition related experiments, through quantile vectors and Gaussian Mixture Modelling (GMM. Over the past several years GMM and MFCC have become two of the dominant approaches for modelling speaker and speech recognition applications. However, memory and computational costs are important drawbacks, because autonomous systems suffer processing and power consumption constraints; thus, having a good trade-off between accuracy and computational requirements is mandatory. We decided to explore another approach (quantile vectors in several tasks and a comparison with MFCC was made. Quantile acoustic vectors are proposed for speaker verification and command recognition tasks and the results showed very good recognition efficiency. This method offered a good trade-off between computation times, characteristics vector complexity and overall achieved efficiency.

  1. Speakers of different languages process the visual world differently.

    Science.gov (United States)

    Chabal, Sarah; Marian, Viorica

    2015-06-01

    Language and vision are highly interactive. Here we show that people activate language when they perceive the visual world, and that this language information impacts how speakers of different languages focus their attention. For example, when searching for an item (e.g., clock) in the same visual display, English and Spanish speakers look at different objects. Whereas English speakers searching for the clock also look at a cloud, Spanish speakers searching for the clock also look at a gift, because the Spanish names for gift (regalo) and clock (reloj) overlap phonologically. These different looking patterns emerge despite an absence of direct language input, showing that linguistic information is automatically activated by visual scene processing. We conclude that the varying linguistic information available to speakers of different languages affects visual perception, leading to differences in how the visual world is processed. (c) 2015 APA, all rights reserved).

  2. Automatic recognition of 3D GGO CT imaging signs through the fusion of hybrid resampling and layer-wise fine-tuning CNNs.

    Science.gov (United States)

    Han, Guanghui; Liu, Xiabi; Zheng, Guangyuan; Wang, Murong; Huang, Shan

    2018-06-06

    Ground-glass opacity (GGO) is a common CT imaging sign on high-resolution CT, which means the lesion is more likely to be malignant compared to common solid lung nodules. The automatic recognition of GGO CT imaging signs is of great importance for early diagnosis and possible cure of lung cancers. The present GGO recognition methods employ traditional low-level features and system performance improves slowly. Considering the high-performance of CNN model in computer vision field, we proposed an automatic recognition method of 3D GGO CT imaging signs through the fusion of hybrid resampling and layer-wise fine-tuning CNN models in this paper. Our hybrid resampling is performed on multi-views and multi-receptive fields, which reduces the risk of missing small or large GGOs by adopting representative sampling panels and processing GGOs with multiple scales simultaneously. The layer-wise fine-tuning strategy has the ability to obtain the optimal fine-tuning model. Multi-CNN models fusion strategy obtains better performance than any single trained model. We evaluated our method on the GGO nodule samples in publicly available LIDC-IDRI dataset of chest CT scans. The experimental results show that our method yields excellent results with 96.64% sensitivity, 71.43% specificity, and 0.83 F1 score. Our method is a promising approach to apply deep learning method to computer-aided analysis of specific CT imaging signs with insufficient labeled images. Graphical abstract We proposed an automatic recognition method of 3D GGO CT imaging signs through the fusion of hybrid resampling and layer-wise fine-tuning CNN models in this paper. Our hybrid resampling reduces the risk of missing small or large GGOs by adopting representative sampling panels and processing GGOs with multiple scales simultaneously. The layer-wise fine-tuning strategy has ability to obtain the optimal fine-tuning model. Our method is a promising approach to apply deep learning method to computer-aided analysis

  3. Automatic recognition of seismic intensity based on RS and GIS: a case study in Wenchuan Ms8.0 earthquake of China.

    Science.gov (United States)

    Zhang, Qiuwen; Zhang, Yan; Yang, Xiaohong; Su, Bin

    2014-01-01

    In recent years, earthquakes have frequently occurred all over the world, which caused huge casualties and economic losses. It is very necessary and urgent to obtain the seismic intensity map timely so as to master the distribution of the disaster and provide supports for quick earthquake relief. Compared with traditional methods of drawing seismic intensity map, which require many investigations in the field of earthquake area or are too dependent on the empirical formulas, spatial information technologies such as Remote Sensing (RS) and Geographical Information System (GIS) can provide fast and economical way to automatically recognize the seismic intensity. With the integrated application of RS and GIS, this paper proposes a RS/GIS-based approach for automatic recognition of seismic intensity, in which RS is used to retrieve and extract the information on damages caused by earthquake, and GIS is applied to manage and display the data of seismic intensity. The case study in Wenchuan Ms8.0 earthquake in China shows that the information on seismic intensity can be automatically extracted from remotely sensed images as quickly as possible after earthquake occurrence, and the Digital Intensity Model (DIM) can be used to visually query and display the distribution of seismic intensity.

  4. Automatic Recognition of Seismic Intensity Based on RS and GIS: A Case Study in Wenchuan Ms8.0 Earthquake of China

    Directory of Open Access Journals (Sweden)

    Qiuwen Zhang

    2014-01-01

    Full Text Available In recent years, earthquakes have frequently occurred all over the world, which caused huge casualties and economic losses. It is very necessary and urgent to obtain the seismic intensity map timely so as to master the distribution of the disaster and provide supports for quick earthquake relief. Compared with traditional methods of drawing seismic intensity map, which require many investigations in the field of earthquake area or are too dependent on the empirical formulas, spatial information technologies such as Remote Sensing (RS and Geographical Information System (GIS can provide fast and economical way to automatically recognize the seismic intensity. With the integrated application of RS and GIS, this paper proposes a RS/GIS-based approach for automatic recognition of seismic intensity, in which RS is used to retrieve and extract the information on damages caused by earthquake, and GIS is applied to manage and display the data of seismic intensity. The case study in Wenchuan Ms8.0 earthquake in China shows that the information on seismic intensity can be automatically extracted from remotely sensed images as quickly as possible after earthquake occurrence, and the Digital Intensity Model (DIM can be used to visually query and display the distribution of seismic intensity.

  5. Gender Identification of the Speaker Using VQ Method

    Directory of Open Access Journals (Sweden)

    Vasif V. Nabiyev

    2009-11-01

    Full Text Available Speaking is the easiest and natural form of communication between people. Intensive studies are made in order to provide this communication via computers between people. The systems using voice biometric technology are attracting attention especially in the angle of cost and usage. When compared with the other biometic systems the application is much more practical. For example by using a microphone placed in the environment voice record can be obtained even without notifying the user and the system can be applied. Moreover the remote access facility is one of the other advantages of voice biometry. In this study, it is aimed to automatically determine the gender of the speaker through the speech waves which include personal information. If the speaker gender can be determined while composing models according to the gender information, the success of voice recognition systems can be increased in an important degree. Generally all the speaker recognition systems are composed of two parts which are feature extraction and matching. Feature extraction is the procedure in which the least information presenting the speech and the speaker is determined through voice signal. There are different features used in voice applications such as LPC, MFCC and PLP. In this study as a feature vector MFCC is used. Feature mathcing is the procedure in which the features derived from unknown speakers and known speaker group are compared. According to the text used in comparison the system is devided to two parts that are text dependent and text independent. While the same text is used in text dependent systems, different texts are used in indepentent text systems. Nowadays, DTW and HMM are text dependent, VQ and GMM are text indepentent matching methods. In this study due to the high success ratio and simple application features VQ approach is used.In this study a system which determines the speaker gender automatically and text independent is proposed. The proposed

  6. Extricating Manual and Non-Manual Features for Subunit Level Medical Sign Modelling in Automatic Sign Language Classification and Recognition.

    Science.gov (United States)

    R, Elakkiya; K, Selvamani

    2017-09-22

    Subunit segmenting and modelling in medical sign language is one of the important studies in linguistic-oriented and vision-based Sign Language Recognition (SLR). Many efforts were made in the precedent to focus the functional subunits from the view of linguistic syllables but the problem is implementing such subunit extraction using syllables is not feasible in real-world computer vision techniques. And also, the present recognition systems are designed in such a way that it can detect the signer dependent actions under restricted and laboratory conditions. This research paper aims at solving these two important issues (1) Subunit extraction and (2) Signer independent action on visual sign language recognition. Subunit extraction involved in the sequential and parallel breakdown of sign gestures without any prior knowledge on syllables and number of subunits. A novel Bayesian Parallel Hidden Markov Model (BPaHMM) is introduced for subunit extraction to combine the features of manual and non-manual parameters to yield better results in classification and recognition of signs. Signer independent action aims in using a single web camera for different signer behaviour patterns and for cross-signer validation. Experimental results have proved that the proposed signer independent subunit level modelling for sign language classification and recognition has shown improvement and variations when compared with other existing works.

  7. Multimodal Speaker Diarization

    NARCIS (Netherlands)

    Noulas, A.; Englebienne, G.; Kröse, B.J.A.

    2012-01-01

    We present a novel probabilistic framework that fuses information coming from the audio and video modality to perform speaker diarization. The proposed framework is a Dynamic Bayesian Network (DBN) that is an extension of a factorial Hidden Markov Model (fHMM) and models the people appearing in an

  8. Fully automatized renal parenchyma volumetry using a support vector machine based recognition system for subject-specific probability map generation in native MR volume data

    Science.gov (United States)

    Gloger, Oliver; Tönnies, Klaus; Mensel, Birger; Völzke, Henry

    2015-11-01

    In epidemiological studies as well as in clinical practice the amount of produced medical image data strongly increased in the last decade. In this context organ segmentation in MR volume data gained increasing attention for medical applications. Especially in large-scale population-based studies organ volumetry is highly relevant requiring exact organ segmentation. Since manual segmentation is time-consuming and prone to reader variability, large-scale studies need automatized methods to perform organ segmentation. Fully automatic organ segmentation in native MR image data has proven to be a very challenging task. Imaging artifacts as well as inter- and intrasubject MR-intensity differences complicate the application of supervised learning strategies. Thus, we propose a modularized framework of a two-stepped probabilistic approach that generates subject-specific probability maps for renal parenchyma tissue, which are refined subsequently by using several, extended segmentation strategies. We present a three class-based support vector machine recognition system that incorporates Fourier descriptors as shape features to recognize and segment characteristic parenchyma parts. Probabilistic methods use the segmented characteristic parenchyma parts to generate high quality subject-specific parenchyma probability maps. Several refinement strategies including a final shape-based 3D level set segmentation technique are used in subsequent processing modules to segment renal parenchyma. Furthermore, our framework recognizes and excludes renal cysts from parenchymal volume, which is important to analyze renal functions. Volume errors and Dice coefficients show that our presented framework outperforms existing approaches.

  9. Fully automatized renal parenchyma volumetry using a support vector machine based recognition system for subject-specific probability map generation in native MR volume data

    International Nuclear Information System (INIS)

    Gloger, Oliver; Völzke, Henry; Tönnies, Klaus; Mensel, Birger

    2015-01-01

    In epidemiological studies as well as in clinical practice the amount of produced medical image data strongly increased in the last decade. In this context organ segmentation in MR volume data gained increasing attention for medical applications. Especially in large-scale population-based studies organ volumetry is highly relevant requiring exact organ segmentation. Since manual segmentation is time-consuming and prone to reader variability, large-scale studies need automatized methods to perform organ segmentation. Fully automatic organ segmentation in native MR image data has proven to be a very challenging task. Imaging artifacts as well as inter- and intrasubject MR-intensity differences complicate the application of supervised learning strategies. Thus, we propose a modularized framework of a two-stepped probabilistic approach that generates subject-specific probability maps for renal parenchyma tissue, which are refined subsequently by using several, extended segmentation strategies. We present a three class-based support vector machine recognition system that incorporates Fourier descriptors as shape features to recognize and segment characteristic parenchyma parts. Probabilistic methods use the segmented characteristic parenchyma parts to generate high quality subject-specific parenchyma probability maps. Several refinement strategies including a final shape-based 3D level set segmentation technique are used in subsequent processing modules to segment renal parenchyma. Furthermore, our framework recognizes and excludes renal cysts from parenchymal volume, which is important to analyze renal functions. Volume errors and Dice coefficients show that our presented framework outperforms existing approaches. (paper)

  10. Multimodal Speaker Diarization.

    Science.gov (United States)

    Noulas, A; Englebienne, G; Krose, B J A

    2012-01-01

    We present a novel probabilistic framework that fuses information coming from the audio and video modality to perform speaker diarization. The proposed framework is a Dynamic Bayesian Network (DBN) that is an extension of a factorial Hidden Markov Model (fHMM) and models the people appearing in an audiovisual recording as multimodal entities that generate observations in the audio stream, the video stream, and the joint audiovisual space. The framework is very robust to different contexts, makes no assumptions about the location of the recording equipment, and does not require labeled training data as it acquires the model parameters using the Expectation Maximization (EM) algorithm. We apply the proposed model to two meeting videos and a news broadcast video, all of which come from publicly available data sets. The results acquired in speaker diarization are in favor of the proposed multimodal framework, which outperforms the single modality analysis results and improves over the state-of-the-art audio-based speaker diarization.

  11. Upgrade of the Automatic Analysis System in the TJ-II Thomson Scattering Diagnostic: New Image Recognition Classifier and Fault Condition Detection

    Energy Technology Data Exchange (ETDEWEB)

    Makili, L.; Dormido-Canto, S. [UNED, Madrid (Spain); Vega, J.; Pastor, I.; Pereira, A.; Portas, A.; Perez-Risco, D.; Rodriguez-Fernandez, M. [Association EuratomCIEMAT para Fusion, Madrid (Spain); Busch, P. [FOM Instituut voor PlasmaFysica Rijnhuizen, Nieuwegein (Netherlands)

    2009-07-01

    Full text of publication follows: An automatic image classification system has been in operation for years in the TJ-II Thomson diagnostic. It recognizes five different types of images: CCD camera background, measurement of stray light without plasma or in a collapsed discharge, image during ECH phase, image during NBI phase and image after reaching the cut o density during ECH heating. Each kind of image implies the execution of different application software. Therefore, the classification system was developed to launch the corresponding software in an automatic way. The method to recognize the several classes was based on a learning system, in particular Support Vector Machines (SVM). Since the first implementation of the classifier, a relevant improvement has been accomplished in the diagnostic: a new notch filter is in operation, having a larger stray-light rejection at the ruby wavelength than the previous filter. On the other hand, its location in the optical system has been modified. As a consequence, the stray light pattern in the CCD image is located in a different position. In addition to these transformations, the power of neutral beams injected in the TJ-II plasma has been increased about a factor of 2. Due to the fact that the recognition system is based on a learning system and major modifications have been carried out in both the diagnostic (optics) and TJ-II plasmas (injected power), the classifier model is no longer valid. The creation of a new model (also based on SVM) under the present conditions has been necessary. Finally, specific error conditions in the data acquisition process can automatically be detected now. The recovering process can be automated, thereby avoiding the loss of data in ensuing discharges. (authors)

  12. The software for automatic creation of the formal grammars used by speech recognition, computer vision, editable text conversion systems, and some new functions

    Science.gov (United States)

    Kardava, Irakli; Tadyszak, Krzysztof; Gulua, Nana; Jurga, Stefan

    2017-02-01

    For more flexibility of environmental perception by artificial intelligence it is needed to exist the supporting software modules, which will be able to automate the creation of specific language syntax and to make a further analysis for relevant decisions based on semantic functions. According of our proposed approach, of which implementation it is possible to create the couples of formal rules of given sentences (in case of natural languages) or statements (in case of special languages) by helping of computer vision, speech recognition or editable text conversion system for further automatic improvement. In other words, we have developed an approach, by which it can be achieved to significantly improve the training process automation of artificial intelligence, which as a result will give us a higher level of self-developing skills independently from us (from users). At the base of our approach we have developed a software demo version, which includes the algorithm and software code for the entire above mentioned component's implementation (computer vision, speech recognition and editable text conversion system). The program has the ability to work in a multi - stream mode and simultaneously create a syntax based on receiving information from several sources.

  13. Speaker gender identification based on majority vote classifiers

    Science.gov (United States)

    Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri

    2017-03-01

    Speaker gender identification is considered among the most important tools in several multimedia applications namely in automatic speech recognition, interactive voice response systems and audio browsing systems. Gender identification systems performance is closely linked to the selected feature set and the employed classification model. Typical techniques are based on selecting the best performing classification method or searching optimum tuning of one classifier parameters through experimentation. In this paper, we consider a relevant and rich set of features involving pitch, MFCCs as well as other temporal and frequency-domain descriptors. Five classification models including decision tree, discriminant analysis, nave Bayes, support vector machine and k-nearest neighbor was experimented. The three best perming classifiers among the five ones will contribute by majority voting between their scores. Experimentations were performed on three different datasets spoken in three languages: English, German and Arabic in order to validate language independency of the proposed scheme. Results confirm that the presented system has reached a satisfying accuracy rate and promising classification performance thanks to the discriminating abilities and diversity of the used features combined with mid-level statistics.

  14. AUTOMATIC RECOGNITION OF CORONAL TYPE II RADIO BURSTS: THE AUTOMATED RADIO BURST IDENTIFICATION SYSTEM METHOD AND FIRST OBSERVATIONS

    International Nuclear Information System (INIS)

    Lobzin, Vasili V.; Cairns, Iver H.; Robinson, Peter A.; Steward, Graham; Patterson, Garth

    2010-01-01

    Major space weather events such as solar flares and coronal mass ejections are usually accompanied by solar radio bursts, which can potentially be used for real-time space weather forecasts. Type II radio bursts are produced near the local plasma frequency and its harmonic by fast electrons accelerated by a shock wave moving through the corona and solar wind with a typical speed of ∼1000 km s -1 . The coronal bursts have dynamic spectra with frequency gradually falling with time and durations of several minutes. This Letter presents a new method developed to detect type II coronal radio bursts automatically and describes its implementation in an extended Automated Radio Burst Identification System (ARBIS 2). Preliminary tests of the method with spectra obtained in 2002 show that the performance of the current implementation is quite high, ∼80%, while the probability of false positives is reasonably low, with one false positive per 100-200 hr for high solar activity and less than one false event per 10000 hr for low solar activity periods. The first automatically detected coronal type II radio burst is also presented.

  15. ROBIN: a platform for evaluating automatic target recognition algorithms: I. Overview of the project and presentation of the SAGEM DS competition

    Science.gov (United States)

    Duclos, D.; Lonnoy, J.; Guillerm, Q.; Jurie, F.; Herbin, S.; D'Angelo, E.

    2008-04-01

    The last five years have seen a renewal of Automatic Target Recognition applications, mainly because of the latest advances in machine learning techniques. In this context, large collections of image datasets are essential for training algorithms as well as for their evaluation. Indeed, the recent proliferation of recognition algorithms, generally applied to slightly different problems, make their comparisons through clean evaluation campaigns necessary. The ROBIN project tries to fulfil these two needs by putting unclassified datasets, ground truths, competitions and metrics for the evaluation of ATR algorithms at the disposition of the scientific community. The scope of this project includes single and multi-class generic target detection and generic target recognition, in military and security contexts. From our knowledge, it is the first time that a database of this importance (several hundred thousands of visible and infrared hand annotated images) has been publicly released. Funded by the French Ministry of Defence (DGA) and by the French Ministry of Research, ROBIN is one of the ten Techno-vision projects. Techno-vision is a large and ambitious government initiative for building evaluation means for computer vision technologies, for various application contexts. ROBIN's consortium includes major companies and research centres involved in Computer Vision R&D in the field of defence: Bertin Technologies, CNES, ECA, DGA, EADS, INRIA, ONERA, MBDA, SAGEM, THALES. This paper, which first gives an overview of the whole project, is focused on one of ROBIN's key competitions, the SAGEM Defence Security database. This dataset contains more than eight hundred ground and aerial infrared images of six different vehicles in cluttered scenes including distracters. Two different sets of data are available for each target. The first set includes different views of each vehicle at close range in a "simple" background, and can be used to train algorithms. The second set

  16. Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification

    DEFF Research Database (Denmark)

    Delgado, Hector; Todisco, Massimiliano; Sahidullah, Md

    2016-01-01

    Many authentication applications involving automatic speaker verification (ASV) demand robust performance using short-duration, fixed or prompted text utterances. Text constraints not only reduce the phone-mismatch between enrollment and test utterances, which generally leads to improved performa...

  17. Limited data speaker identification

    Indian Academy of Sciences (India)

    recognition can be either identification or verification depending on the task objective. .... like Bayesian formalism, voting method and Dempster-Shafer (D–S) theory ..... self-organizing map (SOM) (Kohonen 1990), learning vector quantization ...

  18. Fault prevention by early stage symptoms detection for automatic vehicle transmission using pattern recognition and curve fitting

    Science.gov (United States)

    Balbin, Jessie R.; Cruz, Febus Reidj G.; Abu, Jon Ervin A.; Siño, Carlo G.; Ubaldo, Paolo E.; Zulueta, Christelle Jianne T.

    2017-06-01

    Automobiles have become essential parts of our everyday lives. It can correlate many factors that may affect a vehicle primarily those which may inconvenient or in some cases harm lives or properties. Thus, focusing on detecting an automatic transmission vehicle engine, body and other parts that cause vibration and sound may help prevent car problems using MATLAB. By using sound, vibration, and temperature sensors to detect the defects of the car and with the help of the transmitter and receiver to gather data wirelessly, it is easy to install on to the vehicle. A technique utilized from Toyota Balintawak Philippines that every car is treated as panels(a, b, c, d, and e) 'a' being from the hood until the front wheel of the car and 'e' the rear shield to the back of the car, this was applied on how to properly place the sensors so that precise data could be gathered. Data gathered would be compared to the normal graph taken from the normal status or performance of a vehicle, data that would surpass 50% of the normal graph would be considered that a problem has occurred. The system is designed to prevent car accidents by determining the current status or performance of the vehicle, also keeping people away from harm.

  19. Development an Automatic Speech to Facial Animation Conversion for Improve Deaf Lives

    Directory of Open Access Journals (Sweden)

    S. Hamidreza Kasaei

    2011-05-01

    Full Text Available In this paper, we propose design and initial implementation of a robust system which can automatically translates voice into text and text to sign language animations. Sign Language
    Translation Systems could significantly improve deaf lives especially in communications, exchange of information and employment of machine for translation conversations from one language to another has. Therefore, considering these points, it seems necessary to study the speech recognition. Usually, the voice recognition algorithms address three major challenges. The first is extracting feature form speech and the second is when limited sound gallery are available for recognition, and the final challenge is to improve speaker dependent to speaker independent voice recognition. Extracting feature form speech is an important stage in our method. Different procedures are available for extracting feature form speech. One of the commonest of which used in speech
    recognition systems is Mel-Frequency Cepstral Coefficients (MFCCs. The algorithm starts with preprocessing and signal conditioning. Next extracting feature form speech using Cepstral coefficients will be done. Then the result of this process sends to segmentation part. Finally recognition part recognizes the words and then converting word recognized to facial animation. The project is still in progress and some new interesting methods are described in the current report.

  20. The contribution of discrete-trial naming and visual recognition to rapid automatized naming deficits of dyslexic children with and without a history of language delay

    Directory of Open Access Journals (Sweden)

    Filippo eGasperini

    2014-09-01

    Full Text Available Children with Developmental Dyslexia (DD are impaired in Rapid Automatized Naming (RAN tasks, where subjects are asked to name arrays of high frequency items as quickly as possible. However the reasons why RAN speed discriminates DD from typical readers are not yet fully understood. Our study was aimed to identify some of the cognitive mechanisms underlying RAN-reading relationship by comparing one group of 32 children with DD with an age-matched control group of typical readers on a naming and a visual recognition task both using a discrete-trial methodology , in addition to a serial RAN task, all using the same stimuli (digits and colors. Results showed a significant slowness of DD children in both serial and discrete-trial naming tasks regardless of type of stimulus, but no difference between the two groups on the discrete-trial recognition task. Significant differences between DD and control participants in the RAN task disappeared when performance in the discrete-trial naming task was partialled out by covariance analysis for colors, but not for digits. The same pattern held in a subgroup of DD subjects with a history of early language delay (LD. By contrast, in a subsample of DD children without LD the RAN deficit was specific for digits and disappeared after slowness in discrete-trial naming was partialled out. Slowness in discrete-trial naming was more evident for LD than for noLD DD children. Overall, our results confirm previous evidence indicating a name-retrieval deficit as a cognitive impairment underlying RAN slowness in DD children. This deficit seems to be more marked in DD children with previous LD. Moreover, additional cognitive deficits specifically associated with serial RAN tasks have to be taken into account when explaining deficient RAN speed of these latter children. We suggest that partially different cognitive dysfunctions underpin superficially similar RAN impairments in different subgroups of DD subjects.

  1. Automatic Recognition of Acute Myelogenous Leukemia in Blood Microscopic Images Using K-means Clustering and Support Vector Machine.

    Science.gov (United States)

    Kazemi, Fatemeh; Najafabadi, Tooraj Abbasian; Araabi, Babak Nadjar

    2016-01-01

    Acute myelogenous leukemia (AML) is a subtype of acute leukemia, which is characterized by the accumulation of myeloid blasts in the bone marrow. Careful microscopic examination of stained blood smear or bone marrow aspirate is still the most significant diagnostic methodology for initial AML screening and considered as the first step toward diagnosis. It is time-consuming and due to the elusive nature of the signs and symptoms of AML; wrong diagnosis may occur by pathologists. Therefore, the need for automation of leukemia detection has arisen. In this paper, an automatic technique for identification and detection of AML and its prevalent subtypes, i.e., M2-M5 is presented. At first, microscopic images are acquired from blood smears of patients with AML and normal cases. After applying image preprocessing, color segmentation strategy is applied for segmenting white blood cells from other blood components and then discriminative features, i.e., irregularity, nucleus-cytoplasm ratio, Hausdorff dimension, shape, color, and texture features are extracted from the entire nucleus in the whole images containing multiple nuclei. Images are classified to cancerous and noncancerous images by binary support vector machine (SVM) classifier with 10-fold cross validation technique. Classifier performance is evaluated by three parameters, i.e., sensitivity, specificity, and accuracy. Cancerous images are also classified into their prevalent subtypes by multi-SVM classifier. The results show that the proposed algorithm has achieved an acceptable performance for diagnosis of AML and its common subtypes. Therefore, it can be used as an assistant diagnostic tool for pathologists.

  2. Deep vector-based convolutional neural network approach for automatic recognition of colonies of induced pluripotent stem cells.

    Directory of Open Access Journals (Sweden)

    Muthu Subash Kavitha

    Full Text Available Pluripotent stem cells can potentially be used in clinical applications as a model for studying disease progress. This tracking of disease-causing events in cells requires constant assessment of the quality of stem cells. Existing approaches are inadequate for robust and automated differentiation of stem cell colonies. In this study, we developed a new model of vector-based convolutional neural network (V-CNN with respect to extracted features of the induced pluripotent stem cell (iPSC colony for distinguishing colony characteristics. A transfer function from the feature vectors to the virtual image was generated at the front of the CNN in order for classification of feature vectors of healthy and unhealthy colonies. The robustness of the proposed V-CNN model in distinguishing colonies was compared with that of the competitive support vector machine (SVM classifier based on morphological, textural, and combined features. Additionally, five-fold cross-validation was used to investigate the performance of the V-CNN model. The precision, recall, and F-measure values of the V-CNN model were comparatively higher than those of the SVM classifier, with a range of 87-93%, indicating fewer false positives and false negative rates. Furthermore, for determining the quality of colonies, the V-CNN model showed higher accuracy values based on morphological (95.5%, textural (91.0%, and combined (93.2% features than those estimated with the SVM classifier (86.7, 83.3, and 83.4%, respectively. Similarly, the accuracy of the feature sets using five-fold cross-validation was above 90% for the V-CNN model, whereas that yielded by the SVM model was in the range of 75-77%. We thus concluded that the proposed V-CNN model outperforms the conventional SVM classifier, which strongly suggests that it as a reliable framework for robust colony classification of iPSCs. It can also serve as a cost-effective quality recognition tool during culture and other experimental

  3. Infrared variation reduction by simultaneous background suppression and target contrast enhancement for deep convolutional neural network-based automatic target recognition

    Science.gov (United States)

    Kim, Sungho

    2017-06-01

    Automatic target recognition (ATR) is a traditionally challenging problem in military applications because of the wide range of infrared (IR) image variations and the limited number of training images. IR variations are caused by various three-dimensional target poses, noncooperative weather conditions (fog and rain), and difficult target acquisition environments. Recently, deep convolutional neural network-based approaches for RGB images (RGB-CNN) showed breakthrough performance in computer vision problems, such as object detection and classification. The direct use of RGB-CNN to the IR ATR problem fails to work because of the IR database problems (limited database size and IR image variations). An IR variation-reduced deep CNN (IVR-CNN) to cope with the problems is presented. The problem of limited IR database size is solved by a commercial thermal simulator (OKTAL-SE). The second problem of IR variations is mitigated by the proposed shifted ramp function-based intensity transformation. This can suppress the background and enhance the target contrast simultaneously. The experimental results on the synthesized IR images generated by the thermal simulator (OKTAL-SE) validated the feasibility of IVR-CNN for military ATR applications.

  4. Does verbatim sentence recall underestimate the language competence of near-native speakers?

    Directory of Open Access Journals (Sweden)

    Judith eSchweppe

    2015-02-01

    Full Text Available Verbatim sentence recall is widely used to test the language competence of native and non-native speakers since it involves comprehension and production of connected speech. However, we assume that, to maintain surface information, sentence recall relies particularly on attentional resources, which differentially affects native and non-native speakers. Since even in near-natives language processing is less automatized than in native speakers, processing a sentence in a foreign language plus retaining its surface may result in a cognitive overload. We contrasted sentence recall performance of German native speakers with that of highly proficient non-natives. Non-natives recalled the sentences significantly poorer than the natives, but performed equally well on a cloze test. This implies that sentence recall underestimates the language competence of good non-native speakers in mixed groups with native speakers. The findings also suggest that theories of sentence recall need to consider both its linguistic and its attentional aspects.

  5. The speaker's formant.

    Science.gov (United States)

    Bele, Irene Velsvik

    2006-12-01

    The current study concerns speaking voice quality in two groups of professional voice users, teachers (n = 35) and actors (n = 36), representing trained and untrained voices. The voice quality of text reading at two intensity levels was acoustically analyzed. The central concept was the speaker's formant (SPF), related to the perceptual characteristics "better normal voice quality" (BNQ) and "worse normal voice quality" (WNQ). The purpose of the current study was to get closer to the origin of the phenomenon of the SPF, and to discover the differences in spectral and formant characteristics between the two professional groups and the two voice quality groups. The acoustic analyses were long-term average spectrum (LTAS) and spectrographical measurements of formant frequencies. At very high intensities, the spectral slope was rather quandrangular without a clear SPF peak. The trained voices had a higher energy level in the SPF region compared with the untrained, significantly so in loud phonation. The SPF seemed to be related to both sufficiently strong overtones and a glottal setting, allowing for a lowering of F4 and a closeness of F3 and F4. However, the existence of SPF also in LTAS of the WNQ voices implies that more research is warranted concerning the formation of SPF, and concerning the acoustic correlates of the BNQ voices.

  6. Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar.

    Science.gov (United States)

    Shin, Young Hoon; Seo, Jiwon

    2016-10-29

    People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.

  7. Shhh… I Need Quiet! Children's Understanding of American, British, and Japanese-accented English Speakers.

    Science.gov (United States)

    Bent, Tessa; Holt, Rachael Frush

    2018-02-01

    Children's ability to understand speakers with a wide range of dialects and accents is essential for efficient language development and communication in a global society. Here, the impact of regional dialect and foreign-accent variability on children's speech understanding was evaluated in both quiet and noisy conditions. Five- to seven-year-old children ( n = 90) and adults ( n = 96) repeated sentences produced by three speakers with different accents-American English, British English, and Japanese-accented English-in quiet or noisy conditions. Adults had no difficulty understanding any speaker in quiet conditions. Their performance declined for the nonnative speaker with a moderate amount of noise; their performance only substantially declined for the British English speaker (i.e., below 93% correct) when their understanding of the American English speaker was also impeded. In contrast, although children showed accurate word recognition for the American and British English speakers in quiet conditions, they had difficulty understanding the nonnative speaker even under ideal listening conditions. With a moderate amount of noise, their perception of British English speech declined substantially and their ability to understand the nonnative speaker was particularly poor. These results suggest that although school-aged children can understand unfamiliar native dialects under ideal listening conditions, their ability to recognize words in these dialects may be highly susceptible to the influence of environmental degradation. Fully adult-like word identification for speakers with unfamiliar accents and dialects may exhibit a protracted developmental trajectory.

  8. Arctic Visiting Speakers Series (AVS)

    Science.gov (United States)

    Fox, S. E.; Griswold, J.

    2011-12-01

    The Arctic Visiting Speakers (AVS) Series funds researchers and other arctic experts to travel and share their knowledge in communities where they might not otherwise connect. Speakers cover a wide range of arctic research topics and can address a variety of audiences including K-12 students, graduate and undergraduate students, and the general public. Host applications are accepted on an on-going basis, depending on funding availability. Applications need to be submitted at least 1 month prior to the expected tour dates. Interested hosts can choose speakers from an online Speakers Bureau or invite a speaker of their choice. Preference is given to individuals and organizations to host speakers that reach a broad audience and the general public. AVS tours are encouraged to span several days, allowing ample time for interactions with faculty, students, local media, and community members. Applications for both domestic and international visits will be considered. Applications for international visits should involve participation of more than one host organization and must include either a US-based speaker or a US-based organization. This is a small but important program that educates the public about Arctic issues. There have been 27 tours since 2007 that have impacted communities across the globe including: Gatineau, Quebec Canada; St. Petersburg, Russia; Piscataway, New Jersey; Cordova, Alaska; Nuuk, Greenland; Elizabethtown, Pennsylvania; Oslo, Norway; Inari, Finland; Borgarnes, Iceland; San Francisco, California and Wolcott, Vermont to name a few. Tours have included lectures to K-12 schools, college and university students, tribal organizations, Boy Scout troops, science center and museum patrons, and the general public. There are approximately 300 attendees enjoying each AVS tour, roughly 4100 people have been reached since 2007. The expectations for each tour are extremely manageable. Hosts must submit a schedule of events and a tour summary to be posted online

  9. Affective processing in bilingual speakers: disembodied cognition?

    Science.gov (United States)

    Pavlenko, Aneta

    2012-01-01

    A recent study by Keysar, Hayakawa, and An (2012) suggests that "thinking in a foreign language" may reduce decision biases because a foreign language provides a greater emotional distance than a native tongue. The possibility of such "disembodied" cognition is of great interest for theories of affect and cognition and for many other areas of psychological theory and practice, from clinical and forensic psychology to marketing, but first this claim needs to be properly evaluated. The purpose of this review is to examine the findings of clinical, introspective, cognitive, psychophysiological, and neuroimaging studies of affective processing in bilingual speakers in order to identify converging patterns of results, to evaluate the claim about "disembodied cognition," and to outline directions for future inquiry. The findings to date reveal two interrelated processing effects. First-language (L1) advantage refers to increased automaticity of affective processing in the L1 and heightened electrodermal reactivity to L1 emotion-laden words. Second-language (L2) advantage refers to decreased automaticity of affective processing in the L2, which reduces interference effects and lowers electrodermal reactivity to negative emotional stimuli. The differences in L1 and L2 affective processing suggest that in some bilingual speakers, in particular late bilinguals and foreign language users, respective languages may be differentially embodied, with the later learned language processed semantically but not affectively. This difference accounts for the reduction of framing biases in L2 processing in the study by Keysar et al. (2012). The follow-up discussion identifies the limits of the findings to date in terms of participant populations, levels of processing, and types of stimuli, puts forth alternative explanations of the documented effects, and articulates predictions to be tested in future research.

  10. Voice congruency facilitates word recognition.

    Directory of Open Access Journals (Sweden)

    Sandra Campeanu

    Full Text Available Behavioral studies of spoken word memory have shown that context congruency facilitates both word and source recognition, though the level at which context exerts its influence remains equivocal. We measured event-related potentials (ERPs while participants performed both types of recognition task with words spoken in four voices. Two voice parameters (i.e., gender and accent varied between speakers, with the possibility that none, one or two of these parameters was congruent between study and test. Results indicated that reinstating the study voice at test facilitated both word and source recognition, compared to similar or no context congruency at test. Behavioral effects were paralleled by two ERP modulations. First, in the word recognition test, the left parietal old/new effect showed a positive deflection reflective of context congruency between study and test words. Namely, the same speaker condition provided the most positive deflection of all correctly identified old words. In the source recognition test, a right frontal positivity was found for the same speaker condition compared to the different speaker conditions, regardless of response success. Taken together, the results of this study suggest that the benefit of context congruency is reflected behaviorally and in ERP modulations traditionally associated with recognition memory.

  11. Voice congruency facilitates word recognition.

    Science.gov (United States)

    Campeanu, Sandra; Craik, Fergus I M; Alain, Claude

    2013-01-01

    Behavioral studies of spoken word memory have shown that context congruency facilitates both word and source recognition, though the level at which context exerts its influence remains equivocal. We measured event-related potentials (ERPs) while participants performed both types of recognition task with words spoken in four voices. Two voice parameters (i.e., gender and accent) varied between speakers, with the possibility that none, one or two of these parameters was congruent between study and test. Results indicated that reinstating the study voice at test facilitated both word and source recognition, compared to similar or no context congruency at test. Behavioral effects were paralleled by two ERP modulations. First, in the word recognition test, the left parietal old/new effect showed a positive deflection reflective of context congruency between study and test words. Namely, the same speaker condition provided the most positive deflection of all correctly identified old words. In the source recognition test, a right frontal positivity was found for the same speaker condition compared to the different speaker conditions, regardless of response success. Taken together, the results of this study suggest that the benefit of context congruency is reflected behaviorally and in ERP modulations traditionally associated with recognition memory.

  12. Proper Names and Named Entities Recognition in the Automatic Text Processing. Review of the book: Nouvel, D., Ehrmann, M., & Rosset, S. (2016. Named Entities for Computational Linguistics. London; Hoboken: ISTE Ltd; John Wiley & Sons, Inc., 2016.

    Directory of Open Access Journals (Sweden)

    Daria M. Golikova

    2018-03-01

    Full Text Available The reviewed book by Damien Nouvel, Maud Ehrmann, and Sophie Rosset Named Entities for Computational Linguistics deals with automatic processing of texts, written in a natural language, and with named entities recognition, aimed at extracting most important information in these texts. The notion of named entities here extends to the entire set of linguistic units referring to an object. The researchers minutely consider the concept of named entities, juxtaposing this category to that of proper names and comparing their definitions, and describe all the stages of creation and implementation of automatic text annotation algorithms, as well as different ways of evaluating their performance quality. Proper names, in this context, are seen as a particular instance of named entities, one of the typical sources of reference to real objects to be electronically recognized in the text. The book provides a detailed overview and analysis of previous studies in the same field, based mainly on the English language data. It presents instruments and resources required to create and implement the algorithms in question, these may include typologies, knowledge or databases, and various types of corpora. Theoretical considerations, proposed by the authors, are supported by a significant number of exemplary cases, with algorithms operation principles presented in charts. The reviewed book gives quite a comprehensive picture of modern computational linguistic studies focused on named entities recognition and indicates some problems which are unresolved as yet.

  13. Google Home: smart speaker as environmental control unit.

    Science.gov (United States)

    Noda, Kenichiro

    2017-08-23

    Environmental Control Units (ECU) are devices or a system that allows a person to control appliances in their home or work environment. Such system can be utilized by clients with physical and/or functional disability to enhance their ability to control their environment, to promote independence and improve their quality of life. Over the last several years, there have been an emergence of several inexpensive, commercially-available, voice activated smart speakers into the market such as Google Home and Amazon Echo. These smart speakers are equipped with far field microphone that supports voice recognition, and allows for complete hand-free operation for various purposes, including for playing music, for information retrieval, and most importantly, for environmental control. Clients with disability could utilize these features to turn the unit into a simple ECU that is completely voice activated and wirelessly connected to appliances. Smart speakers, with their ease of setup, low cost and versatility, may be a more affordable and accessible alternative to the traditional ECU. Implications for Rehabilitation Environmental Control Units (ECU) enable independence for physically and functionally disabled clients, and reduce burden and frequency of demands on carers. Traditional ECU can be costly and may require clients to learn specialized skills to use. Smart speakers have the potential to be used as a new-age ECU by overcoming these barriers, and can be used by a wider range of clients.

  14. SU-D-201-05: On the Automatic Recognition of Patient Safety Hazards in a Radiotherapy Setup Using a Novel 3D Camera System and a Deep Learning Framework

    Energy Technology Data Exchange (ETDEWEB)

    Santhanam, A; Min, Y; Beron, P; Agazaryan, N; Kupelian, P; Low, D [UCLA, Los Angeles, CA (United States)

    2016-06-15

    Purpose: Patient safety hazards such as a wrong patient/site getting treated can lead to catastrophic results. The purpose of this project is to automatically detect potential patient safety hazards during the radiotherapy setup and alert the therapist before the treatment is initiated. Methods: We employed a set of co-located and co-registered 3D cameras placed inside the treatment room. Each camera provided a point-cloud of fraxels (fragment pixels with 3D depth information). Each of the cameras were calibrated using a custom-built calibration target to provide 3D information with less than 2 mm error in the 500 mm neighborhood around the isocenter. To identify potential patient safety hazards, the treatment room components and the patient’s body needed to be identified and tracked in real-time. For feature recognition purposes, we used a graph-cut based feature recognition with principal component analysis (PCA) based feature-to-object correlation to segment the objects in real-time. Changes in the object’s position were tracked using the CamShift algorithm. The 3D object information was then stored for each classified object (e.g. gantry, couch). A deep learning framework was then used to analyze all the classified objects in both 2D and 3D and was then used to fine-tune a convolutional network for object recognition. The number of network layers were optimized to identify the tracked objects with >95% accuracy. Results: Our systematic analyses showed that, the system was effectively able to recognize wrong patient setups and wrong patient accessories. The combined usage of 2D camera information (color + depth) enabled a topology-preserving approach to verify patient safety hazards in an automatic manner and even in scenarios where the depth information is partially available. Conclusion: By utilizing the 3D cameras inside the treatment room and a deep learning based image classification, potential patient safety hazards can be effectively avoided.

  15. SU-D-201-05: On the Automatic Recognition of Patient Safety Hazards in a Radiotherapy Setup Using a Novel 3D Camera System and a Deep Learning Framework

    International Nuclear Information System (INIS)

    Santhanam, A; Min, Y; Beron, P; Agazaryan, N; Kupelian, P; Low, D

    2016-01-01

    Purpose: Patient safety hazards such as a wrong patient/site getting treated can lead to catastrophic results. The purpose of this project is to automatically detect potential patient safety hazards during the radiotherapy setup and alert the therapist before the treatment is initiated. Methods: We employed a set of co-located and co-registered 3D cameras placed inside the treatment room. Each camera provided a point-cloud of fraxels (fragment pixels with 3D depth information). Each of the cameras were calibrated using a custom-built calibration target to provide 3D information with less than 2 mm error in the 500 mm neighborhood around the isocenter. To identify potential patient safety hazards, the treatment room components and the patient’s body needed to be identified and tracked in real-time. For feature recognition purposes, we used a graph-cut based feature recognition with principal component analysis (PCA) based feature-to-object correlation to segment the objects in real-time. Changes in the object’s position were tracked using the CamShift algorithm. The 3D object information was then stored for each classified object (e.g. gantry, couch). A deep learning framework was then used to analyze all the classified objects in both 2D and 3D and was then used to fine-tune a convolutional network for object recognition. The number of network layers were optimized to identify the tracked objects with >95% accuracy. Results: Our systematic analyses showed that, the system was effectively able to recognize wrong patient setups and wrong patient accessories. The combined usage of 2D camera information (color + depth) enabled a topology-preserving approach to verify patient safety hazards in an automatic manner and even in scenarios where the depth information is partially available. Conclusion: By utilizing the 3D cameras inside the treatment room and a deep learning based image classification, potential patient safety hazards can be effectively avoided.

  16. Forensic Face Recognition: A Survey

    NARCIS (Netherlands)

    Ali, Tauseef; Spreeuwers, Lieuwe Jan; Veldhuis, Raymond N.J.; Quaglia, Adamo; Epifano, Calogera M.

    2012-01-01

    The improvements of automatic face recognition during the last 2 decades have disclosed new applications like border control and camera surveillance. A new application field is forensic face recognition. Traditionally, face recognition by human experts has been used in forensics, but now there is a

  17. Upgrade of the automatic analysis system in the TJ-II Thomson Scattering diagnostic: New image recognition classifier and fault condition detection

    NARCIS (Netherlands)

    Makili, L.; Vega, J.; Dormido-Canto, S.; Pastor, I.; Pereira, A.; Farias, G.; Portas, A.; Perez-Risco, D.; Rodriguez-Fernandez, M. C.; Busch, P.

    2010-01-01

    An automatic image classification system based on support vector machines (SVM) has been in operation for years in the TJ-II Thomson Scattering diagnostic. It recognizes five different types of images: CCD camera background, measurement of stray light without plasma or in a collapsed discharge,

  18. The native-speaker fever in English language teaching (ELT: Pitting pedagogical competence against historical origin

    Directory of Open Access Journals (Sweden)

    Anchimbe, Eric A.

    2006-01-01

    Full Text Available This paper discusses English language teaching (ELT around the world, and argues that as a profession, it should emphasise pedagogical competence rather than native-speaker requirement in the recruitment of teachers in English as a foreign language (EFL and English as a second language (ESL contexts. It establishes that being a native speaker does not make one automatically a competent speaker or, of that matter, a competent teacher of the language. It observes that on many grounds, including physical, sociocultural, technological and economic changes in the world as well as the status of English as official and national language in many post-colonial regions, the distinction between native and non-native speakers is no longer valid.

  19. Developing a Speaker Identification System for the DARPA RATS Project

    DEFF Research Database (Denmark)

    Plchot, O; Matsoukas, S; Matejka, P

    2013-01-01

    This paper describes the speaker identification (SID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. ...... such as CFCCs out-perform MFCC front-ends on noisy audio, and (c) fusion of multiple systems provides 24% relative improvement in EER compared to the single best system when using a novel SVM-based fusion algorithm that uses side information such as gender, language, and channel id....

  20. Speech overlap detection in a two-pass speaker diarization system

    NARCIS (Netherlands)

    Huijbregts, M.A.H.; Leeuwen, D.A. van; Jong, F. M. G de

    2009-01-01

    In this paper we present the two-pass speaker diarization system that we developed for the NIST RT09s evaluation. In the first pass of our system a model for speech overlap detection is gen- erated automatically. This model is used in two ways to reduce the diarization errors due to overlapping

  1. Speech overlap detection in a two-pass speaker diarization system

    NARCIS (Netherlands)

    Huijbregts, M.; Leeuwen, D.A. van; Jong, F.M.G. de

    2009-01-01

    In this paper we present the two-pass speaker diarization system that we developed for the NIST RT09s evaluation. In the first pass of our system a model for speech overlap detection is generated automatically. This model is used in two ways to reduce the diarization errors due to overlapping

  2. Automatic recognition of damaged town buildings caused by earthquake using remote sensing information: Taking the 2001 Bhuj, India, earthquake and the 1976 Tangshan, China, earthquake as examples

    Science.gov (United States)

    Liu, Jia-Hang; Shan, Xin-Jian; Yin, Jing-Yuan

    2004-11-01

    In the high-resolution images, the undamaged buildings generally show a natural textural feature, while the damaged or semi-damaged buildings always exhibit some low-grayscale blocks because of their coarsely damaged sections. If we use a proper threshold to classify the grayscale of image, some independent holes will appear in the damaged regions. By using such statistical information as the number of holes in every region, or the ratio between the area of holes and that of the region, etc, the damaged buildings can be separated from the undamaged, thus automatic detection of damaged buildings can be realized. Based on these characteristics, a new method to automatically detect the damage buildings by using regional structure and statistical information of texture is presented in the paper. In order to test its validity, 1-m-resolution iKonos merged image of the 2001 Bhuj earthquake and grayscale aerial photos of the 1976 Tangshan earthquake are selected as two examples to automatically detect the damaged buildings. Satisfied results are obtained.

  3. What makes a charismatic speaker?

    DEFF Research Database (Denmark)

    Niebuhr, Oliver; Voße, Jana; Brem, Alexander

    2016-01-01

    The former Apple CEO Steve Jobs was one of the most charismatic speakers of the past decades. However, there is, as yet, no detailed quantitative profile of his way of speaking. We used state-of-the-art computer techniques to acoustically analyze his speech behavior and relate it to reference...... samples. Our paper provides the first-ever acoustic profile of Steve Jobs, based on about 4000 syllables and 12,000 individual speech sounds from his two most outstanding and well-known product presentations: the introductions of the iPhone 4 and the iPad 2. Our results show that Steve Jobs stands out...

  4. Speaker identification for the improvement of the security communication between law enforcement units

    Science.gov (United States)

    Tovarek, Jaromir; Partila, Pavol

    2017-05-01

    This article discusses the speaker identification for the improvement of the security communication between law enforcement units. The main task of this research was to develop the text-independent speaker identification system which can be used for real-time recognition. This system is designed for identification in the open set. It means that the unknown speaker can be anyone. Communication itself is secured, but we have to check the authorization of the communication parties. We have to decide if the unknown speaker is the authorized for the given action. The calls are recorded by IP telephony server and then these recordings are evaluate using classification If the system evaluates that the speaker is not authorized, it sends a warning message to the administrator. This message can detect, for example a stolen phone or other unusual situation. The administrator then performs the appropriate actions. Our novel proposal system uses multilayer neural network for classification and it consists of three layers (input layer, hidden layer, and output layer). A number of neurons in input layer corresponds with the length of speech features. Output layer then represents classified speakers. Artificial Neural Network classifies speech signal frame by frame, but the final decision is done over the complete record. This rule substantially increases accuracy of the classification. Input data for the neural network are a thirteen Mel-frequency cepstral coefficients, which describe the behavior of the vocal tract. These parameters are the most used for speaker recognition. Parameters for training, testing and validation were extracted from recordings of authorized users. Recording conditions for training data correspond with the real traffic of the system (sampling frequency, bit rate). The main benefit of the research is the system developed for text-independent speaker identification which is applied to secure communication between law enforcement units.

  5. A REVIEW: OPTICAL CHARACTER RECOGNITION

    OpenAIRE

    Swati Tomar*1 & Amit Kishore2

    2018-01-01

    This paper presents detailed review in the field of Optical Character Recognition. Various techniques are determine that have been proposed to realize the center of character recognition in an optical character recognition system. Even though, sufficient studies and papers are describes the techniques for converting textual content from a paper document into machine readable form. Optical character recognition is a process where the computer understands automatically the image of handwritten ...

  6. Speaker Segmentation and Clustering Using Gender Information

    Science.gov (United States)

    2006-02-01

    used in the first stages of segmentation forder information in the clustering of the opposite-gender speaker diarization of news broadcasts. files, the...AFRL-HE-WP-TP-2006-0026 AIR FORCE RESEARCH LABORATORY Speaker Segmentation and Clustering Using Gender Information Brian M. Ore General Dynamics...COVERED (From - To) February 2006 ProceedinLgs 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Speaker Segmentation and Clustering Using Gender Information 5b

  7. Foreign Language Analysis and Recognition (FLARe) Initial Progress

    Science.gov (United States)

    2012-11-29

    speaker diarization code was optimized to execute faster and yield a lower Diarization Error Rate (DER). Minimizing the file read and write operations...PLP features were calculated using the same procedure described in Section 2.1.1 A second set of models was estimated that include Speaker Adaptive...non-SAT HMMs. Constrained Maximum Likelihood Linear Regression (CMLLR) transforms were estimated for each speaker , and recognition lattices were

  8. Markov Models for Handwriting Recognition

    CERN Document Server

    Plotz, Thomas

    2011-01-01

    Since their first inception, automatic reading systems have evolved substantially, yet the recognition of handwriting remains an open research problem due to its substantial variation in appearance. With the introduction of Markovian models to the field, a promising modeling and recognition paradigm was established for automatic handwriting recognition. However, no standard procedures for building Markov model-based recognizers have yet been established. This text provides a comprehensive overview of the application of Markov models in the field of handwriting recognition, covering both hidden

  9. Electrolarynx Voice Recognition Utilizing Pulse Coupled Neural Network

    Directory of Open Access Journals (Sweden)

    Fatchul Arifin

    2010-08-01

    Full Text Available The laryngectomies patient has no ability to speak normally because their vocal chords have been removed. The easiest option for the patient to speak again is by using electrolarynx speech. This tool is placed on the lower chin. Vibration of the neck while speaking is used to produce sound. Meanwhile, the technology of "voice recognition" has been growing very rapidly. It is expected that the technology of "voice recognition" can also be used by laryngectomies patients who use electrolarynx.This paper describes a system for electrolarynx speech recognition. Two main parts of the system are feature extraction and pattern recognition. The Pulse Coupled Neural Network – PCNN is used to extract the feature and characteristic of electrolarynx speech. Varying of β (one of PCNN parameter also was conducted. Multi layer perceptron is used to recognize the sound patterns. There are two kinds of recognition conducted in this paper: speech recognition and speaker recognition. The speech recognition recognizes specific speech from every people. Meanwhile, speaker recognition recognizes specific speech from specific person. The system ran well. The "electrolarynx speech recognition" has been tested by recognizing of “A” and "not A" voice. The results showed that the system had 94.4% validation. Meanwhile, the electrolarynx speaker recognition has been tested by recognizing of “saya” voice from some different speakers. The results showed that the system had 92.2% validation. Meanwhile, the best β parameter of PCNN for electrolarynx recognition is 3.

  10. Speaker's presentations. Energy supply security

    International Nuclear Information System (INIS)

    Pierret, Ch.

    2000-01-01

    This document is a collection of most of the papers used by the speakers of the European Seminar on Energy Supply Security organised in Paris (at the French Ministry of Economy, Finance and Industry) on 24 November 2000 by the General Direction of Energy and Raw Materials, in co-operation with the European Commission and the French Planning Office. About 250 attendees were present, including a lot of high level Civil Servants from the 15 European State members, and their questions have allowed to create a rich debate. It took place five days before the publication, on 29 November 2000, by the European Commission, of the Green Paper 'Towards a European Strategy for the Security of Energy Supply'. This French initiative, which took place within the framework of the European Presidency of the European Union, during the second half-year 2000. will bring a first impetus to the brainstorming launched by the Commission. (author)

  11. Speaker-specific variability of phoneme durations

    CSIR Research Space (South Africa)

    Van Heerden, CJ

    2007-11-01

    Full Text Available The durations of phonemes varies for different speakers. To this end, the correlations between phonemes across different speakers are studied and a novel approach to predict unknown phoneme durations from the values of known phoneme durations for a...

  12. Automatic Imitation

    Science.gov (United States)

    Heyes, Cecilia

    2011-01-01

    "Automatic imitation" is a type of stimulus-response compatibility effect in which the topographical features of task-irrelevant action stimuli facilitate similar, and interfere with dissimilar, responses. This article reviews behavioral, neurophysiological, and neuroimaging research on automatic imitation, asking in what sense it is "automatic"…

  13. Apology Strategy in English By Native Speaker

    Directory of Open Access Journals (Sweden)

    Mezia Kemala Sari

    2016-05-01

    Full Text Available This research discussed apology strategies in English by native speaker. This descriptive study was presented within the framework of Pragmatics based on the forms of strategies due to the coding manual as found in CCSARP (Cross-Cultural Speech Acts Realization Project.The goals of this study were to describe the apology strategies in English by native speaker and identify the influencing factors of it. Data were collected through the use of the questionnaire in the form of Discourse Completion Test, which was distributed to 30 native speakers. Data were classified based on the degree of familiarity and the social distance between speaker and hearer and then the data of native will be separated and classified by the type of strategies in coding manual. The results of this study are the pattern of apology strategies of native speaker brief with the pattern that potentially occurs IFID plus Offer of repair plus Taking on responsibility. While Alerters, Explanation and Downgrading appear with less number of percentage. Then, the factors that influence the apology utterance by native speakers are the social situation, the degree of familiarity and degree of the offence which more complicated the mistake tend to produce the most complex utterances by the speaker.

  14. Forensic Face Recognition: A Survey

    NARCIS (Netherlands)

    Ali, Tauseef; Veldhuis, Raymond N.J.; Spreeuwers, Lieuwe Jan

    2010-01-01

    Beside a few papers which focus on the forensic aspects of automatic face recognition, there is not much published about it in contrast to the literature on developing new techniques and methodologies for biometric face recognition. In this report, we review forensic facial identification which is

  15. On the optimization of a mixed speaker array in an enclosed space using the virtual-speaker weighting method

    Science.gov (United States)

    Peng, Bo; Zheng, Sifa; Liao, Xiangning; Lian, Xiaomin

    2018-03-01

    In order to achieve sound field reproduction in a wide frequency band, multiple-type speakers are used. The reproduction accuracy is not only affected by the signals sent to the speakers, but also depends on the position and the number of each type of speaker. The method of optimizing a mixed speaker array is investigated in this paper. A virtual-speaker weighting method is proposed to optimize both the position and the number of each type of speaker. In this method, a virtual-speaker model is proposed to quantify the increment of controllability of the speaker array when the speaker number increases. While optimizing a mixed speaker array, the gain of the virtual-speaker transfer function is used to determine the priority orders of the candidate speaker positions, which optimizes the position of each type of speaker. Then the relative gain of the virtual-speaker transfer function is used to determine whether the speakers are redundant, which optimizes the number of each type of speaker. Finally the virtual-speaker weighting method is verified by reproduction experiments of the interior sound field in a passenger car. The results validate that the optimum mixed speaker array can be obtained using the proposed method.

  16. Effects of emotional and perceptual-motor stress on a voice recognition system's accuracy: An applied investigation

    Science.gov (United States)

    Poock, G. K.; Martin, B. J.

    1984-02-01

    This was an applied investigation examining the ability of a speech recognition system to recognize speakers' inputs when the speakers were under different stress levels. Subjects were asked to speak to a voice recognition system under three conditions: (1) normal office environment, (2) emotional stress, and (3) perceptual-motor stress. Results indicate a definite relationship between voice recognition system performance and the type of low stress reference patterns used to achieve recognition.

  17. Pitch Correlogram Clustering for Fast Speaker Identification

    Directory of Open Access Journals (Sweden)

    Nitin Jhanwar

    2004-12-01

    Full Text Available Gaussian mixture models (GMMs are commonly used in text-independent speaker identification systems. However, for large speaker databases, their high computational run-time limits their use in online or real-time speaker identification situations. Two-stage identification systems, in which the database is partitioned into clusters based on some proximity criteria and only a single-cluster GMM is run in every test, have been suggested in literature to speed up the identification process. However, most clustering algorithms used have shown limited success, apparently because the clustering and GMM feature spaces used are derived from similar speech characteristics. This paper presents a new clustering approach based on the concept of a pitch correlogram that captures frame-to-frame pitch variations of a speaker rather than short-time spectral characteristics like cepstral coefficient, spectral slopes, and so forth. The effectiveness of this two-stage identification process is demonstrated on the IVIE corpus of 110 speakers. The overall system achieves a run-time advantage of 500% as well as a 10% reduction of error in overall speaker identification.

  18. Speakers' choice of frame in binary choice

    Directory of Open Access Journals (Sweden)

    Marc van Buiten

    2009-02-01

    Full Text Available A distinction is proposed between extit{recommending for} preferred choice options and extit{recommending against} non-preferred choice options. In binary choice, both recommendation modes are logically, though not psychologically, equivalent. We report empirical evidence showing that speakers recommending for preferred options predominantly select positive frames, which are less common when speakers recommend against non-preferred options. In addition, option attractiveness is shown to affect speakers' choice of frame, and adoption of recommendation mode. The results are interpreted in terms of three compatibility effects, (i extit{recommendation mode---valence framing compatibility}: speakers' preference for positive framing is enhanced under extit{recommending for} and diminished under extit{recommending against} instructions, (ii extit{option attractiveness---valence framing compatibility}: speakers' preference for positive framing is more pronounced for attractive than for unattractive options, and (iii extit{recommendation mode---option attractiveness compatibility}: speakers are more likely to adopt a extit{recommending for} approach for attractive than for unattractive binary choice pairs.

  19. Audiovisual perceptual learning with multiple speakers.

    Science.gov (United States)

    Mitchel, Aaron D; Gerfen, Chip; Weiss, Daniel J

    2016-05-01

    One challenge for speech perception is between-speaker variability in the acoustic parameters of speech. For example, the same phoneme (e.g. the vowel in "cat") may have substantially different acoustic properties when produced by two different speakers and yet the listener must be able to interpret these disparate stimuli as equivalent. Perceptual tuning, the use of contextual information to adjust phonemic representations, may be one mechanism that helps listeners overcome obstacles they face due to this variability during speech perception. Here we test whether visual contextual cues to speaker identity may facilitate the formation and maintenance of distributional representations for individual speakers, allowing listeners to adjust phoneme boundaries in a speaker-specific manner. We familiarized participants to an audiovisual continuum between /aba/ and /ada/. During familiarization, the "b-face" mouthed /aba/ when an ambiguous token was played, while the "D-face" mouthed /ada/. At test, the same ambiguous token was more likely to be identified as /aba/ when paired with a stilled image of the "b-face" than with an image of the "D-face." This was not the case in the control condition when the two faces were paired equally with the ambiguous token. Together, these results suggest that listeners may form speaker-specific phonemic representations using facial identity cues.

  20. The problem of automatic identification of concepts

    International Nuclear Information System (INIS)

    Andreewsky, Alexandre

    1975-11-01

    This paper deals with the problem of the automatic recognition of concepts and describes an important language tool, the ''linguistic filter'', which facilitates the construction of statistical algorithms. Certain special filters, of prepositions, conjunctions, negatives, logical implication, compound words, are presented. This is followed by a detailed description of a statistical algorithm allowing recognition of pronoun referents, and finally the problem of the automatic treatment of negatives in French is discussed [fr

  1. ISOLATED SPEECH RECOGNITION SYSTEM FOR TAMIL LANGUAGE USING STATISTICAL PATTERN MATCHING AND MACHINE LEARNING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    VIMALA C.

    2015-05-01

    Full Text Available In recent years, speech technology has become a vital part of our daily lives. Various techniques have been proposed for developing Automatic Speech Recognition (ASR system and have achieved great success in many applications. Among them, Template Matching techniques like Dynamic Time Warping (DTW, Statistical Pattern Matching techniques such as Hidden Markov Model (HMM and Gaussian Mixture Models (GMM, Machine Learning techniques such as Neural Networks (NN, Support Vector Machine (SVM, and Decision Trees (DT are most popular. The main objective of this paper is to design and develop a speaker-independent isolated speech recognition system for Tamil language using the above speech recognition techniques. The background of ASR system, the steps involved in ASR, merits and demerits of the conventional and machine learning algorithms and the observations made based on the experiments are presented in this paper. For the above developed system, highest word recognition accuracy is achieved with HMM technique. It offered 100% accuracy during training process and 97.92% for testing process.

  2. Human phoneme recognition depending on speech-intrinsic variability.

    Science.gov (United States)

    Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger

    2010-11-01

    The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).

  3. Speech Recognition on Mobile Devices

    DEFF Research Database (Denmark)

    Tan, Zheng-Hua; Lindberg, Børge

    2010-01-01

    in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within......The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR...

  4. Gricean Semantics and Vague Speaker-Meaning

    OpenAIRE

    Schiffer, Stephen

    2017-01-01

    Presentations of Gricean semantics, including Stephen Neale’s in “Silent Reference,” totally ignore vagueness, even though virtually every utterance is vague. I ask how Gricean semantics might be adjusted to accommodate vague speaker-meaning. My answer is that it can’t accommodate it: the Gricean program collapses in the face of vague speaker-meaning. The Gricean might, however, fi nd some solace in knowing that every other extant meta-semantic and semantic program is in the same boat.

  5. Automatic recognition of emotions from facial expressions

    Science.gov (United States)

    Xue, Henry; Gertner, Izidor

    2014-06-01

    In the human-computer interaction (HCI) process it is desirable to have an artificial intelligent (AI) system that can identify and categorize human emotions from facial expressions. Such systems can be used in security, in entertainment industries, and also to study visual perception, social interactions and disorders (e.g. schizophrenia and autism). In this work we survey and compare the performance of different feature extraction algorithms and classification schemes. We introduce a faster feature extraction method that resizes and applies a set of filters to the data images without sacrificing the accuracy. In addition, we have enhanced SVM to multiple dimensions while retaining the high accuracy rate of SVM. The algorithms were tested using the Japanese Female Facial Expression (JAFFE) Database and the Database of Faces (AT&T Faces).

  6. Automatic Target Recognition for Hyperspectral Imagery

    Science.gov (United States)

    2012-03-01

    covariance matrix of the current processing window (Smetek, 2007). RX scores are then compared to a given threshold, Trx , and if RX is greater than Trx ...the pixel is labeled as an anomaly. Trx is based on the χ2-distribution with p degrees of freedom and p is the dimensionality of the data (Smetek

  7. Composite Classifiers for Automatic Target Recognition

    National Research Council Canada - National Science Library

    Wang, Lin-Cheng

    1998-01-01

    ...) using forward-looking infrared (FLIR) imagery. Two existing classifiers, one based on learning vector quantization and the other on modular neural networks, are used as the building blocks for our composite classifiers...

  8. Bibliographic Entity Automatic Recognition and Disambiguation

    CERN Document Server

    AUTHOR|(SzGeCERN)766022

    This master thesis reports an applied machine learning research internship done at digital library of the European Organization for Nuclear Research (CERN). The way an author’s name may vary in its representation across scientific publications creates ambiguity when it comes to uniquely identifying an author; In the database of any scientific digital library, the same full name variation can be used by more than one author. This may occur even between authors from the same research affiliation. In this work, we built a machine learning based author name disambiguation solution. The approach consists in learning a distance function from a ground-truth data, blocking publications of broadly similar author names, and clustering the publications using a semi-supervised strategy within each of the blocks. The main contributions of this work are twofold; first, improving the distance model by taking into account the (estimated) ethnicity of the author’s full name. Indeed, names from different ethnicities, for e...

  9. Neural decoding of attentional selection in multi-speaker environments without access to clean sources

    Science.gov (United States)

    O'Sullivan, James; Chen, Zhuo; Herrero, Jose; McKhann, Guy M.; Sheth, Sameer A.; Mehta, Ashesh D.; Mesgarani, Nima

    2017-10-01

    Objective. People who suffer from hearing impairments can find it difficult to follow a conversation in a multi-speaker environment. Current hearing aids can suppress background noise; however, there is little that can be done to help a user attend to a single conversation amongst many without knowing which speaker the user is attending to. Cognitively controlled hearing aids that use auditory attention decoding (AAD) methods are the next step in offering help. Translating the successes in AAD research to real-world applications poses a number of challenges, including the lack of access to the clean sound sources in the environment with which to compare with the neural signals. We propose a novel framework that combines single-channel speech separation algorithms with AAD. Approach. We present an end-to-end system that (1) receives a single audio channel containing a mixture of speakers that is heard by a listener along with the listener’s neural signals, (2) automatically separates the individual speakers in the mixture, (3) determines the attended speaker, and (4) amplifies the attended speaker’s voice to assist the listener. Main results. Using invasive electrophysiology recordings, we identified the regions of the auditory cortex that contribute to AAD. Given appropriate electrode locations, our system is able to decode the attention of subjects and amplify the attended speaker using only the mixed audio. Our quality assessment of the modified audio demonstrates a significant improvement in both subjective and objective speech quality measures. Significance. Our novel framework for AAD bridges the gap between the most recent advancements in speech processing technologies and speech prosthesis research and moves us closer to the development of cognitively controlled hearable devices for the hearing impaired.

  10. Semantic Activity Recognition

    OpenAIRE

    Thonnat , Monique

    2008-01-01

    International audience; Extracting automatically the semantics from visual data is a real challenge. We describe in this paper how recent work in cognitive vision leads to significative results in activity recognition for visualsurveillance and video monitoring. In particular we present work performed in the domain of video understanding in our PULSAR team at INRIA in Sophia Antipolis. Our main objective is to analyse in real-time video streams captured by static video cameras and to recogniz...

  11. Speaker Prediction based on Head Orientations

    NARCIS (Netherlands)

    Rienks, R.J.; Poppe, Ronald Walter; van Otterlo, M.; Poel, Mannes; Poel, M.; Nijholt, A.; Nijholt, Antinus

    2005-01-01

    To gain insight into gaze behavior in meetings, this paper compares the results from a Naive Bayes classifier, Neural Networks and humans on speaker prediction in four-person meetings given solely the azimuth head angles. The Naive Bayes classifier scored 69.4% correctly, Neural Networks 62.3% and

  12. Study of audio speakers containing ferrofluid

    Energy Technology Data Exchange (ETDEWEB)

    Rosensweig, R E [34 Gloucester Road, Summit, NJ 07901 (United States); Hirota, Y; Tsuda, S [Ferrotec, 1-4-14 Kyobashi, chuo-Ku, Tokyo 104-0031 (Japan); Raj, K [Ferrotec, 33 Constitution Drive, Bedford, NH 03110 (United States)

    2008-05-21

    This work validates a method for increasing the radial restoring force on the voice coil in audio speakers containing ferrofluid. In addition, a study is made of factors influencing splash loss of the ferrofluid due to shock. Ferrohydrodynamic analysis is employed throughout to model behavior, and predictions are compared to experimental data.

  13. B Anand | Speakers | Indian Academy of Sciences

    Indian Academy of Sciences (India)

    However, the mechanism by which this protospacer fragment gets integrated in a directional fashion into the leader proximal end is elusive. The speakers group identified that the leader region abutting the first CRISPR repeat localizes Integration Host Factor (IHF) and Cas1-2 complex in Escherichia coli. IHF binding to the ...

  14. Using timing information in speaker verification

    CSIR Research Space (South Africa)

    Van Heerden, CJ

    2005-11-01

    Full Text Available This paper presents an analysis of temporal information as a feature for use in speaker verification systems. The relevance of temporal information in a speaker’s utterances is investigated, both with regard to improving the robustness of modern...

  15. Age of Acquisition and Sensitivity to Gender in Spanish Word Recognition

    Science.gov (United States)

    Foote, Rebecca

    2014-01-01

    Speakers of gender-agreement languages use gender-marked elements of the noun phrase in spoken-word recognition: A congruent marking on a determiner or adjective facilitates the recognition of a subsequent noun, while an incongruent marking inhibits its recognition. However, while monolinguals and early language learners evidence this…

  16. A virtual speaker in noisy classroom conditions: supporting or disrupting children's listening comprehension?

    Science.gov (United States)

    Nirme, Jens; Haake, Magnus; Lyberg Åhlander, Viveka; Brännström, Jonas; Sahlén, Birgitta

    2018-04-05

    Seeing a speaker's face facilitates speech recognition, particularly under noisy conditions. Evidence for how it might affect comprehension of the content of the speech is more sparse. We investigated how children's listening comprehension is affected by multi-talker babble noise, with or without presentation of a digitally animated virtual speaker, and whether successful comprehension is related to performance on a test of executive functioning. We performed a mixed-design experiment with 55 (34 female) participants (8- to 9-year-olds), recruited from Swedish elementary schools. The children were presented with four different narratives, each in one of four conditions: audio-only presentation in a quiet setting, audio-only presentation in noisy setting, audio-visual presentation in a quiet setting, and audio-visual presentation in a noisy setting. After each narrative, the children answered questions on the content and rated their perceived listening effort. Finally, they performed a test of executive functioning. We found significantly fewer correct answers to explicit content questions after listening in noise. This negative effect was only mitigated to a marginally significant degree by audio-visual presentation. Strong executive function only predicted more correct answers in quiet settings. Altogether, our results are inconclusive regarding how seeing a virtual speaker affects listening comprehension. We discuss how methodological adjustments, including modifications to our virtual speaker, can be used to discriminate between possible explanations to our results and contribute to understanding the listening conditions children face in a typical classroom.

  17. Pattern recognition

    CERN Document Server

    Theodoridis, Sergios

    2003-01-01

    Pattern recognition is a scientific discipline that is becoming increasingly important in the age of automation and information handling and retrieval. Patter Recognition, 2e covers the entire spectrum of pattern recognition applications, from image analysis to speech recognition and communications. This book presents cutting-edge material on neural networks, - a set of linked microprocessors that can form associations and uses pattern recognition to ""learn"" -and enhances student motivation by approaching pattern recognition from the designer's point of view. A direct result of more than 10

  18. Unsupervised Speaker Change Detection for Broadcast News Segmentation

    DEFF Research Database (Denmark)

    Jørgensen, Kasper Winther; Mølgaard, Lasse Lohilahti; Hansen, Lars Kai

    2006-01-01

    This paper presents a speaker change detection system for news broadcast segmentation based on a vector quantization (VQ) approach. The system does not make any assumption about the number of speakers or speaker identity. The system uses mel frequency cepstral coefficients and change detection...

  19. Accent Attribution in Speakers with Foreign Accent Syndrome

    Science.gov (United States)

    Verhoeven, Jo; De Pauw, Guy; Pettinato, Michele; Hirson, Allen; Van Borsel, John; Marien, Peter

    2013-01-01

    Purpose: The main aim of this experiment was to investigate the perception of Foreign Accent Syndrome in comparison to speakers with an authentic foreign accent. Method: Three groups of listeners attributed accents to conversational speech samples of 5 FAS speakers which were embedded amongst those of 5 speakers with a real foreign accent and 5…

  20. Young Children's Sensitivity to Speaker Gender When Learning from Others

    Science.gov (United States)

    Ma, Lili; Woolley, Jacqueline D.

    2013-01-01

    This research explores whether young children are sensitive to speaker gender when learning novel information from others. Four- and 6-year-olds ("N" = 144) chose between conflicting statements from a male versus a female speaker (Studies 1 and 3) or decided which speaker (male or female) they would ask (Study 2) when learning about the functions…

  1. Speaker Reliability Guides Children's Inductive Inferences about Novel Properties

    Science.gov (United States)

    Kim, Sunae; Kalish, Charles W.; Harris, Paul L.

    2012-01-01

    Prior work shows that children can make inductive inferences about objects based on their labels rather than their appearance (Gelman, 2003). A separate line of research shows that children's trust in a speaker's label is selective. Children accept labels from a reliable speaker over an unreliable speaker (e.g., Koenig & Harris, 2005). In the…

  2. Effects of a metronome on the filled pauses of fluent speakers.

    Science.gov (United States)

    Christenfeld, N

    1996-12-01

    Filled pauses (the "ums" and "uhs" that litter spontaneous speech) seem to be a product of the speaker paying deliberate attention to the normally automatic act of talking. This is the same sort of explanation that has been offered for stuttering. In this paper we explore whether a manipulation that has long been known to decrease stuttering, synchronizing speech to the beats of a metronome, will then also decrease filled pauses. Two experiments indicate that a metronome has a dramatic effect on the production of filled pauses. This effect is not due to any simplification or slowing of the speech and supports the view that a metronome causes speakers to attend more to how they are talking and less to what they are saying. It also lends support to the connection between stutters and filled pauses.

  3. Bilingual Language Switching: Production vs. Recognition

    Science.gov (United States)

    Mosca, Michela; de Bot, Kees

    2017-01-01

    This study aims at assessing how bilinguals select words in the appropriate language in production and recognition while minimizing interference from the non-appropriate language. Two prominent models are considered which assume that when one language is in use, the other is suppressed. The Inhibitory Control (IC) model suggests that, in both production and recognition, the amount of inhibition on the non-target language is greater for the stronger compared to the weaker language. In contrast, the Bilingual Interactive Activation (BIA) model proposes that, in language recognition, the amount of inhibition on the weaker language is stronger than otherwise. To investigate whether bilingual language production and recognition can be accounted for by a single model of bilingual processing, we tested a group of native speakers of Dutch (L1), advanced speakers of English (L2) in a bilingual recognition and production task. Specifically, language switching costs were measured while participants performed a lexical decision (recognition) and a picture naming (production) task involving language switching. Results suggest that while in language recognition the amount of inhibition applied to the non-appropriate language increases along with its dominance as predicted by the IC model, in production the amount of inhibition applied to the non-relevant language is not related to language dominance, but rather it may be modulated by speakers' unconscious strategies to foster the weaker language. This difference indicates that bilingual language recognition and production might rely on different processing mechanisms and cannot be accounted within one of the existing models of bilingual language processing. PMID:28638361

  4. Vocal caricatures reveal signatures of speaker identity

    Science.gov (United States)

    López, Sabrina; Riera, Pablo; Assaneo, María Florencia; Eguía, Manuel; Sigman, Mariano; Trevisan, Marcos A.

    2013-12-01

    What are the features that impersonators select to elicit a speaker's identity? We built a voice database of public figures (targets) and imitations produced by professional impersonators. They produced one imitation based on their memory of the target (caricature) and another one after listening to the target audio (replica). A set of naive participants then judged identity and similarity of pairs of voices. Identity was better evoked by the caricatures and replicas were perceived to be closer to the targets in terms of voice similarity. We used this data to map relevant acoustic dimensions for each task. Our results indicate that speaker identity is mainly associated with vocal tract features, while perception of voice similarity is related to vocal folds parameters. We therefore show the way in which acoustic caricatures emphasize identity features at the cost of loosing similarity, which allows drawing an analogy with caricatures in the visual space.

  5. Methods of Speakers\\' Effects on the Audience

    Directory of Open Access Journals (Sweden)

    فریبا حسینی

    2010-09-01

    Full Text Available Methods of Speakers' Effects on the Audience    Nasrollah Shameli *   Fariba Hosayni **     Abstract   This article is focused on four issues. The first issue is related to the speaker's external appearance including the beauty of face, the power of his voice, moves and signals by hand, the stick and eyebrow as well as the height. Such characteristics could have an important effect on the audience. The second issue is related to internal features of the speaker. These include the ethics of the preacher , his/her piety and intention on the speakers based on their personalities, habits and emotions, knowledge and culture, and speed of learning. The third issue is concerned with the appearance of the lecture. Words should be clear enough as well as being mixed with Quranic verses, poetry and proverbs. The final issue is related to the content. It is argued that the subject of the talk should be in accordance with the level of understanding of listeners as well as being new and interesting for them.   3 - A phenomenon rhetoric: It was noted in this section How to give words and phrases so that these words and phrases are clear, correct, mixed in parables, governance and Quranic verses, and appropriate their meaning.   4 - the content of Oratory : It was noted in this section to the topic of Oratory and say that the Oratory should be the theme commensurate with the minds of audiences and also should mean that agree with the case may be, then I say: that the rhetoric if the theme was innovative and new is affecting more and more on the audience.     Key words : Oratory , Preacher , Audience, Influence of speech     * Associate Professor, Department of Arabic Language and Literature, University of Isfahan E-mail: Dr-Nasrolla Shameli@Yahoo.com   * * M.A. in Arabic Language and Literature from Isfahan University E-mail: faribahosayni@yahoo.com

  6. Shopping Behavior Recognition using a Language Modeling Analogy

    NARCIS (Netherlands)

    Popa, M.C.; Rothkrantz, L.J.M.; Wiggers, P.; Shan, C.

    2012-01-01

    Automatic understanding and recognition of human shopping behavior has many potential applications, attracting an increasing interest inthe market- ing domain. The reliability and performance of the automatic recognition system is highly in uenced by the adopted theoretical model of behavior. In

  7. Physiological responses at short distances from a parametric speaker

    Directory of Open Access Journals (Sweden)

    Lee Soomin

    2012-06-01

    Full Text Available Abstract In recent years, parametric speakers have been used in various circumstances. In our previous studies, we verified that the physiological burden of the sound of parametric speaker set at 2.6 m from the subjects was lower than that of the general speaker. However, nothing has yet been demonstrated about the effects of the sound of a parametric speaker at the shorter distance between parametric speakers the human body. Therefore, we studied this effect on physiological functions and task performance. Nine male subjects participated in this study. They completed three consecutive sessions: a 20-minute quiet period as a baseline, a 30-minute mental task period with general speakers or parametric speakers, and a 20-minute recovery period. We measured electrocardiogram (ECG photoplethysmogram (PTG, electroencephalogram (EEG, systolic and diastolic blood pressure. Four experiments, one with a speaker condition (general speaker and parametric speaker, the other with a distance condition (0.3 m and 1.0 m, were conducted respectively at the same time of day on separate days. To examine the effects of the speaker and distance, three-way repeated measures ANOVA (speaker factor x distance factor x time factor were conducted. In conclusion, we found that the physiological responses were not significantly different between the speaker condition and the distance condition. Meanwhile, it was shown that the physiological burdens increased with progress in time independently of speaker condition and distance condition. In summary, the effects of the parametric speaker at the 2.6 m distance were not obtained at the distance of 1 m or less.

  8. English Speakers Attend More Strongly than Spanish Speakers to Manner of Motion when Classifying Novel Objects and Events

    Science.gov (United States)

    Kersten, Alan W.; Meissner, Christian A.; Lechuga, Julia; Schwartz, Bennett L.; Albrechtsen, Justin S.; Iglesias, Adam

    2010-01-01

    Three experiments provide evidence that the conceptualization of moving objects and events is influenced by one's native language, consistent with linguistic relativity theory. Monolingual English speakers and bilingual Spanish/English speakers tested in an English-speaking context performed better than monolingual Spanish speakers and bilingual…

  9. Grammatical Planning Units during Real-Time Sentence Production in Speakers with Agrammatic Aphasia and Healthy Speakers

    Science.gov (United States)

    Lee, Jiyeon; Yoshida, Masaya; Thompson, Cynthia K.

    2015-01-01

    Purpose: Grammatical encoding (GE) is impaired in agrammatic aphasia; however, the nature of such deficits remains unclear. We examined grammatical planning units during real-time sentence production in speakers with agrammatic aphasia and control speakers, testing two competing models of GE. We queried whether speakers with agrammatic aphasia…

  10. Facial recognition in education system

    Science.gov (United States)

    Krithika, L. B.; Venkatesh, K.; Rathore, S.; Kumar, M. Harish

    2017-11-01

    Human beings exploit emotions comprehensively for conveying messages and their resolution. Emotion detection and face recognition can provide an interface between the individuals and technologies. The most successful applications of recognition analysis are recognition of faces. Many different techniques have been used to recognize the facial expressions and emotion detection handle varying poses. In this paper, we approach an efficient method to recognize the facial expressions to track face points and distances. This can automatically identify observer face movements and face expression in image. This can capture different aspects of emotion and facial expressions.

  11. Data complexity in pattern recognition

    CERN Document Server

    Kam Ho Tin

    2006-01-01

    Machines capable of automatic pattern recognition have many fascinating uses. Algorithms for supervised classification, where one infers a decision boundary from a set of training examples, are at the core of this capability. This book looks at data complexity and its role in shaping the theories and techniques in different disciplines

  12. Speech recognition from spectral dynamics

    Indian Academy of Sciences (India)

    Some of the history of gradual infusion of the modulation spectrum concept into Automatic recognition of speech (ASR) comes next, pointing to the relationship of modulation spectrum processing to wellaccepted ASR techniques such as dynamic speech features or RelAtive SpecTrAl (RASTA) filtering. Next, the frequency ...

  13. License plate recognition using DTCNNs

    NARCIS (Netherlands)

    ter Brugge, M.H; Stevens, J.H; Nijhuis, J.A G; Spaanenburg, L; Tavsanonoglu, V

    1998-01-01

    Automatic license plate recognition requires a series of complex image processing steps. For practical use, the amount of data to he processed must be minimized early on. This paper shows that the computationally most intensive steps can be realized by DTCNNs. Moreover; high-level operations like

  14. SURVEY OF BIOMETRIC SYSTEMS USING IRIS RECOGNITION

    OpenAIRE

    S.PON SANGEETHA; DR.M.KARNAN

    2014-01-01

    The security plays an important role in any type of organization in today’s life. Iris recognition is one of the leading automatic biometric systems in the area of security which is used to identify the individual person. Biometric systems include fingerprints, facial features, voice recognition, hand geometry, handwriting, the eye retina and the most secured one presented in this paper, the iris recognition. Biometric systems has become very famous in security systems because it is not possi...

  15. Target recognition based on convolutional neural network

    Science.gov (United States)

    Wang, Liqiang; Wang, Xin; Xi, Fubiao; Dong, Jian

    2017-11-01

    One of the important part of object target recognition is the feature extraction, which can be classified into feature extraction and automatic feature extraction. The traditional neural network is one of the automatic feature extraction methods, while it causes high possibility of over-fitting due to the global connection. The deep learning algorithm used in this paper is a hierarchical automatic feature extraction method, trained with the layer-by-layer convolutional neural network (CNN), which can extract the features from lower layers to higher layers. The features are more discriminative and it is beneficial to the object target recognition.

  16. Speaker-dependent Dictionary-based Speech Enhancement for Text-Dependent Speaker Verification

    DEFF Research Database (Denmark)

    Thomsen, Nicolai Bæk; Thomsen, Dennis Alexander Lehmann; Tan, Zheng-Hua

    2016-01-01

    not perform well in this setting. In this work we compare the performance of different noise reduction methods under different noise conditions in terms of speaker verification when the text is known and the system is trained on clean data (mis-matched conditions). We furthermore propose a new approach based......The problem of text-dependent speaker verification under noisy conditions is becoming ever more relevant, due to increased usage for authentication in real-world applications. Classical methods for noise reduction such as spectral subtraction and Wiener filtering introduce distortion and do...... on dictionary-based noise reduction and compare it to the baseline methods....

  17. Listening to a non-native speaker: Adaptation and generalization

    Science.gov (United States)

    Clarke, Constance M.

    2004-05-01

    Non-native speech can cause perceptual difficulty for the native listener, but experience can moderate this difficulty. This study explored the perceptual benefit of a brief (approximately 1 min) exposure to foreign-accented speech using a cross-modal word matching paradigm. Processing speed was tracked by recording reaction times (RTs) to visual probe words following English sentences produced by a Spanish-accented speaker. In experiment 1, RTs decreased significantly over 16 accented utterances and by the end were equal to RTs to a native voice. In experiment 2, adaptation to one Spanish-accented voice improved perceptual efficiency for a new Spanish-accented voice, indicating that abstract properties of accented speech are learned during adaptation. The control group in Experiment 2 also adapted to the accented voice during the test block, suggesting adaptation can occur within two to four sentences. The results emphasize the flexibility of the human speech processing system and the need for a mechanism to explain this adaptation in models of spoken word recognition. [Research supported by an NSF Graduate Research Fellowship and the University of Arizona Cognitive Science Program.] a)Currently at SUNY at Buffalo, Dept. of Psych., Park Hall, Buffalo, NY 14260, cclarke2@buffalo.edu

  18. Speaker's voice as a memory cue.

    Science.gov (United States)

    Campeanu, Sandra; Craik, Fergus I M; Alain, Claude

    2015-02-01

    Speaker's voice occupies a central role as the cornerstone of auditory social interaction. Here, we review the evidence suggesting that speaker's voice constitutes an integral context cue in auditory memory. Investigation into the nature of voice representation as a memory cue is essential to understanding auditory memory and the neural correlates which underlie it. Evidence from behavioral and electrophysiological studies suggest that while specific voice reinstatement (i.e., same speaker) often appears to facilitate word memory even without attention to voice at study, the presence of a partial benefit of similar voices between study and test is less clear. In terms of explicit memory experiments utilizing unfamiliar voices, encoding methods appear to play a pivotal role. Voice congruency effects have been found when voice is specifically attended at study (i.e., when relatively shallow, perceptual encoding takes place). These behavioral findings coincide with neural indices of memory performance such as the parietal old/new recollection effect and the late right frontal effect. The former distinguishes between correctly identified old words and correctly identified new words, and reflects voice congruency only when voice is attended at study. Characterization of the latter likely depends upon voice memory, rather than word memory. There is also evidence to suggest that voice effects can be found in implicit memory paradigms. However, the presence of voice effects appears to depend greatly on the task employed. Using a word identification task, perceptual similarity between study and test conditions is, like for explicit memory tests, crucial. In addition, the type of noise employed appears to have a differential effect. While voice effects have been observed when white noise is used at both study and test, using multi-talker babble does not confer the same results. In terms of neuroimaging research modulations, characterization of an implicit memory effect

  19. Speaker diarization system using HXLPS and deep neural network

    Directory of Open Access Journals (Sweden)

    V. Subba Ramaiah

    2018-03-01

    Full Text Available In general, speaker diarization is defined as the process of segmenting the input speech signal and grouped the homogenous regions with regard to the speaker identity. The main idea behind this system is that it is able to discriminate the speaker signal by assigning the label of the each speaker signal. Due to rapid growth of broadcasting and meeting, the speaker diarization is burdensome to enhance the readability of the speech transcription. In order to solve this issue, Holoentropy with the eXtended Linear Prediction using autocorrelation Snapshot (HXLPS and deep neural network (DNN is proposed for the speaker diarization system. The HXLPS extraction method is newly developed by incorporating the Holoentropy with the XLPS. Once we attain the features, the speech and non-speech signals are detected by the Voice Activity Detection (VAD method. Then, i-vector representation of every segmented signal is obtained using Universal Background Model (UBM model. Consequently, DNN is utilized to assign the label for the speaker signal which is then clustered according to the speaker label. The performance is analysed using the evaluation metrics, such as tracking distance, false alarm rate and diarization error rate. The outcome of the proposed method ensures the better diarization performance by achieving the lower DER of 1.36% based on lambda value and DER of 2.23% depends on the frame length. Keywords: Speaker diarization, HXLPS feature extraction, Voice activity detection, Deep neural network, Speaker clustering, Diarization Error Rate (DER

  20. Pradyut Ghosh | Speakers | Indian Academy of Sciences

    Indian Academy of Sciences (India)

    ... synthetic receptors has received increasing attention for its importance in environment, ... Thus, the fundamental chemistry of selective recognition, sensing and ... considered to be very important to develop technologies in water purification ...

  1. Teaching Arabic to Native Speakers Educational Games in a New Curriculum

    DEFF Research Database (Denmark)

    Alshikhabobakr, Hanan; Papadopoulos, Pantelis M.; Ibrahim, Zeinab

    2016-01-01

    This paper presents nine educational games and activities for learning the Arabic language developed for Arabiyyatii project, a three-year endeavor that involves re-conceptualization of the standard Arabic language learning curriculum as a first language for kindergarten students. The applications...... presented in this paper are developed for tabletop surface computers, which support a collaborative and interactive learning environment. These applications focus on speaking, word production, and sentence recognition of the Modern Standard Arabic to young native speakers. This work employs...... an interdisciplinary research framework, exploiting best practices used from related disciplines. Namely: computer-supported collaborative learning, language learning, teaching and learning pedagogy, instructional design and scaffolding....

  2. Speech Recognition

    Directory of Open Access Journals (Sweden)

    Adrian Morariu

    2009-01-01

    Full Text Available This paper presents a method of speech recognition by pattern recognition techniques. Learning consists in determining the unique characteristics of a word (cepstral coefficients by eliminating those characteristics that are different from one word to another. For learning and recognition, the system will build a dictionary of words by determining the characteristics of each word to be used in the recognition. Determining the characteristics of an audio signal consists in the following steps: noise removal, sampling it, applying Hamming window, switching to frequency domain through Fourier transform, calculating the magnitude spectrum, filtering data, determining cepstral coefficients.

  3. Finite element speaker-specific face model generation for the study of speech production.

    Science.gov (United States)

    Bucki, Marek; Nazari, Mohammad Ali; Payan, Yohan

    2010-08-01

    In situations where automatic mesh generation is unsuitable, the finite element (FE) mesh registration technique known as mesh-match-and-repair (MMRep) is an interesting option for quickly creating a subject-specific FE model by fitting a predefined template mesh onto the target organ. The irregular or poor quality elements produced by the elastic deformation are corrected by a 'mesh reparation' procedure ensuring that the desired regularity and quality standards are met. Here, we further extend the MMRep capabilities and demonstrate the possibility of taking into account additional relevant anatomical features. We illustrate this approach with an example of biomechanical model generation of a speaker's face comprising face muscle insertions. While taking advantage of the a priori knowledge about tissues conveyed by the template model, this novel, fast and automatic mesh registration technique makes it possible to achieve greater modelling realism by accurately representing the organ surface as well as inner anatomical or functional structures of interest.

  4. Speech pattern recognition for forensic acoustic purposes

    OpenAIRE

    Herrera Martínez, Marcelo; Aldana Blanco, Andrea Lorena; Guzmán Palacios, Ana María

    2014-01-01

    The present paper describes the development of a software for analysis of acoustic voice parameters (APAVOIX), which can be used for forensic acoustic purposes, based on the speaker recognition and identification. This software enables to observe in a clear manner, the parameters which are sufficient and necessary when performing a comparison between two voice signals, the suspicious and the original one. These parameters are used according to the classic method, generally used by state entit...

  5. Influence of binary mask estimation errors on robust speaker identification

    DEFF Research Database (Denmark)

    May, Tobias

    2017-01-01

    Missing-data strategies have been developed to improve the noise-robustness of automatic speech recognition systems in adverse acoustic conditions. This is achieved by classifying time-frequency (T-F) units into reliable and unreliable components, as indicated by a so-called binary mask. Different...... approaches have been proposed to handle unreliable feature components, each with distinct advantages. The direct masking (DM) approach attenuates unreliable T-F units in the spectral domain, which allows the extraction of conventionally used mel-frequency cepstral coefficients (MFCCs). Instead of attenuating....... Since each of these approaches utilizes the knowledge about reliable and unreliable feature components in a different way, they will respond differently to estimation errors in the binary mask. The goal of this study was to identify the most effective strategy to exploit knowledge about reliable...

  6. Effect of lisping on audience evaluation of male speakers.

    Science.gov (United States)

    Mowrer, D E; Wahl, P; Doolan, S J

    1978-05-01

    The social consequences of adult listeners' first impression of lisping were evaluated in two studies. Five adult speakers were rated by adult listeners with regard to speaking ability, intelligence, education, masculinity, and friendship. Results from both studies indicate that listeners rate adult speakers who demonstrate frontal lisping lower than nonlispers in all five categories investigated. Efforts to correct frontal lisping are justifiable on the basis of the poor impression lisping speakers make on the listener.

  7. The Speaker Gender Gap at Critical Care Conferences.

    Science.gov (United States)

    Mehta, Sangeeta; Rose, Louise; Cook, Deborah; Herridge, Margaret; Owais, Sawayra; Metaxa, Victoria

    2018-06-01

    To review women's participation as faculty at five critical care conferences over 7 years. Retrospective analysis of five scientific programs to identify the proportion of females and each speaker's profession based on conference conveners, program documents, or internet research. Three international (European Society of Intensive Care Medicine, International Symposium on Intensive Care and Emergency Medicine, Society of Critical Care Medicine) and two national (Critical Care Canada Forum, U.K. Intensive Care Society State of the Art Meeting) annual critical care conferences held between 2010 and 2016. Female faculty speakers. None. Male speakers outnumbered female speakers at all five conferences, in all 7 years. Overall, women represented 5-31% of speakers, and female physicians represented 5-26% of speakers. Nursing and allied health professional faculty represented 0-25% of speakers; in general, more than 50% of allied health professionals were women. Over the 7 years, Society of Critical Care Medicine had the highest representation of female (27% overall) and nursing/allied health professional (16-25%) speakers; notably, male physicians substantially outnumbered female physicians in all years (62-70% vs 10-19%, respectively). Women's representation on conference program committees ranged from 0% to 40%, with Society of Critical Care Medicine having the highest representation of women (26-40%). The female proportions of speakers, physician speakers, and program committee members increased significantly over time at the Society of Critical Care Medicine and U.K. Intensive Care Society State of the Art Meeting conferences (p gap at critical care conferences, with male faculty outnumbering female faculty. This gap is more marked among physician speakers than those speakers representing nursing and allied health professionals. Several organizational strategies can address this gender gap.

  8. On the improvement of speaker diarization by detecting overlapped speech

    OpenAIRE

    Hernando Pericás, Francisco Javier; Hernando Pericás, Francisco Javier

    2010-01-01

    Simultaneous speech in meeting environment is responsible for a certain amount of errors caused by standard speaker diarization systems. We are presenting an overlap detection system for far-field data based on spectral and spatial features, where the spatial features obtained on different microphone pairs are fused by means of principal component analysis. Detected overlap segments are applied for speaker diarization in order to increase the purity of speaker clusters an...

  9. Speaker-dependent Multipitch Tracking Using Deep Neural Networks

    Science.gov (United States)

    2015-01-01

    sentences spoken by each of 34 speakers (18 male, 16 female). Two male and two female speakers (No. 1, 2, 18, 20, same as [30]), denoted as MA1, MA2 ...Engineering Technical Report #12, 2015 Speaker Pairs MA1- MA2 MA1-FE1 MA1-FE2 MA2 -FE1 MA2 -FE2 FE1-FE2 E T ot al 0 10 20 30 40 50 60 70 80 Jin and Wang Hu and...Pitch 1 Estimated Pitch 2 (d) Figure 6: Multipitch tracking results on a test mixture (pbbv6n and priv3n) for the MA1- MA2 speaker pair. (a) Groundtruth

  10. Temporal visual cues aid speech recognition

    DEFF Research Database (Denmark)

    Zhou, Xiang; Ross, Lars; Lehn-Schiøler, Tue

    2006-01-01

    of audio to generate an artificial talking-face video and measured word recognition performance on simple monosyllabic words. RESULTS: When presenting words together with the artificial video we find that word recognition is improved over purely auditory presentation. The effect is significant (p......BACKGROUND: It is well known that under noisy conditions, viewing a speaker's articulatory movement aids the recognition of spoken words. Conventionally it is thought that the visual input disambiguates otherwise confusing auditory input. HYPOTHESIS: In contrast we hypothesize...... that it is the temporal synchronicity of the visual input that aids parsing of the auditory stream. More specifically, we expected that purely temporal information, which does not convey information such as place of articulation may facility word recognition. METHODS: To test this prediction we used temporal features...

  11. Differences in Sickness Allowance Receipt between Swedish Speakers and Finnish Speakers in Finland

    Directory of Open Access Journals (Sweden)

    Kaarina S. Reini

    2017-12-01

    Full Text Available Previous research has documented lower disability retirement and mortality rates of Swedish speakers as compared with Finnish speakers in Finland. This paper is the first to compare the two language groups with regard to the receipt of sickness allowance, which is an objective health measure that reflects a less severe poor health condition. Register-based data covering the years 1988-2011 are used. We estimate logistic regression models with generalized estimating equations to account for repeated observations at the individual level. We find that Swedish-speaking men have approximately 30 percent lower odds of receiving sickness allowance than Finnish-speaking men, whereas the difference in women is about 15 percent. In correspondence with previous research on all-cause mortality at working ages, we find no language-group difference in sickness allowance receipt in the socially most successful subgroup of the population.

  12. The Status of Native Speaker Intuitions in a Polylectal Grammar.

    Science.gov (United States)

    Debose, Charles E.

    A study of one speaker's intuitions about and performance in Black English is presented with relation to Saussure's "langue-parole" dichotomy. Native speakers of a language have intuitions about the static synchronic entities although the data of their speaking is variable and panchronic. These entities are in a diglossic relationship to each…

  13. Speaker and Observer Perceptions of Physical Tension during Stuttering.

    Science.gov (United States)

    Tichenor, Seth; Leslie, Paula; Shaiman, Susan; Yaruss, J Scott

    2017-01-01

    Speech-language pathologists routinely assess physical tension during evaluation of those who stutter. If speakers experience tension that is not visible to clinicians, then judgments of severity may be inaccurate. This study addressed this potential discrepancy by comparing judgments of tension by people who stutter and expert clinicians to determine if clinicians could accurately identify the speakers' experience of physical tension. Ten adults who stutter were audio-video recorded in two speaking samples. Two board-certified specialists in fluency evaluated the samples using the Stuttering Severity Instrument-4 and a checklist adapted for this study. Speakers rated their tension using the same forms, and then discussed their experiences in a qualitative interview so that themes related to physical tension could be identified. The degree of tension reported by speakers was higher than that observed by specialists. Tension in parts of the body that were less visible to the observer (chest, abdomen, throat) was reported more by speakers than by specialists. The thematic analysis revealed that speakers' experience of tension changes over time and that these changes may be related to speakers' acceptance of stuttering. The lack of agreement between speaker and specialist perceptions of tension suggests that using self-reports is a necessary component for supporting the accurate diagnosis of tension in stuttering. © 2018 S. Karger AG, Basel.

  14. Request Strategies in Everyday Interactions of Persian and English Speakers

    Directory of Open Access Journals (Sweden)

    Shiler Yazdanfar

    2016-12-01

    Full Text Available Cross-cultural studies of speech acts in different linguistic contexts might have interesting implications for language researchers and practitioners. Drawing on the Speech Act Theory, the present study aimed at conducting a comparative study of request speech act in Persian and English. Specifically, the study endeavored to explore the request strategies used in daily interactions of Persian and English speakers based on directness level and supportive moves. To this end, English and Persian TV series were observed and requestive utterances were transcribed. The utterances were then categorized based on Blum-Kulka and Olshtain’s Cross-Cultural Study of Speech Act Realization Pattern (CCSARP for directness level and internal and external mitigation devises. According to the results, although speakers of both languages opted for the direct level as their most frequently used strategy in their daily interactions, the English speakers used more conventionally indirect strategies than the Persian speakers did, and the Persian speakers used more non-conventionally indirect strategies than the English speakers did. Furthermore, the analyzed data revealed the fact that American English speakers use more mitigation devices in their daily interactions with friends and family members than Persian speakers.

  15. Visual speaker gender affects vowel identification in Danish

    DEFF Research Database (Denmark)

    Larsen, Charlotte; Tøndering, John

    2013-01-01

    The experiment examined the effect of visual speaker gender on the vowel perception of 20 native Danish-speaking subjects. Auditory stimuli consisting of a continuum between /muːlə/ ‘muzzle’ and /moːlə/ ‘pier’ generated using TANDEM-STRAIGHT matched with video clips of a female and a male speaker...

  16. Dysprosody and Stimulus Effects in Cantonese Speakers with Parkinson's Disease

    Science.gov (United States)

    Ma, Joan K.-Y.; Whitehill, Tara; Cheung, Katherine S.-K.

    2010-01-01

    Background: Dysprosody is a common feature in speakers with hypokinetic dysarthria. However, speech prosody varies across different types of speech materials. This raises the question of what is the most appropriate speech material for the evaluation of dysprosody. Aims: To characterize the prosodic impairment in Cantonese speakers with…

  17. Teaching Portuguese to Spanish Speakers: A Case for Trilingualism

    Science.gov (United States)

    Carvalho, Ana M.; Freire, Juliana Luna; da Silva, Antonio J. B.

    2010-01-01

    Portuguese is the sixth-most-spoken native language in the world, with approximately 240,000,000 speakers. Within the United States, there is a growing demand for K-12 language programs to engage the community of Portuguese heritage speakers. According to the 2000 U.S. census, 85,000 school-age children speak Portuguese at home. As a result, more…

  18. Profiles of an Acquisition Generation: Nontraditional Heritage Speakers of Spanish

    Science.gov (United States)

    DeFeo, Dayna Jean

    2018-01-01

    Though definitions vary, the literature on heritage speakers of Spanish identifies two primary attributes: a linguistic and cultural connection to the language. This article profiles four Anglo college students who grew up in bilingual or Spanish-dominant communities in the Southwest who self-identified as Spanish heritage speakers, citing…

  19. Progress in the AMIDA speaker diarization system for meeting data

    NARCIS (Netherlands)

    Leeuwen, D.A. van; Konečný, M.

    2008-01-01

    In this paper we describe the AMIDA speaker dizarization system as it was submitted to the NIST Rich Transcription evaluation 2007 for conference room data. This is done in the context of the history of this system and other speaker diarization systems. One of the goals of our system is to have as

  20. A hybrid generative-discriminative approach to speaker diarization

    NARCIS (Netherlands)

    Noulas, A.K.; van Kasteren, T.; Kröse, B.J.A.

    2008-01-01

    In this paper we present a sound probabilistic approach to speaker diarization. We use a hybrid framework where a distribution over the number of speakers at each point of a multimodal stream is estimated with a discriminative model. The output of this process is used as input in a generative model

  1. Guest Speakers in School-Based Sexuality Education

    Science.gov (United States)

    McRee, Annie-Laurie; Madsen, Nikki; Eisenberg, Marla E.

    2014-01-01

    This study, using data from a statewide survey (n = 332), examined teachers' practices regarding the inclusion of guest speakers to cover sexuality content. More than half of teachers (58%) included guest speakers. In multivariate analyses, teachers who taught high school, had professional preparation in health education, or who received…

  2. Internal request modification by first and second language speakers ...

    African Journals Online (AJOL)

    This study focuses on the question of whether Luganda English speakers would negatively transfer into their English speech the use of syntactic and lexical down graders resulting in pragmatic failure. Data were collected from Luganda and Luganda English speakers by means of a Discourse Completion Test (DCT) ...

  3. (En)countering native-speakerism global perspectives

    CERN Document Server

    Holliday, Adrian; Swan, Anne

    2015-01-01

    The book addresses the issue of native-speakerism, an ideology based on the assumption that 'native speakers' of English have a special claim to the language itself, through critical qualitative studies of the lived experiences of practising teachers and students in a range of scenarios.

  4. The development of the Athens Emotional States Inventory (AESI): collection, validation and automatic processing of emotionally loaded sentences.

    Science.gov (United States)

    Chaspari, Theodora; Soldatos, Constantin; Maragos, Petros

    2015-01-01

    The development of ecologically valid procedures for collecting reliable and unbiased emotional data towards computer interfaces with social and affective intelligence targeting patients with mental disorders. Following its development, presented with, the Athens Emotional States Inventory (AESI) proposes the design, recording and validation of an audiovisual database for five emotional states: anger, fear, joy, sadness and neutral. The items of the AESI consist of sentences each having content indicative of the corresponding emotion. Emotional content was assessed through a survey of 40 young participants with a questionnaire following the Latin square design. The emotional sentences that were correctly identified by 85% of the participants were recorded in a soundproof room with microphones and cameras. A preliminary validation of AESI is performed through automatic emotion recognition experiments from speech. The resulting database contains 696 recorded utterances in Greek language by 20 native speakers and has a total duration of approximately 28 min. Speech classification results yield accuracy up to 75.15% for automatically recognizing the emotions in AESI. These results indicate the usefulness of our approach for collecting emotional data with reliable content, balanced across classes and with reduced environmental variability.

  5. The Communication of Public Speaking Anxiety: Perceptions of Asian and American Speakers.

    Science.gov (United States)

    Martini, Marianne; And Others

    1992-01-01

    Finds that U.S. audiences perceive Asian speakers to have more speech anxiety than U.S. speakers, even though Asian speakers do not self-report higher anxiety levels. Confirms that speech state anxiety is not communicated effectively between speakers and audiences for Asian or U.S. speakers. (SR)

  6. Byblos Speech Recognition Benchmark Results

    National Research Council Canada - National Science Library

    Kubala, F; Austin, S; Barry, C; Makhoul, J; Placeway, P; Schwartz, R

    1991-01-01

    .... Surprisingly, the 12-speaker model performs as well as the one made from 109 speakers. Also within the RM domain, we demonstrate that state-of-the-art SI models perform poorly for speakers with strong dialects...

  7. A comparative study between MFCC and LSF coefficients in automatic recognition of isolated digits pronounced in Portuguese and English - doi: 10.4025/actascitechnol.v35i4.19825

    Directory of Open Access Journals (Sweden)

    Diego Furtado Silva

    2013-10-01

    Full Text Available Recognition of isolated spoken digits is the core procedure for a large number of applications which rely solely on speech for data exchange, as in telephone-based services, such as dialing, airline reservation, bank transaction and price quotation. Spoken digit recognition is generally a challenging task since the signals last for a short period of time and often some digits are acoustically very similar to other digits. The objective of this paper is to investigate the use of machine learning algorithms for spoken digit recognition and disclose the free availability of a database with digits pronounced in English and Portuguese to the scientific community. Since machine learning algorithms are fully dependent on predictive attributes to build precise classifiers, we believe that the most important task for successfully recognizing spoken digits is feature extraction. In this work, we show that Line Spectral Frequencies (LSF provide a set of highly predictive coefficients. We evaluated our classifiers in different settings by altering the sampling rate to simulate low quality channels and varying the number of coefficients.  

  8. Multimodal fusion of polynomial classifiers for automatic person recgonition

    Science.gov (United States)

    Broun, Charles C.; Zhang, Xiaozheng

    2001-03-01

    With the prevalence of the information age, privacy and personalization are forefront in today's society. As such, biometrics are viewed as essential components of current evolving technological systems. Consumers demand unobtrusive and non-invasive approaches. In our previous work, we have demonstrated a speaker verification system that meets these criteria. However, there are additional constraints for fielded systems. The required recognition transactions are often performed in adverse environments and across diverse populations, necessitating robust solutions. There are two significant problem areas in current generation speaker verification systems. The first is the difficulty in acquiring clean audio signals in all environments without encumbering the user with a head- mounted close-talking microphone. Second, unimodal biometric systems do not work with a significant percentage of the population. To combat these issues, multimodal techniques are being investigated to improve system robustness to environmental conditions, as well as improve overall accuracy across the population. We propose a multi modal approach that builds on our current state-of-the-art speaker verification technology. In order to maintain the transparent nature of the speech interface, we focus on optical sensing technology to provide the additional modality-giving us an audio-visual person recognition system. For the audio domain, we use our existing speaker verification system. For the visual domain, we focus on lip motion. This is chosen, rather than static face or iris recognition, because it provides dynamic information about the individual. In addition, the lip dynamics can aid speech recognition to provide liveness testing. The visual processing method makes use of both color and edge information, combined within Markov random field MRF framework, to localize the lips. Geometric features are extracted and input to a polynomial classifier for the person recognition process. A late

  9. Studies on inter-speaker variability in speech and its application in ...

    Indian Academy of Sciences (India)

    tic representation of vowel realizations by different speakers. ... in regional background, education level and gender of speaker. A more ...... formal maps such as bilinear transform and its generalizations for speaker normalization. Since.

  10. Voice reinstatement modulates neural indices of continuous word recognition.

    Science.gov (United States)

    Campeanu, Sandra; Craik, Fergus I M; Backer, Kristina C; Alain, Claude

    2014-09-01

    The present study was designed to examine listeners' ability to use voice information incidentally during spoken word recognition. We recorded event-related brain potentials (ERPs) during a continuous recognition paradigm in which participants indicated on each trial whether the spoken word was "new" or "old." Old items were presented at 2, 8 or 16 words following the first presentation. Context congruency was manipulated by having the same word repeated by either the same speaker or a different speaker. The different speaker could share the gender, accent or neither feature with the word presented the first time. Participants' accuracy was greatest when the old word was spoken by the same speaker than by a different speaker. In addition, accuracy decreased with increasing lag. The correct identification of old words was accompanied by an enhanced late positivity over parietal sites, with no difference found between voice congruency conditions. In contrast, an earlier voice reinstatement effect was observed over frontal sites, an index of priming that preceded recollection in this task. Our results provide further evidence that acoustic and semantic information are integrated into a unified trace and that acoustic information facilitates spoken word recollection. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition

    OpenAIRE

    Jesse, A.; McQueen, J.

    2014-01-01

    Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes in duration, amplitude, and pitch) in spoken-word recognition. We asked here whether they can also use visual suprasegmental cues. In two categorization experiments, Dutch participants saw a speaker...

  12. Artificial Intelligence In Automatic Target Recognizers: Technology And Timelines

    Science.gov (United States)

    Gilmore, John F.

    1984-12-01

    The recognition of targets in thermal imagery has been a problem exhaustively analyzed in its current localized dimension. This paper discusses the application of artificial intelligence (AI) technology to automatic target recognition, a concept capable of expanding current ATR efforts into a new globalized dimension. Deficiencies of current automatic target recognition systems are reviewed in terms of system shortcomings. Areas of artificial intelligence which show the most promise in improving ATR performance are analyzed, and a timeline is formed in light of how near (as well as far) term artificial intelligence applications may exist. Current research in the area of high level expert vision systems is reviewed and the possible utilization of artificial intelligence architectures to improve low level image processing functions is also discussed. Additional application areas of relevance to solving the problem of automatic target recognition utilizing both high and low level processing are also explored.

  13. FPGA-Based Implementation of Lithuanian Isolated Word Recognition Algorithm

    Directory of Open Access Journals (Sweden)

    Tomyslav Sledevič

    2013-05-01

    Full Text Available The paper describes the FPGA-based implementation of Lithuanian isolated word recognition algorithm. FPGA is selected for parallel process implementation using VHDL to ensure fast signal processing at low rate clock signal. Cepstrum analysis was applied to features extraction in voice. The dynamic time warping algorithm was used to compare the vectors of cepstrum coefficients. A library of 100 words features was created and stored in the internal FPGA BRAM memory. Experimental testing with speaker dependent records demonstrated the recognition rate of 94%. The recognition rate of 58% was achieved for speaker-independent records. Calculation of cepstrum coefficients lasted for 8.52 ms at 50 MHz clock, while 100 DTWs took 66.56 ms at 25 MHz clock.Article in Lithuanian

  14. End-to-end visual speech recognition with LSTMS

    NARCIS (Netherlands)

    Petridis, Stavros; Li, Zuwei; Pantic, Maja

    2017-01-01

    Traditional visual speech recognition systems consist of two stages, feature extraction and classification. Recently, several deep learning approaches have been presented which automatically extract features from the mouth images and aim to replace the feature extraction stage. However, research on

  15. Fuzzy Spatiotemporal Data Mining to Activity Recognition in Smart Homes

    Data.gov (United States)

    National Aeronautics and Space Administration — A primary goal to design smart homes is to provide automatic assistance for the residents to make them able to live independently at home. Activity recognition is...

  16. Person Recognition in Social Media Photos

    OpenAIRE

    Oh, Seong Joon; Benenson, Rodrigo; Fritz, Mario; Schiele, Bernt

    2017-01-01

    People nowadays share large parts of their personal lives through social media. Being able to automatically recognise people in personal photos may greatly enhance user convenience by easing photo album organisation. For human identification task, however, traditional focus of computer vision has been face recognition and pedestrian re-identification. Person recognition in social media photos sets new challenges for computer vision, including non-cooperative subjects (e.g. backward viewpoints...

  17. Activity Recognition for Personal Time Management

    Science.gov (United States)

    Prekopcsák, Zoltán; Soha, Sugárka; Henk, Tamás; Gáspár-Papanek, Csaba

    We describe an accelerometer based activity recognition system for mobile phones with a special focus on personal time management. We compare several data mining algorithms for the automatic recognition task in the case of single user and multiuser scenario, and improve accuracy with heuristics and advanced data mining methods. The results show that daily activities can be recognized with high accuracy and the integration with the RescueTime software can give good insights for personal time management.

  18. Security training symposium: Meeting the challenge: Firearms and explosives recognition and detection

    Energy Technology Data Exchange (ETDEWEB)

    1990-09-01

    These conference proceedings have been prepared in support of the US Nuclear Regulatory Commission's Security Training Symposium on Meeting the Challenge -- Firearms and Explosives Recognition and Detection,'' November 28 through 30, 1989, in Bethesda, Maryland. This document contains the edited transcripts of the guest speakers. It also contains some of the speakers' formal papers that were distributed and some of the slides that were shown at the symposium (Appendix A).

  19. Connected word recognition using a cascaded neuro-computational model

    Science.gov (United States)

    Hoya, Tetsuya; van Leeuwen, Cees

    2016-10-01

    We propose a novel framework for processing a continuous speech stream that contains a varying number of words, as well as non-speech periods. Speech samples are segmented into word-tokens and non-speech periods. An augmented version of an earlier-proposed, cascaded neuro-computational model is used for recognising individual words within the stream. Simulation studies using both a multi-speaker-dependent and speaker-independent digit string database show that the proposed method yields a recognition performance comparable to that obtained by a benchmark approach using hidden Markov models with embedded training.

  20. Fundamental frequency characteristics of Jordanian Arabic speakers.

    Science.gov (United States)

    Natour, Yaser S; Wingate, Judith M

    2009-09-01

    This study is the first in a series of investigations designed to test the acoustic characteristics of the normal Arabic voice. The subjects were three hundred normal Jordanian Arabic speakers (100 adult males, 100 adult females, and 100 children). The subjects produced a sustained phonation of the vowel /a:/ and stated their complete names (i.e. first, second, third and surname) using a carrier phrase. The samples were analyzed using the Multi Dimensional Voice Program (MDVP). Fundamental frequency (F0) from the /a:/ and speaking fundamental frequency (SF0) from the sentence were analyzed. Results revealed a significant difference of both F0 and SF0 values among adult Jordanian Arabic-speaking males (F0=131.34Hz +/- 18.65, SF0=137.45 +/- 18.93), females (F0=231.13Hz +/- 20.86, SF0=230.84 +/- 16.50) and children (F0=270.93Hz +/- 20.01, SF0=278.04 +/- 32.07). Comparison with other ethnicities indicated that F0 values of adult Jordanian Arabic-speaking males and females are generally consistent with adult Caucasian and African-American values. However, for Jordanian Arabic-speaking children, a higher trend in F0 values was present than their Western counterparts. SF0 values for adult Jordanian Arabic-speaking males are generally consistent with the adult Caucasian male SF0 values. However, SF0 values of adult Jordanian-speaking females and children were relatively higher than the reported Western values. It is recommended that speech-language pathologists in Arabic-speaking countries, Jordan in specific, utilize the new data provided (F0 and SF0) when evaluating and/or treating Arabic-speaking patients. Due to its cross-linguistic variability, SF0 emerged as a preferred measurement when conducting cross-cultural comparisons of voice features.

  1. On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings

    National Research Council Canada - National Science Library

    Kolar, Jachym; Shriberg, Elizabeth; Liu, Yang

    2006-01-01

    ... speech. We find positive results for both questions, although the second is more complex. Feature analysis reveals that duration is the most used feature type, followed by pause and pitch features...

  2. Joint Single-Channel Speech Separation and Speaker Identification

    DEFF Research Database (Denmark)

    Mowlaee, Pejman; Saeidi, Rahim; Tan, Zheng-Hua

    2010-01-01

    In this paper, we propose a closed loop system to improve the performance of single-channel speech separation in a speaker independent scenario. The system is composed of two interconnected blocks: a separation block and a speaker identiſcation block. The improvement is accomplished by incorporat......In this paper, we propose a closed loop system to improve the performance of single-channel speech separation in a speaker independent scenario. The system is composed of two interconnected blocks: a separation block and a speaker identiſcation block. The improvement is accomplished...... enhances the quality of the separated output signals. To assess the improvements, the results are reported in terms of PESQ for both target and masked signals....

  3. a sociophonetic study of young nigerian english speakers

    African Journals Online (AJOL)

    Oladipupo

    between male and female speakers in boundary consonant deletion, (F(1, .... speech perception (Foulkes 2006, Clopper & Pisoni, 2005, Thomas 2002). ... in Nigeria, and had had the privilege of travelling to Europe and the Americas for the.

  4. Speaker Linking and Applications using Non-Parametric Hashing Methods

    Science.gov (United States)

    2016-09-08

    nonparametric estimate of a multivariate density function,” The Annals of Math- ematical Statistics , vol. 36, no. 3, pp. 1049–1051, 1965. [9] E. A. Patrick...Speaker Linking and Applications using Non-Parametric Hashing Methods† Douglas Sturim and William M. Campbell MIT Lincoln Laboratory, Lexington, MA...with many approaches [1, 2]. For this paper, we focus on using i-vectors [2], but the methods apply to any embedding. For the task of speaker QBE and

  5. Electrophysiology of subject-verb agreement mediated by speakers' gender.

    Science.gov (United States)

    Hanulíková, Adriana; Carreiras, Manuel

    2015-01-01

    An important property of speech is that it explicitly conveys features of a speaker's identity such as age or gender. This event-related potential (ERP) study examined the effects of social information provided by a speaker's gender, i.e., the conceptual representation of gender, on subject-verb agreement. Despite numerous studies on agreement, little is known about syntactic computations generated by speaker characteristics extracted from the acoustic signal. Slovak is well suited to investigate this issue because it is a morphologically rich language in which agreement involves features for number, case, and gender. Grammaticality of a sentence can be evaluated by checking a speaker's gender as conveyed by his/her voice. We examined how conceptual information about speaker gender, which is not syntactic but rather social and pragmatic in nature, is interpreted for the computation of agreement patterns. ERP responses to verbs disagreeing with the speaker's gender (e.g., a sentence including a masculine verbal inflection spoken by a female person 'the neighbors were upset because I (∗)stoleMASC plums') elicited a larger early posterior negativity compared to correct sentences. When the agreement was purely syntactic and did not depend on the speaker's gender, a disagreement between a formally marked subject and the verb inflection (e.g., the womanFEM (∗)stoleMASC plums) resulted in a larger P600 preceded by a larger anterior negativity compared to the control sentences. This result is in line with proposals according to which the recruitment of non-syntactic information such as the gender of the speaker results in N400-like effects, while formally marked syntactic features lead to structural integration as reflected in a LAN/P600 complex.

  6. FPGA Implementation for GMM-Based Speaker Identification

    Directory of Open Access Journals (Sweden)

    Phaklen EhKan

    2011-01-01

    Full Text Available In today's society, highly accurate personal identification systems are required. Passwords or pin numbers can be forgotten or forged and are no longer considered to offer a high level of security. The use of biological features, biometrics, is becoming widely accepted as the next level for security systems. Biometric-based speaker identification is a method of identifying persons from their voice. Speaker-specific characteristics exist in speech signals due to different speakers having different resonances of the vocal tract. These differences can be exploited by extracting feature vectors such as Mel-Frequency Cepstral Coefficients (MFCCs from the speech signal. A well-known statistical modelling process, the Gaussian Mixture Model (GMM, then models the distribution of each speaker's MFCCs in a multidimensional acoustic space. The GMM-based speaker identification system has features that make it promising for hardware acceleration. This paper describes the hardware implementation for classification of a text-independent GMM-based speaker identification system. The aim was to produce a system that can perform simultaneous identification of large numbers of voice streams in real time. This has important potential applications in security and in automated call centre applications. A speedup factor of ninety was achieved compared to a software implementation on a standard PC.

  7. Understanding speaker attitudes from prosody by adults with Parkinson's disease.

    Science.gov (United States)

    Monetta, Laura; Cheang, Henry S; Pell, Marc D

    2008-09-01

    The ability to interpret vocal (prosodic) cues during social interactions can be disrupted by Parkinson's disease, with notable effects on how emotions are understood from speech. This study investigated whether PD patients who have emotional prosody deficits exhibit further difficulties decoding the attitude of a speaker from prosody. Vocally inflected but semantically nonsensical 'pseudo-utterances' were presented to listener groups with and without PD in two separate rating tasks. Task I required participants to rate how confident a speaker sounded from their voice and Task 2 required listeners to rate how polite the speaker sounded for a comparable set of pseudo-utterances. The results showed that PD patients were significantly less able than HC participants to use prosodic cues to differentiate intended levels of speaker confidence in speech, although the patients could accurately detect the politelimpolite attitude of the speaker from prosody in most cases. Our data suggest that many PD patients fail to use vocal cues to effectively infer a speaker's emotions as well as certain attitudes in speech such as confidence, consistent with the idea that the basal ganglia play a role in the meaningful processing of prosodic sequences in spoken language (Pell & Leonard, 2003).

  8. A fundamental residue pitch perception bias for tone language speakers

    Science.gov (United States)

    Petitti, Elizabeth

    A complex tone composed of only higher-order harmonics typically elicits a pitch percept equivalent to the tone's missing fundamental frequency (f0). When judging the direction of residue pitch change between two such tones, however, listeners may have completely opposite perceptual experiences depending on whether they are biased to perceive changes based on the overall spectrum or the missing f0 (harmonic spacing). Individual differences in residue pitch change judgments are reliable and have been associated with musical experience and functional neuroanatomy. Tone languages put greater pitch processing demands on their speakers than non-tone languages, and we investigated whether these lifelong differences in linguistic pitch processing affect listeners' bias for residue pitch. We asked native tone language speakers and native English speakers to perform a pitch judgment task for two tones with missing fundamental frequencies. Given tone pairs with ambiguous pitch changes, listeners were asked to judge the direction of pitch change, where the direction of their response indicated whether they attended to the overall spectrum (exhibiting a spectral bias) or the missing f0 (exhibiting a fundamental bias). We found that tone language speakers are significantly more likely to perceive pitch changes based on the missing f0 than English speakers. These results suggest that tone-language speakers' privileged experience with linguistic pitch fundamentally tunes their basic auditory processing.

  9. Speech variability effects on recognition accuracy associated with concurrent task performance by pilots

    Science.gov (United States)

    Simpson, C. A.

    1985-01-01

    In the present study of the responses of pairs of pilots to aircraft warning classification tasks using an isolated word, speaker-dependent speech recognition system, the induced stress was manipulated by means of different scoring procedures for the classification task and by the inclusion of a competitive manual control task. Both speech patterns and recognition accuracy were analyzed, and recognition errors were recorded by type for an isolated word speaker-dependent system and by an offline technique for a connected word speaker-dependent system. While errors increased with task loading for the isolated word system, there was no such effect for task loading in the case of the connected word system.

  10. SAR: Stroke Authorship Recognition

    KAUST Repository

    Shaheen, Sara; Rockwood, Alyn; Ghanem, Bernard

    2015-01-01

    Are simple strokes unique to the artist or designer who renders them? If so, can this idea be used to identify authorship or to classify artistic drawings? Also, could training methods be devised to develop particular styles? To answer these questions, we propose the Stroke Authorship Recognition (SAR) approach, a novel method that distinguishes the authorship of 2D digitized drawings. SAR converts a drawing into a histogram of stroke attributes that is discriminative of authorship. We provide extensive classification experiments on a large variety of data sets, which validate SAR's ability to distinguish unique authorship of artists and designers. We also demonstrate the usefulness of SAR in several applications including the detection of fraudulent sketches, the training and monitoring of artists in learning a particular new style and the first quantitative way to measure the quality of automatic sketch synthesis tools. © 2015 The Eurographics Association and John Wiley & Sons Ltd.

  11. SAR: Stroke Authorship Recognition

    KAUST Repository

    Shaheen, Sara

    2015-10-15

    Are simple strokes unique to the artist or designer who renders them? If so, can this idea be used to identify authorship or to classify artistic drawings? Also, could training methods be devised to develop particular styles? To answer these questions, we propose the Stroke Authorship Recognition (SAR) approach, a novel method that distinguishes the authorship of 2D digitized drawings. SAR converts a drawing into a histogram of stroke attributes that is discriminative of authorship. We provide extensive classification experiments on a large variety of data sets, which validate SAR\\'s ability to distinguish unique authorship of artists and designers. We also demonstrate the usefulness of SAR in several applications including the detection of fraudulent sketches, the training and monitoring of artists in learning a particular new style and the first quantitative way to measure the quality of automatic sketch synthesis tools. © 2015 The Eurographics Association and John Wiley & Sons Ltd.

  12. Some results of automatic processing of images

    International Nuclear Information System (INIS)

    Golenishchev, I.A.; Gracheva, T.N.; Khardikov, S.V.

    1975-01-01

    The problems of automatic deciphering of the radiographic picture the purpose of which is making a conclusion concerning the quality of the inspected product on the basis of the product defect images in the picture are considered. The methods of defect image recognition are listed, and the algorithms and the class features of defects are described. The results of deciphering of a small radiographic picture by means of the ''Minsk-22'' computer are presented. It is established that the sensitivity of the method of the automatic deciphering is close to that obtained for visual deciphering

  13. View Invariant Gesture Recognition using 3D Motion Primitives

    DEFF Research Database (Denmark)

    Holte, Michael Boelstoft; Moeslund, Thomas B.

    2008-01-01

    This paper presents a method for automatic recognition of human gestures. The method works with 3D image data from a range camera to achieve invariance to viewpoint. The recognition is based solely on motion from characteristic instances of the gestures. These instances are denoted 3D motion...

  14. Real-time embedded face recognition for smart home

    NARCIS (Netherlands)

    Zuo, F.; With, de P.H.N.

    2005-01-01

    We propose a near real-time face recognition system for embedding in consumer applications. The system is embedded in a networked home environment and enables personalized services by automatic identification of users. The aim of our research is to design and build a face recognition system that is

  15. Material recognition based on thermal cues: Mechanisms and applications.

    Science.gov (United States)

    Ho, Hsin-Ni

    2018-01-01

    Some materials feel colder to the touch than others, and we can use this difference in perceived coldness for material recognition. This review focuses on the mechanisms underlying material recognition based on thermal cues. It provides an overview of the physical, perceptual, and cognitive processes involved in material recognition. It also describes engineering domains in which material recognition based on thermal cues have been applied. This includes haptic interfaces that seek to reproduce the sensations associated with contact in virtual environments and tactile sensors aim for automatic material recognition. The review concludes by considering the contributions of this line of research in both science and engineering.

  16. SOFIR: Securely Outsourced Forensic Image Recognition

    NARCIS (Netherlands)

    Bösch, C.T.; Peter, Andreas; Hartel, Pieter H.; Jonker, Willem

    Forensic image recognition tools are used by law enforcement agencies all over the world to automatically detect illegal images on confiscated equipment. This detection is commonly done with the help of a strictly confidential database consisting of hash values of known illegal images. To detect and

  17. The Neural Correlates of Everyday Recognition Memory

    Science.gov (United States)

    Milton, F.; Muhlert, N.; Butler, C. R.; Benattayallah, A.; Zeman, A. Z.

    2011-01-01

    We used a novel automatic camera, SenseCam, to create a recognition memory test for real-life events. Adapting a "Remember/Know" paradigm, we asked healthy undergraduates, who wore SenseCam for 2 days, in their everyday environments, to classify images as strongly or weakly remembered, strongly or weakly familiar or novel, while brain activation…

  18. Biometric Score Calibration for Forensic Face Recognition

    NARCIS (Netherlands)

    Ali, Tauseef

    2014-01-01

    When two biometric specimens are compared using an automatic biometric recognition system, a similarity metric called “score‿ can be computed. In forensics, one of the biometric specimens is from an unknown source, for example, from a CCTV footage or a fingermark found at a crime scene and the other

  19. Speech and audio processing for coding, enhancement and recognition

    CERN Document Server

    Togneri, Roberto; Narasimha, Madihally

    2015-01-01

    This book describes the basic principles underlying the generation, coding, transmission and enhancement of speech and audio signals, including advanced statistical and machine learning techniques for speech and speaker recognition with an overview of the key innovations in these areas. Key research undertaken in speech coding, speech enhancement, speech recognition, emotion recognition and speaker diarization are also presented, along with recent advances and new paradigms in these areas. ·         Offers readers a single-source reference on the significant applications of speech and audio processing to speech coding, speech enhancement and speech/speaker recognition. Enables readers involved in algorithm development and implementation issues for speech coding to understand the historical development and future challenges in speech coding research; ·         Discusses speech coding methods yielding bit-streams that are multi-rate and scalable for Voice-over-IP (VoIP) Networks; ·     �...

  20. Semantic Ambiguity Effects in L2 Word Recognition.

    Science.gov (United States)

    Ishida, Tomomi

    2018-06-01

    The present study examined the ambiguity effects in second language (L2) word recognition. Previous studies on first language (L1) lexical processing have observed that ambiguous words are recognized faster and more accurately than unambiguous words on lexical decision tasks. In this research, L1 and L2 speakers of English were asked whether a letter string on a computer screen was an English word or not. An ambiguity advantage was found for both groups and greater ambiguity effects were found for the non-native speaker group when compared to the native speaker group. The findings imply that the larger ambiguity advantage for L2 processing is due to their slower response time in producing adequate feedback activation from the semantic level to the orthographic level.

  1. Gender Differences in the Recognition of Vocal Emotions

    Science.gov (United States)

    Lausen, Adi; Schacht, Annekathrin

    2018-01-01

    The conflicting findings from the few studies conducted with regard to gender differences in the recognition of vocal expressions of emotion have left the exact nature of these differences unclear. Several investigators have argued that a comprehensive understanding of gender differences in vocal emotion recognition can only be achieved by replicating these studies while accounting for influential factors such as stimulus type, gender-balanced samples, number of encoders, decoders, and emotional categories. This study aimed to account for these factors by investigating whether emotion recognition from vocal expressions differs as a function of both listeners' and speakers' gender. A total of N = 290 participants were randomly and equally allocated to two groups. One group listened to words and pseudo-words, while the other group listened to sentences and affect bursts. Participants were asked to categorize the stimuli with respect to the expressed emotions in a fixed-choice response format. Overall, females were more accurate than males when decoding vocal emotions, however, when testing for specific emotions these differences were small in magnitude. Speakers' gender had a significant impact on how listeners' judged emotions from the voice. The group listening to words and pseudo-words had higher identification rates for emotions spoken by male than by female actors, whereas in the group listening to sentences and affect bursts the identification rates were higher when emotions were uttered by female than male actors. The mixed pattern for emotion-specific effects, however, indicates that, in the vocal channel, the reliability of emotion judgments is not systematically influenced by speakers' gender and the related stereotypes of emotional expressivity. Together, these results extend previous findings by showing effects of listeners' and speakers' gender on the recognition of vocal emotions. They stress the importance of distinguishing these factors to explain

  2. Gender Differences in the Recognition of Vocal Emotions

    Directory of Open Access Journals (Sweden)

    Adi Lausen

    2018-06-01

    Full Text Available The conflicting findings from the few studies conducted with regard to gender differences in the recognition of vocal expressions of emotion have left the exact nature of these differences unclear. Several investigators have argued that a comprehensive understanding of gender differences in vocal emotion recognition can only be achieved by replicating these studies while accounting for influential factors such as stimulus type, gender-balanced samples, number of encoders, decoders, and emotional categories. This study aimed to account for these factors by investigating whether emotion recognition from vocal expressions differs as a function of both listeners' and speakers' gender. A total of N = 290 participants were randomly and equally allocated to two groups. One group listened to words and pseudo-words, while the other group listened to sentences and affect bursts. Participants were asked to categorize the stimuli with respect to the expressed emotions in a fixed-choice response format. Overall, females were more accurate than males when decoding vocal emotions, however, when testing for specific emotions these differences were small in magnitude. Speakers' gender had a significant impact on how listeners' judged emotions from the voice. The group listening to words and pseudo-words had higher identification rates for emotions spoken by male than by female actors, whereas in the group listening to sentences and affect bursts the identification rates were higher when emotions were uttered by female than male actors. The mixed pattern for emotion-specific effects, however, indicates that, in the vocal channel, the reliability of emotion judgments is not systematically influenced by speakers' gender and the related stereotypes of emotional expressivity. Together, these results extend previous findings by showing effects of listeners' and speakers' gender on the recognition of vocal emotions. They stress the importance of distinguishing these

  3. Hierarchical Recognition Scheme for Human Facial Expression Recognition Systems

    Directory of Open Access Journals (Sweden)

    Muhammad Hameed Siddiqi

    2013-12-01

    Full Text Available Over the last decade, human facial expressions recognition (FER has emerged as an important research area. Several factors make FER a challenging research problem. These include varying light conditions in training and test images; need for automatic and accurate face detection before feature extraction; and high similarity among different expressions that makes it difficult to distinguish these expressions with a high accuracy. This work implements a hierarchical linear discriminant analysis-based facial expressions recognition (HL-FER system to tackle these problems. Unlike the previous systems, the HL-FER uses a pre-processing step to eliminate light effects, incorporates a new automatic face detection scheme, employs methods to extract both global and local features, and utilizes a HL-FER to overcome the problem of high similarity among different expressions. Unlike most of the previous works that were evaluated using a single dataset, the performance of the HL-FER is assessed using three publicly available datasets under three different experimental settings: n-fold cross validation based on subjects for each dataset separately; n-fold cross validation rule based on datasets; and, finally, a last set of experiments to assess the effectiveness of each module of the HL-FER separately. Weighted average recognition accuracy of 98.7% across three different datasets, using three classifiers, indicates the success of employing the HL-FER for human FER.

  4. Hierarchical Recognition Scheme for Human Facial Expression Recognition Systems

    Science.gov (United States)

    Siddiqi, Muhammad Hameed; Lee, Sungyoung; Lee, Young-Koo; Khan, Adil Mehmood; Truc, Phan Tran Ho

    2013-01-01

    Over the last decade, human facial expressions recognition (FER) has emerged as an important research area. Several factors make FER a challenging research problem. These include varying light conditions in training and test images; need for automatic and accurate face detection before feature extraction; and high similarity among different expressions that makes it difficult to distinguish these expressions with a high accuracy. This work implements a hierarchical linear discriminant analysis-based facial expressions recognition (HL-FER) system to tackle these problems. Unlike the previous systems, the HL-FER uses a pre-processing step to eliminate light effects, incorporates a new automatic face detection scheme, employs methods to extract both global and local features, and utilizes a HL-FER to overcome the problem of high similarity among different expressions. Unlike most of the previous works that were evaluated using a single dataset, the performance of the HL-FER is assessed using three publicly available datasets under three different experimental settings: n-fold cross validation based on subjects for each dataset separately; n-fold cross validation rule based on datasets; and, finally, a last set of experiments to assess the effectiveness of each module of the HL-FER separately. Weighted average recognition accuracy of 98.7% across three different datasets, using three classifiers, indicates the success of employing the HL-FER for human FER. PMID:24316568

  5. Machine Learning for Text-Independent Speaker Verification : How to Teach a Machine to RecognizeHuman Voices

    OpenAIRE

    Imoscopi, Stefano

    2016-01-01

    The aim of speaker recognition and veri cation is to identify people's identity from the characteristics of their voices (voice biometrics). Traditionally this technology has been employed mostly for security or authentication purposes, identi cation of employees/customers and criminal investigations. During the last decade the increasing popularity of hands-free and voice-controlled systems and the massive growth of media content generated on the internet has increased the need for technique...

  6. Finding weak points automatically

    International Nuclear Information System (INIS)

    Archinger, P.; Wassenberg, M.

    1999-01-01

    Operators of nuclear power stations have to carry out material tests at selected components by regular intervalls. Therefore a full automaticated test, which achieves a clearly higher reproducibility, compared to part automaticated variations, would provide a solution. In addition the full automaticated test reduces the dose of radiation for the test person. (orig.) [de

  7. Automatic modulation classification principles, algorithms and applications

    CERN Document Server

    Zhu, Zhechen

    2014-01-01

    Automatic Modulation Classification (AMC) has been a key technology in many military, security, and civilian telecommunication applications for decades. In military and security applications, modulation often serves as another level of encryption; in modern civilian applications, multiple modulation types can be employed by a signal transmitter to control the data rate and link reliability. This book offers comprehensive documentation of AMC models, algorithms and implementations for successful modulation recognition. It provides an invaluable theoretical and numerical comparison of AMC algo

  8. Noise Reduction with Microphone Arrays for Speaker Identification

    Energy Technology Data Exchange (ETDEWEB)

    Cohen, Z

    2011-12-22

    Reducing acoustic noise in audio recordings is an ongoing problem that plagues many applications. This noise is hard to reduce because of interfering sources and non-stationary behavior of the overall background noise. Many single channel noise reduction algorithms exist but are limited in that the more the noise is reduced; the more the signal of interest is distorted due to the fact that the signal and noise overlap in frequency. Specifically acoustic background noise causes problems in the area of speaker identification. Recording a speaker in the presence of acoustic noise ultimately limits the performance and confidence of speaker identification algorithms. In situations where it is impossible to control the environment where the speech sample is taken, noise reduction filtering algorithms need to be developed to clean the recorded speech of background noise. Because single channel noise reduction algorithms would distort the speech signal, the overall challenge of this project was to see if spatial information provided by microphone arrays could be exploited to aid in speaker identification. The goals are: (1) Test the feasibility of using microphone arrays to reduce background noise in speech recordings; (2) Characterize and compare different multichannel noise reduction algorithms; (3) Provide recommendations for using these multichannel algorithms; and (4) Ultimately answer the question - Can the use of microphone arrays aid in speaker identification?

  9. Fluency profile: comparison between Brazilian and European Portuguese speakers.

    Science.gov (United States)

    Castro, Blenda Stephanie Alves e; Martins-Reis, Vanessa de Oliveira; Baptista, Ana Catarina; Celeste, Letícia Correa

    2014-01-01

    The purpose of the study was to compare the speech fluency of Brazilian Portuguese speakers with that of European Portuguese speakers. The study participants were 76 individuals of any ethnicity or skin color aged 18-29 years. Of the participants, 38 lived in Brazil and 38 in Portugal. Speech samples from all participants were obtained and analyzed according to the variables of typology and frequency of speech disruptions and speech rate. Descriptive and inferential statistical analyses were performed to assess the association between the fluency profile and linguistic variant variables. We found that the speech rate of European Portuguese speakers was higher than the speech rate of Brazilian Portuguese speakers in words per minute (p=0.004). The qualitative distribution of the typology of common dysfluencies (pPortuguese speakers is not available, speech therapists in Portugal can use the same speech fluency assessment as has been used in Brazil to establish a diagnosis of stuttering, especially in regard to typical and stuttering dysfluencies, with care taken when evaluating the speech rate.

  10. THE HUMOROUS SPEAKER: THE CONSTRUCTION OF ETHOS IN COMEDY

    Directory of Open Access Journals (Sweden)

    Maria Flávia Figueiredo

    2016-07-01

    Full Text Available The rhetoric is guided by three dimensions: logos, pathos and ethos. Logos is the speech itself, pathos are the passions that the speaker, through logos, awakens in his audience, and ethos is the image that the speaker creates of himself, also through logos, in front of an audience. The rhetorical genres are three: deliberative (which drives the audience or the judge to think about future events, characterizing them as convenient or harmful, judiciary (the audience thinks about past events in order to classify them as fair or unfair and epidictic (the audience will judge any fact occurred, or even the character of a person as beautiful or not. According to Figueiredo (2014 and based on Eggs (2005, we advocate that ethos is not a mark left by the speaker only in rhetorical genres, but in any textual genre, once the result of human production, the simplest choices in textual construction, are able to reproduce something that is closely linked to speaker, thus, demarcating hir/her ethos. To verify this assumption, we selected a display of a video of the comedian Danilo Gentili, which will be examined in the light of Rhetoric and Textual Linguistics. So, our objective is to find, in the stand-up comedy genre, marks left by the speaker in the speech that characterizes his/her ethos. The analysis results show that ethos, discursive genre and communicational purpose amalgamate in an indissoluble complex in which the success of one of them interdepends on how the other was built.

  11. Direct Speaker Gaze Promotes Trust in Truth-Ambiguous Statements.

    Science.gov (United States)

    Kreysa, Helene; Kessler, Luise; Schweinberger, Stefan R

    2016-01-01

    A speaker's gaze behaviour can provide perceivers with a multitude of cues which are relevant for communication, thus constituting an important non-verbal interaction channel. The present study investigated whether direct eye gaze of a speaker affects the likelihood of listeners believing truth-ambiguous statements. Participants were presented with videos in which a speaker produced such statements with either direct or averted gaze. The statements were selected through a rating study to ensure that participants were unlikely to know a-priori whether they were true or not (e.g., "sniffer dogs cannot smell the difference between identical twins"). Participants indicated in a forced-choice task whether or not they believed each statement. We found that participants were more likely to believe statements by a speaker looking at them directly, compared to a speaker with averted gaze. Moreover, when participants disagreed with a statement, they were slower to do so when the statement was uttered with direct (compared to averted) gaze, suggesting that the process of rejecting a statement as untrue may be inhibited when that statement is accompanied by direct gaze.

  12. Direct Speaker Gaze Promotes Trust in Truth-Ambiguous Statements.

    Directory of Open Access Journals (Sweden)

    Helene Kreysa

    Full Text Available A speaker's gaze behaviour can provide perceivers with a multitude of cues which are relevant for communication, thus constituting an important non-verbal interaction channel. The present study investigated whether direct eye gaze of a speaker affects the likelihood of listeners believing truth-ambiguous statements. Participants were presented with videos in which a speaker produced such statements with either direct or averted gaze. The statements were selected through a rating study to ensure that participants were unlikely to know a-priori whether they were true or not (e.g., "sniffer dogs cannot smell the difference between identical twins". Participants indicated in a forced-choice task whether or not they believed each statement. We found that participants were more likely to believe statements by a speaker looking at them directly, compared to a speaker with averted gaze. Moreover, when participants disagreed with a statement, they were slower to do so when the statement was uttered with direct (compared to averted gaze, suggesting that the process of rejecting a statement as untrue may be inhibited when that statement is accompanied by direct gaze.

  13. Auditory Modeling as a Basis for Spectral Modulation Analysis with Application to Speaker Recognition

    National Research Council Canada - National Science Library

    Wang, Tianyu T; Quatieri, Thomas F

    2007-01-01

    ...) variations in analysis filter-bank size, and (3) nonlinear adaptation. Our methods are motivated both by a desire to better mimic auditory processing relative to traditional front-ends (e.g., the mel-cepstrum...

  14. American or British? L2 Speakers' Recognition and Evaluations of Accent Features in English

    Science.gov (United States)

    Carrie, Erin; McKenzie, Robert M.

    2018-01-01

    Recent language attitude research has attended to the processes involved in identifying and evaluating spoken language varieties. This article investigates the ability of second-language learners of English in Spain (N = 71) to identify Received Pronunciation (RP) and General American (GenAm) speech and their perceptions of linguistic variation…

  15. Speaker Recognition Using Real vs Synthetic Parallel Data for DNN Channel Compensation

    Science.gov (United States)

    2016-09-08

    Analysis of factors affecting system performance in the aspire chal- lenge,” in Proc. of IEEE ASRU, 2015. [22] M. Jeub, M. Schafer, and P. Vary, “A binaurl...Introduction Recently there has been a great deal of interest in using deep neural networks (DNNs) for channel compensation under re- verberant or noisy...time align the data from each mi- crophone channels to the telephone channel. Audio files were rejected if the alignment process failed. At the end of

  16. Speaker Recognition Using Real vs. Synthetic Parallel Data for DNN Channel Compensation

    Science.gov (United States)

    2016-09-08

    Audio Technica Hanging Mic) 05 Jabra Cellphone Earwrap Mic 06 Motorola Cellphone Earbud 07 Olympus Pearlcorder 08 Radio Shack Computer Desktop Mic...on conver- sational telephone speech. To assess the performance impact of the denoising DNN on telephony data we evaluated the DNNs on the SRE10...ASpIRE recipe. Importantly, all three denoising DNN systems did not adversely impact telephone SR perfor- mance as measured on the SRE10 telephone task

  17. Channel Compensation for Speaker Recognition using MAP Adapted PLDA and Denoising DNNs

    Science.gov (United States)

    2016-06-21

    05 Jabra Cellphone Earwrap Mic 06 Motorola Cellphone Earbud 07 Olympus Pearlcorder 08 Radio Shack Computer Desktop Mic Table 1: Mixer 1 and 2...EER and min DCF vs λ for 2cov map adapt PLDA the MAP adapted PLDA model using a λ of 0.5. The remain- ing rows demonstrate the impact of the feature...degrading perfor- mance on conversational telephone speech. To assess the per- formance impact of the denoising DNN on telephony data we evaluated the

  18. Speech activity detection for the automated speaker recognition system of critical use

    Directory of Open Access Journals (Sweden)

    M. M. Bykov

    2017-06-01

    Full Text Available In the article, the authors developed a method for detecting speech activity for an automated system for recognizing critical use of speeches with wavelet parameterization of speech signal and classification at intervals of “language”/“pause” using a curvilinear neural network. The method of wavelet-parametrization proposed by the authors allows choosing the optimal parameters of wavelet transformation in accordance with the user-specified error of presentation of speech signal. Also, the method allows estimating the loss of information depending on the selected parameters of continuous wavelet transformation (NPP, which allowed to reduce the number of scalable coefficients of the LVP of the speech signal in order of magnitude with the allowable degree of distortion of the local spectrum of the LVP. An algorithm for detecting speech activity with a curvilinear neural network classifier is also proposed, which shows the high quality of segmentation of speech signals at intervals "language" / "pause" and is resistant to the presence in the speech signal of narrowband noise and technogenic noise due to the inherent properties of the curvilinear neural network.

  19. Utilising Tree-Based Ensemble Learning for Speaker Segmentation

    DEFF Research Database (Denmark)

    Abou-Zleikha, Mohamed; Tan, Zheng-Hua; Christensen, Mads Græsbøll

    2014-01-01

    In audio and speech processing, accurate detection of the changing points between multiple speakers in speech segments is an important stage for several applications such as speaker identification and tracking. Bayesian Information Criteria (BIC)-based approaches are the most traditionally used...... for a certain condition, the model becomes biased to the data used for training limiting the model’s generalisation ability. In this paper, we propose a BIC-based tuning-free approach for speaker segmentation through the use of ensemble-based learning. A forest of segmentation trees is constructed in which each...... tree is trained using a sampled version of the speech segment. During the tree construction process, a set of randomly selected points in the input sequence is examined as potential segmentation points. The point that yields the highest ΔBIC is chosen and the same process is repeated for the resultant...

  20. Do We Perceive Others Better than Ourselves? A Perceptual Benefit for Noise-Vocoded Speech Produced by an Average Speaker.

    Directory of Open Access Journals (Sweden)

    William L Schuerman

    Full Text Available In different tasks involving action perception, performance has been found to be facilitated when the presented stimuli were produced by the participants themselves rather than by another participant. These results suggest that the same mental representations are accessed during both production and perception. However, with regard to spoken word perception, evidence also suggests that listeners' representations for speech reflect the input from their surrounding linguistic community rather than their own idiosyncratic productions. Furthermore, speech perception is heavily influenced by indexical cues that may lead listeners to frame their interpretations of incoming speech signals with regard to speaker identity. In order to determine whether word recognition evinces similar self-advantages as found in action perception, it was necessary to eliminate indexical cues from the speech signal. We therefore asked participants to identify noise-vocoded versions of Dutch words that were based on either their own recordings or those of a statistically average speaker. The majority of participants were more accurate for the average speaker than for themselves, even after taking into account differences in intelligibility. These results suggest that the speech representations accessed during perception of noise-vocoded speech are more reflective of the input of the speech community, and hence that speech perception is not necessarily based on representations of one's own speech.

  1. The Suitability of Cloud-Based Speech Recognition Engines for Language Learning

    Science.gov (United States)

    Daniels, Paul; Iwago, Koji

    2017-01-01

    As online automatic speech recognition (ASR) engines become more accurate and more widely implemented with call software, it becomes important to evaluate the effectiveness and the accuracy of these recognition engines using authentic speech samples. This study investigates two of the most prominent cloud-based speech recognition engines--Apple's…

  2. A Unified Approach to the Recognition of Complex Actions from Sequences of Zone-Crossings

    NARCIS (Netherlands)

    Sanromà, G.; Patino, L.; Burghouts, G.J.; Schutte, K.; Ferryman, J.

    2014-01-01

    We present a method for the recognition of complex actions. Our method combines automatic learning of simple actions and manual definition of complex actions in a single grammar. Contrary to the general trend in complex action recognition, that consists in dividing recognition into two stages, our

  3. Neural network for automatic analysis of motility data

    DEFF Research Database (Denmark)

    Jakobsen, Erik; Kruse-Andersen, S; Kolberg, Jens Godsk

    1994-01-01

    comparable. However, the neural network recognized pressure peaks clearly generated by muscular activity that had escaped detection by the conventional program. In conclusion, we believe that neurocomputing has potential advantages for automatic analysis of gastrointestinal motility data.......Continuous recording of intraluminal pressures for extended periods of time is currently regarded as a valuable method for detection of esophageal motor abnormalities. A subsequent automatic analysis of the resulting motility data relies on strict mathematical criteria for recognition of pressure...

  4. Using Avatars for Improving Speaker Identification in Captioning

    Science.gov (United States)

    Vy, Quoc V.; Fels, Deborah I.

    Captioning is the main method for accessing television and film content by people who are deaf or hard-of-hearing. One major difficulty consistently identified by the community is that of knowing who is speaking particularly for an off screen narrator. A captioning system was created using a participatory design method to improve speaker identification. The final prototype contained avatars and a coloured border for identifying specific speakers. Evaluation results were very positive; however participants also wanted to customize various components such as caption and avatar location.

  5. Automatic detection of laughter

    NARCIS (Netherlands)

    Truong, K.P.; Leeuwen, D.A. van

    2005-01-01

    In the context of detecting ‘paralinguistic events’ with the aim to make classification of the speaker’s emotional state possible, a detector was developed for one of the most obvious ‘paralinguistic events’, namely laughter. Gaussian Mixture Models were trained with Perceptual Linear Prediction

  6. Social power and recognition of emotional prosody: High power is associated with lower recognition accuracy than low power.

    Science.gov (United States)

    Uskul, Ayse K; Paulmann, Silke; Weick, Mario

    2016-02-01

    Listeners have to pay close attention to a speaker's tone of voice (prosody) during daily conversations. This is particularly important when trying to infer the emotional state of the speaker. Although a growing body of research has explored how emotions are processed from speech in general, little is known about how psychosocial factors such as social power can shape the perception of vocal emotional attributes. Thus, the present studies explored how social power affects emotional prosody recognition. In a correlational study (Study 1) and an experimental study (Study 2), we show that high power is associated with lower accuracy in emotional prosody recognition than low power. These results, for the first time, suggest that individuals experiencing high or low power perceive emotional tone of voice differently. (c) 2016 APA, all rights reserved).

  7. Motion Primitives for Action Recognition

    DEFF Research Database (Denmark)

    Fihl, Preben; Holte, Michael Boelstoft; Moeslund, Thomas B.

    2007-01-01

    the actions as a sequence of temporal isolated instances, denoted primitives. These primitives are each defined by four features extracted from motion images. The primitives are recognized in each frame based on a trained classifier resulting in a sequence of primitives. From this sequence we recognize......The number of potential applications has made automatic recognition of human actions a very active research area. Different approaches have been followed based on trajectories through some state space. In this paper we also model an action as a trajectory through a state space, but we represent...... different temporal actions using a probabilistic Edit Distance method. The method is tested on different actions with and without noise and the results show recognition rates of 88.7% and 85.5%, respectively....

  8. Face recognition in the thermal infrared domain

    Science.gov (United States)

    Kowalski, M.; Grudzień, A.; Palka, N.; Szustakowski, M.

    2017-10-01

    Biometrics refers to unique human characteristics. Each unique characteristic may be used to label and describe individuals and for automatic recognition of a person based on physiological or behavioural properties. One of the most natural and the most popular biometric trait is a face. The most common research methods on face recognition are based on visible light. State-of-the-art face recognition systems operating in the visible light spectrum achieve very high level of recognition accuracy under controlled environmental conditions. Thermal infrared imagery seems to be a promising alternative or complement to visible range imaging due to its relatively high resistance to illumination changes. A thermal infrared image of the human face presents its unique heat-signature and can be used for recognition. The characteristics of thermal images maintain advantages over visible light images, and can be used to improve algorithms of human face recognition in several aspects. Mid-wavelength or far-wavelength infrared also referred to as thermal infrared seems to be promising alternatives. We present the study on 1:1 recognition in thermal infrared domain. The two approaches we are considering are stand-off face verification of non-moving person as well as stop-less face verification on-the-move. The paper presents methodology of our studies and challenges for face recognition systems in the thermal infrared domain.

  9. A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition

    Science.gov (United States)

    Oh, Yoo Rhee; Kim, Hong Kook

    In this paper, we propose a hybrid model adaptation approach in which pronunciation and acoustic models are adapted by incorporating the pronunciation and acoustic variabilities of non-native speech in order to improve the performance of non-native automatic speech recognition (ASR). Specifically, the proposed hybrid model adaptation can be performed at either the state-tying or triphone-modeling level, depending at which acoustic model adaptation is performed. In both methods, we first analyze the pronunciation variant rules of non-native speakers and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level hybrid method then adapts pronunciation models and acoustic models by accommodating the pronunciation variants in the pronunciation dictionary and by clustering the states of triphone acoustic models using the acoustic variants, respectively. On the other hand, the triphone-modeling level hybrid method initially adapts pronunciation models in the same way as in the state-tying level hybrid method; however, for the acoustic model adaptation, the triphone acoustic models are then re-estimated based on the adapted pronunciation models and the states of the re-estimated triphone acoustic models are clustered using the acoustic variants. From the Korean-spoken English speech recognition experiments, it is shown that ASR systems employing the state-tying and triphone-modeling level adaptation methods can relatively reduce the average word error rates (WERs) by 17.1% and 22.1% for non-native speech, respectively, when compared to a baseline ASR system.

  10. L2 Word Recognition: Influence of L1 Orthography on Multi-syllabic Word Recognition.

    Science.gov (United States)

    Hamada, Megumi

    2017-10-01

    L2 reading research suggests that L1 orthographic experience influences L2 word recognition. Nevertheless, the findings on multi-syllabic words in English are still limited despite the fact that a vast majority of words are multi-syllabic. The study investigated whether L1 orthography influences the recognition of multi-syllabic words, focusing on the position of an embedded word. The participants were Arabic ESL learners, Chinese ESL learners, and native speakers of English. The task was a word search task, in which the participants identified a target word embedded in a pseudoword at the initial, middle, or final position. The search accuracy and speed indicated that all groups showed a strong preference for the initial position. The accuracy data further indicated group differences. The Arabic group showed higher accuracy in the final than middle, while the Chinese group showed the opposite and the native speakers showed no difference between the two positions. The findings suggest that L2 multi-syllabic word recognition involves unique processes.

  11. Speech recognition in natural background noise.

    Directory of Open Access Journals (Sweden)

    Julien Meyer

    Full Text Available In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR. Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A, reference at 1 meter at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB. Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda. Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for

  12. Speech recognition in natural background noise.

    Science.gov (United States)

    Meyer, Julien; Dentel, Laure; Meunier, Fanny

    2013-01-01

    In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR). Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A), reference at 1 meter) at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB). Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda). Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for future studies.

  13. Facial Expression Recognition Through Machine Learning

    Directory of Open Access Journals (Sweden)

    Nazia Perveen

    2015-08-01

    Full Text Available Facial expressions communicate non-verbal cues which play an important role in interpersonal relations. Automatic recognition of facial expressions can be an important element of normal human-machine interfaces it might likewise be utilized as a part of behavioral science and in clinical practice. In spite of the fact that people perceive facial expressions for all intents and purposes immediately solid expression recognition by machine is still a challenge. From the point of view of automatic recognition a facial expression can be considered to comprise of disfigurements of the facial parts and their spatial relations or changes in the faces pigmentation. Research into automatic recognition of the facial expressions addresses the issues encompassing the representation and arrangement of static or dynamic qualities of these distortions or face pigmentation. We get results by utilizing the CVIPtools. We have taken train data set of six facial expressions of three persons and for train data set purpose we have total border mask sample 90 and 30 border mask sample for test data set purpose and we use RST- Invariant features and texture features for feature analysis and then classified them by using k- Nearest Neighbor classification algorithm. The maximum accuracy is 90.

  14. The Space-Time Topography of English Speakers

    Science.gov (United States)

    Duman, Steve

    2016-01-01

    English speakers talk and think about Time in terms of physical space. The past is behind us, and the future is in front of us. In this way, we "map" space onto Time. This dissertation addresses the specificity of this physical space, or its topography. Inspired by languages like Yupno (Nunez, et al., 2012) and Bamileke-Dschang (Hyman,…

  15. Within the School and the Community--A Speaker's Bureau.

    Science.gov (United States)

    McClintock, Joy H.

    Student interest prompted the formation of a Speaker's Bureau in Seminole Senior High School, Seminole, Florida. First, students compiled a list of community contacts, including civic clubs, churches, retirement villages, newspaper offices, and the County School Administration media center. A letter of introduction was composed and speaking…

  16. Gesturing by Speakers with Aphasia: How Does It Compare?

    Science.gov (United States)

    Mol, Lisette; Krahmer, Emiel; van de Sandt-Koenderman, Mieke

    2013-01-01

    Purpose: To study the independence of gesture and verbal language production. The authors assessed whether gesture can be semantically compensatory in cases of verbal language impairment and whether speakers with aphasia and control participants use similar depiction techniques in gesture. Method: The informativeness of gesture was assessed in 3…

  17. Sentence comprehension in Swahili-English bilingual agrammatic speakers

    NARCIS (Netherlands)

    Abuom, Tom O.; Shah, Emmah; Bastiaanse, Roelien

    For this study, sentence comprehension was tested in Swahili-English bilingual agrammatic speakers. The sentences were controlled for four factors: (1) order of the arguments (base vs. derived); (2) embedding (declarative vs. relative sentences); (3) overt use of the relative pronoun "who"; (4)

  18. Schizophrenia among Sesotho speakers in South Africa | Mosotho ...

    African Journals Online (AJOL)

    Results: Core symptoms of schizophrenia among Sesotho speakers do not differ significantly from other cultures. However, the content of psychological symptoms such as delusions and hallucinations is strongly affected by cultural variables. Somatic symptoms such as headaches, palpitations, dizziness and excessive ...

  19. Native Speakers' Perception of Non-Native English Speech

    Science.gov (United States)

    Jaber, Maysa; Hussein, Riyad F.

    2011-01-01

    This study is aimed at investigating the rating and intelligibility of different non-native varieties of English, namely French English, Japanese English and Jordanian English by native English speakers and their attitudes towards these foreign accents. To achieve the goals of this study, the researchers used a web-based questionnaire which…

  20. Openings and Closings in Telephone Conversations between Native Spanish Speakers.

    Science.gov (United States)

    Coronel-Molina, Serafin M.

    1998-01-01

    A study analyzed the opening and closing sequences of 11 dyads of native Spanish-speakers in natural telephone conversations conducted in Spanish. The objective was to determine how closely Hispanic cultural patterns of conduct for telephone conversations follow the sequences outlined in previous research. It is concluded that Spanish…