machine coded speech: Topics by WorldWideScience.org

Sample records for machine coded speech

Speech and audio processing for coding, enhancement and recognition

CERN Document Server

Togneri, Roberto; Narasimha, Madihally

2015-01-01

This book describes the basic principles underlying the generation, coding, transmission and enhancement of speech and audio signals, including advanced statistical and machine learning techniques for speech and speaker recognition with an overview of the key innovations in these areas. Key research undertaken in speech coding, speech enhancement, speech recognition, emotion recognition and speaker diarization are also presented, along with recent advances and new paradigms in these areas. · Offers readers a single-source reference on the significant applications of speech and audio processing to speech coding, speech enhancement and speech/speaker recognition. Enables readers involved in algorithm development and implementation issues for speech coding to understand the historical development and future challenges in speech coding research; · Discusses speech coding methods yielding bit-streams that are multi-rate and scalable for Voice-over-IP (VoIP) Networks; · �...
An analysis of machine translation and speech synthesis in speech-to-speech translation system

OpenAIRE

Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.

2011-01-01

This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...
Speech coding

Energy Technology Data Exchange (ETDEWEB)

Ravishankar, C., Hughes Network Systems, Germantown, MD

1998-05-08

Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the
INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM

Directory of Open Access Journals (Sweden)

J. SANGEETHA

2015-02-01

Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.
Principles of speech coding

CERN Document Server

Ogunfunmi, Tokunbo

2010-01-01

It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the
Ultra low bit-rate speech coding

CERN Document Server

Ramasubramanian, V

2015-01-01

"Ultra Low Bit-Rate Speech Coding" focuses on the specialized topic of speech coding at very low bit-rates of 1 Kbits/sec and less, particularly at the lower ends of this range, down to 100 bps. The authors set forth the fundamental results and trends that form the basis for such ultra low bit-rates to be viable and provide a comprehensive overview of various techniques and systems in literature to date, with particular attention to their work in the paradigm of unit-selection based segment quantization. The book is for research students, academic faculty and researchers, and industry practitioners in the areas of speech processing and speech coding.
Sparsity in Linear Predictive Coding of Speech

DEFF Research Database (Denmark)

Giacobello, Daniele

of the effectiveness of their application in audio processing. The second part of the thesis deals with introducing sparsity directly in the linear prediction analysis-by-synthesis (LPAS) speech coding paradigm. We first propose a novel near-optimal method to look for a sparse approximate excitation using a compressed...... one with direct applications to coding but also consistent with the speech production model of voiced speech, where the excitation of the all-pole filter can be modeled as an impulse train, i.e., a sparse sequence. Introducing sparsity in the LP framework will also bring to de- velop the concept...... sensing formulation. Furthermore, we define a novel re-estimation procedure to adapt the predictor coefficients to the given sparse excitation, balancing the two representations in the context of speech coding. Finally, the advantages of the compact parametric representation of a segment of speech, given...
Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem.

Directory of Open Access Journals (Sweden)

Cai Wingfield

2017-09-01

Full Text Available There is widespread interest in the relationship between the neurobiological systems supporting human cognition and emerging computational systems capable of emulating these capacities. Human speech comprehension, poorly understood as a neurobiological process, is an important case in point. Automatic Speech Recognition (ASR systems with near-human levels of performance are now available, which provide a computationally explicit solution for the recognition of words in continuous speech. This research aims to bridge the gap between speech recognition processes in humans and machines, using novel multivariate techniques to compare incremental 'machine states', generated as the ASR analysis progresses over time, to the incremental 'brain states', measured using combined electro- and magneto-encephalography (EMEG, generated as the same inputs are heard by human listeners. This direct comparison of dynamic human and machine internal states, as they respond to the same incrementally delivered sensory input, revealed a significant correspondence between neural response patterns in human superior temporal cortex and the structural properties of ASR-derived phonetic models. Spatially coherent patches in human temporal cortex responded selectively to individual phonetic features defined on the basis of machine-extracted regularities in the speech to lexicon mapping process. These results demonstrate the feasibility of relating human and ASR solutions to the problem of speech recognition, and suggest the potential for further studies relating complex neural computations in human speech comprehension to the rapidly evolving ASR systems that address the same problem domain.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

International Nuclear Information System (INIS)

Holzrichter, J.F.; Ng, L.C.

1998-01-01

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

Science.gov (United States)

Holzrichter, John F.; Ng, Lawrence C.

1998-01-01

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.
Speech rhythms and multiplexed oscillatory sensory coding in the human brain.

Directory of Open Access Journals (Sweden)

Joachim Gross

2013-12-01

Full Text Available Cortical oscillations are likely candidates for segmentation and coding of continuous speech. Here, we monitored continuous speech processing with magnetoencephalography (MEG to unravel the principles of speech segmentation and coding. We demonstrate that speech entrains the phase of low-frequency (delta, theta and the amplitude of high-frequency (gamma oscillations in the auditory cortex. Phase entrainment is stronger in the right and amplitude entrainment is stronger in the left auditory cortex. Furthermore, edges in the speech envelope phase reset auditory cortex oscillations thereby enhancing their entrainment to speech. This mechanism adapts to the changing physical features of the speech envelope and enables efficient, stimulus-specific speech sampling. Finally, we show that within the auditory cortex, coupling between delta, theta, and gamma oscillations increases following speech edges. Importantly, all couplings (i.e., brain-speech and also within the cortex attenuate for backward-presented speech, suggesting top-down control. We conclude that segmentation and coding of speech relies on a nested hierarchy of entrained cortical oscillations.
Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain

Science.gov (United States)

Gross, Joachim; Hoogenboom, Nienke; Thut, Gregor; Schyns, Philippe; Panzeri, Stefano; Belin, Pascal; Garrod, Simon

2013-01-01

Cortical oscillations are likely candidates for segmentation and coding of continuous speech. Here, we monitored continuous speech processing with magnetoencephalography (MEG) to unravel the principles of speech segmentation and coding. We demonstrate that speech entrains the phase of low-frequency (delta, theta) and the amplitude of high-frequency (gamma) oscillations in the auditory cortex. Phase entrainment is stronger in the right and amplitude entrainment is stronger in the left auditory cortex. Furthermore, edges in the speech envelope phase reset auditory cortex oscillations thereby enhancing their entrainment to speech. This mechanism adapts to the changing physical features of the speech envelope and enables efficient, stimulus-specific speech sampling. Finally, we show that within the auditory cortex, coupling between delta, theta, and gamma oscillations increases following speech edges. Importantly, all couplings (i.e., brain-speech and also within the cortex) attenuate for backward-presented speech, suggesting top-down control. We conclude that segmentation and coding of speech relies on a nested hierarchy of entrained cortical oscillations. PMID:24391472
Shared acoustic codes underlie emotional communication in music and speech-Evidence from deep transfer learning.

Science.gov (United States)

Coutinho, Eduardo; Schuller, Björn

2017-01-01

Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence) in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech) and cross-domain experiments (i.e., models trained in one modality and tested on the other). In the cross-domain context, we evaluated two strategies-the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders) for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain.
A Support Vector Machine Approach to Dutch Part-of-Speech Tagging

NARCIS (Netherlands)

Poel, Mannes; Stegeman, L.; op den Akker, Hendrikus J.A.; Berthold, M.R.; Shawe-Taylor, J.; Lavrac, N.

Part-of-Speech tagging, the assignment of Parts-of-Speech to the words in a given context of use, is a basic technique in many systems that handle natural languages. This paper describes a method for supervised training of a Part-of-Speech tagger using a committee of Support Vector Machines on a
Shared acoustic codes underlie emotional communication in music and speech-Evidence from deep transfer learning.

Directory of Open Access Journals (Sweden)

Eduardo Coutinho

Full Text Available Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech and cross-domain experiments (i.e., models trained in one modality and tested on the other. In the cross-domain context, we evaluated two strategies-the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain.
Magnified Neural Envelope Coding Predicts Deficits in Speech Perception in Noise.

Science.gov (United States)

Millman, Rebecca E; Mattys, Sven L; Gouws, André D; Prendergast, Garreth

2017-08-09

Verbal communication in noisy backgrounds is challenging. Understanding speech in background noise that fluctuates in intensity over time is particularly difficult for hearing-impaired listeners with a sensorineural hearing loss (SNHL). The reduction in fast-acting cochlear compression associated with SNHL exaggerates the perceived fluctuations in intensity in amplitude-modulated sounds. SNHL-induced changes in the coding of amplitude-modulated sounds may have a detrimental effect on the ability of SNHL listeners to understand speech in the presence of modulated background noise. To date, direct evidence for a link between magnified envelope coding and deficits in speech identification in modulated noise has been absent. Here, magnetoencephalography was used to quantify the effects of SNHL on phase locking to the temporal envelope of modulated noise (envelope coding) in human auditory cortex. Our results show that SNHL enhances the amplitude of envelope coding in posteromedial auditory cortex, whereas it enhances the fidelity of envelope coding in posteromedial and posterolateral auditory cortex. This dissociation was more evident in the right hemisphere, demonstrating functional lateralization in enhanced envelope coding in SNHL listeners. However, enhanced envelope coding was not perceptually beneficial. Our results also show that both hearing thresholds and, to a lesser extent, magnified cortical envelope coding in left posteromedial auditory cortex predict speech identification in modulated background noise. We propose a framework in which magnified envelope coding in posteromedial auditory cortex disrupts the segregation of speech from background noise, leading to deficits in speech perception in modulated background noise. SIGNIFICANCE STATEMENT People with hearing loss struggle to follow conversations in noisy environments. Background noise that fluctuates in intensity over time poses a particular challenge. Using magnetoencephalography, we demonstrate
Speech coding code- excited linear prediction

CERN Document Server

Bäckström, Tom

2017-01-01

This book provides scientific understanding of the most central techniques used in speech coding both for advanced students as well as professionals with a background in speech audio and or digital signal processing. It provides a clear connection between the whys hows and whats thus enabling a clear view of the necessity purpose and solutions provided by various tools as well as their strengths and weaknesses in each respect Equivalently this book sheds light on the following perspectives for each technology presented Objective What do we want to achieve and especially why is this goal important Resource Information What information is available and how can it be useful and Resource Platform What kind of platforms are we working with and what are their capabilities restrictions This includes computational memory and acoustic properties and the transmission capacity of devices used. The book goes on to address Solutions Which solutions have been proposed and how can they be used to reach the stated goals and ...
Man machine interface based on speech recognition

International Nuclear Information System (INIS)

Jorge, Carlos A.F.; Aghina, Mauricio A.C.; Mol, Antonio C.A.; Pereira, Claudio M.N.A.

2007-01-01

This work reports the development of a Man Machine Interface based on speech recognition. The system must recognize spoken commands, and execute the desired tasks, without manual interventions of operators. The range of applications goes from the execution of commands in an industrial plant's control room, to navigation and interaction in virtual environments. Results are reported for isolated word recognition, the isolated words corresponding to the spoken commands. For the pre-processing stage, relevant parameters are extracted from the speech signals, using the cepstral analysis technique, that are used for isolated word recognition, and corresponds to the inputs of an artificial neural network, that performs recognition tasks. (author)
Reversible machine code and its abstract processor architecture

DEFF Research Database (Denmark)

Axelsen, Holger Bock; Glück, Robert; Yokoyama, Tetsuo

2007-01-01

A reversible abstract machine architecture and its reversible machine code are presented and formalized. For machine code to be reversible, both the underlying control logic and each instruction must be reversible. A general class of machine instruction sets was proven to be reversible, building...
Compressed Domain Packet Loss Concealment of Sinusoidally Coded Speech

DEFF Research Database (Denmark)

Rødbro, Christoffer A.; Christensen, Mads Græsbøll; Andersen, Søren Vang

2003-01-01

We consider the problem of packet loss concealment for voice over IP (VoIP). The speech signal is compressed at the transmitter using a sinusoidal coding scheme working at 8 kbit/s. At the receiver, packet loss concealment is carried out working directly on the quantized sinusoidal parameters......, based on time-scaling of the packets surrounding the missing ones. Subjective listening tests show promising results indicating the potential of sinusoidal speech coding for VoIP....

Detecting Abnormal Word Utterances in Children With Autism Spectrum Disorders: Machine-Learning-Based Voice Analysis Versus Speech Therapists.

Science.gov (United States)

Nakai, Yasushi; Takiguchi, Tetsuya; Matsui, Gakuyo; Yamaoka, Noriko; Takada, Satoshi

2017-10-01

Abnormal prosody is often evident in the voice intonations of individuals with autism spectrum disorders. We compared a machine-learning-based voice analysis with human hearing judgments made by 10 speech therapists for classifying children with autism spectrum disorders ( n = 30) and typical development ( n = 51). Using stimuli limited to single-word utterances, machine-learning-based voice analysis was superior to speech therapist judgments. There was a significantly higher true-positive than false-negative rate for machine-learning-based voice analysis but not for speech therapists. Results are discussed in terms of some artificiality of clinician judgments based on single-word utterances, and the objectivity machine-learning-based voice analysis adds to judging abnormal prosody.
ISOLATED SPEECH RECOGNITION SYSTEM FOR TAMIL LANGUAGE USING STATISTICAL PATTERN MATCHING AND MACHINE LEARNING TECHNIQUES

Directory of Open Access Journals (Sweden)

VIMALA C.

2015-05-01

Full Text Available In recent years, speech technology has become a vital part of our daily lives. Various techniques have been proposed for developing Automatic Speech Recognition (ASR system and have achieved great success in many applications. Among them, Template Matching techniques like Dynamic Time Warping (DTW, Statistical Pattern Matching techniques such as Hidden Markov Model (HMM and Gaussian Mixture Models (GMM, Machine Learning techniques such as Neural Networks (NN, Support Vector Machine (SVM, and Decision Trees (DT are most popular. The main objective of this paper is to design and develop a speaker-independent isolated speech recognition system for Tamil language using the above speech recognition techniques. The background of ASR system, the steps involved in ASR, merits and demerits of the conventional and machine learning algorithms and the observations made based on the experiments are presented in this paper. For the above developed system, highest word recognition accuracy is achieved with HMM technique. It offered 100% accuracy during training process and 97.92% for testing process.
Development of a speech-based dialogue system for report dictation and machine control in the endoscopic laboratory.

Science.gov (United States)

Molnar, B; Gergely, J; Toth, G; Pronai, L; Zagoni, T; Papik, K; Tulassay, Z

2000-01-01

Reporting and machine control based on speech technology can enhance work efficiency in the gastrointestinal endoscopy laboratory. The status and activation of endoscopy laboratory equipment were described as a multivariate parameter and function system. Speech recognition, text evaluation and action definition engines were installed. Special programs were developed for the grammatical analysis of command sentences, and a rule-based expert system for the definition of machine answers. A speech backup engine provides feedback to the user. Techniques were applied based on the "Hidden Markov" model of discrete word, user-independent speech recognition and on phoneme-based speech synthesis. Speech samples were collected from three male low-tone investigators. The dictation module and machine control modules were incorporated in a personal computer (PC) simulation program. Altogether 100 unidentified patient records were analyzed. The sentences were grouped according to keywords, which indicate the main topics of a gastrointestinal endoscopy report. They were: "endoscope", "esophagus", "cardia", "fundus", "corpus", "antrum", "pylorus", "bulbus", and "postbulbar section", in addition to the major pathological findings: "erosion", "ulceration", and "malignancy". "Biopsy" and "diagnosis" were also included. We implemented wireless speech communication control commands for equipment including an endoscopy unit, video, monitor, printer, and PC. The recognition rate was 95%. Speech technology may soon become an integrated part of our daily routine in the endoscopy laboratory. A central speech and laboratory computer could be the most efficient alternative to having separate speech recognition units in all items of equipment.
Integrating Automatic Speech Recognition and Machine Translation for Better Translation Outputs

DEFF Research Database (Denmark)

Liyanapathirana, Jeevanthi

translations, combining machine translation with computer assisted translation has drawn attention in current research. This combines two prospects: the opportunity of ensuring high quality translation along with a significant performance gain. Automatic Speech Recognition (ASR) is another important area......, which caters important functionalities in language processing and natural language understanding tasks. In this work we integrate automatic speech recognition and machine translation in parallel. We aim to avoid manual typing of possible translations as dictating the translation would take less time...... to the n-best list rescoring, we also use word graphs with the expectation of arriving at a tighter integration of ASR and MT models. Integration methods include constraining ASR models using language and translation models of MT, and vice versa. We currently develop and experiment different methods...
A portable virtual machine target for proof-carrying code

DEFF Research Database (Denmark)

Franz, Michael; Chandra, Deepak; Gal, Andreas

2005-01-01

Virtual Machines (VMs) and Proof-Carrying Code (PCC) are two techniques that have been used independently to provide safety for (mobile) code. Existing virtual machines, such as the Java VM, have several drawbacks: First, the effort required for safety verification is considerable. Second and mor...... simultaneously providing efficient justin-time compilation and target-machine independence. In particular, our approach reduces the complexity of the required proofs, resulting in fewer proof obligations that need to be discharged at the target machine....
A wireless brain-machine interface for real-time speech synthesis.

Directory of Open Access Journals (Sweden)

Frank H Guenther

2009-12-01

Full Text Available Brain-machine interfaces (BMIs involving electrodes implanted into the human cerebral cortex have recently been developed in an attempt to restore function to profoundly paralyzed individuals. Current BMIs for restoring communication can provide important capabilities via a typing process, but unfortunately they are only capable of slow communication rates. In the current study we use a novel approach to speech restoration in which we decode continuous auditory parameters for a real-time speech synthesizer from neuronal activity in motor cortex during attempted speech.Neural signals recorded by a Neurotrophic Electrode implanted in a speech-related region of the left precentral gyrus of a human volunteer suffering from locked-in syndrome, characterized by near-total paralysis with spared cognition, were transmitted wirelessly across the scalp and used to drive a speech synthesizer. A Kalman filter-based decoder translated the neural signals generated during attempted speech into continuous parameters for controlling a synthesizer that provided immediate (within 50 ms auditory feedback of the decoded sound. Accuracy of the volunteer's vowel productions with the synthesizer improved quickly with practice, with a 25% improvement in average hit rate (from 45% to 70% and 46% decrease in average endpoint error from the first to the last block of a three-vowel task.Our results support the feasibility of neural prostheses that may have the potential to provide near-conversational synthetic speech output for individuals with severely impaired speech motor control. They also provide an initial glimpse into the functional properties of neurons in speech motor cortical areas.
Model-Based Speech Signal Coding Using Optimized Temporal Decomposition for Storage and Broadcasting Applications

Science.gov (United States)

Athaudage, Chandranath R. N.; Bradley, Alan B.; Lech, Margaret

2003-12-01

A dynamic programming-based optimization strategy for a temporal decomposition (TD) model of speech and its application to low-rate speech coding in storage and broadcasting is presented. In previous work with the spectral stability-based event localizing (SBEL) TD algorithm, the event localization was performed based on a spectral stability criterion. Although this approach gave reasonably good results, there was no assurance on the optimality of the event locations. In the present work, we have optimized the event localizing task using a dynamic programming-based optimization strategy. Simulation results show that an improved TD model accuracy can be achieved. A methodology of incorporating the optimized TD algorithm within the standard MELP speech coder for the efficient compression of speech spectral information is also presented. The performance evaluation results revealed that the proposed speech coding scheme achieves 50%-60% compression of speech spectral information with negligible degradation in the decoded speech quality.
The Cortical Organization of Speech Processing: Feedback Control and Predictive Coding the Context of a Dual-Stream Model

Science.gov (United States)

Hickok, Gregory

2012-01-01

Speech recognition is an active process that involves some form of predictive coding. This statement is relatively uncontroversial. What is less clear is the source of the prediction. The dual-stream model of speech processing suggests that there are two possible sources of predictive coding in speech perception: the motor speech system and the…
Look at the Gato! Code-Switching in Speech to Toddlers

Science.gov (United States)

Bail, Amelie; Morini, Giovanna; Newman, Rochelle S.

2015-01-01

We examined code-switching (CS) in the speech of twenty-four bilingual caregivers when speaking with their 18- to 24-month-old children. All parents CS at least once in a short play session, and some code-switched quite often (over 1/3 of utterances). This CS included both inter-sentential and intra-sentential switches, suggesting that at least…
Speech Compression

Directory of Open Access Journals (Sweden)

Jerry D. Gibson

2016-06-01

Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.
Understanding and Writing G & M Code for CNC Machines

Science.gov (United States)

Loveland, Thomas

2012-01-01

In modern CAD and CAM manufacturing companies, engineers design parts for machines and consumable goods. Many of these parts are cut on CNC machines. Whether using a CNC lathe, milling machine, or router, the ideas and designs of engineers must be translated into a machine-readable form called G & M Code that can be used to cut parts to precise…
Ocean circulation code on machine connection

International Nuclear Information System (INIS)

Vitart, F.

1993-01-01

This work is part of a development of a global climate model based on a coupling between an ocean model and an atmosphere model. The objective was to develop this global model on a massively parallel machine (CM2). The author presents the OPA7 code (equations, boundary conditions, equation system resolution) and parallelization on the CM2 machine. CM2 data structure is briefly evoked, and two tests are reported (on a flat bottom basin, and a topography with eight islands). The author then gives an overview of studies aimed at improving the ocean circulation code: use of a new state equation, use of a formulation of surface pressure, use of a new mesh. He reports the study of the use of multi-block domains on CM2 through advection tests, and two-block tests
Improving Language Models in Speech-Based Human-Machine Interaction

Directory of Open Access Journals (Sweden)

Raquel Justo

2013-02-01

Full Text Available This work focuses on speech-based human-machine interaction. Specifically, a Spoken Dialogue System (SDS that could be integrated into a robot is considered. Since Automatic Speech Recognition is one of the most sensitive tasks that must be confronted in such systems, the goal of this work is to improve the results obtained by this specific module. In order to do so, a hierarchical Language Model (LM is considered. Different series of experiments were carried out using the proposed models over different corpora and tasks. The results obtained show that these models provide greater accuracy in the recognition task. Additionally, the influence of the Acoustic Modelling (AM in the improvement percentage of the Language Models has also been explored. Finally the use of hierarchical Language Models in a language understanding task has been successfully employed, as shown in an additional series of experiments.
Tools for signal compression applications to speech and audio coding

CERN Document Server

Moreau, Nicolas

2013-01-01

This book presents tools and algorithms required to compress/uncompress signals such as speech and music. These algorithms are largely used in mobile phones, DVD players, HDTV sets, etc. In a first rather theoretical part, this book presents the standard tools used in compression systems: scalar and vector quantization, predictive quantization, transform quantization, entropy coding. In particular we show the consistency between these different tools. The second part explains how these tools are used in the latest speech and audio coders. The third part gives Matlab programs simulating t
Fifty years of progress in speech coding standards

Science.gov (United States)

Cox, Richard

2004-10-01

Over the past 50 years, speech coding has taken root worldwide. Early applications were for the military and transmission for telephone networks. The military gave equal priority to intelligibility and low bit rate. The telephone network gave priority to high quality and low delay. These illustrate three of the four areas in which requirements must be set for any speech coder application: bit rate, quality, delay, and complexity. While the military could afford relatively expensive terminal equipment for secure communications, the telephone network needed low cost for massive deployment in switches and transmission equipment worldwide. Today speech coders are at the heart of the wireless phones and telephone answering systems we use every day. In addition to the technology and technical invention that has occurred, standards make it possible for all these different systems to interoperate. The primary areas of standardization are the public switched telephone network, wireless telephony, and secure telephony for government and military applications. With the advent of IP telephony there are additional standardization efforts and challenges. In this talk the progress in all areas is reviewed as well as a reflection on Jim Flanagan's impact on this field during the past half century.
Complete permutation Gray code implemented by finite state machine

Directory of Open Access Journals (Sweden)

Li Peng

2014-09-01

Full Text Available An enumerating method of complete permutation array is proposed. The list of n! permutations based on Gray code defined over finite symbol set Z(n = {1, 2, …, n} is implemented by finite state machine, named as n-RPGCF. An RPGCF can be used to search permutation code and provide improved lower bounds on the maximum cardinality of a permutation code in some cases.
Model-Driven Engineering of Machine Executable Code

Science.gov (United States)

Eichberg, Michael; Monperrus, Martin; Kloppenburg, Sven; Mezini, Mira

Implementing static analyses of machine-level executable code is labor intensive and complex. We show how to leverage model-driven engineering to facilitate the design and implementation of programs doing static analyses. Further, we report on important lessons learned on the benefits and drawbacks while using the following technologies: using the Scala programming language as target of code generation, using XML-Schema to express a metamodel, and using XSLT to implement (a) transformations and (b) a lint like tool. Finally, we report on the use of Prolog for writing model transformations.
A video, text, and speech-driven realistic 3-d virtual head for human-machine interface.

Science.gov (United States)

Yu, Jun; Wang, Zeng-Fu

2015-05-01

A multiple inputs-driven realistic facial animation system based on 3-D virtual head for human-machine interface is proposed. The system can be driven independently by video, text, and speech, thus can interact with humans through diverse interfaces. The combination of parameterized model and muscular model is used to obtain a tradeoff between computational efficiency and high realism of 3-D facial animation. The online appearance model is used to track 3-D facial motion from video in the framework of particle filtering, and multiple measurements, i.e., pixel color value of input image and Gabor wavelet coefficient of illumination ratio image, are infused to reduce the influence of lighting and person dependence for the construction of online appearance model. The tri-phone model is used to reduce the computational consumption of visual co-articulation in speech synchronized viseme synthesis without sacrificing any performance. The objective and subjective experiments show that the system is suitable for human-machine interaction.
Focal versus distributed temporal cortex activity for speech sound category assignment

Science.gov (United States)

Bouton, Sophie; Chambon, Valérian; Tyrand, Rémi; Seeck, Margitta; Karkar, Sami; van de Ville, Dimitri; Giraud, Anne-Lise

2018-01-01

Percepts and words can be decoded from distributed neural activity measures. However, the existence of widespread representations might conflict with the more classical notions of hierarchical processing and efficient coding, which are especially relevant in speech processing. Using fMRI and magnetoencephalography during syllable identification, we show that sensory and decisional activity colocalize to a restricted part of the posterior superior temporal gyrus (pSTG). Next, using intracortical recordings, we demonstrate that early and focal neural activity in this region distinguishes correct from incorrect decisions and can be machine-decoded to classify syllables. Crucially, significant machine decoding was possible from neuronal activity sampled across different regions of the temporal and frontal lobes, despite weak or absent sensory or decision-related responses. These findings show that speech-sound categorization relies on an efficient readout of focal pSTG neural activity, while more distributed activity patterns, although classifiable by machine learning, instead reflect collateral processes of sensory perception and decision. PMID:29363598
Memory of AMR coded speech distorted by packet loss

OpenAIRE

Nykänen, Arne; Lindegren, David; Wruck, Louisa; Ljung, Robert; Odelius, Johan; Möller, Sebastian

2014-01-01

Previous studies have shown that free recall of spoken word lists is impaired if the speech is presented in background noise, even if the signal-to-noise ratio is kept at a level allowing full word identification. The objective of this study was to examine recall rates for word lists presented in noise and word lists coded by an AMR (Adaptive Multi Rate) telephone codec distorted by packet loss. Twenty subjects performed a word recall test. Word lists consisting of ten words were played to th...

Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach

Directory of Open Access Journals (Sweden)

W. Bastiaan Kleijn

2005-06-01

Full Text Available Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features that are most relevant to the listener. The aim of this paper is to provide a tutorial on perceptual speech and audio coding using an invertible auditory model. In this approach, the audio signal is converted into an auditory representation using an invertible auditory model. The auditory representation is quantized and coded. Upon decoding, it is then transformed back into the acoustic domain. This transformation converts a complex distortion criterion into a simple one, thus facilitating quantization with low complexity. We briefly review past work on auditory models and describe in more detail the components of our invertible model and its inversion procedure, that is, the method to reconstruct the signal from the output of the auditory model. We summarize attempts to use the auditory representation for low-bit-rate coding. Our approach also allows the exploitation of the inherent redundancy of the human auditory system for the purpose of multiple description (joint source-channel coding.
Multilevel Analysis in Analyzing Speech Data

Science.gov (United States)

Guddattu, Vasudeva; Krishna, Y.

2011-01-01

The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…
Accuracy comparison among different machine learning techniques for detecting malicious codes

Science.gov (United States)

Narang, Komal

2016-03-01

In this paper, a machine learning based model for malware detection is proposed. It can detect newly released malware i.e. zero day attack by analyzing operation codes on Android operating system. The accuracy of Naïve Bayes, Support Vector Machine (SVM) and Neural Network for detecting malicious code has been compared for the proposed model. In the experiment 400 benign files, 100 system files and 500 malicious files have been used to construct the model. The model yields the best accuracy 88.9% when neural network is used as classifier and achieved 95% and 82.8% accuracy for sensitivity and specificity respectively.
Human phoneme recognition depending on speech-intrinsic variability.

Science.gov (United States)

Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger

2010-11-01

The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).
Using machine-coded event data for the micro-level study of political violence

Directory of Open Access Journals (Sweden)

Jesse Hammond

2014-07-01

Full Text Available Machine-coded datasets likely represent the future of event data analysis. We assess the use of one of these datasets—Global Database of Events, Language and Tone (GDELT—for the micro-level study of political violence by comparing it to two hand-coded conflict event datasets. Our findings indicate that GDELT should be used with caution for geo-spatial analyses at the subnational level: its overall correlation with hand-coded data is mediocre, and at the local level major issues of geographic bias exist in how events are reported. Overall, our findings suggest that due to these issues, researchers studying local conflict processes may want to wait for a more reliable geocoding method before relying too heavily on this set of machine-coded data.
The Interpretation Of Speech Code In A Communication Ethnographic Context For Outsider Students Of Graduate Communication Science Universitas Sumatera Utara In Medan

Directory of Open Access Journals (Sweden)

Fauzi Eka Putra

2017-06-01

Full Text Available Interpreting the typical Medan speech code is something unique and distinctive which could create confusion for the outsider students because of the speech code similarities and differences in Medan. Therefore the graduate students of communication science Universitas Sumatera Utara whose originated from outside of North Sumatera needs to learn comprehend and aware in order to perform effective communication. The purpose of this research is to discover how the interpretation of speech code for the graduate students of communication science Universitas Sumatera Utara whose originated from outside of North Sumatera in adapting themselves in Medan. This research uses qualitative method with the study of ethnography and acculturation communication. The subject of this research is the graduate students of communication science Universitas Sumatera Utara whose originated from outside of North Sumatera in adapting themselves in Medan. Data were collected through interviews observation and documentation. The conclusion of this research shows that speech code interpretation by students from outside of North Sumatera in adapting themselves in Medan leads to an acculturation process of assimilation and integration.
Temporal Fine-Structure Coding and Lateralized Speech Perception in Normal-Hearing and Hearing-Impaired Listeners

DEFF Research Database (Denmark)

Locsei, Gusztav; Pedersen, Julie Hefting; Laugesen, Søren

2016-01-01

This study investigated the relationship between speech perception performance in spatially complex, lateralized listening scenarios and temporal fine-structure (TFS) coding at low frequencies. Young normal-hearing (NH) and two groups of elderly hearing-impaired (HI) listeners with mild or moderate...... hearing loss above 1.5 kHz participated in the study. Speech reception thresholds (SRTs) were estimated in the presence of either speech-shaped noise, two-, four-, or eight-talker babble played reversed, or a nonreversed two-talker masker. Target audibility was ensured by applying individualized linear...... threshold nor the interaural phase difference threshold tasks showed a correlation with the SRTs or with the amount of masking release due to binaural unmasking, respectively. The results suggest that, although HI listeners with normal hearing thresholds below 1.5 kHz experienced difficulties with speech...
Bilingual Voicing: A Study of Code-Switching in the Reported Speech of Finnish Immigrants in Estonia

Science.gov (United States)

Frick, Maria; Riionheimo, Helka

2013-01-01

Through a conversation analytic investigation of Finnish-Estonian bilingual (direct) reported speech (i.e., voicing) by Finns who live in Estonia, this study shows how code-switching is used as a double contextualization device. The code-switched voicings are shaped by the on-going interactional situation, serving its needs by opening up a context…
A Digital Liquid State Machine With Biologically Inspired Learning and Its Application to Speech Recognition.

Science.gov (United States)

Zhang, Yong; Li, Peng; Jin, Yingyezhe; Choe, Yoonsuck

2015-11-01

This paper presents a bioinspired digital liquid-state machine (LSM) for low-power very-large-scale-integration (VLSI)-based machine learning applications. To the best of the authors' knowledge, this is the first work that employs a bioinspired spike-based learning algorithm for the LSM. With the proposed online learning, the LSM extracts information from input patterns on the fly without needing intermediate data storage as required in offline learning methods such as ridge regression. The proposed learning rule is local such that each synaptic weight update is based only upon the firing activities of the corresponding presynaptic and postsynaptic neurons without incurring global communications across the neural network. Compared with the backpropagation-based learning, the locality of computation in the proposed approach lends itself to efficient parallel VLSI implementation. We use subsets of the TI46 speech corpus to benchmark the bioinspired digital LSM. To reduce the complexity of the spiking neural network model without performance degradation for speech recognition, we study the impacts of synaptic models on the fading memory of the reservoir and hence the network performance. Moreover, we examine the tradeoffs between synaptic weight resolution, reservoir size, and recognition performance and present techniques to further reduce the overhead of hardware implementation. Our simulation results show that in terms of isolated word recognition evaluated using the TI46 speech corpus, the proposed digital LSM rivals the state-of-the-art hidden Markov-model-based recognizer Sphinx-4 and outperforms all other reported recognizers including the ones that are based upon the LSM or neural networks.
Optimal interference code based on machine learning

Science.gov (United States)

Qian, Ye; Chen, Qian; Hu, Xiaobo; Cao, Ercong; Qian, Weixian; Gu, Guohua

2016-10-01

In this paper, we analyze the characteristics of pseudo-random code, by the case of m sequence. Depending on the description of coding theory, we introduce the jamming methods. We simulate the interference effect or probability model by the means of MATLAB to consolidate. In accordance with the length of decoding time the adversary spends, we find out the optimal formula and optimal coefficients based on machine learning, then we get the new optimal interference code. First, when it comes to the phase of recognition, this study judges the effect of interference by the way of simulating the length of time over the decoding period of laser seeker. Then, we use laser active deception jamming simulate interference process in the tracking phase in the next block. In this study we choose the method of laser active deception jamming. In order to improve the performance of the interference, this paper simulates the model by MATLAB software. We find out the least number of pulse intervals which must be received, then we can make the conclusion that the precise interval number of the laser pointer for m sequence encoding. In order to find the shortest space, we make the choice of the greatest common divisor method. Then, combining with the coding regularity that has been found before, we restore pulse interval of pseudo-random code, which has been already received. Finally, we can control the time period of laser interference, get the optimal interference code, and also increase the probability of interference as well.
Development of non-linear vibration analysis code for CANDU fuelling machine

International Nuclear Information System (INIS)

Murakami, Hajime; Hirai, Takeshi; Horikoshi, Kiyomi; Mizukoshi, Kaoru; Takenaka, Yasuo; Suzuki, Norio.

1988-01-01

This paper describes the development of a non-linear, dynamic analysis code for the CANDU 600 fuelling machine (F-M), which includes a number of non-linearities such as gap with or without Coulomb friction, special multi-linear spring connections, etc. The capabilities and features of the code and the mathematical treatment for the non-linearities are explained. The modeling and numerical methodology for the non-linearities employed in the code are verified experimentally. Finally, the simulation analyses for the full-scale F-M vibration testing are carried out, and the applicability of the code to such multi-degree of freedom systems as F-M is demonstrated. (author)
Practical speech user interface design

CERN Document Server

Lewis, James R

2010-01-01

Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application
Speech Alarms Pilot Study

Science.gov (United States)

Sandor, Aniko; Moses, Haifa

2016-01-01

Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.
Machine-learning-assisted correction of correlated qubit errors in a topological code

Directory of Open Access Journals (Sweden)

Paul Baireuther

2018-01-01

Full Text Available A fault-tolerant quantum computation requires an efficient means to detect and correct errors that accumulate in encoded quantum information. In the context of machine learning, neural networks are a promising new approach to quantum error correction. Here we show that a recurrent neural network can be trained, using only experimentally accessible data, to detect errors in a widely used topological code, the surface code, with a performance above that of the established minimum-weight perfect matching (or blossom decoder. The performance gain is achieved because the neural network decoder can detect correlations between bit-flip (X and phase-flip (Z errors. The machine learning algorithm adapts to the physical system, hence no noise model is needed. The long short-term memory layers of the recurrent neural network maintain their performance over a large number of quantum error correction cycles, making it a practical decoder for forthcoming experimental realizations of the surface code.
The classification problem in machine learning: an overview with study cases in emotion recognition and music-speech differentiation

OpenAIRE

Rodríguez Cadavid, Santiago

2015-01-01

This work addresses the well-known classification problem in machine learning -- The goal of this study is to approach the reader to the methodological aspects of the feature extraction, feature selection and classifier performance through simple and understandable theoretical aspects and two study cases -- Finally, a very good classification performance was obtained for the emotion recognition from speech
Predicting automatic speech recognition performance over communication channels from instrumental speech quality and intelligibility scores

NARCIS (Netherlands)

Gallardo, L.F.; Möller, S.; Beerends, J.

2017-01-01

The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility
The vector and parallel processing of MORSE code on Monte Carlo Machine

International Nuclear Information System (INIS)

Hasegawa, Yukihiro; Higuchi, Kenji.

1995-11-01

Multi-group Monte Carlo Code for particle transport, MORSE is modified for high performance computing on Monte Carlo Machine Monte-4. The method and the results are described. Monte-4 was specially developed to realize high performance computing of Monte Carlo codes for particle transport, which have been difficult to obtain high performance in vector processing on conventional vector processors. Monte-4 has four vector processor units with the special hardware called Monte Carlo pipelines. The vectorization and parallelization of MORSE code and the performance evaluation on Monte-4 are described. (author)
Compiler design handbook optimizations and machine code generation

CERN Document Server

Srikant, YN

2003-01-01

The widespread use of object-oriented languages and Internet security concerns are just the beginning. Add embedded systems, multiple memory banks, highly pipelined units operating in parallel, and a host of other advances and it becomes clear that current and future computer architectures pose immense challenges to compiler designers-challenges that already exceed the capabilities of traditional compilation techniques. The Compiler Design Handbook: Optimizations and Machine Code Generation is designed to help you meet those challenges. Written by top researchers and designers from around the
The semaphore codes attached to a Turing machine via resets and their various limits

OpenAIRE

Rhodes, John; Schilling, Anne; Silva, Pedro V.

2016-01-01

We introduce semaphore codes associated to a Turing machine via resets. Semaphore codes provide an approximation theory for resets. In this paper we generalize the set-up of our previous paper "Random walks on semaphore codes and delay de Bruijn semigroups" to the infinite case by taking the profinite limit of $k$-resets to obtain $(-\\omega)$-resets. We mention how this opens new avenues to attack the P versus NP problem.
Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

Science.gov (United States)

Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

2010-01-01

A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

Support vector machine and mel frequency Cepstral coefficient based algorithm for hand gestures and bidirectional speech to text device

Science.gov (United States)

Balbin, Jessie R.; Padilla, Dionis A.; Fausto, Janette C.; Vergara, Ernesto M.; Garcia, Ramon G.; Delos Angeles, Bethsedea Joy S.; Dizon, Neil John A.; Mardo, Mark Kevin N.

2017-02-01

This research is about translating series of hand gesture to form a word and produce its equivalent sound on how it is read and said in Filipino accent using Support Vector Machine and Mel Frequency Cepstral Coefficient analysis. The concept is to detect Filipino speech input and translate the spoken words to their text form in Filipino. This study is trying to help the Filipino deaf community to impart their thoughts through the use of hand gestures and be able to communicate to people who do not know how to read hand gestures. This also helps literate deaf to simply read the spoken words relayed to them using the Filipino speech to text system.
Speech recognition technology: an outlook for human-to-machine interaction.

Science.gov (United States)

Erdel, T; Crooks, S

2000-01-01

Speech recognition, as an enabling technology in healthcare-systems computing, is a topic that has been discussed for quite some time, but is just now coming to fruition. Traditionally, speech-recognition software has been constrained by hardware, but improved processors and increased memory capacities are starting to remove some of these limitations. With these barriers removed, companies that create software for the healthcare setting have the opportunity to write more successful applications. Among the criticisms of speech-recognition applications are the high rates of error and steep training curves. However, even in the face of such negative perceptions, there remains significant opportunities for speech recognition to allow healthcare providers and, more specifically, physicians, to work more efficiently and ultimately spend more time with their patients and less time completing necessary documentation. This article will identify opportunities for inclusion of speech-recognition technology in the healthcare setting and examine major categories of speech-recognition software--continuous speech recognition, command and control, and text-to-speech. We will discuss the advantages and disadvantages of each area, the limitations of the software today, and how future trends might affect them.
Speech enhancement

CERN Document Server

Benesty, Jacob; Chen, Jingdong

2006-01-01

We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red
Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

Energy Technology Data Exchange (ETDEWEB)

Hogden, J.

1996-11-05

The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation may decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.
Speech Analysis and Synthesis and Man-Machine Speech Communications for Air Operations. (Synthese et Analyse de la Parole et Liaisons Vocales Homme- Machine dans les Operations Aeriennes)

Science.gov (United States)

1990-05-01

speech processing area are faced . He presents speech communication as an interactive process, in which the listener actively reconstructs the message...speech produced by these systems. Finally, perhaps the greatest recent impetus in advancing digital Finally, in the area of speech and speaker recognitio
Experiences and results multitasking a hydrodynamics code on global and local memory machines

International Nuclear Information System (INIS)

Mandell, D.

1987-01-01

A one-dimensional, time-dependent Lagrangian hydrodynamics code using a Godunov solution method has been multimasked for the Cray X-MP/48, the Intel iPSC hypercube, the Alliant FX series and the IBM RP3 computers. Actual multitasking results have been obtained for the Cray, Intel and Alliant computers and simulated results were obtained for the Cray and RP3 machines. The differences in the methods required to multitask on each of the machines is discussed. Results are presented for a sample problem involving a shock wave moving down a channel. Comparisons are made between theoretical speedups, predicted by Amdahl's law, and the actual speedups obtained. The problems of debugging on the different machines are also described
Vector Quantization of Harmonic Magnitudes in Speech Coding ApplicationsÃ¢Â€Â”A Survey and New Technique

Directory of Open Access Journals (Sweden)

Wai C. Chu

2004-12-01

Full Text Available A harmonic coder extracts the harmonic components of a signal and represents them efficiently using a few parameters. The principles of harmonic coding have become quite successful and several standardized speech and audio coders are based on it. One of the key issues in harmonic coder design is in the quantization of harmonic magnitudes, where many propositions have appeared in the literature. The objective of this paper is to provide a survey of the various techniques that have appeared in the literature for vector quantization of harmonic magnitudes, with emphasis on those adopted by the major speech coding standards; these include constant magnitude approximation, partial quantization, dimension conversion, and variable-dimension vector quantization (VDVQ. In addition, a refined VDVQ technique is proposed where experimental data are provided to demonstrate its effectiveness.
Spoken Language Understanding Systems for Extracting Semantic Information from Speech

CERN Document Server

Tur, Gokhan

2011-01-01

Spoken language understanding (SLU) is an emerging field in between speech and language processing, investigating human/ machine and human/ human communication by leveraging technologies from signal processing, pattern recognition, machine learning and artificial intelligence. SLU systems are designed to extract the meaning from speech utterances and its applications are vast, from voice search in mobile devices to meeting summarization, attracting interest from both commercial and academic sectors. Both human/machine and human/human communications can benefit from the application of SLU, usin
Hateful Help--A Practical Look at the Issue of Hate Speech.

Science.gov (United States)

Shelton, Michael W.

Many college and university administrators have responded to the recent increase in hateful incidents on campus by putting hate speech codes into place. The establishment of speech codes has sparked a heated debate over the impact that such codes have upon free speech and First Amendment values. Some commentators have suggested that viewing hate…
Code-expanded radio access protocol for machine-to-machine communications

DEFF Research Database (Denmark)

Thomsen, Henning; Kiilerich Pratas, Nuno; Stefanovic, Cedomir

2013-01-01

The random access methods used for support of machine-to-machine, also referred to as Machine-Type Communications, in current cellular standards are derivatives of traditional framed slotted ALOHA and therefore do not support high user loads efficiently. We propose an approach that is motivated b...... subframes and orthogonal preambles, the amount of available contention resources is drastically increased, enabling the massive support of Machine-Type Communication users that is beyond the reach of current systems.......The random access methods used for support of machine-to-machine, also referred to as Machine-Type Communications, in current cellular standards are derivatives of traditional framed slotted ALOHA and therefore do not support high user loads efficiently. We propose an approach that is motivated...... by the random access method employed in LTE, which significantly increases the amount of contention resources without increasing the system resources, such as contention subframes and preambles. This is accomplished by a logical, rather than physical, extension of the access method in which the available system...
Spotlight on Speech Codes 2011: The State of Free Speech on Our Nation's Campuses

Science.gov (United States)

Foundation for Individual Rights in Education (NJ1), 2011

2011-01-01

Each year, the Foundation for Individual Rights in Education (FIRE) conducts a rigorous survey of restrictions on speech at America's colleges and universities. The survey and accompanying report explore the extent to which schools are meeting their legal and moral obligations to uphold students' and faculty members' rights to freedom of speech,…
Spotlight on Speech Codes 2009: The State of Free Speech on Our Nation's Campuses

Science.gov (United States)

Foundation for Individual Rights in Education (NJ1), 2009

2009-01-01

Each year, the Foundation for Individual Rights in Education (FIRE) conducts a wide, detailed survey of restrictions on speech at America's colleges and universities. The survey and resulting report explore the extent to which schools are meeting their obligations to uphold students' and faculty members' rights to freedom of speech, freedom of…
Spotlight on Speech Codes 2010: The State of Free Speech on Our Nation's Campuses

Science.gov (United States)

Foundation for Individual Rights in Education (NJ1), 2010

2010-01-01

Each year, the Foundation for Individual Rights in Education (FIRE) conducts a rigorous survey of restrictions on speech at America's colleges and universities. The survey and resulting report explore the extent to which schools are meeting their legal and moral obligations to uphold students' and faculty members' rights to freedom of speech,…
Using supervised machine learning to code policy issues: Can classifiers generalize across contexts?

NARCIS (Netherlands)

Burscher, B.; Vliegenthart, R.; de Vreese, C.H.

2015-01-01

Content analysis of political communication usually covers large amounts of material and makes the study of dynamics in issue salience a costly enterprise. In this article, we present a supervised machine learning approach for the automatic coding of policy issues, which we apply to news articles
Numerical code to determine the particle trapping region in the LISA machine

International Nuclear Information System (INIS)

Azevedo, M.T. de; Raposo, C.C. de; Tomimura, A.

1984-01-01

A numerical code is constructed to determine the trapping region in machine like LISA. The variable magnetic field is two deimensional and is coupled to the Runge-Kutta through the Tchebichev polynomial. Various particle orbits including particle interactions were analysed. Beside this, a strong electric field is introduced to see the possible effects happening inside the plasma. (Author) [pt
An Efficient VQ Codebook Search Algorithm Applied to AMR-WB Speech Coding

Directory of Open Access Journals (Sweden)

Cheng-Yu Yeh

2017-04-01

Full Text Available The adaptive multi-rate wideband (AMR-WB speech codec is widely used in modern mobile communication systems for high speech quality in handheld devices. Nonetheless, a major disadvantage is that vector quantization (VQ of immittance spectral frequency (ISF coefficients takes a considerable computational load in the AMR-WB coding. Accordingly, a binary search space-structured VQ (BSS-VQ algorithm is adopted to efficiently reduce the complexity of ISF quantization in AMR-WB. This search algorithm is done through a fast locating technique combined with lookup tables, such that an input vector is efficiently assigned to a subspace where relatively few codeword searches are required to be executed. In terms of overall search performance, this work is experimentally validated as a superior search algorithm relative to a multiple triangular inequality elimination (MTIE, a TIE with dynamic and intersection mechanisms (DI-TIE, and an equal-average equal-variance equal-norm nearest neighbor search (EEENNS approach. With a full search algorithm as a benchmark for overall search load comparison, this work provides an 87% search load reduction at a threshold of quantization accuracy of 0.96, a figure far beyond 55% in the MTIE, 76% in the EEENNS approach, and 83% in the DI-TIE approach.
A study of speech interfaces for the vehicle environment.

Science.gov (United States)

2013-05-01

Over the past few years, there has been a shift in automotive human machine interfaces from : visual-manual interactions (pushing buttons and rotating knobs) to speech interaction. In terms of : distraction, the industry views speech interaction as a...
A friend man-machine interface for thermo-hydraulic simulation codes of nuclear installations

International Nuclear Information System (INIS)

Araujo Filho, F. de; Belchior Junior, A.; Barroso, A.C.O.; Gebrim, A.

1994-01-01

This work presents the development of a Man-Machine Interface to the TRAC-PF1 code, a computer program to perform best estimate analysis of transients and accidents at nuclear power plants. The results were considered satisfactory and a considerable productivity gain was achieved in the activity of preparing and analyzing simulations. (author)
[Prosody, speech input and language acquisition].

Science.gov (United States)

Jungheim, M; Miller, S; Kühn, D; Ptok, M

2014-04-01

In order to acquire language, children require speech input. The prosody of the speech input plays an important role. In most cultures adults modify their code when communicating with children. Compared to normal speech this code differs especially with regard to prosody. For this review a selective literature search in PubMed and Scopus was performed. Prosodic characteristics are a key feature of spoken language. By analysing prosodic features, children gain knowledge about underlying grammatical structures. Child-directed speech (CDS) is modified in a way that meaningful sequences are highlighted acoustically so that important information can be extracted from the continuous speech flow more easily. CDS is said to enhance the representation of linguistic signs. Taking into consideration what has previously been described in the literature regarding the perception of suprasegmentals, CDS seems to be able to support language acquisition due to the correspondence of prosodic and syntactic units. However, no findings have been reported, stating that the linguistically reduced CDS could hinder first language acquisition.
Machine function based control code algebras

NARCIS (Netherlands)

Bergstra, J.A.

Machine functions have been introduced by Earley and Sturgis in [6] in order to provide a mathematical foundation of the use of the T-diagrams proposed by Bratman in [5]. Machine functions describe the operation of a machine at a very abstract level. A theory of hardware and software based on

Applications of Hilbert Spectral Analysis for Speech and Sound Signals

Science.gov (United States)

Huang, Norden E.

2003-01-01

A new method for analyzing nonlinear and nonstationary data has been developed, and the natural applications are to speech and sound signals. The key part of the method is the Empirical Mode Decomposition method with which any complicated data set can be decomposed into a finite and often small number of Intrinsic Mode Functions (IMF). An IMF is defined as any function having the same numbers of zero-crossing and extrema, and also having symmetric envelopes defined by the local maxima and minima respectively. The IMF also admits well-behaved Hilbert transform. This decomposition method is adaptive, and, therefore, highly efficient. Since the decomposition is based on the local characteristic time scale of the data, it is applicable to nonlinear and nonstationary processes. With the Hilbert transform, the Intrinsic Mode Functions yield instantaneous frequencies as functions of time, which give sharp identifications of imbedded structures. This method invention can be used to process all acoustic signals. Specifically, it can process the speech signals for Speech synthesis, Speaker identification and verification, Speech recognition, and Sound signal enhancement and filtering. Additionally, as the acoustical signals from machinery are essentially the way the machines are talking to us. Therefore, the acoustical signals, from the machines, either from sound through air or vibration on the machines, can tell us the operating conditions of the machines. Thus, we can use the acoustic signal to diagnosis the problems of machines.
Contributions of speech science to the technology of man-machine voice interactions

Science.gov (United States)

Lea, Wayne A.

1977-01-01

Research in speech understanding was reviewed. Plans which include prosodics research, phonological rules for speech understanding systems, and continued interdisciplinary phonetics research are discussed. Improved acoustic phonetic analysis capabilities in speech recognizers are suggested.
Parallelization of MCNP Monte Carlo neutron and photon transport code in parallel virtual machine and message passing interface

International Nuclear Information System (INIS)

Deng Li; Xie Zhongsheng

1999-01-01

The coupled neutron and photon transport Monte Carlo code MCNP (version 3B) has been parallelized in parallel virtual machine (PVM) and message passing interface (MPI) by modifying a previous serial code. The new code has been verified by solving sample problems. The speedup increases linearly with the number of processors and the average efficiency is up to 99% for 12-processor. (author)
Speech emotion recognition methods: A literature review

Science.gov (United States)

Basharirad, Babak; Moradhaseli, Mohammadreza

2017-10-01

Recently, attention of the emotional speech signals research has been boosted in human machine interfaces due to availability of high computation capability. There are many systems proposed in the literature to identify the emotional state through speech. Selection of suitable feature sets, design of a proper classifications methods and prepare an appropriate dataset are the main key issues of speech emotion recognition systems. This paper critically analyzed the current available approaches of speech emotion recognition methods based on the three evaluating parameters (feature set, classification of features, accurately usage). In addition, this paper also evaluates the performance and limitations of available methods. Furthermore, it highlights the current promising direction for improvement of speech emotion recognition systems.
Joint Machine Learning and Game Theory for Rate Control in High Efficiency Video Coding.

Science.gov (United States)

Gao, Wei; Kwong, Sam; Jia, Yuheng

2017-08-25

In this paper, a joint machine learning and game theory modeling (MLGT) framework is proposed for inter frame coding tree unit (CTU) level bit allocation and rate control (RC) optimization in High Efficiency Video Coding (HEVC). First, a support vector machine (SVM) based multi-classification scheme is proposed to improve the prediction accuracy of CTU-level Rate-Distortion (R-D) model. The legacy "chicken-and-egg" dilemma in video coding is proposed to be overcome by the learning-based R-D model. Second, a mixed R-D model based cooperative bargaining game theory is proposed for bit allocation optimization, where the convexity of the mixed R-D model based utility function is proved, and Nash bargaining solution (NBS) is achieved by the proposed iterative solution search method. The minimum utility is adjusted by the reference coding distortion and frame-level Quantization parameter (QP) change. Lastly, intra frame QP and inter frame adaptive bit ratios are adjusted to make inter frames have more bit resources to maintain smooth quality and bit consumption in the bargaining game optimization. Experimental results demonstrate that the proposed MLGT based RC method can achieve much better R-D performances, quality smoothness, bit rate accuracy, buffer control results and subjective visual quality than the other state-of-the-art one-pass RC methods, and the achieved R-D performances are very close to the performance limits from the FixedQP method.
Shared acoustic codes underlie emotional communication in music and speech—Evidence from deep transfer learning

Science.gov (United States)

Schuller, Björn

2017-01-01

Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence) in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech) and cross-domain experiments (i.e., models trained in one modality and tested on the other). In the cross-domain context, we evaluated two strategies—the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders) for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain. PMID:28658285
A Bit Stream Scalable Speech/Audio Coder Combining Enhanced Regular Pulse Excitation and Parametric Coding

Directory of Open Access Journals (Sweden)

Albertus C. den Brinker

2007-01-01

Full Text Available This paper introduces a new audio and speech broadband coding technique based on the combination of a pulse excitation coder and a standardized parametric coder, namely, MPEG-4 high-quality parametric coder. After presenting a series of enhancements to regular pulse excitation (RPE to make it suitable for the modeling of broadband signals, it is shown how pulse and parametric codings complement each other and how they can be merged to yield a layered bit stream scalable coder able to operate at different points in the quality bit rate plane. The performance of the proposed coder is evaluated in a listening test. The major result is that the extra functionality of the bit stream scalability does not come at the price of a reduced performance since the coder is competitive with standardized coders (MP3, AAC, SSC.
A Bit Stream Scalable Speech/Audio Coder Combining Enhanced Regular Pulse Excitation and Parametric Coding

Science.gov (United States)

Riera-Palou, Felip; den Brinker, Albertus C.

2007-12-01

This paper introduces a new audio and speech broadband coding technique based on the combination of a pulse excitation coder and a standardized parametric coder, namely, MPEG-4 high-quality parametric coder. After presenting a series of enhancements to regular pulse excitation (RPE) to make it suitable for the modeling of broadband signals, it is shown how pulse and parametric codings complement each other and how they can be merged to yield a layered bit stream scalable coder able to operate at different points in the quality bit rate plane. The performance of the proposed coder is evaluated in a listening test. The major result is that the extra functionality of the bit stream scalability does not come at the price of a reduced performance since the coder is competitive with standardized coders (MP3, AAC, SSC).
Optimization and Openmp Parallelization of a Discrete Element Code for Convex Polyhedra on Multi-Core Machines

Science.gov (United States)

Chen, Jian; Matuttis, Hans-Georg

2013-02-01

We report our experiences with the optimization and parallelization of a discrete element code for convex polyhedra on multi-core machines and introduce a novel variant of the sort-and-sweep neighborhood algorithm. While in theory the whole code in itself parallelizes ideally, in practice the results on different architectures with different compilers and performance measurement tools depend very much on the particle number and optimization of the code. After difficulties with the interpretation of the data for speedup and efficiency are overcome, respectable parallelization speedups could be obtained.
Machine speech and speaking about machines

Energy Technology Data Exchange (ETDEWEB)

Nye, A. [Univ. of Wisconsin, Whitewater, WI (United States)

1996-12-31

Current philosophy of language prides itself on scientific status. It boasts of being no longer contaminated with queer mental entities or idealist essences. It theorizes language as programmable variants of formal semantic systems, reimaginable either as the properly epiphenomenal machine functions of computer science or the properly material neural networks of physiology. Whether or not such models properly capture the physical workings of a living human brain is a question that scientists will have to answer. I, as a philosopher, come at the problem from another direction. Does contemporary philosophical semantics, in its dominant truth-theoretic and related versions, capture actual living human thought as it is experienced, or does it instead reflect, regardless of (perhaps dubious) scientific credentials, pathology of thought, a pathology with a disturbing social history.
Noise-robust speech recognition through auditory feature detection and spike sequence decoding.

Science.gov (United States)

Schafer, Phillip B; Jin, Dezhe Z

2014-03-01

Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.
Voice Activity Detection. Fundamentals and Speech Recognition System Robustness

OpenAIRE

Ramirez, J.; Gorriz, J. M.; Segura, J. C.

2007-01-01

This chapter has shown an overview of the main challenges in robust speech detection and a review of the state of the art and applications. VADs are frequently used in a number of applications including speech coding, speech enhancement and speech recognition. A precise VAD extracts a set of discriminative speech features from the noisy speech and formulates the decision in terms of well defined rule. The chapter has summarized three robust VAD methods that yield high speech/non-speech discri...
Implementation of support vector machine for classification of speech marked hijaiyah letters based on Mel frequency cepstrum coefficient feature extraction

Science.gov (United States)

Adhi Pradana, Wisnu; Adiwijaya; Novia Wisesty, Untari

2018-03-01

Support Vector Machine or commonly called SVM is one method that can be used to process the classification of a data. SVM classifies data from 2 different classes with hyperplane. In this study, the system was built using SVM to develop Arabic Speech Recognition. In the development of the system, there are 2 kinds of speakers that have been tested that is dependent speakers and independent speakers. The results from this system is an accuracy of 85.32% for speaker dependent and 61.16% for independent speakers.
A Navier-Strokes Chimera Code on the Connection Machine CM-5: Design and Performance

Science.gov (United States)

Jespersen, Dennis C.; Levit, Creon; Kwak, Dochan (Technical Monitor)

1994-01-01

We have implemented a three-dimensional compressible Navier-Stokes code on the Connection Machine CM-5. The code is set up for implicit time-stepping on single or multiple structured grids. For multiple grids and geometrically complex problems, we follow the 'chimera' approach, where flow data on one zone is interpolated onto another in the region of overlap. We will describe our design philosophy and give some timing results for the current code. A parallel machine like the CM-5 is well-suited for finite-difference methods on structured grids. The regular pattern of connections of a structured mesh maps well onto the architecture of the machine. So the first design choice, finite differences on a structured mesh, is natural. We use centered differences in space, with added artificial dissipation terms. When numerically solving the Navier-Stokes equations, there are liable to be some mesh cells near a solid body that are small in at least one direction. This mesh cell geometry can impose a very severe CFL (Courant-Friedrichs-Lewy) condition on the time step for explicit time-stepping methods. Thus, though explicit time-stepping is well-suited to the architecture of the machine, we have adopted implicit time-stepping. We have further taken the approximate factorization approach. This creates the need to solve large banded linear systems and creates the first possible barrier to an efficient algorithm. To overcome this first possible barrier we have considered two options. The first is just to solve the banded linear systems with data spread over the whole machine, using whatever fast method is available. This option is adequate for solving scalar tridiagonal systems, but for scalar pentadiagonal or block tridiagonal systems it is somewhat slower than desired. The second option is to 'transpose' the flow and geometry variables as part of the time-stepping process: Start with x-lines of data in-processor. Form explicit terms in x, then transpose so y-lines of data are
Incorporating Speech Recognition into a Natural User Interface

Science.gov (United States)

Chapa, Nicholas

2017-01-01

The Augmented/ Virtual Reality (AVR) Lab has been working to study the applicability of recent virtual and augmented reality hardware and software to KSC operations. This includes the Oculus Rift, HTC Vive, Microsoft HoloLens, and Unity game engine. My project in this lab is to integrate voice recognition and voice commands into an easy to modify system that can be added to an existing portion of a Natural User Interface (NUI). A NUI is an intuitive and simple to use interface incorporating visual, touch, and speech recognition. The inclusion of speech recognition capability will allow users to perform actions or make inquiries using only their voice. The simplicity of needing only to speak to control an on-screen object or enact some digital action means that any user can quickly become accustomed to using this system. Multiple programs were tested for use in a speech command and recognition system. Sphinx4 translates speech to text using a Hidden Markov Model (HMM) based Language Model, an Acoustic Model, and a word Dictionary running on Java. PocketSphinx had similar functionality to Sphinx4 but instead ran on C. However, neither of these programs were ideal as building a Java or C wrapper slowed performance. The most ideal speech recognition system tested was the Unity Engine Grammar Recognizer. A Context Free Grammar (CFG) structure is written in an XML file to specify the structure of phrases and words that will be recognized by Unity Grammar Recognizer. Using Speech Recognition Grammar Specification (SRGS) 1.0 makes modifying the recognized combinations of words and phrases very simple and quick to do. With SRGS 1.0, semantic information can also be added to the XML file, which allows for even more control over how spoken words and phrases are interpreted by Unity. Additionally, using a CFG with SRGS 1.0 produces a Finite State Machine (FSM) functionality limiting the potential for incorrectly heard words or phrases. The purpose of my project was to
ACOUSTIC SPEECH RECOGNITION FOR MARATHI LANGUAGE USING SPHINX

Directory of Open Access Journals (Sweden)

Aman Ankit

2016-09-01

Full Text Available Speech recognition or speech to text processing, is a process of recognizing human speech by the computer and converting into text. In speech recognition, transcripts are created by taking recordings of speech as audio and their text transcriptions. Speech based applications which include Natural Language Processing (NLP techniques are popular and an active area of research. Input to such applications is in natural language and output is obtained in natural language. Speech recognition mostly revolves around three approaches namely Acoustic phonetic approach, Pattern recognition approach and Artificial intelligence approach. Creation of acoustic model requires a large database of speech and training algorithms. The output of an ASR system is recognition and translation of spoken language into text by computers and computerized devices. ASR today finds enormous application in tasks that require human machine interfaces like, voice dialing, and etc. Our key contribution in this paper is to create corpora for Marathi language and explore the use of Sphinx engine for automatic speech recognition
Auditory-neurophysiological responses to speech during early childhood: Effects of background noise.

Science.gov (United States)

White-Schwoch, Travis; Davies, Evan C; Thompson, Elaine C; Woodruff Carr, Kali; Nicol, Trent; Bradlow, Ann R; Kraus, Nina

2015-10-01

Early childhood is a critical period of auditory learning, during which children are constantly mapping sounds to meaning. But this auditory learning rarely occurs in ideal listening conditions-children are forced to listen against a relentless din. This background noise degrades the neural coding of these critical sounds, in turn interfering with auditory learning. Despite the importance of robust and reliable auditory processing during early childhood, little is known about the neurophysiology underlying speech processing in children so young. To better understand the physiological constraints these adverse listening scenarios impose on speech sound coding during early childhood, auditory-neurophysiological responses were elicited to a consonant-vowel syllable in quiet and background noise in a cohort of typically-developing preschoolers (ages 3-5 yr). Overall, responses were degraded in noise: they were smaller, less stable across trials, slower, and there was poorer coding of spectral content and the temporal envelope. These effects were exacerbated in response to the consonant transition relative to the vowel, suggesting that the neural coding of spectrotemporally-dynamic speech features is more tenuous in noise than the coding of static features-even in children this young. Neural coding of speech temporal fine structure, however, was more resilient to the addition of background noise than coding of temporal envelope information. Taken together, these results demonstrate that noise places a neurophysiological constraint on speech processing during early childhood by causing a breakdown in neural processing of speech acoustics. These results may explain why some listeners have inordinate difficulties understanding speech in noise. Speech-elicited auditory-neurophysiological responses offer objective insight into listening skills during early childhood by reflecting the integrity of neural coding in quiet and noise; this paper documents typical response
Reading your own lips: common-coding theory and visual speech perception.

Science.gov (United States)

Tye-Murray, Nancy; Spehar, Brent P; Myerson, Joel; Hale, Sandra; Sommers, Mitchell S

2013-02-01

Common-coding theory posits that (1) perceiving an action activates the same representations of motor plans that are activated by actually performing that action, and (2) because of individual differences in the ways that actions are performed, observing recordings of one's own previous behavior activates motor plans to an even greater degree than does observing someone else's behavior. We hypothesized that if observing oneself activates motor plans to a greater degree than does observing others, and if these activated plans contribute to perception, then people should be able to lipread silent video clips of their own previous utterances more accurately than they can lipread video clips of other talkers. As predicted, two groups of participants were able to lipread video clips of themselves, recorded more than two weeks earlier, significantly more accurately than video clips of others. These results suggest that visual input activates speech motor activity that links to word representations in the mental lexicon.
Reviewing the connection between speech and obstructive sleep apnea.

Science.gov (United States)

Espinoza-Cuadros, Fernando; Fernández-Pozo, Rubén; Toledano, Doroteo T; Alcázar-Ramírez, José D; López-Gonzalo, Eduardo; Hernández-Gómez, Luis A

2016-02-20

Sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). The altered UA structure or function in OSA speakers has led to hypothesize the automatic analysis of speech for OSA assessment. In this paper we critically review several approaches using speech analysis and machine learning techniques for OSA detection, and discuss the limitations that can arise when using machine learning techniques for diagnostic applications. A large speech database including 426 male Spanish speakers suspected to suffer OSA and derived to a sleep disorders unit was used to study the clinical validity of several proposals using machine learning techniques to predict the apnea-hypopnea index (AHI) or classify individuals according to their OSA severity. AHI describes the severity of patients' condition. We first evaluate AHI prediction using state-of-the-art speaker recognition technologies: speech spectral information is modelled using supervectors or i-vectors techniques, and AHI is predicted through support vector regression (SVR). Using the same database we then critically review several OSA classification approaches previously proposed. The influence and possible interference of other clinical variables or characteristics available for our OSA population: age, height, weight, body mass index, and cervical perimeter, are also studied. The poor results obtained when estimating AHI using supervectors or i-vectors followed by SVR contrast with the positive results reported by previous research. This fact prompted us to a careful review of these approaches, also testing some reported results over our database. Several methodological limitations and deficiencies were detected that may have led to overoptimistic results. The methodological deficiencies observed after critically reviewing previous research can be relevant examples of potential pitfalls when using machine learning techniques for
Personality in speech assessment and automatic classification

CERN Document Server

Polzehl, Tim

2015-01-01

This work combines interdisciplinary knowledge and experience from research fields of psychology, linguistics, audio-processing, machine learning, and computer science. The work systematically explores a novel research topic devoted to automated modeling of personality expression from speech. For this aim, it introduces a novel personality assessment questionnaire and presents the results of extensive labeling sessions to annotate the speech data with personality assessments. It provides estimates of the Big 5 personality traits, i.e. openness, conscientiousness, extroversion, agreeableness, and neuroticism. Based on a database built on the questionnaire, the book presents models to tell apart different personality types or classes from speech automatically.

Machine medical ethics

CERN Document Server

Pontier, Matthijs

2015-01-01

The essays in this book, written by researchers from both humanities and sciences, describe various theoretical and experimental approaches to adding medical ethics to a machine in medical settings. Medical machines are in close proximity with human beings, and getting closer: with patients who are in vulnerable states of health, who have disabilities of various kinds, with the very young or very old, and with medical professionals. In such contexts, machines are undertaking important medical tasks that require emotional sensitivity, knowledge of medical codes, human dignity, and privacy. As machine technology advances, ethical concerns become more urgent: should medical machines be programmed to follow a code of medical ethics? What theory or theories should constrain medical machine conduct? What design features are required? Should machines share responsibility with humans for the ethical consequences of medical actions? How ought clinical relationships involving machines to be modeled? Is a capacity for e...
Improved Methods for Pitch Synchronous Linear Prediction Analysis of Speech

OpenAIRE

劉, 麗清

2015-01-01

Linear prediction (LP) analysis has been applied to speech system over the last few decades. LP technique is well-suited for speech analysis due to its ability to model speech production process approximately. Hence LP analysis has been widely used for speech enhancement, low-bit-rate speech coding in cellular telephony, speech recognition, characteristic parameter extraction (vocal tract resonances frequencies, fundamental frequency called pitch) and so on. However, the performance of the co...
The cognitive approach to conscious machines

CERN Document Server

Haikonen, Pentti O

2003-01-01

Could a machine have an immaterial mind? The author argues that true conscious machines can be built, but rejects artificial intelligence and classical neural networks in favour of the emulation of the cognitive processes of the brain-the flow of inner speech, inner imagery and emotions. This results in a non-numeric meaning-processing machine with distributed information representation and system reactions. It is argued that this machine would be conscious; it would be aware of its own existence and its mental content and perceive this as immaterial. Novel views on consciousness and the mind-
Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

Directory of Open Access Journals (Sweden)

Heracleous Panikos

2007-01-01

Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.
Current trends in small vocabulary speech recognition for equipment control

Science.gov (United States)

Doukas, Nikolaos; Bardis, Nikolaos G.

2017-09-01

Speech recognition systems allow human - machine communication to acquire an intuitive nature that approaches the simplicity of inter - human communication. Small vocabulary speech recognition is a subset of the overall speech recognition problem, where only a small number of words need to be recognized. Speaker independent small vocabulary recognition can find significant applications in field equipment used by military personnel. Such equipment may typically be controlled by a small number of commands that need to be given quickly and accurately, under conditions where delicate manual operations are difficult to achieve. This type of application could hence significantly benefit by the use of robust voice operated control components, as they would facilitate the interaction with their users and render it much more reliable in times of crisis. This paper presents current challenges involved in attaining efficient and robust small vocabulary speech recognition. These challenges concern feature selection, classification techniques, speaker diversity and noise effects. A state machine approach is presented that facilitates the voice guidance of different equipment in a variety of situations.
A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

Science.gov (United States)

Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong

2012-01-01

Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.
A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

Directory of Open Access Journals (Sweden)

Ai-bing Zhang

Full Text Available Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish and two representing non-coding ITS barcodes (rust fungi and brown algae. Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ and Maximum likelihood (ML methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40% for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37% for 1094 brown algae queries, both using ITS barcodes.
Using the TED Talks to Evaluate Spoken Post-editing of Machine Translation

DEFF Research Database (Denmark)

Liyanapathirana, Jeevanthi; Popescu-Belis, Andrei

2016-01-01

This paper presents a solution to evaluate spoken post-editing of imperfect machine translation output by a human translator. We compare two approaches to the combination of machine translation (MT) and automatic speech recognition (ASR): a heuristic algorithm and a machine learning method...
Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

Directory of Open Access Journals (Sweden)

Hiroshi Saruwatari

2007-01-01

Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a 93.9% word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.
Soft computing in machine learning

CERN Document Server

Park, Jooyoung; Inoue, Atsushi

2014-01-01

As users or consumers are now demanding smarter devices, intelligent systems are revolutionizing by utilizing machine learning. Machine learning as part of intelligent systems is already one of the most critical components in everyday tools ranging from search engines and credit card fraud detection to stock market analysis. You can train machines to perform some things, so that they can automatically detect, diagnose, and solve a variety of problems. The intelligent systems have made rapid progress in developing the state of the art in machine learning based on smart and deep perception. Using machine learning, the intelligent systems make widely applications in automated speech recognition, natural language processing, medical diagnosis, bioinformatics, and robot locomotion. This book aims at introducing how to treat a substantial amount of data, to teach machines and to improve decision making models. And this book specializes in the developments of advanced intelligent systems through machine learning. It...
Speech Recognition for the iCub Platform

Directory of Open Access Journals (Sweden)

Bertrand Higy

2018-02-01

Full Text Available This paper describes open source software (available at https://github.com/robotology/natural-speech to build automatic speech recognition (ASR systems and run them within the YARP platform. The toolkit is designed (i to allow non-ASR experts to easily create their own ASR system and run it on iCub and (ii to build deep learning-based models specifically addressing the main challenges an ASR system faces in the context of verbal human–iCub interactions. The toolkit mostly consists of Python, C++ code and shell scripts integrated in YARP. As additional contribution, a second codebase (written in Matlab is provided for more expert ASR users who want to experiment with bio-inspired and developmental learning-inspired ASR systems. Specifically, we provide code for two distinct kinds of speech recognition: “articulatory” and “unsupervised” speech recognition. The first is largely inspired by influential neurobiological theories of speech perception which assume speech perception to be mediated by brain motor cortex activities. Our articulatory systems have been shown to outperform strong deep learning-based baselines. The second type of recognition systems, the “unsupervised” systems, do not use any supervised information (contrary to most ASR systems, including our articulatory systems. To some extent, they mimic an infant who has to discover the basic speech units of a language by herself. In addition, we provide resources consisting of pre-trained deep learning models for ASR, and a 2.5-h speech dataset of spoken commands, the VoCub dataset, which can be used to adapt an ASR system to the typical acoustic environments in which iCub operates.
From Vocal Replication to Shared Combinatorial Speech Codes: A Small Step for Evolution, A Big Step for Language

Science.gov (United States)

Oudeyer, Pierre-Yves

Humans use spoken vocalizations, or their signed equivalent, as a physical support to carry language. This support is highly organized: vocalizations are built with the re-use of a small number of articulatory units, which are themselves discrete elements carved up by each linguistic community in the articulatory continuum. Moreover, the repertoires of these elementary units (the gestures, the phonemes, the morphemes) have a number of structural regularities: for example, while our vocal tract allows physically the production of hundreds of vowels, each language uses most often 5, and never more than 20 of them. Also, certain vowels are very frequent, like /a,e,i,o,u/, and some others are very rare, like /en/. All the speakers of a given linguistic community categorize the speech sounds in the same manner, and share the same repertoire of vocalizations. Speakers of different communities may have very different ways of categorizing sounds (for example, Chinese use tones to distinguish sounds), and repertoires of vocalizations. Such an organized physical support of language is crucial for the existence of language, and thus asking how it may have appeared in the biological and/or cultural history of humans is a fundamental questions. In particular, one can wonder how much the evolution of human speech codes relied on specific evolutionary innovations, and thus how difficult (or not) it was for speech to appear.
Non linear analyses of speech and prosody in Asperger's syndrome

DEFF Research Database (Denmark)

Fusaroli, Riccardo; Bang, Dan; Weed, Ethan

It is widely acknowledged that people on the ASD spectrum behave atypically in the way they modulate aspects of speech and voice, including pitch, fluency, and voice quality. ASD speech has been described at times as “odd”, “mechanical”, or “monotone”. However, it has proven difficult to quantify...... the results in a supervised machine-learning process to classify speech production as either belonging to the control or the AS group as well as to assess the severity of the disorder (as measured by Autism Spectrum Quotient), based solely on acoustic features....
Impact of dynamic rate coding aspects of mobile phone networks on forensic voice comparison.

Science.gov (United States)

Alzqhoul, Esam A S; Nair, Balamurali B T; Guillemin, Bernard J

2015-09-01

Previous studies have shown that landline and mobile phone networks are different in their ways of handling the speech signal, and therefore in their impact on it. But the same is also true of the different networks within the mobile phone arena. There are two major mobile phone technologies currently in use today, namely the global system for mobile communications (GSM) and code division multiple access (CDMA) and these are fundamentally different in their design. For example, the quality of the coded speech in the GSM network is a function of channel quality, whereas in the CDMA network it is determined by channel capacity (i.e., the number of users sharing a cell site). This paper examines the impact on the speech signal of a key feature of these networks, namely dynamic rate coding, and its subsequent impact on the task of likelihood-ratio-based forensic voice comparison (FVC). Surprisingly, both FVC accuracy and precision are found to be better for both GSM- and CDMA-coded speech than for uncoded. Intuitively one expects FVC accuracy to increase with increasing coded speech quality. This trend is shown to occur for the CDMA network, but, surprisingly, not for the GSM network. Further, in respect to comparisons between these two networks, FVC accuracy for CDMA-coded speech is shown to be slightly better than for GSM-coded speech, particularly when the coded-speech quality is high, but in terms of FVC precision the two networks are shown to be very similar. Copyright © 2015 The Chartered Society of Forensic Sciences. Published by Elsevier Ireland Ltd. All rights reserved.
Non-linear Dynamics of Speech in Schizophrenia

DEFF Research Database (Denmark)

Fusaroli, Riccardo; Simonsen, Arndis; Weed, Ethan

(regularity and complexity) of speech. Our aims are (1) to achieve a more fine-grained understanding of the speech patterns in schizophrenia than has previously been achieved using traditional, linear measures of prosody and fluency, and (2) to employ the results in a supervised machine-learning process......-effects inference. SANS and SAPS scores were predicted using a 10-fold cross-validated multiple linear regression. Both analyses were iterated 1000 to test for stability of results. Results: Voice dynamics allowed discrimination of patients with schizophrenia from healthy controls with a balanced accuracy of 85...
Autocoding State Machine in Erlang

DEFF Research Database (Denmark)

Guo, Yu; Hoffman, Torben; Gunder, Nicholas

2008-01-01

This paper presents an autocoding tool suit, which supports development of state machine in a model-driven fashion, where models are central to all phases of the development process. The tool suit, which is built on the Eclipse platform, provides facilities for the graphical specification...... of a state machine model. Once the state machine is specified, it is used as input to a code generation engine that generates source code in Erlang....
Random Deep Belief Networks for Recognizing Emotions from Speech Signals.

Science.gov (United States)

Wen, Guihua; Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang

2017-01-01

Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.
Random Deep Belief Networks for Recognizing Emotions from Speech Signals

Directory of Open Access Journals (Sweden)

Guihua Wen

2017-01-01

Full Text Available Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.
An introduction to quantum machine learning

OpenAIRE

Schuld, M.; Sinayskiy, I.; Petruccione, F.

2014-01-01

Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum compute...
Hybrid methodological approach to context-dependent speech recognition

Directory of Open Access Journals (Sweden)

Dragiša Mišković

2017-01-01

Full Text Available Although the importance of contextual information in speech recognition has been acknowledged for a long time now, it has remained clearly underutilized even in state-of-the-art speech recognition systems. This article introduces a novel, methodologically hybrid approach to the research question of context-dependent speech recognition in human–machine interaction. To the extent that it is hybrid, the approach integrates aspects of both statistical and representational paradigms. We extend the standard statistical pattern-matching approach with a cognitively inspired and analytically tractable model with explanatory power. This methodological extension allows for accounting for contextual information which is otherwise unavailable in speech recognition systems, and using it to improve post-processing of recognition hypotheses. The article introduces an algorithm for evaluation of recognition hypotheses, illustrates it for concrete interaction domains, and discusses its implementation within two prototype conversational agents.

Do North Carolina Students Have Freedom of Speech? A Review of Campus Speech Codes

Science.gov (United States)

Robinson, Jenna Ashley

2010-01-01

America's colleges and universities are supposed to be strongholds of classically liberal ideals, including the protection of individual rights and openness to debate and inquiry. Too often, this is not the case. Across the country, universities deny students and faculty their fundamental rights to freedom of speech and expression. The report…
Fishing for meaningful units in connected speech

DEFF Research Database (Denmark)

Henrichsen, Peter Juel; Christiansen, Thomas Ulrich

2009-01-01

In many branches of spoken language analysis including ASR, the set of smallest meaningful units of speech is taken to coincide with the set of phones or phonemes. However, fishing for phones is difficult, error-prone, and computationally expensive. We present an experiment, based on machine...
What makes an automated teller machine usable by blind users?

Science.gov (United States)

Manzke, J M; Egan, D H; Felix, D; Krueger, H

1998-07-01

Fifteen blind and sighted subjects, who featured as a control group for acceptance, were asked for their requirements for automated teller machines (ATMs). Both groups also tested the usability of a partially operational ATM mock-up. This machine was based on an existing cash dispenser, providing natural speech output, different function menus and different key arrangements. Performance and subjective evaluation data of blind and sighted subjects were collected. All blind subjects were able to operate the ATM successfully. The implemented speech output was the main usability factor for them. The different interface designs did not significantly affect performance and subjective evaluation. Nevertheless, design recommendations can be derived from the requirement assessment. The sighted subjects were rather open for design modifications, especially the implementation of speech output. However, there was also a mismatch of the requirements of the two subject groups, mainly concerning the key arrangement.
Brain cells in the avian 'prefrontal cortex' code for features of slot-machine-like gambling.

Directory of Open Access Journals (Sweden)

Damian Scarf

2011-01-01

Full Text Available Slot machines are the most common and addictive form of gambling. In the current study, we recorded from single neurons in the 'prefrontal cortex' of pigeons while they played a slot-machine-like task. We identified four categories of neurons that coded for different aspects of our slot-machine-like task. Reward-Proximity neurons showed a linear increase in activity as the opportunity for a reward drew near. I-Won neurons fired only when the fourth stimulus of a winning (four-of-a-kind combination was displayed. I-Lost neurons changed their firing rate at the presentation of the first nonidentical stimulus, that is, when it was apparent that no reward was forthcoming. Finally, Near-Miss neurons also changed their activity the moment it was recognized that a reward was no longer available, but more importantly, the activity level was related to whether the trial contained one, two, or three identical stimuli prior to the display of the nonidentical stimulus. These findings not only add to recent neurophysiological research employing simulated gambling paradigms, but also add to research addressing the functional correspondence between the avian NCL and primate PFC.
Objective voice and speech analysis of persons with chronic hoarseness by prosodic analysis of speech samples.

Science.gov (United States)

Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar

2016-10-01

Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.
Bi-Modal Face and Speech Authentication: a BioLogin Demonstration System

OpenAIRE

Marcel, Sébastien; Mariéthoz, Johnny; Rodriguez, Yann; Cardinaux, Fabien

2006-01-01

This paper presents a bi-modal (face and speech) authentication demonstration system that simulates the login of a user using its face and its voice. This demonstration is called BioLogin. It runs both on Linux and Windows and the Windows version is freely available for download. Bio\\-Login is implemented using an open source machine learning library and its machine vision package.
Particle-in-cell plasma simulation codes on the connection machine

International Nuclear Information System (INIS)

Walker, D.W.

1991-01-01

Methods for implementing three-dimensional, electromagnetic, relativistic PIC plasma simulation codes on the Connection Machine (CM-2) are discussed. The gather and scatter phases of the PIC algorithm involve indirect indexing of data, which results in a large amount of communication on the CM-2. Different data decompositions are described that seek to reduce the amount of communication while maintaining good load balance. These methods require the particles to be spatially sorted at the start of each time step, which introduced another form of overhead. The different methods are implemented in CM Fortran on the CM-2 and compared. It was found that the general router is slow in performing the communication in the gather and scatter steps, which precludes an efficient CM Fortran implementation. An alternative method that uses PARIS calls and the NEWS communication network to pipeline data along the axes of the VP set is suggested as a more efficient algorithm
Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes.

Science.gov (United States)

Meyer, Bernd T; Brand, Thomas; Kollmeier, Birger

2011-01-01

The aim of this study is to quantify the gap between the recognition performance of human listeners and an automatic speech recognition (ASR) system with special focus on intrinsic variations of speech, such as speaking rate and effort, altered pitch, and the presence of dialect and accent. Second, it is investigated if the most common ASR features contain all information required to recognize speech in noisy environments by using resynthesized ASR features in listening experiments. For the phoneme recognition task, the ASR system achieved the human performance level only when the signal-to-noise ratio (SNR) was increased by 15 dB, which is an estimate for the human-machine gap in terms of the SNR. The major part of this gap is attributed to the feature extraction stage, since human listeners achieve comparable recognition scores when the SNR difference between unaltered and resynthesized utterances is 10 dB. Intrinsic variabilities result in strong increases of error rates, both in human speech recognition (HSR) and ASR (with a relative increase of up to 120%). An analysis of phoneme duration and recognition rates indicates that human listeners are better able to identify temporal cues than the machine at low SNRs, which suggests incorporating information about the temporal dynamics of speech into ASR systems.
OLIVE: Speech-Based Video Retrieval

NARCIS (Netherlands)

de Jong, Franciska M.G.; Gauvain, Jean-Luc; den Hartog, Jurgen; den Hartog, Jeremy; Netter, Klaus

1999-01-01

This paper describes the Olive project which aims to support automated indexing of video material by use of human language technologies. Olive is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which serve as the
Untyped Memory in the Java Virtual Machine

DEFF Research Database (Denmark)

Gal, Andreas; Probst, Christian; Franz, Michael

2005-01-01

We have implemented a virtual execution environment that executes legacy binary code on top of the type-safe Java Virtual Machine by recompiling native code instructions to type-safe bytecode. As it is essentially impossible to infer static typing into untyped machine code, our system emulates...... untyped memory on top of Java’s type system. While this approach allows to execute native code on any off-the-shelf JVM, the resulting runtime performance is poor. We propose a set of virtual machine extensions that add type-unsafe memory objects to JVM. We contend that these JVM extensions do not relax...... Java’s type system as the same functionality can be achieved in pure Java, albeit much less efficiently....
78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

Science.gov (United States)

2013-08-15

...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...
Social Robotics in Therapy of Apraxia of Speech

Directory of Open Access Journals (Sweden)

José Carlos Castillo

2018-01-01

Full Text Available Apraxia of speech is a motor speech disorder in which messages from the brain to the mouth are disrupted, resulting in an inability for moving lips or tongue to the right place to pronounce sounds correctly. Current therapies for this condition involve a therapist that in one-on-one sessions conducts the exercises. Our aim is to work in the line of robotic therapies in which a robot is able to perform partially or autonomously a therapy session, endowing a social robot with the ability of assisting therapists in apraxia of speech rehabilitation exercises. Therefore, we integrate computer vision and machine learning techniques to detect the mouth pose of the user and, on top of that, our social robot performs autonomously the different steps of the therapy using multimodal interaction.
Emotion Recognition of Speech Signals Based on Filter Methods

Directory of Open Access Journals (Sweden)

Narjes Yazdanian

2016-10-01

Full Text Available Speech is the basic mean of communication among human beings.With the increase of transaction between human and machine, necessity of automatic dialogue and removing human factor has been considered. The aim of this study was to determine a set of affective features the speech signal is based on emotions. In this study system was designs that include three mains sections, features extraction, features selection and classification. After extraction of useful features such as, mel frequency cepstral coefficient (MFCC, linear prediction cepstral coefficients (LPC, perceptive linear prediction coefficients (PLP, ferment frequency, zero crossing rate, cepstral coefficients and pitch frequency, Mean, Jitter, Shimmer, Energy, Minimum, Maximum, Amplitude, Standard Deviation, at a later stage with filter methods such as Pearson Correlation Coefficient, t-test, relief and information gain, we came up with a method to rank and select effective features in emotion recognition. Then Result, are given to the classification system as a subset of input. In this classification stage, multi support vector machine are used to classify seven type of emotion. According to the results, that method of relief, together with multi support vector machine, has the most classification accuracy with emotion recognition rate of 93.94%.
Human speech articulator measurements using low power, 2GHz Homodyne sensors

International Nuclear Information System (INIS)

Barnes, T; Burnett, G C; Holzrichter, J F

1999-01-01

Very low power, short-range microwave ''radar-like'' sensors can measure the motions and vibrations of internal human speech articulators as speech is produced. In these animate (and also in inanimate acoustic systems) microwave sensors can measure vibration information associated with excitation sources and other interfaces. These data, together with the corresponding acoustic data, enable the calculation of system transfer functions. This information appears to be useful for a surprisingly wide range of applications such as speech coding and recognition, speaker or object identification, speech and musical instrument synthesis, noise cancellation, and other applications
Using Peephole Optimization on Intermediate Code

NARCIS (Netherlands)

Tanenbaum, A.S.; van Staveren, H.; Stevenson, J.W.

1982-01-01

Many portable compilers generate an intermediate code that is subsequently translated into the target machine's assembly language. In this paper a stack-machine-based intermediate code suitable for algebraic languages (e.g., PASCAL, C, FORTRAN) and most byte-addressed mini- and microcomputers is
Identifying Deceptive Speech Across Cultures

Science.gov (United States)

2016-06-25

enough from the truth. Subjects were then interviewed individually in a sound booth to obtain “norming” speech data, pre- interview. We also...e.g. pitch, intensity, speaking rate, voice quality), gender, ethnicity and personality information, our machine learning experiments can classify...Have you ever been in trouble with the police?” vs. open-ended (e.g. “What is the last movie you saw that you really hated ?”) DISTRIBUTION A
LINGUISTIC ANALYSIS FOR THE BELARUSIAN CORPUS WITH THE APPLICATION OF NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING TECHNIQUES

Directory of Open Access Journals (Sweden)

Yu. S. Hetsevich

2017-01-01

Full Text Available The article focuses on the problems existing in text-to-speech synthesis. Different morphological, lexical and syntactical elements were localized with the help of the Belarusian unit of NooJ program. Those types of errors, which occur in Belarusian texts, were analyzed and corrected. Language model and part of speech tagging model were built. The natural language processing of Belarusian corpus with the help of developed algorithm using machine learning was carried out. The precision of developed models of machine learning has been 80–90 %. The dictionary was enriched with new words for the further using it in the systems of Belarusian speech synthesis.
Research on the optoacoustic communication system for speech transmission by variable laser-pulse repetition rates

Science.gov (United States)

Jiang, Hongyan; Qiu, Hongbing; He, Ning; Liao, Xin

2018-06-01

For the optoacoustic communication from in-air platforms to submerged apparatus, a method based on speech recognition and variable laser-pulse repetition rates is proposed, which realizes character encoding and transmission for speech. Firstly, the theories and spectrum characteristics of the laser-generated underwater sound are analyzed; and moreover character conversion and encoding for speech as well as the pattern of codes for laser modulation is studied; lastly experiments to verify the system design are carried out. Results show that the optoacoustic system, where laser modulation is controlled by speech-to-character baseband codes, is beneficial to improve flexibility in receiving location for underwater targets as well as real-time performance in information transmission. In the overwater transmitter, a pulse laser is controlled to radiate by speech signals with several repetition rates randomly selected in the range of one to fifty Hz, and then in the underwater receiver laser pulse repetition rate and data can be acquired by the preamble and information codes of the corresponding laser-generated sound. When the energy of the laser pulse is appropriate, real-time transmission for speaker-independent speech can be realized in that way, which solves the problem of underwater bandwidth resource and provides a technical approach for the air-sea communication.
Evaluation of pitch coding alternatives for vibrotactile stimulation in speech training of the deaf

Energy Technology Data Exchange (ETDEWEB)

Barbacena, I L; Barros, A T [CEFET/PB, Joao Pessoa - PB (Brazil); Freire, R C S [DEE, UFCG, Campina Grande-PB (Brazil); Vieira, E C A [CEFET/PB, Joao Pessoa - PB (Brazil)

2007-11-15

Use of vibrotactile feedback stimulation as an aid for speech vocalization by the hearing impaired or deaf is reviewed. Architecture of a vibrotactile based speech therapy system is proposed. Different formulations for encoding the fundamental frequency of the vocalized speech into the pulsed stimulation frequency are proposed and investigated. Simulation results are also presented to obtain a comparative evaluation of the effectiveness of the different formulated transformations. Results of the perception sensitivity to the vibrotactile stimulus frequency to verify effectiveness of the above transformations are included.
Evaluation of pitch coding alternatives for vibrotactile stimulation in speech training of the deaf

International Nuclear Information System (INIS)

Barbacena, I L; Barros, A T; Freire, R C S; Vieira, E C A

2007-01-01

Use of vibrotactile feedback stimulation as an aid for speech vocalization by the hearing impaired or deaf is reviewed. Architecture of a vibrotactile based speech therapy system is proposed. Different formulations for encoding the fundamental frequency of the vocalized speech into the pulsed stimulation frequency are proposed and investigated. Simulation results are also presented to obtain a comparative evaluation of the effectiveness of the different formulated transformations. Results of the perception sensitivity to the vibrotactile stimulus frequency to verify effectiveness of the above transformations are included

Trunnion Collar Removal Machine - Gap Analysis Table

International Nuclear Information System (INIS)

Johnson, M.

2005-01-01

The purpose of this document is to review the existing the trunnion collar removal machine against the ''Nuclear Safety Design Bases for License Application'' (NSDB) [Ref. 10] requirements and to identify codes and standards and supplemental requirements to meet these requirements. If these codes and standards can not fully meet these requirements then a ''gap'' is identified. These gaps will be identified here and addressed using the ''Trunnion Collar Removal Machine Design Development Plan'' [Ref. 15]. The codes and standards, supplemental requirements, and design development requirements for the trunnion collar removal machine are provided in the gap analysis table (Appendix A, Table 1). Because the trunnion collar removal machine is credited with performing functions important to safety (ITS) in the NSDB [Ref. 10], design basis requirements are applicable to ensure equipment is available and performs required safety functions when needed. The gap analysis table is used to identify design objectives and provide a means to satisfy safety requirements. To ensure that the trunnion collar removal machine performs required safety functions and meets performance criteria, this portion of the gap analysis tables supplies codes and standards sections and the supplemental requirements and identifies design development requirements, if needed
Speech-to-Speech Relay Service

Science.gov (United States)

Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...
Verbal Short-Term Memory Span in Speech-Disordered Children: Implications for Articulatory Coding in Short-Term Memory.

Science.gov (United States)

Raine, Adrian; And Others

1991-01-01

Children with speech disorders had lower short-term memory capacity and smaller word length effect than control children. Children with speech disorders also had reduced speech-motor activity during rehearsal. Results suggest that speech rate may be a causal determinant of verbal short-term memory capacity. (BC)
Normativity in 18th century discourse on speech.

Science.gov (United States)

MacNamee, T

1984-11-01

Eighteenth century phoneticians, such as Dodart, Ferrein, and Hellwag, extended the taxonomy of visible articulatory processes into the realm of the invisible, notably with the exploration of the voicing mechanism. Remedial initiatives were not simply confined to consideration of the outward manifestations of speech and its disorders: The work of Haller, Kuestner, and Morgagni shows an acute awareness of the nervous organization underlying verbal behavior. There was a characteristic preoccupation with mechanical models of speech, which led to the attempts of Kempelen and other investigators to construct actual "speaking machines." Eighteenth century scholars regarded language as not only an innate capacity peculiar to human nature, but also as a bodily habit learned by experience. The function of the orthoepist was to teach the right speech habits, and the upward mobility of the bourgeoisie created a demand for his services.
Use of system code to estimate equilibrium tritium inventory in fusion DT machines, such as ARIES-AT and components testing facilities

International Nuclear Information System (INIS)

Wong, C.P.C.; Merrill, B.

2014-01-01

Highlights: • With the use of a system code, tritium burn-up fraction (f burn ) can be determined. • Initial tritium inventory for steady state DT machines can be estimated. • f burn of ARIES-AT, CFETR and FNSF-AT are in the range of 1–2.8%. • Respective total tritium inventories of are 7.6 kg, 6.1 kg, and 5.2 kg. - Abstract: ITER is under construction and will begin operation in 2020. This is the first 500 MW fusion class DT device, and since it is not going to breed tritium, it will consume most of the limited supply of tritium resources in the world. Yet, in parallel, DT fusion nuclear component testing machines will be needed to provide technical data for the design of DEMO. It becomes necessary to estimate the tritium burn-up fraction and corresponding initial tritium inventory and the doubling time of these machines for the planning of future supply and utilization of tritium. With the use of a system code, tritium burn-up fraction and initial tritium inventory for steady state DT machines can be estimated. Estimated tritium burn-up fractions of FNSF-AT, CFETR-R and ARIES-AT are in the range of 1–2.8%. Corresponding total equilibrium tritium inventories of the plasma flow and tritium processing system, and with the DCLL blanket option are 7.6 kg, 6.1 kg, and 5.2 kg for ARIES-AT, CFETR-R and FNSF-AT, respectively
The DNA of prophetic speech

African Journals Online (AJOL)

2014-03-04

Mar 4, 2014 ... It is expected that people will be drawn into the reality of God by authentic prophetic speech, .... strands of the DNA molecule show themselves to be arranged ... explains, chemical patterns act like the letters of a code, .... viewing the self-reflection regarding the ministry of renewal from the .... Irresistible force.
Human speech articulator measurements using low power, 2GHz Homodyne sensors

Energy Technology Data Exchange (ETDEWEB)

Barnes, T; Burnett, G C; Holzrichter, J F

1999-06-29

Very low power, short-range microwave ''radar-like'' sensors can measure the motions and vibrations of internal human speech articulators as speech is produced. In these animate (and also in inanimate acoustic systems) microwave sensors can measure vibration information associated with excitation sources and other interfaces. These data, together with the corresponding acoustic data, enable the calculation of system transfer functions. This information appears to be useful for a surprisingly wide range of applications such as speech coding and recognition, speaker or object identification, speech and musical instrument synthesis, noise cancellation, and other applications.
Criteria for Labelling Prosodic Aspects of English Speech.

Science.gov (United States)

Bagshaw, Paul C.; Williams, Briony J.

A study reports a set of labelling criteria which have been developed to label prosodic events in clear, continuous speech, and proposes a scheme whereby this information can be transcribed in a machine readable format. A prosody in a syllabic domain which is synchronized with a phonemic segmentation was annotated. A procedural definition of…
An introduction to quantum machine learning

Science.gov (United States)

Schuld, Maria; Sinayskiy, Ilya; Petruccione, Francesco

2015-04-01

Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum computer to the translation of stochastic methods into the language of quantum theory. This contribution gives a systematic overview of the emerging field of quantum machine learning. It presents the approaches as well as technical details in an accessible way, and discusses the potential of a future theory of quantum learning.
Neural Decoder for Topological Codes

Science.gov (United States)

Torlai, Giacomo; Melko, Roger G.

2017-07-01

We present an algorithm for error correction in topological codes that exploits modern machine learning techniques. Our decoder is constructed from a stochastic neural network called a Boltzmann machine, of the type extensively used in deep learning. We provide a general prescription for the training of the network and a decoding strategy that is applicable to a wide variety of stabilizer codes with very little specialization. We demonstrate the neural decoder numerically on the well-known two-dimensional toric code with phase-flip errors.
Implications of Sepedi/English code switching for ASR systems

CSIR Research Space (South Africa)

Modipa, TI

2013-12-01

Full Text Available . We also perform an initial acoustic analysis to determine the impact of such code switching on speech recognition performance. We nd that the frequency of code switching is unexpectedly high, and that the continuum of code switching (from unmodi ed...
Modelling the Architecture of Phonetic Plans: Evidence from Apraxia of Speech

Science.gov (United States)

Ziegler, Wolfram

2009-01-01

In theories of spoken language production, the gestural code prescribing the movements of the speech organs is usually viewed as a linear string of holistic, encapsulated, hard-wired, phonetic plans, e.g., of the size of phonemes or syllables. Interactions between phonetic units on the surface of overt speech are commonly attributed to either the…
Speech-To-Text Conversion STT System Using Hidden Markov Model HMM

Directory of Open Access Journals (Sweden)

Su Myat Mon

2015-06-01

Full Text Available Abstract Speech is an easiest way to communicate with each other. Speech processing is widely used in many applications like security devices household appliances cellular phones ATM machines and computers. The human computer interface has been developed to communicate or interact conveniently for one who is suffering from some kind of disabilities. Speech-to-Text Conversion STT systems have a lot of benefits for the deaf or dumb people and find their applications in our daily lives. In the same way the aim of the system is to convert the input speech signals into the text output for the deaf or dumb students in the educational fields. This paper presents an approach to extract features by using Mel Frequency Cepstral Coefficients MFCC from the speech signals of isolated spoken words. And Hidden Markov Model HMM method is applied to train and test the audio files to get the recognized spoken word. The speech database is created by using MATLAB.Then the original speech signals are preprocessed and these speech samples are extracted to the feature vectors which are used as the observation sequences of the Hidden Markov Model HMM recognizer. The feature vectors are analyzed in the HMM depending on the number of states.
Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

Science.gov (United States)

Ye, Sherry

2015-01-01

NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.
Use of system code to estimate equilibrium tritium inventory in fusion DT machines, such as ARIES-AT and components testing facilities

Energy Technology Data Exchange (ETDEWEB)

Wong, C.P.C., E-mail: wongc@fusion.gat.com [General Atomics, San Diego, CA (United States); Merrill, B. [Idaho National Laboratory, Idaho Falls, ID (United States)

2014-10-15

Highlights: • With the use of a system code, tritium burn-up fraction (f{sub burn}) can be determined. • Initial tritium inventory for steady state DT machines can be estimated. • f{sub burn} of ARIES-AT, CFETR and FNSF-AT are in the range of 1–2.8%. • Respective total tritium inventories of are 7.6 kg, 6.1 kg, and 5.2 kg. - Abstract: ITER is under construction and will begin operation in 2020. This is the first 500 MW{sub fusion} class DT device, and since it is not going to breed tritium, it will consume most of the limited supply of tritium resources in the world. Yet, in parallel, DT fusion nuclear component testing machines will be needed to provide technical data for the design of DEMO. It becomes necessary to estimate the tritium burn-up fraction and corresponding initial tritium inventory and the doubling time of these machines for the planning of future supply and utilization of tritium. With the use of a system code, tritium burn-up fraction and initial tritium inventory for steady state DT machines can be estimated. Estimated tritium burn-up fractions of FNSF-AT, CFETR-R and ARIES-AT are in the range of 1–2.8%. Corresponding total equilibrium tritium inventories of the plasma flow and tritium processing system, and with the DCLL blanket option are 7.6 kg, 6.1 kg, and 5.2 kg for ARIES-AT, CFETR-R and FNSF-AT, respectively.
SII-Based Speech Prepocessing for Intelligibility Improvement in Noise

DEFF Research Database (Denmark)

Taal, Cees H.; Jensen, Jesper

2013-01-01

filter sets certain frequency bands to zero when they do not contribute to intelligibility anymore. Experiments show large intelligibility improvements with the proposed method when used in stationary speech-shaped noise. However, it was also found that the method does not perform well for speech...... corrupted by a competing speaker. This is due to the fact that the SII is not a reliable intelligibility predictor for fluctuating noise sources. MATLAB code is provided....
Gesture-speech integration in children with specific language impairment.

Science.gov (United States)

Mainela-Arnold, Elina; Alibali, Martha W; Hostetter, Autumn B; Evans, Julia L

2014-11-01

Previous research suggests that speakers are especially likely to produce manual communicative gestures when they have relative ease in thinking about the spatial elements of what they are describing, paired with relative difficulty organizing those elements into appropriate spoken language. Children with specific language impairment (SLI) exhibit poor expressive language abilities together with within-normal-range nonverbal IQs. This study investigated whether weak spoken language abilities in children with SLI influence their reliance on gestures to express information. We hypothesized that these children would rely on communicative gestures to express information more often than their age-matched typically developing (TD) peers, and that they would sometimes express information in gestures that they do not express in the accompanying speech. Participants were 15 children with SLI (aged 5;6-10;0) and 18 age-matched TD controls. Children viewed a wordless cartoon and retold the story to a listener unfamiliar with the story. Children's gestures were identified and coded for meaning using a previously established system. Speech-gesture combinations were coded as redundant if the information conveyed in speech and gesture was the same, and non-redundant if the information conveyed in speech was different from the information conveyed in gesture. Children with SLI produced more gestures than children in the TD group; however, the likelihood that speech-gesture combinations were non-redundant did not differ significantly across the SLI and TD groups. In both groups, younger children were significantly more likely to produce non-redundant speech-gesture combinations than older children. The gesture-speech integration system functions similarly in children with SLI and TD, but children with SLI rely more on gesture to help formulate, conceptualize or express the messages they want to convey. This provides motivation for future research examining whether interventions
Cultural and biological evolution of phonemic speech

NARCIS (Netherlands)

de Boer, B.; Freitas, A.A.; Capcarrere, M.S.; Bentley, Peter J.; Johnson, Colin G.; Timmis, Jon

2005-01-01

This paper investigates the interaction between cultural evolution and biological evolution in the emergence of phonemic coding in speech. It is observed that our nearest relatives, the primates, use holistic utterances, whereas humans use phonemic utterances. It can therefore be argued that our
Towards advanced code simulators

International Nuclear Information System (INIS)

Scriven, A.H.

1990-01-01

The Central Electricity Generating Board (CEGB) uses advanced thermohydraulic codes extensively to support PWR safety analyses. A system has been developed to allow fully interactive execution of any code with graphical simulation of the operator desk and mimic display. The system operates in a virtual machine environment, with the thermohydraulic code executing in one virtual machine, communicating via interrupts with any number of other virtual machines each running other programs and graphics drivers. The driver code itself does not have to be modified from its normal batch form. Shortly following the release of RELAP5 MOD1 in IBM compatible form in 1983, this code was used as the driver for this system. When RELAP5 MOD2 became available, it was adopted with no changes needed in the basic system. Overall the system has been used for some 5 years for the analysis of LOBI tests, full scale plant studies and for simple what-if studies. For gaining rapid understanding of system dependencies it has proved invaluable. The graphical mimic system, being independent of the driver code, has also been used with other codes to study core rewetting, to replay results obtained from batch jobs on a CRAY2 computer system and to display suitably processed experimental results from the LOBI facility to aid interpretation. For the above work real-time execution was not necessary. Current work now centers on implementing the RELAP 5 code on a true parallel architecture machine. Marconi Simulation have been contracted to investigate the feasibility of using upwards of 100 processors, each capable of a peak of 30 MIPS to run a highly detailed RELAP5 model in real time, complete with specially written 3D core neutronics and balance of plant models. This paper describes the experience of using RELAP5 as an analyzer/simulator, and outlines the proposed methods and problems associated with parallel execution of RELAP5
Parallelization of the MAAP-A code neutronics/thermal hydraulics coupling

International Nuclear Information System (INIS)

Froehle, P.H.; Wei, T.Y.C.; Weber, D.P.; Henry, R.E.

1998-01-01

A major new feature, one-dimensional space-time kinetics, has been added to a developmental version of the MAAP code through the introduction of the DIF3D-K module. This code is referred to as MAAP-A. To reduce the overall job time required, a capability has been provided to run the MAAP-A code in parallel. The parallel version of MAAP-A utilizes two machines running in parallel, with the DIF3D-K module executing on one machine and the rest of the MAAP-A code executing on the other machine. Timing results obtained during the development of the capability indicate that reductions in time of 30--40% are possible. The parallel version can be run on two SPARC 20 (SUN OS 5.5) workstations connected through the ethernet. MPI (Message Passing Interface standard) needs to be implemented on the machines. If necessary the parallel version can also be run on only one machine. The results obtained running in this one-machine mode identically match the results obtained from the serial version of the code

Marital conflict and adjustment: speech nonfluencies in intimate disclosure.

Science.gov (United States)

Paul, E L; White, K M; Speisman, J C; Costos, D

1988-06-01

Speech nonfluency in response to questions about the marital relationship was used to assess anxiety. Subjects were 31 husbands and 31 wives, all white, college educated, from middle- to lower-middle-class families, and ranging from 20 to 30 years of age. Three types of nonfluencies were coded: filled pauses, unfilled pauses, and repetitions. Speech-disturbance ratios were computed by dividing the sum of speech nonfluencies by the total words spoken. The results support the notion that some issues within marriage are more sensitive and/or problematic than others, and that, in an interview situation, gender interacts with question content in the production of nonfluencies.
Analysis of Feature Extraction Methods for Speaker Dependent Speech Recognition

Directory of Open Access Journals (Sweden)

Gurpreet Kaur

2017-02-01

Full Text Available Speech recognition is about what is being said, irrespective of who is saying. Speech recognition is a growing field. Major progress is taking place on the technology of automatic speech recognition (ASR. Still, there are lots of barriers in this field in terms of recognition rate, background noise, speaker variability, speaking rate, accent etc. Speech recognition rate mainly depends on the selection of features and feature extraction methods. This paper outlines the feature extraction techniques for speaker dependent speech recognition for isolated words. A brief survey of different feature extraction techniques like Mel-Frequency Cepstral Coefficients (MFCC, Linear Predictive Coding Coefficients (LPCC, Perceptual Linear Prediction (PLP, Relative Spectra Perceptual linear Predictive (RASTA-PLP analysis are presented and evaluation is done. Speech recognition has various applications from daily use to commercial use. We have made a speaker dependent system and this system can be useful in many areas like controlling a patient vehicle using simple commands.
Machine-Learning Algorithms to Code Public Health Spending Accounts.

Science.gov (United States)

Brady, Eoghan S; Leider, Jonathon P; Resnick, Beth A; Alfonso, Y Natalia; Bishai, David

Government public health expenditure data sets require time- and labor-intensive manipulation to summarize results that public health policy makers can use. Our objective was to compare the performances of machine-learning algorithms with manual classification of public health expenditures to determine if machines could provide a faster, cheaper alternative to manual classification. We used machine-learning algorithms to replicate the process of manually classifying state public health expenditures, using the standardized public health spending categories from the Foundational Public Health Services model and a large data set from the US Census Bureau. We obtained a data set of 1.9 million individual expenditure items from 2000 to 2013. We collapsed these data into 147 280 summary expenditure records, and we followed a standardized method of manually classifying each expenditure record as public health, maybe public health, or not public health. We then trained 9 machine-learning algorithms to replicate the manual process. We calculated recall, precision, and coverage rates to measure the performance of individual and ensembled algorithms. Compared with manual classification, the machine-learning random forests algorithm produced 84% recall and 91% precision. With algorithm ensembling, we achieved our target criterion of 90% recall by using a consensus ensemble of ≥6 algorithms while still retaining 93% coverage, leaving only 7% of the summary expenditure records unclassified. Machine learning can be a time- and cost-saving tool for estimating public health spending in the United States. It can be used with standardized public health spending categories based on the Foundational Public Health Services model to help parse public health expenditure information from other types of health-related spending, provide data that are more comparable across public health organizations, and evaluate the impact of evidence-based public health resource allocation.
3D equilibrium codes for mirror machines

International Nuclear Information System (INIS)

Kaiser, T.B.

1983-01-01

The codes developed for cumputing three-dimensional guiding center equilibria for quadrupole tandem mirrors are discussed. TEBASCO (Tandem equilibrium and ballooning stability code) is a code developed at LLNL that uses a further expansion of the paraxial equilibrium equation in powers of β (plasma pressure/magnetic pressure). It has been used to guide the design of the TMX-U and MFTF-B experiments at Livermore. Its principal weakness is its perturbative nature, which renders its validity for high-β calculation open to question. In order to compute high-β equilibria, the reduced MHD technique that has been proven useful for determining toroidal equilibria was adapted to the tandem mirror geometry. In this approach, the paraxial expansion of the MHD equations yields a set of coupled nonlinear equations of motion valid for arbitrary β, that are solved as an initial-value problem. Two particular formulations have been implemented in computer codes developed at NYU/Kyoto U and LLNL. They differ primarily in the type of grid, the location of the lateral boundary and the damping techniques employed, and in the method of calculating pressure-balance equilibrium. Discussions on these codes are presented in this paper. (Kato, T.)
Continuous speech recognition with sparse coding

CSIR Research Space (South Africa)

Smit, WJ

2009-04-01

Full Text Available generative model. The spike train is classified by making use of a spike train model and dynamic programming. It is computationally expensive to find a sparse code. We use an iterative subset selection algorithm with quadratic programming for this process...
Neuroscience-inspired computational systems for speech recognition under noisy conditions

Science.gov (United States)

Schafer, Phillip B.

Humans routinely recognize speech in challenging acoustic environments with background music, engine sounds, competing talkers, and other acoustic noise. However, today's automatic speech recognition (ASR) systems perform poorly in such environments. In this dissertation, I present novel methods for ASR designed to approach human-level performance by emulating the brain's processing of sounds. I exploit recent advances in auditory neuroscience to compute neuron-based representations of speech, and design novel methods for decoding these representations to produce word transcriptions. I begin by considering speech representations modeled on the spectrotemporal receptive fields of auditory neurons. These representations can be tuned to optimize a variety of objective functions, which characterize the response properties of a neural population. I propose an objective function that explicitly optimizes the noise invariance of the neural responses, and find that it gives improved performance on an ASR task in noise compared to other objectives. The method as a whole, however, fails to significantly close the performance gap with humans. I next consider speech representations that make use of spiking model neurons. The neurons in this method are feature detectors that selectively respond to spectrotemporal patterns within short time windows in speech. I consider a number of methods for training the response properties of the neurons. In particular, I present a method using linear support vector machines (SVMs) and show that this method produces spikes that are robust to additive noise. I compute the spectrotemporal receptive fields of the neurons for comparison with previous physiological results. To decode the spike-based speech representations, I propose two methods designed to work on isolated word recordings. The first method uses a classical ASR technique based on the hidden Markov model. The second method is a novel template-based recognition scheme that takes
Family Worlds: Couple Satisfaction, Parenting Style, and Mothers' and Fathers' Speech to Young Children.

Science.gov (United States)

Pratt, Michael W.; And Others

1992-01-01

Investigated relations between certain family context variables and the conversational behavior of 36 parents who were playing with their 3 year olds. Transcripts were coded for types of conversational functions and structure of parent speech. Marital satisfaction was associated with aspects of parent speech. (LB)
Voice Activity Detection Using Fuzzy Entropy and Support Vector Machine

Directory of Open Access Journals (Sweden)

R. Johny Elton

2016-08-01

Full Text Available This paper proposes support vector machine (SVM based voice activity detection using FuzzyEn to improve detection performance under noisy conditions. The proposed voice activity detection (VAD uses fuzzy entropy (FuzzyEn as a feature extracted from noise-reduced speech signals to train an SVM model for speech/non-speech classification. The proposed VAD method was tested by conducting various experiments by adding real background noises of different signal-to-noise ratios (SNR ranging from −10 dB to 10 dB to actual speech signals collected from the TIMIT database. The analysis proves that FuzzyEn feature shows better results in discriminating noise and corrupted noisy speech. The efficacy of the SVM classifier was validated using 10-fold cross validation. Furthermore, the results obtained by the proposed method was compared with those of previous standardized VAD algorithms as well as recently developed methods. Performance comparison suggests that the proposed method is proven to be more efficient in detecting speech under various noisy environments with an accuracy of 93.29%, and the FuzzyEn feature detects speech efficiently even at low SNR levels.
Kurzweil Reading Machine: A Partial Evaluation of Its Optical Character Recognition Error Rate.

Science.gov (United States)

Goodrich, Gregory L.; And Others

1979-01-01

A study designed to assess the ability of the Kurzweil reading machine (a speech reading device for the visually handicapped) to read three different type styles produced by five different means indicated that the machines tested had different error rates depending upon the means of producing the copy and upon the type style used. (Author/CL)
Lean coding machine. Facilities target productivity and job satisfaction with coding automation.

Science.gov (United States)

Rollins, Genna

2010-07-01

Facilities are turning to coding automation to help manage the volume of electronic documentation, streamlining workflow, boosting productivity, and increasing job satisfaction. As EHR adoption increases, computer-assisted coding may become a necessity, not an option.
Is talking to an automated teller machine natural and fun?

Science.gov (United States)

Chan, F Y; Khalid, H M

Usability and affective issues of using automatic speech recognition technology to interact with an automated teller machine (ATM) are investigated in two experiments. The first uncovered dialogue patterns of ATM users for the purpose of designing the user interface for a simulated speech ATM system. Applying the Wizard-of-Oz methodology, multiple mapping and word spotting techniques, the speech driven ATM accommodates bilingual users of Bahasa Melayu and English. The second experiment evaluates the usability of a hybrid speech ATM, comparing it with a simulated manual ATM. The aim is to investigate how natural and fun can talking to a speech ATM be for these first-time users. Subjects performed the withdrawal and balance enquiry tasks. The ANOVA was performed on the usability and affective data. The results showed significant differences between systems in the ability to complete the tasks as well as in transaction errors. Performance was measured on the time taken by subjects to complete the task and the number of speech recognition errors that occurred. On the basis of user emotions, it can be said that the hybrid speech system enabled pleasurable interaction. Despite the limitations of speech recognition technology, users are set to talk to the ATM when it becomes available for public use.
lncRScan-SVM: A Tool for Predicting Long Non-Coding RNAs Using Support Vector Machine.

Science.gov (United States)

Sun, Lei; Liu, Hui; Zhang, Lin; Meng, Jia

2015-01-01

Functional long non-coding RNAs (lncRNAs) have been bringing novel insight into biological study, however it is still not trivial to accurately distinguish the lncRNA transcripts (LNCTs) from the protein coding ones (PCTs). As various information and data about lncRNAs are preserved by previous studies, it is appealing to develop novel methods to identify the lncRNAs more accurately. Our method lncRScan-SVM aims at classifying PCTs and LNCTs using support vector machine (SVM). The gold-standard datasets for lncRScan-SVM model training, lncRNA prediction and method comparison were constructed according to the GENCODE gene annotations of human and mouse respectively. By integrating features derived from gene structure, transcript sequence, potential codon sequence and conservation, lncRScan-SVM outperforms other approaches, which is evaluated by several criteria such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and area under curve (AUC). In addition, several known human lncRNA datasets were assessed using lncRScan-SVM. LncRScan-SVM is an efficient tool for predicting the lncRNAs, and it is quite useful for current lncRNA study.
A sparse neural code for some speech sounds but not for others.

Directory of Open Access Journals (Sweden)

Mathias Scharinger

Full Text Available The precise neural mechanisms underlying speech sound representations are still a matter of debate. Proponents of 'sparse representations' assume that on the level of speech sounds, only contrastive or otherwise not predictable information is stored in long-term memory. Here, in a passive oddball paradigm, we challenge the neural foundations of such a 'sparse' representation; we use words that differ only in their penultimate consonant ("coronal" [t] vs. "dorsal" [k] place of articulation and for example distinguish between the German nouns Latz ([lats]; bib and Lachs ([laks]; salmon. Changes from standard [t] to deviant [k] and vice versa elicited a discernible Mismatch Negativity (MMN response. Crucially, however, the MMN for the deviant [lats] was stronger than the MMN for the deviant [laks]. Source localization showed this difference to be due to enhanced brain activity in right superior temporal cortex. These findings reflect a difference in phonological 'sparsity': Coronal [t] segments, but not dorsal [k] segments, are based on more sparse representations and elicit less specific neural predictions; sensory deviations from this prediction are more readily 'tolerated' and accordingly trigger weaker MMNs. The results foster the neurocomputational reality of 'representationally sparse' models of speech perception that are compatible with more general predictive mechanisms in auditory perception.
Subspace-Based Noise Reduction for Speech Signals via Diagonal and Triangular Matrix Decompositions

DEFF Research Database (Denmark)

Hansen, Per Christian; Jensen, Søren Holdt

2007-01-01

We survey the definitions and use of rank-revealing matrix decompositions in single-channel noise reduction algorithms for speech signals. Our algorithms are based on the rank-reduction paradigm and, in particular, signal subspace techniques. The focus is on practical working algorithms, using both...... with working Matlab code and applications in speech processing....
Variable Frame Rate and Length Analysis for Data Compression in Distributed Speech Recognition

DEFF Research Database (Denmark)

Kraljevski, Ivan; Tan, Zheng-Hua

2014-01-01

This paper addresses the issue of data compression in distributed speech recognition on the basis of a variable frame rate and length analysis method. The method first conducts frame selection by using a posteriori signal-to-noise ratio weighted energy distance to find the right time resolution...... length for steady regions. The method is applied to scalable source coding in distributed speech recognition where the target bitrate is met by adjusting the frame rate. Speech recognition results show that the proposed approach outperforms other compression methods in terms of recognition accuracy...... for noisy speech while achieving higher compression rates....
Machine Translation from Text

Science.gov (United States)

Habash, Nizar; Olive, Joseph; Christianson, Caitlin; McCary, John

Machine translation (MT) from text, the topic of this chapter, is perhaps the heart of the GALE project. Beyond being a well defined application that stands on its own, MT from text is the link between the automatic speech recognition component and the distillation component. The focus of MT in GALE is on translating from Arabic or Chinese to English. The three languages represent a wide range of linguistic diversity and make the GALE MT task rather challenging and exciting.
Monte Carlo codes and Monte Carlo simulator program

International Nuclear Information System (INIS)

Higuchi, Kenji; Asai, Kiyoshi; Suganuma, Masayuki.

1990-03-01

Four typical Monte Carlo codes KENO-IV, MORSE, MCNP and VIM have been vectorized on VP-100 at Computing Center, JAERI. The problems in vector processing of Monte Carlo codes on vector processors have become clear through the work. As the result, it is recognized that these are difficulties to obtain good performance in vector processing of Monte Carlo codes. A Monte Carlo computing machine, which processes the Monte Carlo codes with high performances is being developed at our Computing Center since 1987. The concept of Monte Carlo computing machine and its performance have been investigated and estimated by using a software simulator. In this report the problems in vectorization of Monte Carlo codes, Monte Carlo pipelines proposed to mitigate these difficulties and the results of the performance estimation of the Monte Carlo computing machine by the simulator are described. (author)
Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?

Directory of Open Access Journals (Sweden)

Héctor Delgado

2015-06-01

This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.
Freedom of Speech Wins in Wisconsin

Science.gov (United States)

Downs, Donald Alexander

2006-01-01

One might derive, from the eradication of a particularly heinous speech code, some encouragement that all is not lost in the culture wars. A core of dedicated scholars, working from within, made it obvious, to all but the most radical left, that imposing social justice by restricting thought and expression was a recipe for tyranny. Donald…
Neural Entrainment to Speech Modulates Speech Intelligibility

NARCIS (Netherlands)

Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne

2018-01-01

Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and

IEP goals for school-age children with speech sound disorders.

Science.gov (United States)

Farquharson, Kelly; Tambyraja, Sherine R; Justice, Laura M; Redle, Erin E

2014-01-01

The purpose of the current study was to describe the current state of practice for writing Individualized Education Program (IEP) goals for children with speech sound disorders (SSDs). IEP goals for 146 children receiving services for SSDs within public school systems across two states were coded for their dominant theoretical framework and overall quality. A dichotomous scheme was used for theoretical framework coding: cognitive-linguistic or sensory-motor. Goal quality was determined by examining 7 specific indicators outlined by an empirically tested rating tool. In total, 147 long-term and 490 short-term goals were coded. The results revealed no dominant theoretical framework for long-term goals, whereas short-term goals largely reflected a sensory-motor framework. In terms of quality, the majority of speech production goals were functional and generalizable in nature, but were not able to be easily targeted during common daily tasks or by other members of the IEP team. Short-term goals were consistently rated higher in quality domains when compared to long-term goals. The current state of practice for writing IEP goals for children with SSDs indicates that theoretical framework may be eclectic in nature and likely written to support the individual needs of children with speech sound disorders. Further investigation is warranted to determine the relations between goal quality and child outcomes. (1) Identify two predominant theoretical frameworks and discuss how they apply to IEP goal writing. (2) Discuss quality indicators as they relate to IEP goals for children with speech sound disorders. (3) Discuss the relationship between long-term goals level of quality and related theoretical frameworks. (4) Identify the areas in which business-as-usual IEP goals exhibit strong quality.
Comparative simulation of Stirling and Sibling cycle cryocoolers with two codes

International Nuclear Information System (INIS)

Mitchell, M.P.; Wilson, K.J.; Bauwens, L.

1989-01-01

The authors present a comparative analysis of Stirling and Sibling Cycle cryocoolers conducted with two different computer simulation codes. One code (CRYOWEISS) performs an initial analysis on the assumption of isothermal conditions in the machines and adjusts that result with decoupled loss calculations. The other code (MS*2) models fluid flows and heat transfers more realistically but ignores significant loss mechanisms, including flow friction and heat conduction through the metal of the machines. Surprisingly, MS*2 is less optimistic about performance of all machines even though it ignores losses that are modelled by CRYOWEISS. Comparison between constant-bore Stirling and Sibling machines shows that their performance is generally comparable over a range of temperatures, pressures and operating speeds. No machine was consistently superior or inferior according to both codes over the whole range of conditions studied
Psychoacoustic cues to emotion in speech prosody and music.

Science.gov (United States)

Coutinho, Eduardo; Dibben, Nicola

2013-01-01

There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.
Detecting Parkinson's disease from sustained phonation and speech signals.

Directory of Open Access Journals (Sweden)

Evaldas Vaiciukynas

Full Text Available This study investigates signals from sustained phonation and text-dependent speech modalities for Parkinson's disease screening. Phonation corresponds to the vowel /a/ voicing task and speech to the pronunciation of a short sentence in Lithuanian language. Signals were recorded through two channels simultaneously, namely, acoustic cardioid (AC and smart phone (SP microphones. Additional modalities were obtained by splitting speech recording into voiced and unvoiced parts. Information in each modality is summarized by 18 well-known audio feature sets. Random forest (RF is used as a machine learning algorithm, both for individual feature sets and for decision-level fusion. Detection performance is measured by the out-of-bag equal error rate (EER and the cost of log-likelihood-ratio. Essentia audio feature set was the best using the AC speech modality and YAAFE audio feature set was the best using the SP unvoiced modality, achieving EER of 20.30% and 25.57%, respectively. Fusion of all feature sets and modalities resulted in EER of 19.27% for the AC and 23.00% for the SP channel. Non-linear projection of a RF-based proximity matrix into the 2D space enriched medical decision support by visualization.
Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?

Directory of Open Access Journals (Sweden)

Héctor Delgado

2015-12-01

Full Text Available This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.
Automated analysis of free speech predicts psychosis onset in high-risk youths

Science.gov (United States)

Bedi, Gillinder; Carrillo, Facundo; Cecchi, Guillermo A; Slezak, Diego Fernández; Sigman, Mariano; Mota, Natália B; Ribeiro, Sidarta; Javitt, Daniel C; Copelli, Mauro; Corcoran, Cheryl M

2015-01-01

Background/Objectives: Psychiatry lacks the objective clinical tests routinely used in other specializations. Novel computerized methods to characterize complex behaviors such as speech could be used to identify and predict psychiatric illness in individuals. AIMS: In this proof-of-principle study, our aim was to test automated speech analyses combined with Machine Learning to predict later psychosis onset in youths at clinical high-risk (CHR) for psychosis. Methods: Thirty-four CHR youths (11 females) had baseline interviews and were assessed quarterly for up to 2.5 years; five transitioned to psychosis. Using automated analysis, transcripts of interviews were evaluated for semantic and syntactic features predicting later psychosis onset. Speech features were fed into a convex hull classification algorithm with leave-one-subject-out cross-validation to assess their predictive value for psychosis outcome. The canonical correlation between the speech features and prodromal symptom ratings was computed. Results: Derived speech features included a Latent Semantic Analysis measure of semantic coherence and two syntactic markers of speech complexity: maximum phrase length and use of determiners (e.g., which). These speech features predicted later psychosis development with 100% accuracy, outperforming classification from clinical interviews. Speech features were significantly correlated with prodromal symptoms. Conclusions: Findings support the utility of automated speech analysis to measure subtle, clinically relevant mental state changes in emergent psychosis. Recent developments in computer science, including natural language processing, could provide the foundation for future development of objective clinical tests for psychiatry. PMID:27336038
Neural Spike-Train Analyses of the Speech-Based Envelope Power Spectrum Model

Science.gov (United States)

Rallapalli, Varsha H.

2016-01-01

Diagnosing and treating hearing impairment is challenging because people with similar degrees of sensorineural hearing loss (SNHL) often have different speech-recognition abilities. The speech-based envelope power spectrum model (sEPSM) has demonstrated that the signal-to-noise ratio (SNRENV) from a modulation filter bank provides a robust speech-intelligibility measure across a wider range of degraded conditions than many long-standing models. In the sEPSM, noise (N) is assumed to: (a) reduce S + N envelope power by filling in dips within clean speech (S) and (b) introduce an envelope noise floor from intrinsic fluctuations in the noise itself. While the promise of SNRENV has been demonstrated for normal-hearing listeners, it has not been thoroughly extended to hearing-impaired listeners because of limited physiological knowledge of how SNHL affects speech-in-noise envelope coding relative to noise alone. Here, envelope coding to speech-in-noise stimuli was quantified from auditory-nerve model spike trains using shuffled correlograms, which were analyzed in the modulation-frequency domain to compute modulation-band estimates of neural SNRENV. Preliminary spike-train analyses show strong similarities to the sEPSM, demonstrating feasibility of neural SNRENV computations. Results suggest that individual differences can occur based on differential degrees of outer- and inner-hair-cell dysfunction in listeners currently diagnosed into the single audiological SNHL category. The predicted acoustic-SNR dependence in individual differences suggests that the SNR-dependent rate of susceptibility could be an important metric in diagnosing individual differences. Future measurements of the neural SNRENV in animal studies with various forms of SNHL will provide valuable insight for understanding individual differences in speech-in-noise intelligibility.
Natural speech algorithm applied to baseline interview data can predict which patients will respond to psilocybin for treatment-resistant depression.

Science.gov (United States)

Carrillo, Facundo; Sigman, Mariano; Fernández Slezak, Diego; Ashton, Philip; Fitzgerald, Lily; Stroud, Jack; Nutt, David J; Carhart-Harris, Robin L

2018-04-01

Natural speech analytics has seen some improvements over recent years, and this has opened a window for objective and quantitative diagnosis in psychiatry. Here, we used a machine learning algorithm applied to natural speech to ask whether language properties measured before psilocybin for treatment-resistant can predict for which patients it will be effective and for which it will not. A baseline autobiographical memory interview was conducted and transcribed. Patients with treatment-resistant depression received 2 doses of psilocybin, 10 mg and 25 mg, 7 days apart. Psychological support was provided before, during and after all dosing sessions. Quantitative speech measures were applied to the interview data from 17 patients and 18 untreated age-matched healthy control subjects. A machine learning algorithm was used to classify between controls and patients and predict treatment response. Speech analytics and machine learning successfully differentiated depressed patients from healthy controls and identified treatment responders from non-responders with a significant level of 85% of accuracy (75% precision). Automatic natural language analysis was used to predict effective response to treatment with psilocybin, suggesting that these tools offer a highly cost-effective facility for screening individuals for treatment suitability and sensitivity. The sample size was small and replication is required to strengthen inferences on these results. Copyright © 2018 Elsevier B.V. All rights reserved.
Difficulty understanding speech in noise by the hearing impaired: underlying causes and technological solutions.

Science.gov (United States)

Healy, Eric W; Yoho, Sarah E

2016-08-01

A primary complaint of hearing-impaired individuals involves poor speech understanding when background noise is present. Hearing aids and cochlear implants often allow good speech understanding in quiet backgrounds. But hearing-impaired individuals are highly noise intolerant, and existing devices are not very effective at combating background noise. As a result, speech understanding in noise is often quite poor. In accord with the significance of the problem, considerable effort has been expended toward understanding and remedying this issue. Fortunately, our understanding of the underlying issues is reasonably good. In sharp contrast, effective solutions have remained elusive. One solution that seems promising involves a single-microphone machine-learning algorithm to extract speech from background noise. Data from our group indicate that the algorithm is capable of producing vast increases in speech understanding by hearing-impaired individuals. This paper will first provide an overview of the speech-in-noise problem and outline why hearing-impaired individuals are so noise intolerant. An overview of our approach to solving this problem will follow.
An object-oriented extension for debugging the virtual machine

Energy Technology Data Exchange (ETDEWEB)

Pizzi, Jr, Robert G. [Univ. of California, Davis, CA (United States)

1994-12-01

A computer is nothing more then a virtual machine programmed by source code to perform a task. The program`s source code expresses abstract constructs which are compiled into some lower level target language. When a virtual machine breaks, it can be very difficult to debug because typical debuggers provide only low-level target implementation information to the software engineer. We believe that the debugging task can be simplified by introducing aspects of the abstract design and data into the source code. We introduce OODIE, an object-oriented extension to programming languages that allows programmers to specify a virtual environment by describing the meaning of the design and data of a virtual machine. This specification is translated into symbolic information such that an augmented debugger can present engineers with a programmable debugging environment specifically tailored for the virtual machine that is to be debugged.
FUSION DECISION FOR A BIMODAL BIOMETRIC VERIFICATION SYSTEM USING SUPPORT VECTOR MACHINE AND ITS VARIATIONS

Directory of Open Access Journals (Sweden)

A. Teoh

2017-12-01

Full Text Available This paw presents fusion detection technique comparisons based on support vector machine and its variations for a bimodal biometric verification system that makes use of face images and speech utterances. The system is essentially constructed by a face expert, a speech expert and a fusion decision module. Each individual expert has been optimized to operate in automatic mode and designed for security access application. Fusion decision schemes considered are linear, weighted Support Vector Machine (SVM and linear SVM with quadratic transformation. The conditions tested include the balanced and unbalanced conditions between the two experts in order to obtain the optimum fusion module from these techniques best suited to the target application.
Analysis of Parent, Teacher, and Consultant Speech Exchanges and Educational Outcomes of Students With Autism During COMPASS Consultation.

Science.gov (United States)

Ruble, Lisa; Birdwhistell, Jessie; Toland, Michael D; McGrew, John H

2011-01-01

The significant increase in the numbers of students with autism combined with the need for better trained teachers (National Research Council, 2001) call for research on the effectiveness of alternative methods, such as consultation, that have the potential to improve service delivery. Data from 2 randomized controlled single-blind trials indicate that an autism-specific consultation planning framework known as the collaborative model for promoting competence and success (COMPASS) is effective in increasing child Individual Education Programs (IEP) outcomes (Ruble, Dal-rymple, & McGrew, 2010; Ruble, McGrew, & Toland, 2011). In this study, we describe the verbal interactions, defined as speech acts and speech act exchanges that take place during COMPASS consultation, and examine the associations between speech exchanges and child outcomes. We applied the Psychosocial Processes Coding Scheme (Leaper, 1991) to code speech acts. Speech act exchanges were overwhelmingly affiliative, failed to show statistically significant relationships with child IEP outcomes and teacher adherence, but did correlate positively with IEP quality.
Integration of literacy into speech-language therapy: a descriptive analysis of treatment practices.

Science.gov (United States)

Tambyraja, Sherine R; Schmitt, Mary Beth; Justice, Laura M; Logan, Jessica A R; Schwarz, Sadie

2014-01-01

The purpose of the present study was: (a) to examine the extent to which speech-language therapy provided to children with language disorders in the schools targets code-based literacy skills (e.g., alphabet knowledge and phonological awareness) during business-as-usual treatment sessions, and (b) to determine whether literacy-focused therapy time was associated with factors specific to children and/or speech-language pathologists (SLPs). Participants were 151 kindergarten and first-grade children and 40 SLPs. Video-recorded therapy sessions were coded to determine the amount of time that addressed literacy. Assessments of children's literacy skills were administered as well as questionnaires regarding characteristics of SLPs (e.g., service delivery, professional development). Results showed that time spent addressing code-related literacy across therapy sessions was variable. Significant predictors included SLP years of experience, therapy location, and therapy session duration, such that children receiving services from SLPs with more years of experience, and/or who utilized the classroom for therapy, received more literacy-focused time. Additionally, children in longer therapy sessions received more therapy time on literacy skills. There is considerable variability in the extent to which children received literacy-focused time in therapy; however, SLP-level factors predict time spent in literacy more than child-level factors. Further research is needed to understand the nature of literacy-focused therapy in the public schools. Readers will be able to: (a) define code-based literacy skills, (b) discuss the role that speech-language pathologists have in fostering children's literacy development, and (c) identify key factors that may currently influence the inclusion of literacy targets in school-based speech-language therapy. Copyright © 2014 Elsevier Inc. All rights reserved.
Python for probability, statistics, and machine learning

CERN Document Server

Unpingco, José

2016-01-01

This book covers the key ideas that link probability, statistics, and machine learning illustrated using Python modules in these areas. The entire text, including all the figures and numerical results, is reproducible using the Python codes and their associated Jupyter/IPython notebooks, which are provided as supplementary downloads. The author develops key intuitions in machine learning by working meaningful examples using multiple analytical methods and Python codes, thereby connecting theoretical concepts to concrete implementations. Modern Python modules like Pandas, Sympy, and Scikit-learn are applied to simulate and visualize important machine learning concepts like the bias/variance trade-off, cross-validation, and regularization. Many abstract mathematical ideas, such as convergence in probability theory, are developed and illustrated with numerical examples. This book is suitable for anyone with an undergraduate-level exposure to probability, statistics, or machine learning and with rudimentary knowl...
Deriving Word Order in Code-Switching: Feature Inheritance and Light Verbs

Science.gov (United States)

Shim, Ji Young

2013-01-01

This dissertation investigates code-switching (CS), the concurrent use of more than one language in conversation, commonly observed in bilingual speech. Assuming that code-switching is subject to universal principles, just like monolingual grammar, the dissertation provides a principled account of code-switching, with particular emphasis on OV~VO…
Stable 1-Norm Error Minimization Based Linear Predictors for Speech Modeling

DEFF Research Database (Denmark)

Giacobello, Daniele; Christensen, Mads Græsbøll; Jensen, Tobias Lindstrøm

2014-01-01

In linear prediction of speech, the 1-norm error minimization criterion has been shown to provide a valid alternative to the 2-norm minimization criterion. However, unlike 2-norm minimization, 1-norm minimization does not guarantee the stability of the corresponding all-pole filter and can generate...... saturations when this is used to synthesize speech. In this paper, we introduce two new methods to obtain intrinsically stable predictors with the 1-norm minimization. The first method is based on constraining the roots of the predictor to lie within the unit circle by reducing the numerical range...... based linear prediction for modeling and coding of speech....
Machine-to-machine communications architectures, technology, standards, and applications

CERN Document Server

Misic, Vojislav B

2014-01-01

With the number of machine-to-machine (M2M)-enabled devices projected to reach 20 to 50 billion by 2020, there is a critical need to understand the demands imposed by such systems. Machine-to-Machine Communications: Architectures, Technology, Standards, and Applications offers rigorous treatment of the many facets of M2M communication, including its integration with current technology.Presenting the work of a different group of international experts in each chapter, the book begins by supplying an overview of M2M technology. It considers proposed standards, cutting-edge applications, architectures, and traffic modeling and includes case studies that highlight the differences between traditional and M2M communications technology.Details a practical scheme for the forward error correction code designInvestigates the effectiveness of the IEEE 802.15.4 low data rate wireless personal area network standard for use in M2M communicationsIdentifies algorithms that will ensure functionality, performance, reliability, ...
Influence of musical training on understanding voiced and whispered speech in noise.

Science.gov (United States)

Ruggles, Dorea R; Freyman, Richard L; Oxenham, Andrew J

2014-01-01

This study tested the hypothesis that the previously reported advantage of musicians over non-musicians in understanding speech in noise arises from more efficient or robust coding of periodic voiced speech, particularly in fluctuating backgrounds. Speech intelligibility was measured in listeners with extensive musical training, and in those with very little musical training or experience, using normal (voiced) or whispered (unvoiced) grammatically correct nonsense sentences in noise that was spectrally shaped to match the long-term spectrum of the speech, and was either continuous or gated with a 16-Hz square wave. Performance was also measured in clinical speech-in-noise tests and in pitch discrimination. Musicians exhibited enhanced pitch discrimination, as expected. However, no systematic or statistically significant advantage for musicians over non-musicians was found in understanding either voiced or whispered sentences in either continuous or gated noise. Musicians also showed no statistically significant advantage in the clinical speech-in-noise tests. Overall, the results provide no evidence for a significant difference between young adult musicians and non-musicians in their ability to understand speech in noise.
Perception of words and pitch patterns in song and speech

Directory of Open Access Journals (Sweden)

Julia eMerrill

2012-03-01

Full Text Available This fMRI study examines shared and distinct cortical areas involved in the auditory perception of song and speech at the level of their underlying constituents: words, pitch and rhythm. Univariate and multivariate analyses were performed on the brain activity patterns of six conditions, arranged in a subtractive hierarchy: sung sentences including words, pitch and rhythm; hummed speech prosody and song melody containing only pitch patterns and rhythm; as well as the pure musical or speech rhythm.Systematic contrasts between these balanced conditions following their hierarchical organization showed a great overlap between song and speech at all levels in the bilateral temporal lobe, but suggested a differential role of the inferior frontal gyrus (IFG and intraparietal sulcus (IPS in processing song and speech. The left IFG was involved in word- and pitch-related processing in speech, the right IFG in processing pitch in song.Furthermore, the IPS showed sensitivity to discrete pitch relations in song as opposed to the gliding pitch in speech. Finally, the superior temporal gyrus and premotor cortex coded for general differences between words and pitch patterns, irrespective of whether they were sung or spoken. Thus, song and speech share many features which are reflected in a fundamental similarity of brain areas involved in their perception. However, fine-grained acoustic differences on word and pitch level are reflected in the activity of IFG and IPS.
A Search Complexity Improvement of Vector Quantization to Immittance Spectral Frequency Coefficients in AMR-WB Speech Codec

Directory of Open Access Journals (Sweden)

Bing-Jhih Yao

2016-09-01

Full Text Available An adaptive multi-rate wideband (AMR-WB code is a speech codec developed on the basis of an algebraic code-excited linear-prediction (ACELP coding technique, and has a double advantage of low bit rates and high speech quality. This coding technique is widely used in modern mobile communication systems for a high speech quality in handheld devices. However, a major disadvantage is that a vector quantization (VQ of immittance spectral frequency (ISF coefficients occupies a significant computational load in the AMR-WB encoder. Hence, this paper presents a triangular inequality elimination (TIE algorithm combined with a dynamic mechanism and an intersection mechanism, abbreviated as the DI-TIE algorithm, to remarkably improve the complexity of ISF coefficient quantization in the AMR-WB speech codec. Both mechanisms are designed in a way that recursively enhances the performance of the TIE algorithm. At the end of this work, this proposal is experimentally validated as a superior search algorithm relative to a conventional TIE, a multiple TIE (MTIE, and an equal-average equal-variance equal-norm nearest neighbor search (EEENNS approach. With a full search algorithm as a benchmark for search load comparison, this work provides a search load reduction above 77%, a figure far beyond 36% in the TIE, 49% in the MTIE, and 68% in the EEENNS approach.

Inferring imagined speech using EEG signals: a new approach using Riemannian manifold features

Science.gov (United States)

Nguyen, Chuong H.; Karavas, George K.; Artemiadis, Panagiotis

2018-02-01

Objective. In this paper, we investigate the suitability of imagined speech for brain-computer interface (BCI) applications. Approach. A novel method based on covariance matrix descriptors, which lie in Riemannian manifold, and the relevance vector machines classifier is proposed. The method is applied on electroencephalographic (EEG) signals and tested in multiple subjects. Main results. The method is shown to outperform other approaches in the field with respect to accuracy and robustness. The algorithm is validated on various categories of speech, such as imagined pronunciation of vowels, short words and long words. The classification accuracy of our methodology is in all cases significantly above chance level, reaching a maximum of 70% for cases where we classify three words and 95% for cases of two words. Significance. The results reveal certain aspects that may affect the success of speech imagery classification from EEG signals, such as sound, meaning and word complexity. This can potentially extend the capability of utilizing speech imagery in future BCI applications. The dataset of speech imagery collected from total 15 subjects is also published.
Self-Repair and Language Selection in Bilingual Speech Processing

Directory of Open Access Journals (Sweden)

Inga Hennecke

2013-07-01

Full Text Available In psycholinguistic research the exact level of language selection in bilingual lexical access is still controversial and current models of bilingual speech production offer conflicting statements about the mechanisms and location of language selection. This paper aims to provide a corpus analysis of self-repair mechanisms in code-switching contexts of highly fluent bilingual speakers in order to gain further insights into bilingual speech production. The present paper follows the assumptions of the Selection by Proficiency model, which claims that language proficiency and lexical robustness determine the mechanism and level of language selection. In accordance with this hypothesis, highly fluent bilinguals select languages at a prelexical level, which should influence the occurrence of self-repairs in bilingual speech. A corpus of natural speech data of highly fluent and balanced bilingual French-English speakers of the Canadian French variety Franco-Manitoban serves as the basis for a detailed analysis of different self-repair mechanisms in code-switching environments. Although the speech data contain a large amount of code-switching, results reveal that only a few speech errors and self-repairs occur in direct code-switching environments. A detailed analysis of the respective starting point of code-switching and the different repair mechanisms supports the hypothesis that highly proficient bilinguals do not select languages at the lexical level.Le niveau exact de la sélection des langues lors de l’accès lexical chez le bilingue reste une question controversée dans la recherche psycholinguistique. Les modèles actuels de la production verbale bilingue proposent des arguments contradictoires concernant le mécanisme et le lieu de la sélection des langues. La présente recherche vise à fournir une analyse de corpus mettant l’accent sur les mécanismes d’autoréparation dans le contexte d’alternance codique dans la production verbale
Fidelity of Automatic Speech Processing for Adult and Child Talker Classifications.

Directory of Open Access Journals (Sweden)

Mark VanDam

Full Text Available Automatic speech processing (ASP has recently been applied to very large datasets of naturalistically collected, daylong recordings of child speech via an audio recorder worn by young children. The system developed by the LENA Research Foundation analyzes children's speech for research and clinical purposes, with special focus on of identifying and tagging family speech dynamics and the at-home acoustic environment from the auditory perspective of the child. A primary issue for researchers, clinicians, and families using the Language ENvironment Analysis (LENA system is to what degree the segment labels are valid. This classification study evaluates the performance of the computer ASP output against 23 trained human judges who made about 53,000 judgements of classification of segments tagged by the LENA ASP. Results indicate performance consistent with modern ASP such as those using HMM methods, with acoustic characteristics of fundamental frequency and segment duration most important for both human and machine classifications. Results are likely to be important for interpreting and improving ASP output.
Bandwidth extension of speech using perceptual criteria

CERN Document Server

Berisha, Visar; Liss, Julie

2013-01-01

Bandwidth extension of speech is used in the International Telecommunication Union G.729.1 standard in which the narrowband bitstream is combined with quantized high-band parameters. Although this system produces high-quality wideband speech, the additional bits used to represent the high band can be further reduced. In addition to the algorithm used in the G.729.1 standard, bandwidth extension methods based on spectrum prediction have also been proposed. Although these algorithms do not require additional bits, they perform poorly when the correlation between the low and the high band is weak. In this book, two wideband speech coding algorithms that rely on bandwidth extension are developed. The algorithms operate as wrappers around existing narrowband compression schemes. More specifically, in these algorithms, the low band is encoded using an existing toll-quality narrowband system, whereas the high band is generated using the proposed extension techniques. The first method relies only on transmitted high-...
Development an Automatic Speech to Facial Animation Conversion for Improve Deaf Lives

Directory of Open Access Journals (Sweden)

S. Hamidreza Kasaei

2011-05-01

Full Text Available In this paper, we propose design and initial implementation of a robust system which can automatically translates voice into text and text to sign language animations. Sign Language
Translation Systems could significantly improve deaf lives especially in communications, exchange of information and employment of machine for translation conversations from one language to another has. Therefore, considering these points, it seems necessary to study the speech recognition. Usually, the voice recognition algorithms address three major challenges. The first is extracting feature form speech and the second is when limited sound gallery are available for recognition, and the final challenge is to improve speaker dependent to speaker independent voice recognition. Extracting feature form speech is an important stage in our method. Different procedures are available for extracting feature form speech. One of the commonest of which used in speech
recognition systems is Mel-Frequency Cepstral Coefficients (MFCCs. The algorithm starts with preprocessing and signal conditioning. Next extracting feature form speech using Cepstral coefficients will be done. Then the result of this process sends to segmentation part. Finally recognition part recognizes the words and then converting word recognized to facial animation. The project is still in progress and some new interesting methods are described in the current report.
Inner Speech's Relationship With Overt Speech in Poststroke Aphasia.

Science.gov (United States)

Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A

2017-09-18

Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.
Real time implementation of a linear predictive coding algorithm on digital signal processor DSP32C

International Nuclear Information System (INIS)

Sheikh, N.M.; Usman, S.R.; Fatima, S.

2002-01-01

Pulse Code Modulation (PCM) has been widely used in speech coding. However, due to its high bit rate. PCM has severe limitations in application where high spectral efficiency is desired, for example, in mobile communication, CD quality broadcasting system etc. These limitation have motivated research in bit rate reduction techniques. Linear predictive coding (LPC) is one of the most powerful complex techniques for bit rate reduction. With the introduction of powerful digital signal processors (DSP) it is possible to implement the complex LPC algorithm in real time. In this paper we present a real time implementation of the LPC algorithm on AT and T's DSP32C at a sampling frequency of 8192 HZ. Application of the LPC algorithm on two speech signals is discussed. Using this implementation , a bit rate reduction of 1:3 is achieved for better than tool quality speech, while a reduction of 1.16 is possible for speech quality required in military applications. (author)
Event-related potential evidence of form and meaning coding during online speech recognition.

Science.gov (United States)

Friedrich, Claudia K; Kotz, Sonja A

2007-04-01

It is still a matter of debate whether initial analysis of speech is independent of contextual influences or whether meaning can modulate word activation directly. Utilizing event-related brain potentials (ERPs), we tested the neural correlates of speech recognition by presenting sentences that ended with incomplete words, such as To light up the dark she needed her can-. Immediately following the incomplete words, subjects saw visual words that (i) matched form and meaning, such as candle; (ii) matched meaning but not form, such as lantern; (iii) matched form but not meaning, such as candy; or (iv) mismatched form and meaning, such as number. We report ERP evidence for two distinct cohorts of lexical tokens: (a) a left-lateralized effect, the P250, differentiates form-matching words (i, iii) and form-mismatching words (ii, iv); (b) a right-lateralized effect, the P220, differentiates words that match in form and/or meaning (i, ii, iii) from mismatching words (iv). Lastly, fully matching words (i) reduce the amplitude of the N400. These results accommodate bottom-up and top-down accounts of human speech recognition. They suggest that neural representations of form and meaning are activated independently early on and are integrated at a later stage during sentence comprehension.
Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies.

Science.gov (United States)

Mitra, Vikramjit; Nam, Hosung; Espy-Wilson, Carol Y; Saltzman, Elliot; Goldstein, Louis

2010-09-13

Many different studies have claimed that articulatory information can be used to improve the performance of automatic speech recognition systems. Unfortunately, such articulatory information is not readily available in typical speaker-listener situations. Consequently, such information has to be estimated from the acoustic signal in a process which is usually termed "speech-inversion." This study aims to propose and compare various machine learning strategies for speech inversion: Trajectory mixture density networks (TMDNs), feedforward artificial neural networks (FF-ANN), support vector regression (SVR), autoregressive artificial neural network (AR-ANN), and distal supervised learning (DSL). Further, using a database generated by the Haskins Laboratories speech production model, we test the claim that information regarding constrictions produced by the distinct organs of the vocal tract (vocal tract variables) is superior to flesh-point information (articulatory pellet trajectories) for the inversion process.
Speech Enhancement by Classification of Noisy Signals Decomposed Using NMF and Wiener Filtering

DEFF Research Database (Denmark)

Fakhry, Mahmoud; Poorjam, Amir Hossein; Christensen, Mads Græsbøll

2018-01-01

are identified in the cepstral domain using the trained classifier. We apply unsupervised NMF followed by Wiener filtering for the decomposition, and use a support vector machine trained on the mel-frequency cepstral coefficients of the parts of training speech and noise signals for the classification...
Musician advantage for speech-on-speech perception

NARCIS (Netherlands)

Başkent, Deniz; Gaudrain, Etienne

Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level
Towards a universal code formatter through machine learning

NARCIS (Netherlands)

Parr, T. (Terence); J.J. Vinju (Jurgen)

2016-01-01

textabstractThere are many declarative frameworks that allow us to implement code formatters relatively easily for any specific language, but constructing them is cumbersome. The first problem is that "everybody" wants to format their code differently, leading to either many formatter variants or a
Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

NARCIS (Netherlands)

Ahmadi, S.; Ahadi, S.M.; Cranen, B.; Boves, L.W.J.

2014-01-01

The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equivalent of a sequence of short-time power spectra, mainly to simplify the computation of the
Machine Learning an algorithmic perspective

CERN Document Server

Marsland, Stephen

2009-01-01

Traditional books on machine learning can be divided into two groups - those aimed at advanced undergraduates or early postgraduates with reasonable mathematical knowledge and those that are primers on how to code algorithms. The field is ready for a text that not only demonstrates how to use the algorithms that make up machine learning methods, but also provides the background needed to understand how and why these algorithms work. Machine Learning: An Algorithmic Perspective is that text.Theory Backed up by Practical ExamplesThe book covers neural networks, graphical models, reinforcement le
Representing high-dimensional data to intelligent prostheses and other wearable assistive robots: A first comparison of tile coding and selective Kanerva coding.

Science.gov (United States)

Travnik, Jaden B; Pilarski, Patrick M

2017-07-01

Prosthetic devices have advanced in their capabilities and in the number and type of sensors included in their design. As the space of sensorimotor data available to a conventional or machine learning prosthetic control system increases in dimensionality and complexity, it becomes increasingly important that this data be represented in a useful and computationally efficient way. Well structured sensory data allows prosthetic control systems to make informed, appropriate control decisions. In this study, we explore the impact that increased sensorimotor information has on current machine learning prosthetic control approaches. Specifically, we examine the effect that high-dimensional sensory data has on the computation time and prediction performance of a true-online temporal-difference learning prediction method as embedded within a resource-limited upper-limb prosthesis control system. We present results comparing tile coding, the dominant linear representation for real-time prosthetic machine learning, with a newly proposed modification to Kanerva coding that we call selective Kanerva coding. In addition to showing promising results for selective Kanerva coding, our results confirm potential limitations to tile coding as the number of sensory input dimensions increases. To our knowledge, this study is the first to explicitly examine representations for realtime machine learning prosthetic devices in general terms. This work therefore provides an important step towards forming an efficient prosthesis-eye view of the world, wherein prompt and accurate representations of high-dimensional data may be provided to machine learning control systems within artificial limbs and other assistive rehabilitation technologies.
Code-Expanded Random Access for Machine-Type Communications

DEFF Research Database (Denmark)

Kiilerich Pratas, Nuno; Thomsen, Henning; Stefanovic, Cedomir

2012-01-01

Abstract—The random access methods used for support of machine-type communications (MTC) in current cellular standards are derivatives of traditional framed slotted ALOHA and therefore do not support high user loads efficiently. Motivated by the random access method employed in LTE, we propose...
Modeling speech imitation and ecological learning of auditory-motor maps

Directory of Open Access Journals (Sweden)

Claudia eCanevari

2013-06-01

Full Text Available Classical models of speech consider an antero-posterior distinction between perceptive and productive functions. However, the selective alteration of neural activity in speech motor centers, via transcranial magnetic stimulation, was shown to affect speech discrimination. On the automatic speech recognition (ASR side, the recognition systems have classically relied solely on acoustic data, achieving rather good performance in optimal listening conditions. The main limitations of current ASR are mainly evident in the realistic use of such systems. These limitations can be partly reduced by using normalization strategies that minimize inter-speaker variability by either explicitly removing speakers’ peculiarities or adapting different speakers to a reference model. In this paper we aim at modeling a motor-based imitation learning mechanism in ASR. We tested the utility of a speaker normalization strategy that uses motor representations of speech and compare it with strategies that ignore the motor domain. Specifically, we first trained a regressor through state-of-the-art machine learning techniques to build an auditory-motor mapping, in a sense mimicking a human learner that tries to reproduce utterances produced by other speakers. This auditory-motor mapping maps the speech acoustics of a speaker into the motor plans of a reference speaker. Since, during recognition, only speech acoustics are available, the mapping is necessary to recover motor information. Subsequently, in a phone classification task, we tested the system on either one of the speakers that was used during training or a new one. Results show that in both cases the motor-based speaker normalization strategy almost always outperforms all other strategies where only acoustics is taken into account.
Speech endpoint detection with non-language speech sounds for generic speech processing applications

Science.gov (United States)

McClain, Matthew; Romanowski, Brian

2009-05-01

Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
Quantum neural network based machine translator for Hindi to English.

Science.gov (United States)

Narayan, Ravi; Singh, V P; Chakraverty, S

2014-01-01

This paper presents the machine learning based machine translation system for Hindi to English, which learns the semantically correct corpus. The quantum neural based pattern recognizer is used to recognize and learn the pattern of corpus, using the information of part of speech of individual word in the corpus, like a human. The system performs the machine translation using its knowledge gained during the learning by inputting the pair of sentences of Devnagri-Hindi and English. To analyze the effectiveness of the proposed approach, 2600 sentences have been evaluated during simulation and evaluation. The accuracy achieved on BLEU score is 0.7502, on NIST score is 6.5773, on ROUGE-L score is 0.9233, and on METEOR score is 0.5456, which is significantly higher in comparison with Google Translation and Bing Translation for Hindi to English Machine Translation.
Digitised evaluation of speech intelligibility using vowels in maxillectomy patients.

Science.gov (United States)

Sumita, Y I; Hattori, M; Murase, M; Elbashti, M E; Taniguchi, H

2018-03-01

Among the functional disabilities that patients face following maxillectomy, speech impairment is a major factor influencing quality of life. Proper rehabilitation of speech, which may include prosthodontic and surgical treatments and speech therapy, requires accurate evaluation of speech intelligibility (SI). A simple, less time-consuming yet accurate evaluation is desirable both for maxillectomy patients and the various clinicians providing maxillofacial treatment. This study sought to determine the utility of digital acoustic analysis of vowels for the prediction of SI in maxillectomy patients, based on a comprehensive understanding of speech production in the vocal tract of maxillectomy patients and its perception. Speech samples were collected from 33 male maxillectomy patients (mean age 57.4 years) in two conditions, without and with a maxillofacial prosthesis, and formant data for the vowels /a/,/e/,/i/,/o/, and /u/ were calculated based on linear predictive coding. The frequency range of formant 2 (F2) was determined by differences between the minimum and maximum frequency. An SI test was also conducted to reveal the relationship between SI score and F2 range. Statistical analyses were applied. F2 range and SI score were significantly different between the two conditions without and with a prosthesis (both P maxillectomy. © 2017 John Wiley & Sons Ltd.

SwingStates: adding state machines to the swing toolkit

OpenAIRE

Appert , Caroline; Beaudouin-Lafon , Michel

2006-01-01

International audience; This article describes SwingStates, a library that adds state machines to the Java Swing user interface toolkit. Unlike traditional approaches, which use callbacks or listeners to define interaction, state machines provide a powerful control structure and localize all of the interaction code in one place. SwingStates takes advantage of Java's inner classes, providing programmers with a natural syntax and making it easier to follow and debug the resulting code. SwingSta...
FCG: a code generator for lazy functional languages

NARCIS (Netherlands)

Kastens, U.; Langendoen, K.G.; Hartel, Pieter H.; Pfahler, P.

1992-01-01

The FCGcode generator produces portable code that supports efficient two-space copying garbage collection. The code generator transforms the output of the FAST compiler front end into an abstract machine code. This code explicitly uses a call stack, which is accessible to the garbage collector. In
Speech parts as Poisson processes.

Science.gov (United States)

Badalamenti, A F

2001-09-01

This paper presents evidence that six of the seven parts of speech occur in written text as Poisson processes, simple or recurring. The six major parts are nouns, verbs, adjectives, adverbs, prepositions, and conjunctions, with the interjection occurring too infrequently to support a model. The data consist of more than the first 5000 words of works by four major authors coded to label the parts of speech, as well as periods (sentence terminators). Sentence length is measured via the period and found to be normally distributed with no stochastic model identified for its occurrence. The models for all six speech parts but the noun significantly distinguish some pairs of authors and likewise for the joint use of all words types. Any one author is significantly distinguished from any other by at least one word type and sentence length very significantly distinguishes each from all others. The variety of word type use, measured by Shannon entropy, builds to about 90% of its maximum possible value. The rate constants for nouns are close to the fractions of maximum entropy achieved. This finding together with the stochastic models and the relations among them suggest that the noun may be a primitive organizer of written text.
Music and Speech Perception in Children Using Sung Speech.

Science.gov (United States)

Nie, Yingjiu; Galvin, John J; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie

2018-01-01

This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.
Oscillatory Brain Responses Reflect Anticipation during Comprehension of Speech Acts in Spoken Dialog

Directory of Open Access Journals (Sweden)

Rosa S. Gisladottir

2018-02-01

Full Text Available Everyday conversation requires listeners to quickly recognize verbal actions, so-called speech acts, from the underspecified linguistic code and prepare a relevant response within the tight time constraints of turn-taking. The goal of this study was to determine the time-course of speech act recognition by investigating oscillatory EEG activity during comprehension of spoken dialog. Participants listened to short, spoken dialogs with target utterances that delivered three distinct speech acts (Answers, Declinations, Pre-offers. The targets were identical across conditions at lexico-syntactic and phonetic/prosodic levels but differed in the pragmatic interpretation of the speech act performed. Speech act comprehension was associated with reduced power in the alpha/beta bands just prior to Declination speech acts, relative to Answers and Pre-offers. In addition, we observed reduced power in the theta band during the beginning of Declinations, relative to Answers. Based on the role of alpha and beta desynchronization in anticipatory processes, the results are taken to indicate that anticipation plays a role in speech act recognition. Anticipation of speech acts could be critical for efficient turn-taking, allowing interactants to quickly recognize speech acts and respond within the tight time frame characteristic of conversation. The results show that anticipatory processes can be triggered by the characteristics of the interaction, including the speech act type.
Simulation of linac operation using the tracking code L

International Nuclear Information System (INIS)

Drevlak, M.; Timm, M.; Weiland, T.

1996-01-01

In linear accelerators, misalignments of the machine elements can cause considerable emittance growth due to wake fields, dispersion and other effects. Hence, tight limits are imposed on machine tolerances, design parameters and methods of machine operation. In order to simulate the beam dynamics in linacs, the tracking code L has been developed. Including both single- and multi-bunch effects, the behaviour of the beam in the machine can be simulated and adjustments on parameters of the machine elements up to complete correction techniques and operation procedures can be applied. Utilization of the program is facilitated by a graphical user interface. In this paper we will give an overview over the capabilities of this code and demonstrate its efficiency at attacking the problems associated with large linear accelerators. (author)
Apraxia of Speech

Science.gov (United States)

... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...
Machine learning and medical imaging

CERN Document Server

Shen, Dinggang; Sabuncu, Mert

2016-01-01

Machine Learning and Medical Imaging presents state-of- the-art machine learning methods in medical image analysis. It first summarizes cutting-edge machine learning algorithms in medical imaging, including not only classical probabilistic modeling and learning methods, but also recent breakthroughs in deep learning, sparse representation/coding, and big data hashing. In the second part leading research groups around the world present a wide spectrum of machine learning methods with application to different medical imaging modalities, clinical domains, and organs. The biomedical imaging modalities include ultrasound, magnetic resonance imaging (MRI), computed tomography (CT), histology, and microscopy images. The targeted organs span the lung, liver, brain, and prostate, while there is also a treatment of examining genetic associations. Machine Learning and Medical Imaging is an ideal reference for medical imaging researchers, industry scientists and engineers, advanced undergraduate and graduate students, a...
Age-related changes to spectral voice characteristics affect judgments of prosodic, segmental, and talker attributes for child and adult speech

Science.gov (United States)

Dilley, Laura C.; Wieland, Elizabeth A.; Gamache, Jessica L.; McAuley, J. Devin; Redford, Melissa A.

2013-01-01

Purpose As children mature, changes in voice spectral characteristics covary with changes in speech, language, and behavior. Spectral characteristics were manipulated to alter the perceived ages of talkers’ voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Method Speech was modified by lowering formants and fundamental frequency, for 5-year-old children’s utterances, or raising them, for adult caregivers’ utterances. Next, participants differing in awareness of the manipulation (Exp. 1a) or amount of speech-language training (Exp. 1b) made judgments of prosodic, segmental, and talker attributes. Exp. 2 investigated the effects of spectral modification on intelligibility. Finally, in Exp. 3 trained analysts used formal prosody coding to assess prosodic characteristics of spectrally-modified and unmodified speech. Results Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Conclusions Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work. PMID:23275414
Legitimization Arguments for Procedural Reforms: a semio-linguistic analysis of statement of reasons from the Civil Procedure Code of 1939 and of the draft bill of the New Civil Procedure Code of 2010.

Directory of Open Access Journals (Sweden)

Matheus Guarino Sant’Anna Lima de Almeida

2016-08-01

Full Text Available This research aims to analyze the arguments of legitimization that were used in the reform of Brazilian procedural legal codes, by comparing the texts of the statement of reasons of the Civil Procedure Code of 1939 and the draft bill of the New Civil Procedure Code. We consider these codes as milestones: the Civil Procedure Code of 1939 was the first one with a national scope; the draft bill of the New Civil Procedure Code was the first one produced during a democratic period. Our goal is to search for similarities and contrasts between the legitimization arguments used in each historical and political period, by asking if they were only arguments to bestow legitimacy to such reforms. We decided to use the methodological tools of sociolinguistic analysis of speech developed by Patrick Charaudeau in his analyses of political speech in order to elucidate how the uses of language and elements of meaning in the speech construction provide justification for the concept of procedure, in both 1939 and 2010. As a result, we conclude that the process of drafting the CPC of 1939 and the New CPC, even if they are very distant in terms of political and historical contexts, they are also very close in their rhetorical construction and their attempt to find justification and adherence. On balance, some of the differences depend on the vocabulary used when the codes were developed, their justification and the need for change.
Can bilingual two-year-olds code-switch?

Science.gov (United States)

Lanza, E

1992-10-01

Sociolinguists have investigated language mixing as code-switching in the speech of bilingual children three years old and older. Language mixing by bilingual two-year-olds, however, has generally been interpreted in the child language literature as a sign of the child's lack of language differentiation. The present study applies perspectives from sociolinguistics to investigate the language mixing of a bilingual two-year-old acquiring Norwegian and English simultaneously in Norway. Monthly recordings of the child's spontaneous speech in interactions with her parents were made from the age of 2;0 to 2;7. An investigation into the formal aspects of the child's mixing and the context of the mixing reveals that she does differentiate her language use in contextually sensitive ways, hence that she can code-switch. This investigation stresses the need to examine more carefully the roles of dominance and context in the language mixing of young bilingual children.
An Automatic Instruction-Level Parallelization of Machine Code

Directory of Open Access Journals (Sweden)

MARINKOVIC, V.

2018-02-01

Full Text Available Prevailing multicores and novel manycores have made a great challenge of modern day - parallelization of embedded software that is still written as sequential. In this paper, automatic code parallelization is considered, focusing on developing a parallelization tool at the binary level as well as on the validation of this approach. The novel instruction-level parallelization algorithm for assembly code which uses the register names after SSA to find independent blocks of code and then to schedule independent blocks using METIS to achieve good load balance is developed. The sequential consistency is verified and the validation is done by measuring the program execution time on the target architecture. Great speedup, taken as the performance measure in the validation process, and optimal load balancing are achieved for multicore RISC processors with 2 to 16 cores (e.g. MIPS, MicroBlaze, etc.. In particular, for 16 cores, the average speedup is 7.92x, while in some cases it reaches 14x. An approach to automatic parallelization provided by this paper is useful to researchers and developers in the area of parallelization as the basis for further optimizations, as the back-end of a compiler, or as the code parallelization tool for an embedded system.
Speech reconstruction using a deep partially supervised neural network.

Science.gov (United States)

McLoughlin, Ian; Li, Jingjie; Song, Yan; Sharifzadeh, Hamid R

2017-08-01

Statistical speech reconstruction for larynx-related dysphonia has achieved good performance using Gaussian mixture models and, more recently, restricted Boltzmann machine arrays; however, deep neural network (DNN)-based systems have been hampered by the limited amount of training data available from individual voice-loss patients. The authors propose a novel DNN structure that allows a partially supervised training approach on spectral features from smaller data sets, yielding very good results compared with the current state-of-the-art.
Common neural substrates support speech and non-speech vocal tract gestures.

Science.gov (United States)

Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M J; Poletto, Christopher J; Ludlow, Christy L

2009-08-01

The issue of whether speech is supported by the same neural substrates as non-speech vocal tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, was compared to the production of speech syllables without meaning. Brain activation related to overt production was captured with BOLD fMRI using a sparse sampling design for both conditions. Speech and non-speech were compared using voxel-wise whole brain analyses, and ROI analyses focused on frontal and temporoparietal structures previously reported to support speech production. Results showed substantial activation overlap between speech and non-speech function in regions. Although non-speech gesture production showed greater extent and amplitude of activation in the regions examined, both speech and non-speech showed comparable left laterality in activation for both target perception and production. These findings posit a more general role of the previously proposed "auditory dorsal stream" in the left hemisphere--to support the production of vocal tract gestures that are not limited to speech processing.
Auditory information coding by modeled cochlear nucleus neurons.

Science.gov (United States)

Wang, Huan; Isik, Michael; Borst, Alexander; Hemmert, Werner

2011-06-01

In this paper we use information theory to quantify the information in the output spike trains of modeled cochlear nucleus globular bushy cells (GBCs). GBCs are part of the sound localization pathway. They are known for their precise temporal processing, and they code amplitude modulations with high fidelity. Here we investigated the information transmission for a natural sound, a recorded vowel. We conclude that the maximum information transmission rate for a single neuron was close to 1,050 bits/s, which corresponds to a value of approximately 5.8 bits per spike. For quasi-periodic signals like voiced speech, the transmitted information saturated as word duration increased. In general, approximately 80% of the available information from the spike trains was transmitted within about 20 ms. Transmitted information for speech signals concentrated around formant frequency regions. The efficiency of neural coding was above 60% up to the highest temporal resolution we investigated (20 μs). The increase in transmitted information to that precision indicates that these neurons are able to code information with extremely high fidelity, which is required for sound localization. On the other hand, only 20% of the information was captured when the temporal resolution was reduced to 4 ms. As the temporal resolution of most speech recognition systems is limited to less than 10 ms, this massive information loss might be one of the reasons which are responsible for the lack of noise robustness of these systems.
Quantification and Systematic Characterization of Stuttering-Like Disfluencies in Acquired Apraxia of Speech.

Science.gov (United States)

Bailey, Dallin J; Blomgren, Michael; DeLong, Catharine; Berggren, Kiera; Wambaugh, Julie L

2017-06-22

The purpose of this article is to quantify and describe stuttering-like disfluencies in speakers with acquired apraxia of speech (AOS), utilizing the Lidcombe Behavioural Data Language (LBDL). Additional purposes include measuring test-retest reliability and examining the effect of speech sample type on disfluency rates. Two types of speech samples were elicited from 20 persons with AOS and aphasia: repetition of mono- and multisyllabic words from a protocol for assessing AOS (Duffy, 2013), and connected speech tasks (Nicholas & Brookshire, 1993). Sampling was repeated at 1 and 4 weeks following initial sampling. Stuttering-like disfluencies were coded using the LBDL, which is a taxonomy that focuses on motoric aspects of stuttering. Disfluency rates ranged from 0% to 13.1% for the connected speech task and from 0% to 17% for the word repetition task. There was no significant effect of speech sampling time on disfluency rate in the connected speech task, but there was a significant effect of time for the word repetition task. There was no significant effect of speech sample type. Speakers demonstrated both major types of stuttering-like disfluencies as categorized by the LBDL (fixed postures and repeated movements). Connected speech samples yielded more reliable tallies over repeated measurements. Suggestions are made for modifying the LBDL for use in AOS in order to further add to systematic descriptions of motoric disfluencies in this disorder.
Blind estimation of the number of speech source in reverberant multisource scenarios based on binaural signals

DEFF Research Database (Denmark)

May, Tobias; van de Par, Steven

2012-01-01

In this paper we present a new approach for estimating the number of active speech sources in the presence of interfering noise sources and reverberation. First, a binaural front-end is used to detect the spatial positions of all active sound sources, resulting in a binary mask for each candidate...... on a support vector machine (SVM) classifier. A systematic analysis shows that the proposed algorithm is able to blindly determine the number and the corresponding spatial positions of speech sources in multisource scenarios and generalizes well to unknown acoustic conditions...
PRAGMATIC FOUNDATIONS OF COMMUNICATION CODE FAILURE IN PRESENT-DAY DISCOURSE

Directory of Open Access Journals (Sweden)

Pochtar Elena Ivanovna

2014-09-01

Full Text Available The article considers the issue of communicative regulations within the discourse frames as viewed through the fact of existing interconnection between speech arrangement modes and speech functional destinations; it analyzes the basic maxims of the P. Grice's Cooperation principle, initially formulated from the speaker's viewpoint, and finds out its relevance for the listeners, thus providing identity of speech behavior principles as shared by both participants in the communication process. Comparing each of the cooperative maxims with the communicative parameters of the present-day discourse the author discovers in it frequent violations of the Cooperation principles suggested by P. Grice and concludes that this system of speech relation fails in cases of discourse realizing an effective function. The article observes that the traditional communicative code is being pressed out as the basic regulator of conversation and goes through some pragmatic changes resulting in communication code failure in present day discourse, some other means of securing the perlocutionary effect in affective discourse are introduced by the author, the politeness principle and the principle of style in particular. Considering the basic mechanisms of these aestheticethical principles in application to the discourse of advertising the author finds proofs to them being functionally adequate and communicatively effective.
A qualitative analysis of hate speech reported to the Romanian National Council for Combating Discrimination (2003‑2015)

OpenAIRE

Adriana Iordache

2015-01-01

The article analyzes the specificities of Romanian hate speech over a period of twelve years through a qualitative analysis of 384 Decisions of the National Council for Combating Discrimination. The study employs a coding methodology which allows one to separate decisions according to the group that was the victim of hate speech. The article finds that stereotypes employed are similar to those encountered in the international literature. The main target of hate speech is the Roma, who are ...
Auditory analysis for speech recognition based on physiological models

Science.gov (United States)

Jeon, Woojay; Juang, Biing-Hwang

2004-05-01

To address the limitations of traditional cepstrum or LPC based front-end processing methods for automatic speech recognition, more elaborate methods based on physiological models of the human auditory system may be used to achieve more robust speech recognition in adverse environments. For this purpose, a modified version of a model of the primary auditory cortex featuring a three dimensional mapping of auditory spectra [Wang and Shamma, IEEE Trans. Speech Audio Process. 3, 382-395 (1995)] is adopted and investigated for its use as an improved front-end processing method. The study is conducted in two ways: first, by relating the model's redundant representation to traditional spectral representations and showing that the former not only encompasses information provided by the latter, but also reveals more relevant information that makes it superior in describing the identifying features of speech signals; and second, by observing the statistical features of the representation for various classes of sound to show how different identifying features manifest themselves as specific patterns on the cortical map, thereby becoming a place-coded data set on which detection theory could be applied to simulate auditory perception and cognition.

A study on nonlinear characteristics of speech sound with reference to some languages of North East region

Science.gov (United States)

Dutta, Rashmi

INTRODUCTION : Speech science is, in fact, a sub-discipline of the Nonlinear Dynamical System [2,104 ]. There are two different types of Dynamical System. A Continuous Dynamical System may be defined for the continuous time case, by the equation: x = F (x), where x is a vector of length d, defining a point in a d- dimensional space, F is some function (linear or nonlinear) operating on x, and x is the time derivative of x. This system is deterministic, in that it is possible to completely specify its evolution or flow of trajectories in the d- dimensional space, given the initial starting conditions. A Discrete Dynamical System can be defined as a map [by the process of literations]: Xn+1 = G ( Xn ), where Xn is again a d- length vector at time step n, and G is an operator function. Given an initial state, X0, it is possible to calculate the value of xn for any n > 0. Speech has evolved as a primary form of communication between humans, i.e. speech and hearing are the man's most used means of communication [104, 114]. Analysis of human speech has been a goal of Research during the last few decades [105, 108]. With the rapid development of information technology (IT), the human-machine communication, using natural speech, has received wide attention from both academic and business communities. One highly quantitative approach of characterizing the communications potential of speech is in terms of information theory ideas as introduced by Shannon [C.E. Shannon, "A Mathematical Theory of Communication," Bell System Tech journal, Vol 27, pp623- 656, October, 1968]. According to information theory, speech can be represented in terms of its message content, or information. An alternative way of characterizing speech is in terms of the signal carrying the message information, i.e., the acoustic waveform. Although information theoretic ideas have played a major role in sophisticated communications systems, it is the speech representation based on the waveform, or some
A machine-hearing system exploiting head movements for binaural sound localisation in reverberant conditions

DEFF Research Database (Denmark)

May, Tobias; Ma, Ning; Wierstorf, Hagen

2015-01-01

This paper is concerned with machine localisation of multiple active speech sources in reverberant environments using two (binaural) microphones. Such conditions typically present a problem for ‘classical’ binaural models. Inspired by the human ability to utilise head movements, the current study...
Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals.

Science.gov (United States)

Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

2015-01-01

In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature.
Speech Production and Speech Discrimination by Hearing-Impaired Children.

Science.gov (United States)

Novelli-Olmstead, Tina; Ling, Daniel

1984-01-01

Seven hearing impaired children (five to seven years old) assigned to the Speakers group made highly significant gains in speech production and auditory discrimination of speech, while Listeners made only slight speech production gains and no gains in auditory discrimination. Combined speech and auditory training was more effective than auditory…
I2D: code for conversion of ISOTXS structured data to DTF and ANISN structured tables

International Nuclear Information System (INIS)

Resnik, W.M. II.

1977-06-01

The I2D code converts neutron cross-section data written in the standard interface file format called ISOTXS to a matrix structured format commonly called DTF tables. Several BCD and binary output options are available including FIDO (ANISN) format. The I2D code adheres to the guidelines established by the Committee on Computer Code Coordination for standardized code development. Since some machine dependency is inherent regardless of the degree of standardization, provisions have been made in the I2D code for easy implementation on either short-word machines (IBM) or on long-word machines (CDC). 3 figures, 5 tables
Keyboard with Universal Communication Protocol Applied to CNC Machine

Directory of Open Access Journals (Sweden)

Mejía-Ugalde Mario

2014-04-01

Full Text Available This article describes the use of a universal communication protocol for industrial keyboard based microcontroller applied to computer numerically controlled (CNC machine. The main difference among the keyboard manufacturers is that each manufacturer has its own programming of source code, producing a different communication protocol, generating an improper interpretation of the function established. The above results in commercial industrial keyboards which are expensive and incompatible in their connection with different machines. In the present work the protocol allows to connect the designed universal keyboard and the standard keyboard of the PC at the same time, it is compatible with all the computers through the communications USB, AT or PS/2, to use in CNC machines, with extension to other machines such as robots, blowing, injection molding machines and others. The advantages of this design include its easy reprogramming, decreased costs, manipulation of various machine functions and easy expansion of entry and exit signals. The results obtained of performance tests were satisfactory, because each key has the programmed and reprogrammed facility in different ways, generating codes for different functions, depending on the application where it is required to be used.
Wind Noise Reduction using Non-negative Sparse Coding

DEFF Research Database (Denmark)

Schmidt, Mikkel N.; Larsen, Jan; Hsiao, Fu-Tien

2007-01-01

We introduce a new speaker independent method for reducing wind noise in single-channel recordings of noisy speech. The method is based on non-negative sparse coding and relies on a wind noise dictionary which is estimated from an isolated noise recording. We estimate the parameters of the model ...... and discuss their sensitivity. We then compare the algorithm with the classical spectral subtraction method and the Qualcomm-ICSI-OGI noise reduction method. We optimize the sound quality in terms of signal-to-noise ratio and provide results on a noisy speech recognition task....
Stuttering Frequency, Speech Rate, Speech Naturalness, and Speech Effort During the Production of Voluntary Stuttering.

Science.gov (United States)

Davidow, Jason H; Grossman, Heather L; Edge, Robin L

2018-05-01

Voluntary stuttering techniques involve persons who stutter purposefully interjecting disfluencies into their speech. Little research has been conducted on the impact of these techniques on the speech pattern of persons who stutter. The present study examined whether changes in the frequency of voluntary stuttering accompanied changes in stuttering frequency, articulation rate, speech naturalness, and speech effort. In total, 12 persons who stutter aged 16-34 years participated. Participants read four 300-syllable passages during a control condition, and three voluntary stuttering conditions that involved attempting to produce purposeful, tension-free repetitions of initial sounds or syllables of a word for two or more repetitions (i.e., bouncing). The three voluntary stuttering conditions included bouncing on 5%, 10%, and 15% of syllables read. Friedman tests and follow-up Wilcoxon signed ranks tests were conducted for the statistical analyses. Stuttering frequency, articulation rate, and speech naturalness were significantly different between the voluntary stuttering conditions. Speech effort did not differ between the voluntary stuttering conditions. Stuttering frequency was significantly lower during the three voluntary stuttering conditions compared to the control condition, and speech effort was significantly lower during two of the three voluntary stuttering conditions compared to the control condition. Due to changes in articulation rate across the voluntary stuttering conditions, it is difficult to conclude, as has been suggested previously, that voluntary stuttering is the reason for stuttering reductions found when using voluntary stuttering techniques. Additionally, future investigations should examine different types of voluntary stuttering over an extended period of time to determine their impact on stuttering frequency, speech rate, speech naturalness, and speech effort.
Comprehension of synthetic speech and digitized natural speech by adults with aphasia.

Science.gov (United States)

Hux, Karen; Knollman-Porter, Kelly; Brown, Jessica; Wallace, Sarah E

2017-09-01

Using text-to-speech technology to provide simultaneous written and auditory content presentation may help compensate for chronic reading challenges if people with aphasia can understand synthetic speech output; however, inherent auditory comprehension challenges experienced by people with aphasia may make understanding synthetic speech difficult. This study's purpose was to compare the preferences and auditory comprehension accuracy of people with aphasia when listening to sentences generated with digitized natural speech, Alex synthetic speech (i.e., Macintosh platform), or David synthetic speech (i.e., Windows platform). The methodology required each of 20 participants with aphasia to select one of four images corresponding in meaning to each of 60 sentences comprising three stimulus sets. Results revealed significantly better accuracy given digitized natural speech than either synthetic speech option; however, individual participant performance analyses revealed three patterns: (a) comparable accuracy regardless of speech condition for 30% of participants, (b) comparable accuracy between digitized natural speech and one, but not both, synthetic speech option for 45% of participants, and (c) greater accuracy with digitized natural speech than with either synthetic speech option for remaining participants. Ranking and Likert-scale rating data revealed a preference for digitized natural speech and David synthetic speech over Alex synthetic speech. Results suggest many individuals with aphasia can comprehend synthetic speech options available on popular operating systems. Further examination of synthetic speech use to support reading comprehension through text-to-speech technology is thus warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Common neural substrates support speech and non-speech vocal tract gestures

OpenAIRE

Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M.J.; Poletto, Christopher J.; Ludlow, Christy L.

2009-01-01

The issue of whether speech is supported by the same neural substrates as non-speech vocal-tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, were compared to the production of speech sylla...
Introductory speeches

International Nuclear Information System (INIS)

2001-01-01

This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering
Contribution to automatic speech recognition. Analysis of the direct acoustical signal. Recognition of isolated words and phoneme identification

International Nuclear Information System (INIS)

Dupeyrat, Benoit

1981-01-01

This report deals with the acoustical-phonetic step of the automatic recognition of the speech. The parameters used are the extrema of the acoustical signal (coded in amplitude and duration). This coding method, the properties of which are described, is simple and well adapted to a digital processing. The quality and the intelligibility of the coded signal after reconstruction are particularly satisfactory. An experiment for the automatic recognition of isolated words has been carried using this coding system. We have designed a filtering algorithm operating on the parameters of the coding. Thus the characteristics of the formants can be derived under certain conditions which are discussed. Using these characteristics the identification of a large part of the phonemes for a given speaker was achieved. Carrying on the studies has required the development of a particular methodology of real time processing which allowed immediate evaluation of the improvement of the programs. Such processing on temporal coding of the acoustical signal is extremely powerful and could represent, used in connection with other methods an efficient tool for the automatic processing of the speech.(author) [fr
No, there is no 150 ms lead of visual speech on auditory speech, but a range of audiovisual asynchronies varying from small audio lead to large audio lag.

Directory of Open Access Journals (Sweden)

Jean-Luc Schwartz

2014-07-01

Full Text Available An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call "preparatory gestures". However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call "comodulatory gestures" providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction.
Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

DEFF Research Database (Denmark)

Jørgensen, Søren; Dau, Torsten

2013-01-01

The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...
Analysis of Salient Feature Jitter in the Cochlea for Objective Prediction of Temporally Localized Distortion in Synthesized Speech

Directory of Open Access Journals (Sweden)

Wenliang Lu

2009-01-01

Full Text Available Temporally localized distortions account for the highest variance in subjective evaluation of coded speech signals (Sen (2001 and Hall (2001. The ability to discern and decompose perceptually relevant temporally localized coding noise from other types of distortions is both of theoretical importance as well as a valuable tool for deploying and designing speech synthesis systems. The work described within uses a physiologically motivated cochlear model to provide a tractable analysis of salient feature trajectories as processed by the cochlea. Subsequent statistical analysis shows simple relationships between the jitter of these trajectories and temporal attributes of the Diagnostic Acceptability Measure (DAM.
Exploring Australian speech-language pathologists' use and perceptions ofnon-speech oral motor exercises.

Science.gov (United States)

Rumbach, Anna F; Rose, Tanya A; Cheah, Mynn

2018-01-29

To explore Australian speech-language pathologists' use of non-speech oral motor exercises, and rationales for using/not using non-speech oral motor exercises in clinical practice. A total of 124 speech-language pathologists practising in Australia, working with paediatric and/or adult clients with speech sound difficulties, completed an online survey. The majority of speech-language pathologists reported that they did not use non-speech oral motor exercises when working with paediatric or adult clients with speech sound difficulties. However, more than half of the speech-language pathologists working with adult clients who have dysarthria reported using non-speech oral motor exercises with this population. The most frequently reported rationale for using non-speech oral motor exercises in speech sound difficulty management was to improve awareness/placement of articulators. The majority of speech-language pathologists agreed there is no clear clinical or research evidence base to support non-speech oral motor exercise use with clients who have speech sound difficulties. This study provides an overview of Australian speech-language pathologists' reported use and perceptions of non-speech oral motor exercises' applicability and efficacy in treating paediatric and adult clients who have speech sound difficulties. The research findings provide speech-language pathologists with insight into how and why non-speech oral motor exercises are currently used, and adds to the knowledge base regarding Australian speech-language pathology practice of non-speech oral motor exercises in the treatment of speech sound difficulties. Implications for Rehabilitation Non-speech oral motor exercises refer to oral motor activities which do not involve speech, but involve the manipulation or stimulation of oral structures including the lips, tongue, jaw, and soft palate. Non-speech oral motor exercises are intended to improve the function (e.g., movement, strength) of oral structures. The
Hints About Some Baseful but Indispensable Elements in Speech Recognition and Reconstruction

Directory of Open Access Journals (Sweden)

Mihaela Costin

2002-07-01

Full Text Available The cochlear implant (CI is a device used to reconstruct the hearing capabilities of a person diagnosed with total cophosis. This impairment may occur after some accidents, chemotherapy etc., the person still having an intact hearing nerve. The cochlear implant has two parts: a programmable, external part, the Digital Signal Processing (DSP device which process and transform the speech signal, and another surgically implanted part, with a certain number of electrodes (depending on brand used to stimulate the hearing nerve. The speech signal is fully processed in the DSP external device resulting the ``coded'' information on speech. This is modulated with the support of the fundamental frequency F0 and the energy impulses are inductively sent to the hearing nerve. The correct detection of this frequency is very important, determining the manner of hearing and making the difference between a "computer'' voice and a natural one. The results are applicable not only in the medical domain, but also in the Romanian speech synthesis.
Machine learning of the reactor core loading pattern critical parameters

International Nuclear Information System (INIS)

Trontl, K.; Pevec, D.; Smuc, T.

2007-01-01

The usual approach to loading pattern optimization involves high degree of engineering judgment, a set of heuristic rules, an optimization algorithm and a computer code used for evaluating proposed loading patterns. The speed of the optimization process is highly dependent on the computer code used for the evaluation. In this paper we investigate the applicability of a machine learning model which could be used for fast loading pattern evaluation. We employed a recently introduced machine learning technique, Support Vector Regression (SVR), which has a strong theoretical background in statistical learning theory. Superior empirical performance of the method has been reported on difficult regression problems in different fields of science and technology. SVR is a data driven, kernel based, nonlinear modelling paradigm, in which model parameters are automatically determined by solving a quadratic optimization problem. The main objective of the work reported in this paper was to evaluate the possibility of applying SVR method for reactor core loading pattern modelling. The starting set of experimental data for training and testing of the machine learning algorithm was obtained using a two-dimensional diffusion theory reactor physics computer code. We illustrate the performance of the solution and discuss its applicability, i.e., complexity, speed and accuracy, with a projection to a more realistic scenario involving machine learning from the results of more accurate and time consuming three-dimensional core modelling code. (author)
[Improving speech comprehension using a new cochlear implant speech processor].

Science.gov (United States)

Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

2009-06-01

The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg
Benchmarking MILC code with OpenMP and MPI

International Nuclear Information System (INIS)

Gottlieb, Steven; Tamhankar, Sonali

2001-01-01

A trend in high performance computers that is becoming increasingly popular is the use of symmetric multi-processing (SMP) rather than the older paradigm of MPP. MPI codes that ran and scaled well on MPP machines can often be run on an SMP machine using the vendor's version of MPI. However, this approach may not make optimal use of the (expensive) SMP hardware. More significantly, there are machines like Blue Horizon, an IBM SP with 8-way SMP nodes at the San Diego Supercomputer Center that can only support 4 MPI processes per node (with the current switch). On such a machine it is imperative to be able to use OpenMP parallelism on the node, and MPI between nodes. We describe the challenges of converting MILC MPI code to using a second level of OpenMP parallelism, and benchmarks on IBM and Sun computers

The analysis of speech acts patterns in two Egyptian inaugural speeches

Directory of Open Access Journals (Sweden)

Imad Hayif Sameer

2017-09-01

Full Text Available The theory of speech acts, which clarifies what people do when they speak, is not about individual words or sentences that form the basic elements of human communication, but rather about particular speech acts that are performed when uttering words. A speech act is the attempt at doing something purely by speaking. Many things can be done by speaking. Speech acts are studied under what is called speech act theory, and belong to the domain of pragmatics. In this paper, two Egyptian inaugural speeches from El-Sadat and El-Sisi, belonging to different periods were analyzed to find out whether there were differences within this genre in the same culture or not. The study showed that there was a very small difference between these two speeches which were analyzed according to Searle’s theory of speech acts. In El Sadat’s speech, commissives came to occupy the first place. Meanwhile, in El–Sisi’s speech, assertives occupied the first place. Within the speeches of one culture, we can find that the differences depended on the circumstances that surrounded the elections of the Presidents at the time. Speech acts were tools they used to convey what they wanted and to obtain support from their audiences.
Voice based gender classification using machine learning

Science.gov (United States)

Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.

2017-11-01

Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.
Speech Problems

Science.gov (United States)

... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...
Alternative Speech Communication System for Persons with Severe Speech Disorders

Science.gov (United States)

Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

2009-12-01

Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.
A Danish open-set speech corpus for competing-speech studies

DEFF Research Database (Denmark)

Nielsen, Jens Bo; Dau, Torsten; Neher, Tobias

2014-01-01

Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs......) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded...... with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed...
Speech entrainment enables patients with Broca’s aphasia to produce fluent speech

Science.gov (United States)

Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

2012-01-01

A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production
Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN

Science.gov (United States)

Zhu, Lianzhang; Chen, Leiming; Zhao, Dehai

2017-01-01

Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed. PMID:28737705
Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN.

Science.gov (United States)

Zhu, Lianzhang; Chen, Leiming; Zhao, Dehai; Zhou, Jiehan; Zhang, Weishan

2017-07-24

Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed.
Probability Machines: Consistent Probability Estimation Using Nonparametric Learning Machines

Science.gov (United States)

Malley, J. D.; Kruppa, J.; Dasgupta, A.; Malley, K. G.; Ziegler, A.

2011-01-01

Summary Background Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. Objectives The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Methods Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Results Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Conclusions Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications. PMID:21915433
Multimodal Speech Capture System for Speech Rehabilitation and Learning.

Science.gov (United States)

Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam

2017-11-01

Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.
Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

Science.gov (United States)

van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

2007-01-01

Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…
Subspace-Based Noise Reduction for Speech Signals via Diagonal and Triangular Matrix Decompositions

DEFF Research Database (Denmark)

Hansen, Per Christian; Jensen, Søren Holdt

We survey the definitions and use of rank-revealing matrix decompositions in single-channel noise reduction algorithms for speech signals. Our algorithms are based on the rank-reduction paradigm and, in particular, signal subspace techniques. The focus is on practical working algorithms, using both...... diagonal (eigenvalue and singular value) decompositions and rank-revealing triangular decompositions (ULV, URV, VSV, ULLV and ULLIV). In addition we show how the subspace-based algorithms can be evaluated and compared by means of simple FIR filter interpretations. The algorithms are illustrated...... with working Matlab code and applications in speech processing....
Graphic man-machine interface applied to nuclear reactor designs

International Nuclear Information System (INIS)

Pereira, Claudio M.N.A; Mol, Antonio Carlos A.

1999-01-01

The Man-Machine Interfaces have been of interest of many researchers in the area of nuclear human factors engineering, principally applied to monitoring systems. The clarity of information provides best adaptation of the men to the machine. This work proposes the development of a Graphic Man-Machine Interface applied to nuclear reactor designs as a tool to optimize them. Here is present a prototype of a graphic man-machine interface for the Hammer code developed for PC under the Windows environment. The results of its application are commented. (author)
Asymmetric Dynamic Attunement of Speech and Gestures in the Construction of Children's Understanding.

Science.gov (United States)

De Jonge-Hoekstra, Lisette; Van der Steen, Steffie; Van Geert, Paul; Cox, Ralf F A

2016-01-01

As children learn they use their speech to express words and their hands to gesture. This study investigates the interplay between real-time gestures and speech as children construct cognitive understanding during a hands-on science task. 12 children (M = 6, F = 6) from Kindergarten (n = 5) and first grade (n = 7) participated in this study. Each verbal utterance and gesture during the task were coded, on a complexity scale derived from dynamic skill theory. To explore the interplay between speech and gestures, we applied a cross recurrence quantification analysis (CRQA) to the two coupled time series of the skill levels of verbalizations and gestures. The analysis focused on (1) the temporal relation between gestures and speech, (2) the relative strength and direction of the interaction between gestures and speech, (3) the relative strength and direction between gestures and speech for different levels of understanding, and (4) relations between CRQA measures and other child characteristics. The results show that older and younger children differ in the (temporal) asymmetry in the gestures-speech interaction. For younger children, the balance leans more toward gestures leading speech in time, while the balance leans more toward speech leading gestures for older children. Secondly, at the group level, speech attracts gestures in a more dynamically stable fashion than vice versa, and this asymmetry in gestures and speech extends to lower and higher understanding levels. Yet, for older children, the mutual coupling between gestures and speech is more dynamically stable regarding the higher understanding levels. Gestures and speech are more synchronized in time as children are older. A higher score on schools' language tests is related to speech attracting gestures more rigidly and more asymmetry between gestures and speech, only for the less difficult understanding levels. A higher score on math or past science tasks is related to less asymmetry between gestures and
A Very Fast and Angular Momentum Conserving Tree Code

International Nuclear Information System (INIS)

Marcello, Dominic C.

2017-01-01

There are many methods used to compute the classical gravitational field in astrophysical simulation codes. With the exception of the typically impractical method of direct computation, none ensure conservation of angular momentum to machine precision. Under uniform time-stepping, the Cartesian fast multipole method of Dehnen (also known as the very fast tree code) conserves linear momentum to machine precision. We show that it is possible to modify this method in a way that conserves both angular and linear momenta.
A Very Fast and Angular Momentum Conserving Tree Code

Energy Technology Data Exchange (ETDEWEB)

Marcello, Dominic C., E-mail: dmarce504@gmail.com [Department of Physics and Astronomy, and Center for Computation and Technology Louisiana State University, Baton Rouge, LA 70803 (United States)

2017-09-01

There are many methods used to compute the classical gravitational field in astrophysical simulation codes. With the exception of the typically impractical method of direct computation, none ensure conservation of angular momentum to machine precision. Under uniform time-stepping, the Cartesian fast multipole method of Dehnen (also known as the very fast tree code) conserves linear momentum to machine precision. We show that it is possible to modify this method in a way that conserves both angular and linear momenta.
Subjective and objective measurement of the intelligibility of synthesized speech impaired by the very low bit rate STANAG 4591 codec including packet loss

NARCIS (Netherlands)

Počta, P.; Beerends, J.G.

2017-01-01

This paper deals with the intelligibility of speech coded by the STANAG 4591 standard codec, including packet loss, using synthesized speech input. Both subjective and objective assessments are used. It is shown that this codec significantly degrades intelligibility when compared to a standard
Analysing Afrikaans-English bilingual children's conversational code ...

African Journals Online (AJOL)

It has been observed that children mix languages more often if they have been exposed to mixed speech, especially if they are in bilingual company. Very little research, however, exists on the code switching (CS) of children brought up in multilingual contexts. The study discussed in this paper investigates the grammatical ...
Applications guide to the RSIC-distributed version of the MCNP code (coupled Monte Carlo neutron-photon Code)

International Nuclear Information System (INIS)

Cramer, S.N.

1985-09-01

An overview of the RSIC-distributed version of the MCNP code (a soupled Monte Carlo neutron-photon code) is presented. All general features of the code, from machine hardware requirements to theoretical details, are discussed. The current nuclide cross-section and other libraries available in the standard code package are specified, and a realistic example of the flexible geometry input is given. Standard and nonstandard source, estimator, and variance-reduction procedures are outlined. Examples of correct usage and possible misuse of certain code features are presented graphically and in standard output listings. Finally, itemized summaries of sample problems, various MCNP code documentation, and future work are given
Near-toll quality digital speech transmission in the mobile satellite service

Science.gov (United States)

Townes, S. A.; Divsalar, D.

1986-01-01

This paper discusses system considerations for near-toll quality digital speech transmission in a 5 kHz mobile satellite system channel. Tradeoffs are shown for power performance versus delay for a 4800 bps speech compression system in conjunction with a 16 state rate 2/3 trellis coded 8PSK modulation system. The suggested system has an additional 150 ms of delay beyond the propagation delay and requires an E(b)/N(0) of about 7 dB for a Ricean channel assumption with line-of-sight to diffuse component ratio of 10 assuming ideal synchronization. An additional loss of 2 to 3 dB is expected for synchronization in fading environment.

An Approach for Implementing State Machines with Online Testability

Directory of Open Access Journals (Sweden)

P. K. Lala

2010-01-01

Full Text Available During the last two decades, significant amount of research has been performed to simplify the detection of transient or soft errors in VLSI-based digital systems. This paper proposes an approach for implementing state machines that uses 2-hot code for state encoding. State machines designed using this approach allow online detection of soft errors in registers and output logic. The 2-hot code considerably reduces the number of required flip-flops and leads to relatively straightforward implementation of next state and output logic. A new way of designing output logic for online fault detection has also been presented.
Coupling a Basin Modeling and a Seismic Code using MOAB

KAUST Repository

Yan, Mi; Jordan, Kirk; Kaushik, Dinesh; Perrone, Michael; Sachdeva, Vipin; Tautges, Timothy J.; Magerlein, John

2012-01-01

We report on a demonstration of loose multiphysics coupling between a basin modeling code and a seismic code running on a large parallel machine. Multiphysics coupling, which is one critical capability for a high performance computing (HPC) framework, was implemented using the MOAB open-source mesh and field database. MOAB provides for code coupling by storing mesh data and input and output field data for the coupled analysis codes and interpolating the field values between different meshes used by the coupled codes. We found it straightforward to use MOAB to couple the PBSM basin modeling code and the FWI3D seismic code on an IBM Blue Gene/P system. We describe how the coupling was implemented and present benchmarking results for up to 8 racks of Blue Gene/P with 8192 nodes and MPI processes. The coupling code is fast compared to the analysis codes and it scales well up to at least 8192 nodes, indicating that a mesh and field database is an efficient way to implement loose multiphysics coupling for large parallel machines.
Coupling a Basin Modeling and a Seismic Code using MOAB

KAUST Repository

Yan, Mi

2012-06-02

We report on a demonstration of loose multiphysics coupling between a basin modeling code and a seismic code running on a large parallel machine. Multiphysics coupling, which is one critical capability for a high performance computing (HPC) framework, was implemented using the MOAB open-source mesh and field database. MOAB provides for code coupling by storing mesh data and input and output field data for the coupled analysis codes and interpolating the field values between different meshes used by the coupled codes. We found it straightforward to use MOAB to couple the PBSM basin modeling code and the FWI3D seismic code on an IBM Blue Gene/P system. We describe how the coupling was implemented and present benchmarking results for up to 8 racks of Blue Gene/P with 8192 nodes and MPI processes. The coupling code is fast compared to the analysis codes and it scales well up to at least 8192 nodes, indicating that a mesh and field database is an efficient way to implement loose multiphysics coupling for large parallel machines.
Machine-Checked Sequencer for Critical Embedded Code Generator

Science.gov (United States)

Izerrouken, Nassima; Pantel, Marc; Thirioux, Xavier

This paper presents the development of a correct-by-construction block sequencer for GeneAuto a qualifiable (according to DO178B/ED12B recommendation) automatic code generator. It transforms Simulink models to MISRA C code for safety critical systems. Our approach which combines classical development process and formal specification and verification using proof-assistants, led to preliminary fruitful exchanges with certification authorities. We present parts of the classical user and tools requirements and derived formal specifications, implementation and verification for the correctness and termination of the block sequencer. This sequencer has been successfully applied to real-size industrial use cases from various transportation domain partners and led to requirement errors detection and a correct-by-construction implementation.
Coding strategies for cochlear implants under adverse environments

Science.gov (United States)

Tahmina, Qudsia

Cochlear implants are electronic prosthetic devices that restores partial hearing in patients with severe to profound hearing loss. Although most coding strategies have significantly improved the perception of speech in quite listening conditions, there remains limitations on speech perception under adverse environments such as in background noise, reverberation and band-limited channels, and we propose strategies that improve the intelligibility of speech transmitted over the telephone networks, reverberated speech and speech in the presence of background noise. For telephone processed speech, we propose to examine the effects of adding low-frequency and high- frequency information to the band-limited telephone speech. Four listening conditions were designed to simulate the receiving frequency characteristics of telephone handsets. Results indicated improvement in cochlear implant and bimodal listening when telephone speech was augmented with high frequency information and therefore this study provides support for design of algorithms to extend the bandwidth towards higher frequencies. The results also indicated added benefit from hearing aids for bimodal listeners in all four types of listening conditions. Speech understanding in acoustically reverberant environments is always a difficult task for hearing impaired listeners. Reverberated sounds consists of direct sound, early reflections and late reflections. Late reflections are known to be detrimental to speech intelligibility. In this study, we propose a reverberation suppression strategy based on spectral subtraction to suppress the reverberant energies from late reflections. Results from listening tests for two reverberant conditions (RT60 = 0.3s and 1.0s) indicated significant improvement when stimuli was processed with SS strategy. The proposed strategy operates with little to no prior information on the signal and the room characteristics and therefore, can potentially be implemented in real-time CI
How the machine ‘thinks’: Understanding opacity in machine learning algorithms

Directory of Open Access Journals (Sweden)

Jenna Burrell

2016-01-01

Full Text Available This article considers the issue of opacity as a problem for socially consequential mechanisms of classification and ranking, such as spam filters, credit card fraud detection, search engines, news trends, market segmentation and advertising, insurance or loan qualification, and credit scoring. These mechanisms of classification all frequently rely on computational algorithms, and in many cases on machine learning algorithms to do this work. In this article, I draw a distinction between three forms of opacity: (1 opacity as intentional corporate or state secrecy, (2 opacity as technical illiteracy, and (3 an opacity that arises from the characteristics of machine learning algorithms and the scale required to apply them usefully. The analysis in this article gets inside the algorithms themselves. I cite existing literatures in computer science, known industry practices (as they are publicly presented, and do some testing and manipulation of code as a form of lightweight code audit. I argue that recognizing the distinct forms of opacity that may be coming into play in a given application is a key to determining which of a variety of technical and non-technical solutions could help to prevent harm.
Study of wavelet packet energy entropy for emotion classification in speech and glottal signals

Science.gov (United States)

He, Ling; Lech, Margaret; Zhang, Jing; Ren, Xiaomei; Deng, Lihua

2013-07-01

The automatic speech emotion recognition has important applications in human-machine communication. Majority of current research in this area is focused on finding optimal feature parameters. In recent studies, several glottal features were examined as potential cues for emotion differentiation. In this study, a new type of feature parameter is proposed, which calculates energy entropy on values within selected Wavelet Packet frequency bands. The modeling and classification tasks are conducted using the classical GMM algorithm. The experiments use two data sets: the Speech Under Simulated Emotion (SUSE) data set annotated with three different emotions (angry, neutral and soft) and Berlin Emotional Speech (BES) database annotated with seven different emotions (angry, bored, disgust, fear, happy, sad and neutral). The average classification accuracy achieved for the SUSE data (74%-76%) is significantly higher than the accuracy achieved for the BES data (51%-54%). In both cases, the accuracy was significantly higher than the respective random guessing levels (33% for SUSE and 14.3% for BES).
Axon guidance pathways served as common targets for human speech/language evolution and related disorders.

Science.gov (United States)

Lei, Huimeng; Yan, Zhangming; Sun, Xiaohong; Zhang, Yue; Wang, Jianhong; Ma, Caihong; Xu, Qunyuan; Wang, Rui; Jarvis, Erich D; Sun, Zhirong

2017-11-01

Human and several nonhuman species share the rare ability of modifying acoustic and/or syntactic features of sounds produced, i.e. vocal learning, which is the important neurobiological and behavioral substrate of human speech/language. This convergent trait was suggested to be associated with significant genomic convergence and best manifested at the ROBO-SLIT axon guidance pathway. Here we verified the significance of such genomic convergence and assessed its functional relevance to human speech/language using human genetic variation data. In normal human populations, we found the affected amino acid sites were well fixed and accompanied with significantly more associated protein-coding SNPs in the same genes than the rest genes. Diseased individuals with speech/language disorders have significant more low frequency protein coding SNPs but they preferentially occurred outside the affected genes. Such patients' SNPs were enriched in several functional categories including two axon guidance pathways (mediated by netrin and semaphorin) that interact with ROBO-SLITs. Four of the six patients have homozygous missense SNPs on PRAME gene family, one youngest gene family in human lineage, which possibly acts upon retinoic acid receptor signaling, similarly as FOXP2, to modulate axon guidance. Taken together, we suggest the axon guidance pathways (e.g. ROBO-SLIT, PRAME gene family) served as common targets for human speech/language evolution and related disorders. Copyright © 2017 Elsevier Inc. All rights reserved.
Enhancement of speech signals - with a focus on voiced speech models

DEFF Research Database (Denmark)

Nørholm, Sidsel Marie

This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based...... on this model. The basic model used in this thesis is the harmonic model which is a commonly used model for describing the voiced part of the speech signal. We show that it can be beneficial to extend the model to take inharmonicities or the non-stationarity of speech into account. Extending the model...
Stochastic algorithm for channel optimized vector quantization: application to robust narrow-band speech coding

International Nuclear Information System (INIS)

Bouzid, M.; Benkherouf, H.; Benzadi, K.

2011-01-01

In this paper, we propose a stochastic joint source-channel scheme developed for efficient and robust encoding of spectral speech LSF parameters. The encoding system, named LSF-SSCOVQ-RC, is an LSF encoding scheme based on a reduced complexity stochastic split vector quantizer optimized for noisy channel. For transmissions over noisy channel, we will show first that our LSF-SSCOVQ-RC encoder outperforms the conventional LSF encoder designed by the split vector quantizer. After that, we applied the LSF-SSCOVQ-RC encoder (with weighted distance) for the robust encoding of LSF parameters of the 2.4 Kbits/s MELP speech coder operating over a noisy/noiseless channel. The simulation results will show that the proposed LSF encoder, incorporated in the MELP, ensure better performances than the original MELP MSVQ of 25 bits/frame; especially when the transmission channel is highly disturbed. Indeed, we will show that the LSF-SSCOVQ-RC yields significant improvement to the LSFs encoding performances by ensuring reliable transmissions over noisy channel.
Characterization of coded random access with compressive sensing based multi user detection

DEFF Research Database (Denmark)

Ji, Yalei; Stefanovic, Cedomir; Bockelmann, Carsten

2014-01-01

The emergence of Machine-to-Machine (M2M) communication requires new Medium Access Control (MAC) schemes and physical (PHY) layer concepts to support a massive number of access requests. The concept of coded random access, introduced recently, greatly outperforms other random access methods...... coded random access with CS-MUD on the PHY layer and show very promising results for the resulting protocol....
Intelligibility for Binaural Speech with Discarded Low-SNR Speech Components.

Science.gov (United States)

Schoenmaker, Esther; van de Par, Steven

2016-01-01

Speech intelligibility in multitalker settings improves when the target speaker is spatially separated from the interfering speakers. A factor that may contribute to this improvement is the improved detectability of target-speech components due to binaural interaction in analogy to the Binaural Masking Level Difference (BMLD). This would allow listeners to hear target speech components within specific time-frequency intervals that have a negative SNR, similar to the improvement in the detectability of a tone in noise when these contain disparate interaural difference cues. To investigate whether these negative-SNR target-speech components indeed contribute to speech intelligibility, a stimulus manipulation was performed where all target components were removed when local SNRs were smaller than a certain criterion value. It can be expected that for sufficiently high criterion values target speech components will be removed that do contribute to speech intelligibility. For spatially separated speakers, assuming that a BMLD-like detection advantage contributes to intelligibility, degradation in intelligibility is expected already at criterion values below 0 dB SNR. However, for collocated speakers it is expected that higher criterion values can be applied without impairing speech intelligibility. Results show that degradation of intelligibility for separated speakers is only seen for criterion values of 0 dB and above, indicating a negligible contribution of a BMLD-like detection advantage in multitalker settings. These results show that the spatial benefit is related to a spatial separation of speech components at positive local SNRs rather than to a BMLD-like detection improvement for speech components at negative local SNRs.
An experimental Dutch keyboard-to-speech system for the speech impaired

NARCIS (Netherlands)

Deliege, R.J.H.

1989-01-01

An experimental Dutch keyboard-to-speech system has been developed to explor the possibilities and limitations of Dutch speech synthesis in a communication aid for the speech impaired. The system uses diphones and a formant synthesizer chip for speech synthesis. Input to the system is in
ADAPTING HYBRID MACHINE TRANSLATION TECHNIQUES FOR CROSS-LANGUAGE TEXT RETRIEVAL SYSTEM

Directory of Open Access Journals (Sweden)

P. ISWARYA

2017-03-01

Full Text Available This research work aims in developing Tamil to English Cross - language text retrieval system using hybrid machine translation approach. The hybrid machine translation system is a combination of rule based and statistical based approaches. In an existing word by word translation system there are lot of issues and some of them are ambiguity, Out-of-Vocabulary words, word inflections, and improper sentence structure. To handle these issues, proposed architecture is designed in such a way that, it contains Improved Part-of-Speech tagger, machine learning based morphological analyser, collocation based word sense disambiguation procedure, semantic dictionary, and tense markers with gerund ending rules, and two pass transliteration algorithm. From the experimental results it is clear that the proposed Tamil Query based translation system achieves significantly better translation quality over existing system, and reaches 95.88% of monolingual performance.
VOA: a 2-d plasma physics code

International Nuclear Information System (INIS)

Eltgroth, P.G.

1975-12-01

A 2-dimensional relativistic plasma physics code was written and tested. The non-thermal components of the particle distribution functions are represented by expansion into moments in momentum space. These moments are computed directly from numerical equations. Currently three species are included - electrons, ions and ''beam electrons''. The computer code runs on either the 7600 or STAR machines at LLL. Both the physics and the operation of the code are discussed
Enhancing Speech Recognition Using Improved Particle Swarm Optimization Based Hidden Markov Model

Directory of Open Access Journals (Sweden)

Lokesh Selvaraj

2014-01-01

Full Text Available Enhancing speech recognition is the primary intention of this work. In this paper a novel speech recognition method based on vector quantization and improved particle swarm optimization (IPSO is suggested. The suggested methodology contains four stages, namely, (i denoising, (ii feature mining (iii, vector quantization, and (iv IPSO based hidden Markov model (HMM technique (IP-HMM. At first, the speech signals are denoised using median filter. Next, characteristics such as peak, pitch spectrum, Mel frequency Cepstral coefficients (MFCC, mean, standard deviation, and minimum and maximum of the signal are extorted from the denoised signal. Following that, to accomplish the training process, the extracted characteristics are given to genetic algorithm based codebook generation in vector quantization. The initial populations are created by selecting random code vectors from the training set for the codebooks for the genetic algorithm process and IP-HMM helps in doing the recognition. At this point the creativeness will be done in terms of one of the genetic operation crossovers. The proposed speech recognition technique offers 97.14% accuracy.
The minor third communicates sadness in speech, mirroring its use in music.

Science.gov (United States)

Curtis, Meagan E; Bharucha, Jamshed J

2010-06-01

There is a long history of attempts to explain why music is perceived as expressing emotion. The relationship between pitches serves as an important cue for conveying emotion in music. The musical interval referred to as the minor third is generally thought to convey sadness. We reveal that the minor third also occurs in the pitch contour of speech conveying sadness. Bisyllabic speech samples conveying four emotions were recorded by 9 actresses. Acoustic analyses revealed that the relationship between the 2 salient pitches of the sad speech samples tended to approximate a minor third. Participants rated the speech samples for perceived emotion, and the use of numerous acoustic parameters as cues for emotional identification was modeled using regression analysis. The minor third was the most reliable cue for identifying sadness. Additional participants rated musical intervals for emotion, and their ratings verified the historical association between the musical minor third and sadness. These findings support the theory that human vocal expressions and music share an acoustic code for communicating sadness.
Enhancement of a radiation safety system through the use of a microprocessor-controlled speech synthesizer

International Nuclear Information System (INIS)

Keefe, D.J.; McDowell, W.P.

1980-01-01

A speech synthesizer is being used to differentiate eight separate safety alarms on a high energy accelerator at Argonne National Laboratory. A single board microcomputer monitors eight signals from an existing radiation safety logic circuit. The microcomputer is programmed to output the proper code at the proper time and sequence to a speech synthesizer which supplies the audio input to a local public address system. This eliminates the requirement for eight different alarm tones and the personnel training required to differentiate among them. A twenty-word vocabulary was found adequate to supply the necessary safety announcements. The article describes the techniques used to interface the speech synthesizer into the existing safety logic circuit
Speech Function and Speech Role in Carl Fredricksen's Dialogue on Up Movie

OpenAIRE

Rehana, Ridha; Silitonga, Sortha

2013-01-01

One aim of this article is to show through a concrete example how speech function and speech role used in movie. The illustrative example is taken from the dialogue of Up movie. Central to the analysis proper form of dialogue on Up movie that contain of speech function and speech role; i.e. statement, offer, question, command, giving, and demanding. 269 dialogue were interpreted by actor, and it was found that the use of speech function and speech role.
Electronic data processing codes for California wildland plants

Science.gov (United States)

Merton J. Reed; W. Robert Powell; Bur S. Bal

1963-01-01

Systematized codes for plant names are helpful to a wide variety of workers who must record the identity of plants in the field. We have developed such codes for a majority of the vascular plants encountered on California wildlands and have published the codes in pocket size, using photo-reductions of the output from data processing machines. A limited number of the...

Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

Science.gov (United States)

Larm, Petra; Hongisto, Valtteri

2006-02-01

During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.
Code portability and data management considerations in the SAS3D LMFBR accident-analysis code

International Nuclear Information System (INIS)

Dunn, F.E.

1981-01-01

The SAS3D code was produced from a predecessor in order to reduce or eliminate interrelated problems in the areas of code portability, the large size of the code, inflexibility in the use of memory and the size of cases that can be run, code maintenance, and running speed. Many conventional solutions, such as variable dimensioning, disk storage, virtual memory, and existing code-maintenance utilities were not feasible or did not help in this case. A new data management scheme was developed, coding standards and procedures were adopted, special machine-dependent routines were written, and a portable source code processing code was written. The resulting code is quite portable, quite flexible in the use of memory and the size of cases that can be run, much easier to maintain, and faster running. SAS3D is still a large, long running code that only runs well if sufficient main memory is available
Robust Speech/Non-Speech Classification in Heterogeneous Multimedia Content

NARCIS (Netherlands)

Huijbregts, M.A.H.; de Jong, Franciska M.G.

In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because
Automatic analysis of slips of the tongue: Insights into the cognitive architecture of speech production.

Science.gov (United States)

Goldrick, Matthew; Keshet, Joseph; Gustafson, Erin; Heller, Jordana; Needle, Jeremy

2016-04-01

Traces of the cognitive mechanisms underlying speaking can be found within subtle variations in how we pronounce sounds. While speech errors have traditionally been seen as categorical substitutions of one sound for another, acoustic/articulatory analyses show they partially reflect the intended sound. When "pig" is mispronounced as "big," the resulting /b/ sound differs from correct productions of "big," moving towards intended "pig"-revealing the role of graded sound representations in speech production. Investigating the origins of such phenomena requires detailed estimation of speech sound distributions; this has been hampered by reliance on subjective, labor-intensive manual annotation. Computational methods can address these issues by providing for objective, automatic measurements. We develop a novel high-precision computational approach, based on a set of machine learning algorithms, for measurement of elicited speech. The algorithms are trained on existing manually labeled data to detect and locate linguistically relevant acoustic properties with high accuracy. Our approach is robust, is designed to handle mis-productions, and overall matches the performance of expert coders. It allows us to analyze a very large dataset of speech errors (containing far more errors than the total in the existing literature), illuminating properties of speech sound distributions previously impossible to reliably observe. We argue that this provides novel evidence that two sources both contribute to deviations in speech errors: planning processes specifying the targets of articulation and articulatory processes specifying the motor movements that execute this plan. These findings illustrate how a much richer picture of speech provides an opportunity to gain novel insights into language processing. Copyright © 2016 Elsevier B.V. All rights reserved.
Using others' words: conversational use of reported speech by individuals with aphasia and their communication partners.

Science.gov (United States)

Hengst, Julie A; Frame, Simone R; Neuman-Stritzel, Tiffany; Gannaway, Rachel

2005-02-01

Reported speech, wherein one quotes or paraphrases the speech of another, has been studied extensively as a set of linguistic and discourse practices. Researchers agree that reported speech is pervasive, found across languages, and used in diverse contexts. However, to date, there have been no studies of the use of reported speech among individuals with aphasia. Grounded in an interactional sociolinguistic perspective, the study presented here documents and analyzes the use of reported speech by 7 adults with mild to moderately severe aphasia and their routine communication partners. Each of the 7 pairs was videotaped in 4 everyday activities at home or around the community, yielding over 27 hr of conversational interaction for analysis. A coding scheme was developed that identified 5 types of explicitly marked reported speech: direct, indirect, projected, indexed, and undecided. Analysis of the data documented reported speech as a common discourse practice used successfully by the individuals with aphasia and their communication partners. All participants produced reported speech at least once, and across all observations the target pairs produced 400 reported speech episodes (RSEs), 149 by individuals with aphasia and 251 by their communication partners. For all participants, direct and indirect forms were the most prevalent (70% of RSEs). Situated discourse analysis of specific episodes of reported speech used by 3 of the pairs provides detailed portraits of the diverse interactional, referential, social, and discourse functions of reported speech and explores ways that the pairs used reported speech to successfully frame talk despite their ongoing management of aphasia.
The development of speech coding and the first standard coder for public mobile telephony

NARCIS (Netherlands)

Sluijter, R.J.

2005-01-01

This thesis describes in its core chapter (Chapter 4) the original algorithmic and design features of the ??rst coder for public mobile telephony, the GSM full-rate speech coder, as standardized in 1988. It has never been described in so much detail as presented here. The coder is put in a
CMIS arithmetic and multiwire news for QCD on the connection machine

International Nuclear Information System (INIS)

Brickner, R.G.

1991-01-01

Our collaboration has been running Wilson fermion QCD simulations on various Connection Machines for over a year and a half. During this time, we have continually optimized our code for operations found in the fermion matrix inversion. Our current version of the matrix inversion is written almost entirely in CMIS (Connection Machine Instruction Set), and utilizes both high-speed arithmetic and multiwire 'news' (nearest-neighbor communications). We present details of how these and other features of our code are implemented on the CM-2. (orig.)
Asymmetric dynamic attunement of speech and gestures in the construction of children’s understanding

Directory of Open Access Journals (Sweden)

Lisette eDe Jonge-Hoekstra

2016-03-01

Full Text Available As children learn they use their speech to express words and their hands to gesture. This study investigates the interplay between real-time gestures and speech as children construct cognitive understanding during a hands-on science task. 12 children (M = 6, F = 6 from Kindergarten (n = 5 and first grade (n = 7 participated in this study. Each verbal utterance and gesture during the task were coded, on a complexity scale derived from dynamic skill theory. To explore the interplay between speech and gestures, we applied a cross recurrence quantification analysis (CRQA to the two coupled time series of the skill levels of verbalizations and gestures. The analysis focused on 1 the temporal relation between gestures and speech, 2 the relative strength and direction of the interaction between gestures and speech, 3 the relative strength and direction between gestures and speech for different levels of understanding, and 4 relations between CRQA measures and other child characteristics. The results show that older and younger children differ in the (temporal asymmetry in the gestures-speech interaction. For younger children, the balance leans more towards gestures leading speech in time, while the balance leans more towards speech leading gestures for older children. Secondly, at the group level, speech attracts gestures in a more dynamically stable fashion than vice versa, and this asymmetry in gestures and speech extends to lower and higher understanding levels. Yet, for older children, the mutual coupling between gestures and speech is more dynamically stable regarding the higher understanding levels. Gestures and speech are more synchronized in time as children are older. A higher score on schools’ language tests is related to speech attracting gestures more rigidly and more asymmetry between gestures and speech, only for the less difficult understanding levels. A higher score on math or past science tasks is related to less asymmetry between
Intelligibility of speech of children with speech and sound disorders

OpenAIRE

Ivetac, Tina

2014-01-01

The purpose of this study is to examine speech intelligibility of children with primary speech and sound disorders aged 3 to 6 years in everyday life. The research problem is based on the degree to which parents or guardians, immediate family members (sister, brother, grandparents), extended family members (aunt, uncle, cousin), child's friends, other acquaintances, child's teachers and strangers understand the speech of children with speech sound disorders. We examined whether the level ...
Computation of the bounce-average code

International Nuclear Information System (INIS)

Cutler, T.A.; Pearlstein, L.D.; Rensink, M.E.

1977-01-01

The bounce-average computer code simulates the two-dimensional velocity transport of ions in a mirror machine. The code evaluates and bounce-averages the collision operator and sources along the field line. A self-consistent equilibrium magnetic field is also computed using the long-thin approximation. Optionally included are terms that maintain μ, J invariance as the magnetic field changes in time. The assumptions and analysis that form the foundation of the bounce-average code are described. When references can be cited, the required results are merely stated and explained briefly. A listing of the code is appended
Speech disorders - children

Science.gov (United States)

... disorder; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder; Speech disorder - stuttering ... evaluation tools that can help identify and diagnose speech disorders: Denver Articulation Screening Examination Goldman-Fristoe Test of ...
Steganalysis of recorded speech

Science.gov (United States)

Johnson, Micah K.; Lyu, Siwei; Farid, Hany

2005-03-01

Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.
Virtual machine provisioning, code management, and data movement design for the Fermilab HEPCloud Facility

Science.gov (United States)

Timm, S.; Cooper, G.; Fuess, S.; Garzoglio, G.; Holzman, B.; Kennedy, R.; Grassano, D.; Tiradani, A.; Krishnamurthy, R.; Vinayagam, S.; Raicu, I.; Wu, H.; Ren, S.; Noh, S.-Y.

2017-10-01

The Fermilab HEPCloud Facility Project has as its goal to extend the current Fermilab facility interface to provide transparent access to disparate resources including commercial and community clouds, grid federations, and HPC centers. This facility enables experiments to perform the full spectrum of computing tasks, including data-intensive simulation and reconstruction. We have evaluated the use of the commercial cloud to provide elasticity to respond to peaks of demand without overprovisioning local resources. Full scale data-intensive workflows have been successfully completed on Amazon Web Services for two High Energy Physics Experiments, CMS and NOνA, at the scale of 58000 simultaneous cores. This paper describes the significant improvements that were made to the virtual machine provisioning system, code caching system, and data movement system to accomplish this work. The virtual image provisioning and contextualization service was extended to multiple AWS regions, and to support experiment-specific data configurations. A prototype Decision Engine was written to determine the optimal availability zone and instance type to run on, minimizing cost and job interruptions. We have deployed a scalable on-demand caching service to deliver code and database information to jobs running on the commercial cloud. It uses the frontiersquid server and CERN VM File System (CVMFS) clients on EC2 instances and utilizes various services provided by AWS to build the infrastructure (stack). We discuss the architecture and load testing benchmarks on the squid servers. We also describe various approaches that were evaluated to transport experimental data to and from the cloud, and the optimal solutions that were used for the bulk of the data transport. Finally, we summarize lessons learned from this scale test, and our future plans to expand and improve the Fermilab HEP Cloud Facility.
Virtual Machine Provisioning, Code Management, and Data Movement Design for the Fermilab HEPCloud Facility

Energy Technology Data Exchange (ETDEWEB)

Timm, S. [Fermilab; Cooper, G. [Fermilab; Fuess, S. [Fermilab; Garzoglio, G. [Fermilab; Holzman, B. [Fermilab; Kennedy, R. [Fermilab; Grassano, D. [Fermilab; Tiradani, A. [Fermilab; Krishnamurthy, R. [IIT, Chicago; Vinayagam, S. [IIT, Chicago; Raicu, I. [IIT, Chicago; Wu, H. [IIT, Chicago; Ren, S. [IIT, Chicago; Noh, S. Y. [KISTI, Daejeon

2017-11-22

The Fermilab HEPCloud Facility Project has as its goal to extend the current Fermilab facility interface to provide transparent access to disparate resources including commercial and community clouds, grid federations, and HPC centers. This facility enables experiments to perform the full spectrum of computing tasks, including data-intensive simulation and reconstruction. We have evaluated the use of the commercial cloud to provide elasticity to respond to peaks of demand without overprovisioning local resources. Full scale data-intensive workflows have been successfully completed on Amazon Web Services for two High Energy Physics Experiments, CMS and NOνA, at the scale of 58000 simultaneous cores. This paper describes the significant improvements that were made to the virtual machine provisioning system, code caching system, and data movement system to accomplish this work. The virtual image provisioning and contextualization service was extended to multiple AWS regions, and to support experiment-specific data configurations. A prototype Decision Engine was written to determine the optimal availability zone and instance type to run on, minimizing cost and job interruptions. We have deployed a scalable on-demand caching service to deliver code and database information to jobs running on the commercial cloud. It uses the frontiersquid server and CERN VM File System (CVMFS) clients on EC2 instances and utilizes various services provided by AWS to build the infrastructure (stack). We discuss the architecture and load testing benchmarks on the squid servers. We also describe various approaches that were evaluated to transport experimental data to and from the cloud, and the optimal solutions that were used for the bulk of the data transport. Finally, we summarize lessons learned from this scale test, and our future plans to expand and improve the Fermilab HEP Cloud Facility.
Neurophysiology of speech differences in childhood apraxia of speech.

Science.gov (United States)

Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole

2014-01-01

Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.
An Adaptive Approach to a 2.4 kb/s LPC Speech Coding System.

Science.gov (United States)

1985-07-01

laryngeal cancer ). Spectral estimation is at the foundation of speech analysis for all these goals and accurate AR model estimation in noise is...S ,5 mWnL NrinKt ) o ,-G p (d va Rmea.imn flU: 5() WOM Lu M(G)INUNM 40 4KeemS! MU= 1 UD M5) SIGHSM A SO= WAGe . M. (d) I U NS maIm ( IW vis MAMA
Performance evaluation based on data from code reviews

OpenAIRE

Andrej, Sekáč

2016-01-01

Context. Modern code review tools such as Gerrit have made available great amounts of code review data from different open source projects as well as other commercial projects. Code reviews are used to keep the quality of produced source code under control but the stored data could also be used for evaluation of the software development process. Objectives. This thesis uses machine learning methods for an approximation of review expert’s performance evaluation function. Due to limitations in ...
Paradox in AI - AI 2.0: The Way to Machine Consciousness

Science.gov (United States)

Palensky, Peter; Bruckner, Dietmar; Tmej, Anna; Deutsch, Tobias

Artificial Intelligence, the big promise of the last millennium, has apparently made its way into our daily lives. Cell phones with speech control, evolutionary computing in data mining or power grids, optimized via neural network, show its applicability in industrial environments. The original expectation of true intelligence and thinking machines lies still ahead of us. Researchers are, however, optimistic as never before. This paper tries to compare the views, challenges and approaches of several disciplines: engineering, psychology, neuroscience, philosophy. It gives a short introduction to Psychoanalysis, discusses the term consciousness, social implications of intelligent machines, related theories, and expectations and shall serve as a starting point for first attempts of combining these diverse thoughts.
Manifold learning in machine vision and robotics

Science.gov (United States)

Bernstein, Alexander

2017-02-01

Smart algorithms are used in Machine vision and Robotics to organize or extract high-level information from the available data. Nowadays, Machine learning is an essential and ubiquitous tool to automate extraction patterns or regularities from data (images in Machine vision; camera, laser, and sonar sensors data in Robotics) in order to solve various subject-oriented tasks such as understanding and classification of images content, navigation of mobile autonomous robot in uncertain environments, robot manipulation in medical robotics and computer-assisted surgery, and other. Usually such data have high dimensionality, however, due to various dependencies between their components and constraints caused by physical reasons, all "feasible and usable data" occupy only a very small part in high dimensional "observation space" with smaller intrinsic dimensionality. Generally accepted model of such data is manifold model in accordance with which the data lie on or near an unknown manifold (surface) of lower dimensionality embedded in an ambient high dimensional observation space; real-world high-dimensional data obtained from "natural" sources meet, as a rule, this model. The use of Manifold learning technique in Machine vision and Robotics, which discovers a low-dimensional structure of high dimensional data and results in effective algorithms for solving of a large number of various subject-oriented tasks, is the content of the conference plenary speech some topics of which are in the paper.
Is the Speech Transmission Index (STI) a robust measure of sound system speech intelligibility performance?

Science.gov (United States)

Mapp, Peter

2002-11-01

Although RaSTI is a good indicator of the speech intelligibility capability of auditoria and similar spaces, during the past 2-3 years it has been shown that RaSTI is not a robust predictor of sound system intelligibility performance. Instead, it is now recommended, within both national and international codes and standards, that full STI measurement and analysis be employed. However, new research is reported, that indicates that STI is not as flawless, nor robust as many believe. The paper highlights a number of potential error mechanisms. It is shown that the measurement technique and signal excitation stimulus can have a significant effect on the overall result and accuracy, particularly where DSP-based equipment is employed. It is also shown that in its current state of development, STI is not capable of appropriately accounting for a number of fundamental speech and system attributes, including typical sound system frequency response variations and anomalies. This is particularly shown to be the case when a system is operating under reverberant conditions. Comparisons between actual system measurements and corresponding word score data are reported where errors of up to 50 implications for VA and PA system performance verification will be discussed.

Bacteriological quality of drinks from vending machines.

Science.gov (United States)

Hunter, P. R.; Burge, S. H.

1986-01-01

A survey on the bacteriological quality of both drinking water and flavoured drinks from coin-operated vending machines is reported. Forty-four per cent of 25 drinking water samples examined contained coliforms and 84% had viable counts of greater than 1000 organisms ml at 30 degrees C. Thirty-one flavoured drinks were examined; 6% contained coliforms and 39% had total counts greater than 1000 organisms ml. It is suggested that the D.H.S.S. code of practice on coin-operated vending machines is not being followed. It is also suggested that drinking water alone should not be dispensed from such machines. PMID:3794325
Next generation Zero-Code control system UI

CERN Multimedia

CERN. Geneva

2017-01-01

Developing ergonomic user interfaces for control systems is challenging, especially during machine upgrade and commissioning where several small changes may suddenly be required. Zero-code systems, such as *Inspector*, provide agile features for creating and maintaining control system interfaces. More so, these next generation Zero-code systems bring simplicity and uniformity and brake the boundaries between Users and Developers. In this talk we present *Inspector*, a CERN made Zero-code application development system, and we introduce the major differences and advantages of using Zero-code control systems to develop operational UI.
Syntheses by rules of the speech signal in its amplitude-time representation - melody study - phonetic, translation program

International Nuclear Information System (INIS)

Santamarina, Carole

1975-01-01

The present paper deals with the real-time speech synthesis implemented on a minicomputer. A first program translates the orthographic text into a string of phonetic codes, which is then processed by the synthesis program itself. The method used, a synthesis by rules, directly computes the speech signal in its amplitude-time representation. Emphasis has been put on special cases (diphthongs, 'e muet', consonant-consonant transition) and the implementation of the rhythm and of the melody. (author) [fr
Listeners Experience Linguistic Masking Release in Noise-Vocoded Speech-in-Speech Recognition

Science.gov (United States)

Viswanathan, Navin; Kokkinakis, Kostas; Williams, Brittany T.

2018-01-01

Purpose: The purpose of this study was to evaluate whether listeners with normal hearing perceiving noise-vocoded speech-in-speech demonstrate better intelligibility of target speech when the background speech was mismatched in language (linguistic release from masking [LRM]) and/or location (spatial release from masking [SRM]) relative to the…
Model-based machine learning.

Science.gov (United States)

Bishop, Christopher M

2013-02-13

Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications.
Object-Oriented Support for Adaptive Methods on Paranel Machines

Directory of Open Access Journals (Sweden)

Sandeep Bhatt

1993-01-01

Full Text Available This article reports on experiments from our ongoing project whose goal is to develop a C++ library which supports adaptive and irregular data structures on distributed memory supercomputers. We demonstrate the use of our abstractions in implementing "tree codes" for large-scale N-body simulations. These algorithms require dynamically evolving treelike data structures, as well as load-balancing, both of which are widely believed to make the application difficult and cumbersome to program for distributed-memory machines. The ease of writing the application code on top of our C++ library abstractions (which themselves are application independent, and the low overhead of the resulting C++ code (over hand-crafted C code supports our belief that object-oriented approaches are eminently suited to programming distributed-memory machines in a manner that (to the applications programmer is architecture-independent. Our contribution in parallel programming methodology is to identify and encapsulate general classes of communication and load-balancing strategies useful across applications and MIMD architectures. This article reports experimental results from simulations of half a million particles using multiple methods.
Speech Perception and Short-Term Memory Deficits in Persistent Developmental Speech Disorder

Science.gov (United States)

Kenney, Mary Kay; Barac-Cikoja, Dragana; Finnegan, Kimberly; Jeffries, Neal; Ludlow, Christy L.

2006-01-01

Children with developmental speech disorders may have additional deficits in speech perception and/or short-term memory. To determine whether these are only transient developmental delays that can accompany the disorder in childhood or persist as part of the speech disorder, adults with a persistent familial speech disorder were tested on speech…
Automatic speech recognition (ASR) based approach for speech therapy of aphasic patients: A review

Science.gov (United States)

Jamal, Norezmi; Shanta, Shahnoor; Mahmud, Farhanahani; Sha'abani, MNAH

2017-09-01

This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system's library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.
The effect of network degradation on speech recognition

CSIR Research Space (South Africa)

Joubert, G

2005-11-01

Full Text Available become increasingly popular, VoIP (Voice over Internet Protocol) is predicted to become the standard means of spoken telecommunication. As a consequence, a significant amount of research has been undertaken on the effect of various packet... to measure the effect of network traffic degeneration during a VoIP transmission, on speech-recognition accuracy. Sentences from the TIMIT database [2] were selected as basis for comparison. The open-source toolkit SOX [3] was used to code the samples...
Speech and Language Delay

Science.gov (United States)

... OTC Relief for Diarrhea Home Diseases and Conditions Speech and Language Delay Condition Speech and Language Delay Share Print Table of Contents1. ... Treatment6. Everyday Life7. Questions8. Resources What is a speech and language delay? A speech and language delay ...
Plasticity in the Human Speech Motor System Drives Changes in Speech Perception

Science.gov (United States)

Lametti, Daniel R.; Rochet-Capellan, Amélie; Neufeld, Emily; Shiller, Douglas M.

2014-01-01

Recent studies of human speech motor learning suggest that learning is accompanied by changes in auditory perception. But what drives the perceptual change? Is it a consequence of changes in the motor system? Or is it a result of sensory inflow during learning? Here, subjects participated in a speech motor-learning task involving adaptation to altered auditory feedback and they were subsequently tested for perceptual change. In two separate experiments, involving two different auditory perceptual continua, we show that changes in the speech motor system that accompany learning drive changes in auditory speech perception. Specifically, we obtained changes in speech perception when adaptation to altered auditory feedback led to speech production that fell into the phonetic range of the speech perceptual tests. However, a similar change in perception was not observed when the auditory feedback that subjects' received during learning fell into the phonetic range of the perceptual tests. This indicates that the central motor outflow associated with vocal sensorimotor adaptation drives changes to the perceptual classification of speech sounds. PMID:25080594
Small intragenic deletion in FOXP2 associated with childhood apraxia of speech and dysarthria.

Science.gov (United States)

Turner, Samantha J; Hildebrand, Michael S; Block, Susan; Damiano, John; Fahey, Michael; Reilly, Sheena; Bahlo, Melanie; Scheffer, Ingrid E; Morgan, Angela T

2013-09-01

Relatively little is known about the neurobiological basis of speech disorders although genetic determinants are increasingly recognized. The first gene for primary speech disorder was FOXP2, identified in a large, informative family with verbal and oral dyspraxia. Subsequently, many de novo and familial cases with a severe speech disorder associated with FOXP2 mutations have been reported. These mutations include sequencing alterations, translocations, uniparental disomy, and genomic copy number variants. We studied eight probands with speech disorder and their families. Family members were phenotyped using a comprehensive assessment of speech, oral motor function, language, literacy skills, and cognition. Coding regions of FOXP2 were screened to identify novel variants. Segregation of the variant was determined in the probands' families. Variants were identified in two probands. One child with severe motor speech disorder had a small de novo intragenic FOXP2 deletion. His phenotype included features of childhood apraxia of speech and dysarthria, oral motor dyspraxia, receptive and expressive language disorder, and literacy difficulties. The other variant was found in a family in two of three family members with stuttering, and also in the mother with oral motor impairment. This variant was considered a benign polymorphism as it was predicted to be non-pathogenic with in silico tools and found in database controls. This is the first report of a small intragenic deletion of FOXP2 that is likely to be the cause of severe motor speech disorder associated with language and literacy problems. Copyright © 2013 Wiley Periodicals, Inc.
Stability analysis by ERATO code

International Nuclear Information System (INIS)

Tsunematsu, Toshihide; Takeda, Tatsuoki; Matsuura, Toshihiko; Azumi, Masafumi; Kurita, Gen-ichi

1979-12-01

Problems in MHD stability calculations by ERATO code are described; which concern convergence property of results, equilibrium codes, and machine optimization of ERATO code. It is concluded that irregularity on a convergence curve is not due to a fault of the ERATO code itself but due to inappropriate choice of the equilibrium calculation meshes. Also described are a code to calculate an equilibrium as a quasi-inverse problem and a code to calculate an equilibrium as a result of a transport process. Optimization of the code with respect to I/O operations reduced both CPU time and I/O time considerably. With the FACOM230-75 APU/CPU multiprocessor system, the performance is about 6 times as high as with the FACOM230-75 CPU, showing the effectiveness of a vector processing computer for the kind of MHD computations. This report is a summary of the material presented at the ERATO workshop 1979(ORNL), supplemented with some details. (author)
Childhood apraxia of speech: A survey of praxis and typical speech characteristics.

Science.gov (United States)

Malmenholt, Ann; Lohmander, Anette; McAllister, Anita

2017-07-01

The purpose of this study was to investigate current knowledge of the diagnosis childhood apraxia of speech (CAS) in Sweden and compare speech characteristics and symptoms to those of earlier survey findings in mainly English-speakers. In a web-based questionnaire 178 Swedish speech-language pathologists (SLPs) anonymously answered questions about their perception of typical speech characteristics for CAS. They graded own assessment skills and estimated clinical occurrence. The seven top speech characteristics reported as typical for children with CAS were: inconsistent speech production (85%), sequencing difficulties (71%), oro-motor deficits (63%), vowel errors (62%), voicing errors (61%), consonant cluster deletions (54%), and prosodic disturbance (53%). Motor-programming deficits described as lack of automatization of speech movements were perceived by 82%. All listed characteristics were consistent with the American Speech-Language-Hearing Association (ASHA) consensus-based features, Strand's 10-point checklist, and the diagnostic model proposed by Ozanne. The mode for clinical occurrence was 5%. Number of suspected cases of CAS in the clinical caseload was approximately one new patient/year and SLP. The results support and add to findings from studies of CAS in English-speaking children with similar speech characteristics regarded as typical. Possibly, these findings could contribute to cross-linguistic consensus on CAS characteristics.
Comparison of results between different precision MAFIA codes

International Nuclear Information System (INIS)

Farkas, D.; Tice, B.

1990-01-01

In order to satisfy the inquiries of the Mafia Code users at SLAC, an evaluation of these codes was done. This consisted of running a cavity with known solutions. This study considered only the time independent solutions. No wake-field calculations were tried. The two machines involved were the NMFECC Cray (e-machine) at LLNL and the IBM/3081 at SLAC. The primary difference between the implementation of the codes on these machines is that the Cray has 64-bit accuracy while the IBM version has 32-bit accuracy. Unfortunately this study is incomplete as the Post-processor (P3) could not be made to work properly on the SLAC machine. This meant that no q's were calculated and no field patterns were generated. A certain amount of guessing had to be done when constructing the comparison tables. This problem aside, the probable conclusions that may be drawn are: (1) thirty-two bit precision is adequate for frequency determination; (2) sixty-four bit precision is desirable for field determination. This conclusion is deduced from the accuracy statistics. The cavity selected for study was a rectangular one with the dimensions (4,3,5) in centimeters. Only half of this cavity was used (2,3,5) with the x dimension being the one that was halved. The boundary conditions (B.C.) on the plane of symmetry were varied between Neumann and Dirichlet so as to cover all possible modes. Ten (10) modes were ran for each boundary condition
Strength Is in Numbers: Can Concordant Artificial Listeners Improve Prediction of Emotion from Speech?

Science.gov (United States)

Martinelli, Eugenio; Mencattini, Arianna; Daprati, Elena; Di Natale, Corrado

2016-01-01

Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present 'intelligent personal assistants', and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants' emotional state, selective/differential data collection based on emotional content, etc.).
Private speech in teacher-learner interactions in an EFL context: A sociocultural perspective

Directory of Open Access Journals (Sweden)

Nouzar Gheisari

2017-07-01

Full Text Available Theoretically framed within Vygotskyan sociocultural theory (SCT of mind, the present study investigated resurfacing of private speech markers by Iranian elementary female EFL learners in teacher-learner interactions. To this end, an elementary EFL class including 12 female learners and a same-sex teacher were selected as the participants of the study. As for the data, six 30-minute reading comprehension tasks with the interval of every two weeks were videotaped, while each participant was provided with a sensitive MP3 player to keep track of very low private speech markers. Instances of externalized private speech markers were coded and reports were generated for the patterns of private speech markers regarding their form and content. While a high number of literal translation, metalanguage, and switching to L1 mid-utterance were reported, the generated number of such private markers as self-directed questions, reading aloud, reviewing, and self-explanations in L2 was comparatively less which could be due to low L2 proficiency of the learners. The findings of the study, besides highlighting the importance of paying more attention to private speech as a mediating tool in cognitive regulation of learners in doing tasks in L2, suggest that teachers’ type of classroom practice is effective in production of private speech. Pedagogically speaking, the results suggest that instead of seeing L1 private speech markers as detrimental to L2 learning, they should be seen as signs of cognitive regulation when facing challenging tasks.
Evaluation of speech errors in Putonghua speakers with cleft palate: a critical review of methodology issues.

Science.gov (United States)

Jiang, Chenghui; Whitehill, Tara L

2014-04-01

Speech errors associated with cleft palate are well established for English and several other Indo-European languages. Few articles describing the speech of Putonghua (standard Mandarin Chinese) speakers with cleft palate have been published in English language journals. Although methodological guidelines have been published for the perceptual speech evaluation of individuals with cleft palate, there has been no critical review of methodological issues in studies of Putonghua speakers with cleft palate. A literature search was conducted to identify relevant studies published over the past 30 years in Chinese language journals. Only studies incorporating perceptual analysis of speech were included. Thirty-seven articles which met inclusion criteria were analyzed and coded on a number of methodological variables. Reliability was established by having all variables recoded for all studies. This critical review identified many methodological issues. These design flaws make it difficult to draw reliable conclusions about characteristic speech errors in this group of speakers. Specific recommendations are made to improve the reliability and validity of future studies, as well to facilitate cross-center comparisons.
Induction technology optimization code

International Nuclear Information System (INIS)

Caporaso, G.J.; Brooks, A.L.; Kirbie, H.C.

1992-01-01

A code has been developed to evaluate relative costs of induction accelerator driver systems for relativistic klystrons. The code incorporates beam generation, transport and pulsed power system constraints to provide an integrated design tool. The code generates an injector/accelerator combination which satisfies the top level requirements and all system constraints once a small number of design choices have been specified (rise time of the injector voltage and aspect ratio of the ferrite induction cores, for example). The code calculates dimensions of accelerator mechanical assemblies and values of all electrical components. Cost factors for machined parts, raw materials and components are applied to yield a total system cost. These costs are then plotted as a function of the two design choices to enable selection of an optimum design based on various criteria. (Author) 11 refs., 3 figs
Speech-specific audiovisual perception affects identification but not detection of speech

DEFF Research Database (Denmark)

Eskelund, Kasper; Andersen, Tobias

Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate...... of audiovisual integration specific to speech perception. However, the results of Tuomainen et al. might have been influenced by another effect. When observers were naïve, they had little motivation to look at the face. When informed, they knew that the face was relevant for the task and this could increase...... visual detection task. In our first experiment, observers presented with congruent and incongruent audiovisual sine-wave speech stimuli did only show a McGurk effect when informed of the speech nature of the stimulus. Performance on the secondary visual task was very good, thus supporting the finding...

Robust Transmission of Speech LSFs Using Hidden Markov Model-Based Multiple Description Index Assignments

Directory of Open Access Journals (Sweden)

Rondeau Paul

2008-01-01

Full Text Available Speech coding techniques capable of generating encoded representations which are robust against channel losses play an important role in enabling reliable voice communication over packet networks and mobile wireless systems. In this paper, we investigate the use of multiple description index assignments (MDIAs for loss-tolerant transmission of line spectral frequency (LSF coefficients, typically generated by state-of-the-art speech coders. We propose a simulated annealing-based approach for optimizing MDIAs for Markov-model-based decoders which exploit inter- and intraframe correlations in LSF coefficients to reconstruct the quantized LSFs from coded bit streams corrupted by channel losses. Experimental results are presented which compare the performance of a number of novel LSF transmission schemes. These results clearly demonstrate that Markov-model-based decoders, when used in conjunction with optimized MDIA, can yield average spectral distortion much lower than that produced by methods such as interleaving/interpolation, commonly used to combat the packet losses.
Robust Transmission of Speech LSFs Using Hidden Markov Model-Based Multiple Description Index Assignments

Directory of Open Access Journals (Sweden)

Pradeepa Yahampath

2008-03-01

Full Text Available Speech coding techniques capable of generating encoded representations which are robust against channel losses play an important role in enabling reliable voice communication over packet networks and mobile wireless systems. In this paper, we investigate the use of multiple description index assignments (MDIAs for loss-tolerant transmission of line spectral frequency (LSF coefficients, typically generated by state-of-the-art speech coders. We propose a simulated annealing-based approach for optimizing MDIAs for Markov-model-based decoders which exploit inter- and intraframe correlations in LSF coefficients to reconstruct the quantized LSFs from coded bit streams corrupted by channel losses. Experimental results are presented which compare the performance of a number of novel LSF transmission schemes. These results clearly demonstrate that Markov-model-based decoders, when used in conjunction with optimized MDIA, can yield average spectral distortion much lower than that produced by methods such as interleaving/interpolation, commonly used to combat the packet losses.
Panda code

International Nuclear Information System (INIS)

Altomare, S.; Minton, G.

1975-02-01

PANDA is a new two-group one-dimensional (slab/cylinder) neutron diffusion code designed to replace and extend the FAB series. PANDA allows for the nonlinear effects of xenon, enthalpy and Doppler. Fuel depletion is allowed. PANDA has a completely general search facility which will seek criticality, maximize reactivity, or minimize peaking. Any single parameter may be varied in a search. PANDA is written in FORTRAN IV, and as such is nearly machine independent. However, PANDA has been written with the present limitations of the Westinghouse CDC-6600 system in mind. Most computation loops are very short, and the code is less than half the useful 6600 memory size so that two jobs can reside in the core at once. (auth)
Speech-Language Therapy (For Parents)

Science.gov (United States)

... Staying Safe Videos for Educators Search English Español Speech-Language Therapy KidsHealth / For Parents / Speech-Language Therapy ... most kids with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech ...
Digital speech processing using Matlab

CERN Document Server

Gopi, E S

2014-01-01

Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.
Developmental apraxia of speech in children. Quantitive assessment of speech characteristics

NARCIS (Netherlands)

Thoonen, G.H.J.

1998-01-01

Developmental apraxia of speech (DAS) in children is a speech disorder, supposed to have a neurological origin, which is commonly considered to result from particular deficits in speech processing (i.e., phonological planning, motor programming). However, the label DAS has often been used as
Development of Fractal Pattern Making Application using L-System for Enhanced Machine Controller

Directory of Open Access Journals (Sweden)

Gunawan Alexander A S

2014-03-01

Full Text Available One big issue facing the industry today is an automated machine lack of flexibility for customization because it is designed by the manufacturers based on certain standards. In this research, it is developed customized application software for CNC (Computer Numerically Controlled machines using open source platform. The application is enable us to create designs by means of fractal patterns using L-System, developed by turtle geometry interpretation and Python programming languages. The result of the application is the G-Code of fractal pattern formed by the method of L-System. In the experiment on the CNC machine, the G-Code of fractal pattern which involving the branching structure has been able to run well.
Oil and gas field code master list, 1993

Energy Technology Data Exchange (ETDEWEB)

1993-12-16

This document contains data collected through October 1993 and provides standardized field name spellings and codes for all identified oil and/or gas fields in the United States. Other Federal and State government agencies, as well as industry, use the EIA Oil and Gas Field Code Master List as the standard for field identification. A machine-readable version of the Oil and Gas Field Code Master List is available from the National Technical Information Service.
Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech Emotion Recognition

Directory of Open Access Journals (Sweden)

Michalis Papakostas

2017-06-01

Full Text Available Emotion recognition from speech may play a crucial role in many applications related to human–computer interaction or understanding the affective state of users in certain tasks, where other modalities such as video or physiological parameters are unavailable. In general, a human’s emotions may be recognized using several modalities such as analyzing facial expressions, speech, physiological parameters (e.g., electroencephalograms, electrocardiograms etc. However, measuring of these modalities may be difficult, obtrusive or require expensive hardware. In that context, speech may be the best alternative modality in many practical applications. In this work we present an approach that uses a Convolutional Neural Network (CNN functioning as a visual feature extractor and trained using raw speech information. In contrast to traditional machine learning approaches, CNNs are responsible for identifying the important features of the input thus, making the need of hand-crafted feature engineering optional in many tasks. In this paper no extra features are required other than the spectrogram representations and hand-crafted features were only extracted for validation purposes of our method. Moreover, it does not require any linguistic model and is not specific to any particular language. We compare the proposed approach using cross-language datasets and demonstrate that it is able to provide superior results vs. traditional ones that use hand-crafted features.
Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages

Directory of Open Access Journals (Sweden)

Sadaoki Furui

2009-01-01

Full Text Available Text corpus size is an important issue when building a language model (LM. This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a machine-translated text corpus. Icelandic speech recognition experiments were performed using data, machine translated (MT from English to Icelandic on a word-by-word and sentence-by-sentence basis. LM interpolation using the baseline LM and an LM built from either word-by-word or sentence-by-sentence translated text reduced the word error rate significantly when manually obtained utterances used as a baseline were very sparse.
Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems.

Science.gov (United States)

Greene, Beth G; Logan, John S; Pisoni, David B

1986-03-01

We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.
Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems

Science.gov (United States)

GREENE, BETH G.; LOGAN, JOHN S.; PISONI, DAVID B.

2012-01-01

We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered. PMID:23225916
WORD SENSE DISAMBIGUATION FOR TAMIL LANGUAGE USING PART-OF-SPEECH AND CLUSTERING TECHNIQUE

Directory of Open Access Journals (Sweden)

P. ISWARYA

2017-09-01

Full Text Available Word sense disambiguation is an important task in Natural Language Processing (NLP, and this paper concentrates on the problem of target word selection in machine translation. The proposed method called enhanced Word Sense Disambiguation with Part-of-Speech and Clustering based Sensecollocation (WSDPCS consists of two steps namely (i Part-of-Speech (POS tagger in disambiguating word senses and (ii Enhanced with Clustering and Sense-collocation dictionary based disambiguation. In the first step an ambiguous Tamil words are disambiguated using Tamil and English POS Tagger. If it has same type of POS category labels, then it passes the word to the next step. In the second step ambiguity is resolved using sense-collocation dictionary. The experimental analysis shows that the accuracy of proposed WSDPCS method achieves 1.86% improvement over an existing method.
The speech perception skills of children with and without speech sound disorder.

Science.gov (United States)

Hearnshaw, Stephanie; Baker, Elise; Munro, Natalie

To investigate whether Australian-English speaking children with and without speech sound disorder (SSD) differ in their overall speech perception accuracy. Additionally, to investigate differences in the perception of specific phonemes and the association between speech perception and speech production skills. Twenty-five Australian-English speaking children aged 48-60 months participated in this study. The SSD group included 12 children and the typically developing (TD) group included 13 children. Children completed routine speech and language assessments in addition to an experimental Australian-English lexical and phonetic judgement task based on Rvachew's Speech Assessment and Interactive Learning System (SAILS) program (Rvachew, 2009). This task included eight words across four word-initial phonemes-/k, ɹ, ʃ, s/. Children with SSD showed significantly poorer perceptual accuracy on the lexical and phonetic judgement task compared with TD peers. The phonemes /ɹ/ and /s/ were most frequently perceived in error across both groups. Additionally, the phoneme /ɹ/ was most commonly produced in error. There was also a positive correlation between overall speech perception and speech production scores. Children with SSD perceived speech less accurately than their typically developing peers. The findings suggest that an Australian-English variation of a lexical and phonetic judgement task similar to the SAILS program is promising and worthy of a larger scale study. Copyright © 2017 Elsevier Inc. All rights reserved.
Speech Matters

DEFF Research Database (Denmark)

Hasse Jørgensen, Stina

2011-01-01

About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....
Hate speech

Directory of Open Access Journals (Sweden)

Anne Birgitta Nilsen

2014-12-01

Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the
Speech Inconsistency in Children with Childhood Apraxia of Speech, Language Impairment, and Speech Delay: Depends on the Stimuli

Science.gov (United States)

Iuzzini-Seigel, Jenya; Hogan, Tiffany P.; Green, Jordan R.

2017-01-01

Purpose: The current research sought to determine (a) if speech inconsistency is a core feature of childhood apraxia of speech (CAS) or if it is driven by comorbid language impairment that affects a large subset of children with CAS and (b) if speech inconsistency is a sensitive and specific diagnostic marker that can differentiate between CAS and…
The development of a speech act coding scheme to characterize communication patterns under an off-normal situation in nuclear power plants

International Nuclear Information System (INIS)

Kim, Seung Hwan; Park, Jin Kyun

2009-01-01

Since communication is an important means to exchange information between individuals/teams or auxiliary means to share resources and information given in the team and group activity, effective communication is the prerequisite for construct powerful teamwork by a sharing mental model. Therefore, unless communication is performed efficiently, the quality of task and performance of team lower. Furthermore, since communication is highly related to situation awareness during team activities, inappropriate communication causes a lack of situation awareness and tension and stress are intensified and errors are increased. According to lesson learned from several accidents that have actually occurred in nuclear power plant (NPP), consequence of accident leads most critical results and is more dangerous than those of other industries. In order to improve operator's cope ability and operation ability through simulation training with various off-normal condition, the operation groups are trained regularly every 6 months in the training center of reference NPP. The objective of this study is to suggest modified speech act coding scheme and to elucidate the communication pattern characteristics of an operator's conversation during an abnormal situation in NPP
A Machine Learning Perspective on Predictive Coding with PAQ

OpenAIRE

Knoll, Byron; de Freitas, Nando

2011-01-01

PAQ8 is an open source lossless data compression algorithm that currently achieves the best compression rates on many benchmarks. This report presents a detailed description of PAQ8 from a statistical machine learning perspective. It shows that it is possible to understand some of the modules of PAQ8 and use this understanding to improve the method. However, intuitive statistical explanations of the behavior of other modules remain elusive. We hope the description in this report will be a sta...
Governing sexual behaviour through humanitarian codes of conduct.

Science.gov (United States)

Matti, Stephanie

2015-10-01

Since 2001, there has been a growing consensus that sexual exploitation and abuse of intended beneficiaries by humanitarian workers is a real and widespread problem that requires governance. Codes of conduct have been promoted as a key mechanism for governing the sexual behaviour of humanitarian workers and, ultimately, preventing sexual exploitation and abuse (PSEA). This article presents a systematic study of PSEA codes of conduct adopted by humanitarian non-governmental organisations (NGOs) and how they govern the sexual behaviour of humanitarian workers. It draws on Foucault's analytics of governance and speech act theory to examine the findings of a survey of references to codes of conduct made on the websites of 100 humanitarian NGOs, and to analyse some features of the organisation-specific PSEA codes identified. © 2015 The Author(s). Disasters © Overseas Development Institute, 2015.

Paracantor: A two group, two region reactor code

Energy Technology Data Exchange (ETDEWEB)

Stone, Stuart

1956-07-01

Paracantor I a two energy group, two region, time independent reactor code, which obtains a closed solution for a critical reactor assembly. The code deals with cylindrical reactors of finite length and with a radial reflector of finite thickness. It is programmed for the 1.B.M: Magnetic Drum Data-Processing Machine, Type 650. The limited memory space available does not permit a flux solution to be included in the basic Paracantor code. A supplementary code, Paracantor 11, has been programmed which computes fluxes, .including adjoint fluxes, from the .output of Paracamtor I.
Ink-constrained halftoning with application to QR codes

Science.gov (United States)

Bayeh, Marzieh; Compaan, Erin; Lindsey, Theodore; Orlow, Nathan; Melczer, Stephen; Voller, Zachary

2014-01-01

This paper examines adding visually significant, human recognizable data into QR codes without affecting their machine readability by utilizing known methods in image processing. Each module of a given QR code is broken down into pixels, which are halftoned in such a way as to keep the QR code structure while revealing aspects of the secondary image to the human eye. The loss of information associated to this procedure is discussed, and entropy values are calculated for examples given in the paper. Numerous examples of QR codes with embedded images are included.
Clear Speech - Mere Speech? How segmental and prosodic speech reduction shape the impression that speakers create on listeners

DEFF Research Database (Denmark)

Niebuhr, Oliver

2017-01-01

of reduction levels and perceived speaker attributes in which moderate reduction can make a better impression on listeners than no reduction. In addition to its relevance in reduction models and theories, this interplay is instructive for various fields of speech application from social robotics to charisma...... whether variation in the degree of reduction also has a systematic effect on the attributes we ascribe to the speaker who produces the speech signal. A perception experiment was carried out for German in which 46 listeners judged whether or not speakers showing 3 different combinations of segmental...... and prosodic reduction levels (unreduced, moderately reduced, strongly reduced) are appropriately described by 13 physical, social, and cognitive attributes. The experiment shows that clear speech is not mere speech, and less clear speech is not just reduced either. Rather, results revealed a complex interplay...
Audiovisual Temporal Recalibration for Speech in Synchrony Perception and Speech Identification

Science.gov (United States)

Asakawa, Kaori; Tanaka, Akihiro; Imai, Hisato

We investigated whether audiovisual synchrony perception for speech could change after observation of the audiovisual temporal mismatch. Previous studies have revealed that audiovisual synchrony perception is re-calibrated after exposure to a constant timing difference between auditory and visual signals in non-speech. In the present study, we examined whether this audiovisual temporal recalibration occurs at the perceptual level even for speech (monosyllables). In Experiment 1, participants performed an audiovisual simultaneity judgment task (i.e., a direct measurement of the audiovisual synchrony perception) in terms of the speech signal after observation of the speech stimuli which had a constant audiovisual lag. The results showed that the “simultaneous” responses (i.e., proportion of responses for which participants judged the auditory and visual stimuli to be synchronous) at least partly depended on exposure lag. In Experiment 2, we adopted the McGurk identification task (i.e., an indirect measurement of the audiovisual synchrony perception) to exclude the possibility that this modulation of synchrony perception was solely attributable to the response strategy using stimuli identical to those of Experiment 1. The characteristics of the McGurk effect reported by participants depended on exposure lag. Thus, it was shown that audiovisual synchrony perception for speech could be modulated following exposure to constant lag both in direct and indirect measurement. Our results suggest that temporal recalibration occurs not only in non-speech signals but also in monosyllabic speech at the perceptual level.
Contribution of auditory working memory to speech understanding in mandarin-speaking cochlear implant users.

Science.gov (United States)

Tao, Duoduo; Deng, Rui; Jiang, Ye; Galvin, John J; Fu, Qian-Jie; Chen, Bing

2014-01-01

of voice pitch cues (albeit poorly coded by the CI) did not influence the relationship between working memory and speech perception.
Under-resourced speech recognition based on the speech manifold

CSIR Research Space (South Africa)

Sahraeian, R

2015-09-01

Full Text Available Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a...
PRACTICING SPEECH THERAPY INTERVENTION FOR SOCIAL INTEGRATION OF CHILDREN WITH SPEECH DISORDERS

Directory of Open Access Journals (Sweden)

Martin Ofelia POPESCU

2016-11-01

Full Text Available The article presents a concise speech correction intervention program in of dyslalia in conjunction with capacity development of intra, interpersonal and social integration of children with speech disorders. The program main objectives represent: the potential increasing of individual social integration by correcting speech disorders in conjunction with intra- and interpersonal capacity, the potential growth of children and community groups for social integration by optimizing the socio-relational context of children with speech disorder. In the program were included 60 children / students with dyslalia speech disorders (monomorphic and polymorphic dyslalia, from 11 educational institutions - 6 kindergartens and 5 schools / secondary schools, joined with inter-school logopedic centre (CLI from Targu Jiu city and areas of Gorj district. The program was implemented under the assumption that therapeutic-formative intervention to correct speech disorders and facilitate the social integration will lead, in combination with correct pronunciation disorders, to social integration optimization of children with speech disorders. The results conirm the hypothesis and gives facts about the intervention program eficiency.
APC-II: an electron beam propagation code

International Nuclear Information System (INIS)

Iwan, D.C.; Freeman, J.R.

1984-05-01

The computer code APC-II simulates the propagation of a relativistic electron beam through air. APC-II is an updated version of the APC envelope model code. It incorporates an improved conductivity model which significantly extends the range of stable calculations. A number of test cases show that these new models are capable of reproducing the simulations of the original APC code. As the result of a major restructuring and reprogramming of the code, APC-II is now friendly to both the occasional user and the experienced user who wishes to make modifications. Most of the code is in standard ANS-II Fortran 77 so that it can be easily transported between machines
Techniques and applications for binaural sound manipulation in human-machine interfaces

Science.gov (United States)

Begault, Durand R.; Wenzel, Elizabeth M.

1992-01-01

The implementation of binaural sound to speech and auditory sound cues (auditory icons) is addressed from both an applications and technical standpoint. Techniques overviewed include processing by means of filtering with head-related transfer functions. Application to advanced cockpit human interface systems is discussed, although the techniques are extendable to any human-machine interface. Research issues pertaining to three-dimensional sound displays under investigation at the Aerospace Human Factors Division at NASA Ames Research Center are described.
Schizophrenia alters intra-network functional connectivity in the caudate for detecting speech under informational speech masking conditions.

Science.gov (United States)

Zheng, Yingjun; Wu, Chao; Li, Juanhua; Li, Ruikeng; Peng, Hongjun; She, Shenglin; Ning, Yuping; Li, Liang

2018-04-04

Speech recognition under noisy "cocktail-party" environments involves multiple perceptual/cognitive processes, including target detection, selective attention, irrelevant signal inhibition, sensory/working memory, and speech production. Compared to health listeners, people with schizophrenia are more vulnerable to masking stimuli and perform worse in speech recognition under speech-on-speech masking conditions. Although the schizophrenia-related speech-recognition impairment under "cocktail-party" conditions is associated with deficits of various perceptual/cognitive processes, it is crucial to know whether the brain substrates critically underlying speech detection against informational speech masking are impaired in people with schizophrenia. Using functional magnetic resonance imaging (fMRI), this study investigated differences between people with schizophrenia (n = 19, mean age = 33 ± 10 years) and their matched healthy controls (n = 15, mean age = 30 ± 9 years) in intra-network functional connectivity (FC) specifically associated with target-speech detection under speech-on-speech-masking conditions. The target-speech detection performance under the speech-on-speech-masking condition in participants with schizophrenia was significantly worse than that in matched healthy participants (healthy controls). Moreover, in healthy controls, but not participants with schizophrenia, the strength of intra-network FC within the bilateral caudate was positively correlated with the speech-detection performance under the speech-masking conditions. Compared to controls, patients showed altered spatial activity pattern and decreased intra-network FC in the caudate. In people with schizophrenia, the declined speech-detection performance under speech-on-speech masking conditions is associated with reduced intra-caudate functional connectivity, which normally contributes to detecting target speech against speech masking via its functions of suppressing masking-speech signals.
Speech disorder prevention

Directory of Open Access Journals (Sweden)

Miladis Fornaris-Méndez

2017-04-01

Full Text Available Language therapy has trafficked from a medical focus until a preventive focus. However, difficulties are evidenced in the development of this last task, because he is devoted bigger space to the correction of the disorders of the language. Because the speech disorders is the dysfunction with more frequently appearance, acquires special importance the preventive work that is developed to avoid its appearance. Speech education since early age of the childhood makes work easier for prevent the appearance of speech disorders in the children. The present work has as objective to offer different activities for the prevention of the speech disorders.
Speech and Speech-Related Quality of Life After Late Palate Repair: A Patient's Perspective.

Science.gov (United States)

Schönmeyr, Björn; Wendby, Lisa; Sharma, Mitali; Jacobson, Lia; Restrepo, Carolina; Campbell, Alex

2015-07-01

Many patients with cleft palate deformities worldwide receive treatment at a later age than is recommended for normal speech to develop. The outcomes after late palate repairs in terms of speech and quality of life (QOL) still remain largely unstudied. In the current study, questionnaires were used to assess the patients' perception of speech and QOL before and after primary palate repair. All of the patients were operated at a cleft center in northeast India and had a cleft palate with a normal lip or with a cleft lip that had been previously repaired. A total of 134 patients (7-35 years) were interviewed preoperatively and 46 patients (7-32 years) were assessed in the postoperative survey. The survey showed that scores based on the speech handicap index, concerning speech and speech-related QOL, did not improve postoperatively. In fact, the questionnaires indicated that the speech became more unpredictable (P reported that their self-confidence had improved after the operation. Thus, the majority of interviewed patients who underwent late primary palate repair were satisfied with the surgery. At the same time, speech and speech-related QOL did not improve according to the speech handicap index-based survey. Speech predictability may even become worse and nasal regurgitation may increase after late palate repair, according to these results.
Visual Speech Fills in Both Discrimination and Identification of Non-Intact Auditory Speech in Children

Science.gov (United States)

Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Herve

2018-01-01

To communicate, children must discriminate and identify speech sounds. Because visual speech plays an important role in this process, we explored how visual speech influences phoneme discrimination and identification by children. Critical items had intact visual speech (e.g. baez) coupled to non-intact (excised onsets) auditory speech (signified…
Tackling the complexity in speech

DEFF Research Database (Denmark)

section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...
Temporal predictive mechanisms modulate motor reaction time during initiation and inhibition of speech and hand movement.

Science.gov (United States)

Johari, Karim; Behroozmand, Roozbeh

2017-08-01

Skilled movement is mediated by motor commands executed with extremely fine temporal precision. The question of how the brain incorporates temporal information to perform motor actions has remained unanswered. This study investigated the effect of stimulus temporal predictability on response timing of speech and hand movement. Subjects performed a randomized vowel vocalization or button press task in two counterbalanced blocks in response to temporally-predictable and unpredictable visual cues. Results indicated that speech and hand reaction time was decreased for predictable compared with unpredictable stimuli. This finding suggests that a temporal predictive code is established to capture temporal dynamics of sensory cues in order to produce faster movements in responses to predictable stimuli. In addition, results revealed a main effect of modality, indicating faster hand movement compared with speech. We suggest that this effect is accounted for by the inherent complexity of speech production compared with hand movement. Lastly, we found that movement inhibition was faster than initiation for both hand and speech, suggesting that movement initiation requires a longer processing time to coordinate activities across multiple regions in the brain. These findings provide new insights into the mechanisms of temporal information processing during initiation and inhibition of speech and hand movement. Copyright © 2017 Elsevier B.V. All rights reserved.
Speech in spinocerebellar ataxia.

Science.gov (United States)

Schalling, Ellika; Hartelius, Lena

2013-12-01

Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.
Spiritualist Writing Machines: Telegraphy, Typtology, Typewriting

Directory of Open Access Journals (Sweden)

Anthony Enns

2015-09-01

Full Text Available This paper examines how religious concepts both reflected and informed the development of new technologies for encoding, transmitting, and printing written information. While many spiritualist writing machines were based on existing technologies that were repurposed for spirit communication, others prefigured or even inspired more advanced technological innovations. The history of spiritualist writing machines thus not only represents a response to the rise of new media technologies in the nineteenth century, but it also reflects a set of cultural demands that helped to shape the development of new technologies, such as the need to replace handwriting with discrete, uniform lettering, which accelerated the speed of composition; the need to translate written information into codes, which could be transmitted across vast distances; and the need to automate the process of transmitting, translating, and transcribing written information, which seemed to endow the machines themselves with a certain degree of autonomy or even intelligence. While spiritualists and inventors were often (but not always motivated by different goals, the development of spiritualist writing machines and the development of technological writing machines were nevertheless deeply interrelated and interdependent.
Filtering, Coding, and Compression with Malvar Wavelets

Science.gov (United States)

1993-12-01

speech coding techniques being investigated by the military (38). Imagery: Space imagery often requires adaptive restoration to deblur out-of-focus...and blurred image, find an estimate of the ideal image using a priori information about the blur, noise , and the ideal image" (12). The research for...recording can be described as the original signal convolved with impulses , which appear as echoes in the seismic event. The term deconvolution indicates
Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

Science.gov (United States)

Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

2014-01-01

Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…
A causal test of the motor theory of speech perception: a case of impaired speech production and spared speech perception.

Science.gov (United States)

Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z

2015-01-01

The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.

Syrthes thermal code and Estet or N3S fluid mechanics codes coupling; Couplage du code de thermique Syrthes et des codes de mecanique des fluides N3S et ou Estet

Energy Technology Data Exchange (ETDEWEB)

Peniguel, C [Electricite de France (EDF), 78 - Chatou (France). Direction des Etudes et Recherches; Rupp, I [SIMULOG, 78 - Guyancourt (France)

1997-06-01

EDF has developed numerical codes for modeling the conductive, radiative and convective thermal transfers and their couplings in complex industrial configurations: the convection in a fluid is solved by Estet in finite volumes or N3S in finite elements, the conduction is solved by Syrthes and the wall-to-wall thermal radiation is modelled by Syrthes with the help of a radiosity method. Syrthes controls the different heat exchanges which may occur between fluid and solid domains, using an explicit iterative method. An extension of Syrthes has been developed in order to allow the consideration of configurations where several fluid codes operate simultaneously, using ``message passing`` tools such as PVM (Parallel Virtual Machine) and the Calcium code coupler developed at EDF. Application examples are given
The Relationship between Speech Production and Speech Perception Deficits in Parkinson's Disease

Science.gov (United States)

De Keyser, Kim; Santens, Patrick; Bockstael, Annelies; Botteldooren, Dick; Talsma, Durk; De Vos, Stefanie; Van Cauwenberghe, Mieke; Verheugen, Femke; Corthals, Paul; De Letter, Miet

2016-01-01

Purpose: This study investigated the possible relationship between hypokinetic speech production and speech intensity perception in patients with Parkinson's disease (PD). Method: Participants included 14 patients with idiopathic PD and 14 matched healthy controls (HCs) with normal hearing and cognition. First, speech production was objectified…
Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

Science.gov (United States)

Borrie, Stephanie A

2015-03-01

This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.
The treatment of apraxia of speech : Speech and music therapy, an innovative joint effort

NARCIS (Netherlands)

Hurkmans, Josephus Johannes Stephanus

2016-01-01

Apraxia of Speech (AoS) is a neurogenic speech disorder. A wide variety of behavioural methods have been developed to treat AoS. Various therapy programmes use musical elements to improve speech production. A unique therapy programme combining elements of speech therapy and music therapy is called
Tool set for distributed real-time machine control

Science.gov (United States)

Carrott, Andrew J.; Wright, Christopher D.; West, Andrew A.; Harrison, Robert; Weston, Richard H.

1997-01-01

Demands for increased control capabilities require next generation manufacturing machines to comprise intelligent building elements, physically located at the point where the control functionality is required. Networks of modular intelligent controllers are increasingly designed into manufacturing machines and usable standards are slowly emerging. To implement a control system using off-the-shelf intelligent devices from multi-vendor sources requires a number of well defined activities, including (a) the specification and selection of interoperable control system components, (b) device independent application programming and (c) device configuration, management, monitoring and control. This paper briefly discusses the support for the above machine lifecycle activities through the development of an integrated computing environment populated with an extendable software toolset. The toolset supports machine builder activities such as initial control logic specification, logic analysis, machine modeling, mechanical verification, application programming, automatic code generation, simulation/test, version control, distributed run-time support and documentation. The environment itself consists of system management tools and a distributed object-oriented database which provides storage for the outputs from machine lifecycle activities and specific target control solutions.
Motor Speech Phenotypes of Frontotemporal Dementia, Primary Progressive Aphasia, and Progressive Apraxia of Speech

Science.gov (United States)

Poole, Matthew L.; Brodtmann, Amy; Darby, David; Vogel, Adam P.

2017-01-01

Purpose: Our purpose was to create a comprehensive review of speech impairment in frontotemporal dementia (FTD), primary progressive aphasia (PPA), and progressive apraxia of speech in order to identify the most effective measures for diagnosis and monitoring, and to elucidate associations between speech and neuroimaging. Method: Speech and…
Quantum Virtual Machine (QVM)

Energy Technology Data Exchange (ETDEWEB)

2016-11-18

There is a lack of state-of-the-art HPC simulation tools for simulating general quantum computing. Furthermore, there are no real software tools that integrate current quantum computers into existing classical HPC workflows. This product, the Quantum Virtual Machine (QVM), solves this problem by providing an extensible framework for pluggable virtual, or physical, quantum processing units (QPUs). It enables the execution of low level quantum assembly codes and returns the results of such executions.
Defending Malicious Script Attacks Using Machine Learning Classifiers

Directory of Open Access Journals (Sweden)

Nayeem Khan

2017-01-01

Full Text Available The web application has become a primary target for cyber criminals by injecting malware especially JavaScript to perform malicious activities for impersonation. Thus, it becomes an imperative to detect such malicious code in real time before any malicious activity is performed. This study proposes an efficient method of detecting previously unknown malicious java scripts using an interceptor at the client side by classifying the key features of the malicious code. Feature subset was obtained by using wrapper method for dimensionality reduction. Supervised machine learning classifiers were used on the dataset for achieving high accuracy. Experimental results show that our method can efficiently classify malicious code from benign code with promising results.
An analysis of the masking of speech by competing speech using self-report data.

Science.gov (United States)

Agus, Trevor R; Akeroyd, Michael A; Noble, William; Bhullar, Navjot

2009-01-01

Many of the items in the "Speech, Spatial, and Qualities of Hearing" scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol. 43, 85-99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively higher scores for competing speech (six items), energetic masking (one item), and no masking (three items). The results suggest significant masking by competing speech in everyday listening situations.
Neural pathways for visual speech perception

Directory of Open Access Journals (Sweden)

Lynne E Bernstein

2014-12-01

Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.
Part-of-speech effects on text-to-speech synthesis

CSIR Research Space (South Africa)

Schlunz, GI

2010-11-01

Full Text Available One of the goals of text-to-speech (TTS) systems is to produce natural-sounding synthesised speech. Towards this end various natural language processing (NLP) tasks are performed to model the prosodic aspects of the TTS voice. One of the fundamental...
Strength Is in Numbers: Can Concordant Artificial Listeners Improve Prediction of Emotion from Speech?

Directory of Open Access Journals (Sweden)

Eugenio Martinelli

Full Text Available Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present 'intelligent personal assistants', and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc. and for applications in the research domain (such as real-time pairing of stimuli to participants' emotional state, selective/differential data collection based on emotional content, etc..
75 FR 26701 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...

Science.gov (United States)

2010-05-12

...] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities... proposed compensation rates for Interstate TRS, Speech-to-Speech Services (STS), Captioned Telephone... costs reported in the data submitted to NECA by VRS providers. In this regard, document DA 10-761 also...
[Non-speech oral motor treatment efficacy for children with developmental speech sound disorders].

Science.gov (United States)

Ygual-Fernandez, A; Cervera-Merida, J F

2016-01-01

In the treatment of speech disorders by means of speech therapy two antagonistic methodological approaches are applied: non-verbal ones, based on oral motor exercises (OME), and verbal ones, which are based on speech processing tasks with syllables, phonemes and words. In Spain, OME programmes are called 'programas de praxias', and are widely used and valued by speech therapists. To review the studies conducted on the effectiveness of OME-based treatments applied to children with speech disorders and the theoretical arguments that could justify, or not, their usefulness. Over the last few decades evidence has been gathered about the lack of efficacy of this approach to treat developmental speech disorders and pronunciation problems in populations without any neurological alteration of motor functioning. The American Speech-Language-Hearing Association has advised against its use taking into account the principles of evidence-based practice. The knowledge gathered to date on motor control shows that the pattern of mobility and its corresponding organisation in the brain are different in speech and other non-verbal functions linked to nutrition and breathing. Neither the studies on their effectiveness nor the arguments based on motor control studies recommend the use of OME-based programmes for the treatment of pronunciation problems in children with developmental language disorders.
Breaking the language barrier: machine assisted diagnosis using the medical speech translator.

Science.gov (United States)

Starlander, Marianne; Bouillon, Pierrette; Rayner, Manny; Chatzichrisafis, Nikos; Hockey, Beth Ann; Isahara, Hitoshi; Kanzaki, Kyoko; Nakao, Yukie; Santaholma, Marianne

2005-01-01

In this paper, we describe and evaluate an Open Source medical speech translation system (MedSLT) intended for safety-critical applications. The aim of this system is to eliminate the language barriers in emergency situation. It translates spoken questions from English into French, Japanese and Finnish in three medical subdomains (headache, chest pain and abdominal pain), using a vocabulary of about 250-400 words per sub-domain. The architecture is a compromise between fixed-phrase translation on one hand and complex linguistically-based systems on the other. Recognition is guided by a Context Free Grammar Language Model compiled from a general unification grammar, automatically specialised for the domain. We present an evaluation of this initial prototype that shows the advantages of this grammar-based approach for this particular translation task in term of both reliability and use.
75 FR 54040 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...

Science.gov (United States)

2010-09-03

...] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...; speech-to-speech (STS); pay-per-call (900) calls; types of calls; and equal access to interexchange... of a report, due April 16, 2011, addressing whether it is necessary for the waivers to remain in...
Does the Holland Code Predict Job Satisfaction and Productivity in Clothing Factory Workers?

Science.gov (United States)

Heesacker, Martin; And Others

1988-01-01

Administered Self-Directed Search to sewing machine operators to determine Holland code, and assessed work productivity, job satisfaction, absenteeism, and insurance claims. Most workers were of the Social code. Social subjects were the most satisfied, Conventional and Realistic subjects next, and subjects of other codes less so. Productivity of…
Environmental Contamination of Normal Speech.

Science.gov (United States)

Harley, Trevor A.

1990-01-01

Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…
Computer codes for the calculation of vibrations in machines and structures

International Nuclear Information System (INIS)

1989-01-01

After an introductory paper on the typical requirements to be met by vibration calculations, the first two sections of the conference papers present universal as well as specific finite-element codes tailored to solve individual problems. The calculation of dynamic processes increasingly now in addition to the finite elements applies the method of multi-component systems which takes into account rigid bodies or partial structures and linking and joining elements. This method, too, is explained referring to universal computer codes and to special versions. In mechanical engineering, rotary vibrations are a major problem, and under this topic, conference papers exclusively deal with codes that also take into account special effects such as electromechanical coupling, non-linearities in clutches, etc. (orig./HP) [de
Emotionally conditioning the target-speech voice enhances recognition of the target speech under "cocktail-party" listening conditions.

Science.gov (United States)

Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang

2018-05-01

Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.

Objective Prediction of Hearing Aid Benefit Across Listener Groups Using Machine Learning: Speech Recognition Performance With Binaural Noise-Reduction Algorithms.

Science.gov (United States)

Schädler, Marc R; Warzybok, Anna; Kollmeier, Birger

2018-01-01

The simulation framework for auditory discrimination experiments (FADE) was adopted and validated to predict the individual speech-in-noise recognition performance of listeners with normal and impaired hearing with and without a given hearing-aid algorithm. FADE uses a simple automatic speech recognizer (ASR) to estimate the lowest achievable speech reception thresholds (SRTs) from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Empirical data from the literature were used to evaluate the model in terms of predicted SRTs and benefits in SRT with the German matrix sentence recognition test when using eight single- and multichannel binaural noise-reduction algorithms. To allow individual predictions of SRTs in binaural conditions, the model was extended with a simple better ear approach and individualized by taking audiograms into account. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for a group of normal-hearing listeners and predicted the corresponding benefits with a root-mean-square prediction error of 0.6 dB. This highlights the potential of the approach for the objective assessment of benefits in SRT without prior knowledge about the empirical data. The predictions for the group of listeners with impaired hearing explained 75% of the empirical variance, while the individual predictions explained less than 25%. Possibly, additional individual factors should be considered for more accurate predictions with impaired hearing. A competing talker condition clearly showed one limitation of current ASR technology, as the empirical performance with SRTs lower than -20 dB could not be predicted.
Integrating speech in time depends on temporal expectancies and attention.

Science.gov (United States)

Scharinger, Mathias; Steinberg, Johanna; Tavano, Alessandro

2017-08-01

Sensory information that unfolds in time, such as in speech perception, relies on efficient chunking mechanisms in order to yield optimally-sized units for further processing. Whether or not two successive acoustic events receive a one-unit or a two-unit interpretation seems to depend on the fit between their temporal extent and a stipulated temporal window of integration. However, there is ongoing debate on how flexible this temporal window of integration should be, especially for the processing of speech sounds. Furthermore, there is no direct evidence of whether attention may modulate the temporal constraints on the integration window. For this reason, we here examine how different word durations, which lead to different temporal separations of sound onsets, interact with attention. In an Electroencephalography (EEG) study, participants actively and passively listened to words where word-final consonants were occasionally omitted. Words had either a natural duration or were artificially prolonged in order to increase the separation of speech sound onsets. Omission responses to incomplete speech input, originating in left temporal cortex, decreased when the critical speech sound was separated from previous sounds by more than 250 msec, i.e., when the separation was larger than the stipulated temporal window of integration (125-150 msec). Attention, on the other hand, only increased omission responses for stimuli with natural durations. We complemented the event-related potential (ERP) analyses by a frequency-domain analysis on the stimulus presentation rate. Notably, the power of stimulation frequency showed the same duration and attention effects than the omission responses. We interpret these findings on the background of existing research on temporal integration windows and further suggest that our findings may be accounted for within the framework of predictive coding. Copyright © 2017 Elsevier Ltd. All rights reserved.
Perceived Liveliness and Speech Comprehensibility in Aphasia: The Effects of Direct Speech in Auditory Narratives

Science.gov (United States)

Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike

2014-01-01

Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in "healthy" communication direct speech constructions contribute to the liveliness, and indirectly to the comprehensibility, of speech.…
Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model

Directory of Open Access Journals (Sweden)

Lotter Thomas

2005-01-01

Full Text Available This contribution presents two spectral amplitude estimators for acoustical background noise suppression based on maximum a posteriori estimation and super-Gaussian statistical modelling of the speech DFT amplitudes. The probability density function of the speech spectral amplitude is modelled with a simple parametric function, which allows a high approximation accuracy for Laplace- or Gamma-distributed real and imaginary parts of the speech DFT coefficients. Also, the statistical model can be adapted to optimally fit the distribution of the speech spectral amplitudes for a specific noise reduction system. Based on the super-Gaussian statistical model, computationally efficient maximum a posteriori speech estimators are derived, which outperform the commonly applied Ephraim-Malah algorithm.
Exploring the role of brain oscillations in speech perception in noise: Intelligibility of isochronously retimed speech

Directory of Open Access Journals (Sweden)

Vincent Aubanel

2016-08-01

Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.
ClearTK 2.0: Design Patterns for Machine Learning in UIMA.

Science.gov (United States)

Bethard, Steven; Ogren, Philip; Becker, Lee

2014-05-01

ClearTK adds machine learning functionality to the UIMA framework, providing wrappers to popular machine learning libraries, a rich feature extraction library that works across different classifiers, and utilities for applying and evaluating machine learning models. Since its inception in 2008, ClearTK has evolved in response to feedback from developers and the community. This evolution has followed a number of important design principles including: conceptually simple annotator interfaces, readable pipeline descriptions, minimal collection readers, type system agnostic code, modules organized for ease of import, and assisting user comprehension of the complex UIMA framework.
Ear, Hearing and Speech

DEFF Research Database (Denmark)

Poulsen, Torben

2000-01-01

An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...
Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech and music

Directory of Open Access Journals (Sweden)

Hwee Ling eLee

2014-08-01

Full Text Available This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogues of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms. Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past three years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.
Assembly processor program converts symbolic programming language to machine language

Science.gov (United States)

Pelto, E. V.

1967-01-01

Assembly processor program converts symbolic programming language to machine language. This program translates symbolic codes into computer understandable instructions, assigns locations in storage for successive instructions, and computer locations from symbolic addresses.
The development of a speech act coding scheme to characterize communication patterns under an off-normal situation in nuclear power plants

Energy Technology Data Exchange (ETDEWEB)

Kim, Seung Hwan; Park, Jin Kyun [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

2009-05-15

Since communication is an important means to exchange information between individuals/teams or auxiliary means to share resources and information given in the team and group activity, effective communication is the prerequisite for construct powerful teamwork by a sharing mental model. Therefore, unless communication is performed efficiently, the quality of task and performance of team lower. Furthermore, since communication is highly related to situation awareness during team activities, inappropriate communication causes a lack of situation awareness and tension and stress are intensified and errors are increased. According to lesson learned from several accidents that have actually occurred in nuclear power plant (NPP), consequence of accident leads most critical results and is more dangerous than those of other industries. In order to improve operator's cope ability and operation ability through simulation training with various off-normal condition, the operation groups are trained regularly every 6 months in the training center of reference NPP. The objective of this study is to suggest modified speech act coding scheme and to elucidate the communication pattern characteristics of an operator's conversation during an abnormal situation in NPP.
TESLA: Large Signal Simulation Code for Klystrons

International Nuclear Information System (INIS)

Vlasov, Alexander N.; Cooke, Simon J.; Chernin, David P.; Antonsen, Thomas M. Jr.; Nguyen, Khanh T.; Levush, Baruch

2003-01-01

TESLA (Telegraphist's Equations Solution for Linear Beam Amplifiers) is a new code designed to simulate linear beam vacuum electronic devices with cavities, such as klystrons, extended interaction klystrons, twistrons, and coupled cavity amplifiers. The model includes a self-consistent, nonlinear solution of the three-dimensional electron equations of motion and the solution of time-dependent field equations. The model differs from the conventional Particle in Cell approach in that the field spectrum is assumed to consist of a carrier frequency and its harmonics with slowly varying envelopes. Also, fields in the external cavities are modeled with circuit like equations and couple to fields in the beam region through boundary conditions on the beam tunnel wall. The model in TESLA is an extension of the model used in gyrotron code MAGY. The TESLA formulation has been extended to be capable to treat the multiple beam case, in which each beam is transported inside its own tunnel. The beams interact with each other as they pass through the gaps in their common cavities. The interaction is treated by modification of the boundary conditions on the wall of each tunnel to include the effect of adjacent beams as well as the fields excited in each cavity. The extended version of TESLA for the multiple beam case, TESLA-MB, has been developed for single processor machines, and can run on UNIX machines and on PC computers with a large memory (above 2GB). The TESLA-MB algorithm is currently being modified to simulate multiple beam klystrons on multiprocessor machines using the MPI (Message Passing Interface) environment. The code TESLA has been verified by comparison with MAGIC for single and multiple beam cases. The TESLA code and the MAGIC code predict the same power within 1% for a simple two cavity klystron design while the computational time for TESLA is orders of magnitude less than for MAGIC 2D. In addition, recently TESLA was used to model the L-6048 klystron, code
Effect of gap detection threshold on consistency of speech in children with speech sound disorder.

Science.gov (United States)

Sayyahi, Fateme; Soleymani, Zahra; Akbari, Mohammad; Bijankhan, Mahmood; Dolatshahi, Behrooz

2017-02-01

The present study examined the relationship between gap detection threshold and speech error consistency in children with speech sound disorder. The participants were children five to six years of age who were categorized into three groups of typical speech, consistent speech disorder (CSD) and inconsistent speech disorder (ISD).The phonetic gap detection threshold test was used for this study, which is a valid test comprised six syllables with inter-stimulus intervals between 20-300ms. The participants were asked to listen to the recorded stimuli three times and indicate whether they heard one or two sounds. There was no significant difference between the typical and CSD groups (p=0.55), but there were significant differences in performance between the ISD and CSD groups and the ISD and typical groups (p=0.00). The ISD group discriminated between speech sounds at a higher threshold. Children with inconsistent speech errors could not distinguish speech sounds during time-limited phonetic discrimination. It is suggested that inconsistency in speech is a representation of inconsistency in auditory perception, which causes by high gap detection threshold. Copyright © 2016 Elsevier Ltd. All rights reserved.
Categorical speech processing in Broca's area: an fMRI study using multivariate pattern-based analysis.

Science.gov (United States)

Lee, Yune-Sang; Turkeltaub, Peter; Granger, Richard; Raizada, Rajeev D S

2012-03-14

Although much effort has been directed toward understanding the neural basis of speech processing, the neural processes involved in the categorical perception of speech have been relatively less studied, and many questions remain open. In this functional magnetic resonance imaging (fMRI) study, we probed the cortical regions mediating categorical speech perception using an advanced brain-mapping technique, whole-brain multivariate pattern-based analysis (MVPA). Normal healthy human subjects (native English speakers) were scanned while they listened to 10 consonant-vowel syllables along the /ba/-/da/ continuum. Outside of the scanner, individuals' own category boundaries were measured to divide the fMRI data into /ba/ and /da/ conditions per subject. The whole-brain MVPA revealed that Broca's area and the left pre-supplementary motor area evoked distinct neural activity patterns between the two perceptual categories (/ba/ vs /da/). Broca's area was also found when the same analysis was applied to another dataset (Raizada and Poldrack, 2007), which previously yielded the supramarginal gyrus using a univariate adaptation-fMRI paradigm. The consistent MVPA findings from two independent datasets strongly indicate that Broca's area participates in categorical speech perception, with a possible role of translating speech signals into articulatory codes. The difference in results between univariate and multivariate pattern-based analyses of the same data suggest that processes in different cortical areas along the dorsal speech perception stream are distributed on different spatial scales.
Speech Perception as a Multimodal Phenomenon

OpenAIRE

Rosenblum, Lawrence D.

2008-01-01

Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal s...
Poor Speech Perception Is Not a Core Deficit of Childhood Apraxia of Speech: Preliminary Findings

Science.gov (United States)

Zuk, Jennifer; Iuzzini-Seigel, Jenya; Cabbage, Kathryn; Green, Jordan R.; Hogan, Tiffany P.

2018-01-01

Purpose: Childhood apraxia of speech (CAS) is hypothesized to arise from deficits in speech motor planning and programming, but the influence of abnormal speech perception in CAS on these processes is debated. This study examined speech perception abilities among children with CAS with and without language impairment compared to those with…
Direct Simulation Monte Carlo (DSMC) on the Connection Machine

International Nuclear Information System (INIS)

Wong, B.C.; Long, L.N.

1992-01-01

The massively parallel computer Connection Machine is utilized to map an improved version of the direct simulation Monte Carlo (DSMC) method for solving flows with the Boltzmann equation. The kinetic theory is required for analyzing hypersonic aerospace applications, and the features and capabilities of the DSMC particle-simulation technique are discussed. The DSMC is shown to be inherently massively parallel and data parallel, and the algorithm is based on molecule movements, cross-referencing their locations, locating collisions within cells, and sampling macroscopic quantities in each cell. The serial DSMC code is compared to the present parallel DSMC code, and timing results show that the speedup of the parallel version is approximately linear. The correct physics can be resolved from the results of the complete DSMC method implemented on the connection machine using the data-parallel approach. 41 refs
Dynamic analysis of the BPX machine structure

International Nuclear Information System (INIS)

Dahlgen, F.; Citrolo, J.; Knutson, D.; Kalish, M.

1992-01-01

A preliminary analysis of the response of the BPX machine structure to a seismic input was performed. MSC/NASTRAN 5 , a general purpose XXX element computer code, has been used. The purpose of this paper is to assess the probable range of seismically induced stresses and deflections in the machine substructure which connects the machine to the test cell floor, with particular emphasis on the shear pins which will be used to attach the TF coil modules to the machine substructure (for a more detailed description of the shear pins and structure see ref. 4 in these proceedings). The model was developed with sufficient detail to be used subsequently to investigate the transient response to various dynamic loading conditions imposed on the structure by the PF, TF, and Vacuum Vessel, during normal and off-normal operations. The model does not include the mass and stiffness of the building or the building-soil interaction and as such can only be considered an interim assessment of the dynamic response of the machine to the S.S.E.(this is the Safe Shutdown Earthquake which is also the Design XXX Earthquake for all major structural components)
The Neural Bases of Difficult Speech Comprehension and Speech Production: Two Activation Likelihood Estimation (ALE) Meta-Analyses

Science.gov (United States)

Adank, Patti

2012-01-01

The role of speech production mechanisms in difficult speech comprehension is the subject of on-going debate in speech science. Two Activation Likelihood Estimation (ALE) analyses were conducted on neuroimaging studies investigating difficult speech comprehension or speech production. Meta-analysis 1 included 10 studies contrasting comprehension…
Metaheuristic applications to speech enhancement

CERN Document Server

Kunche, Prajna

2016-01-01

This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.
Systematic Studies of Modified Vocalization: The Effect of Speech Rate on Speech Production Measures during Metronome-Paced Speech in Persons Who Stutter

Science.gov (United States)

Davidow, Jason H.

2014-01-01

Background: Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control…

TongueToSpeech (TTS): Wearable wireless assistive device for augmented speech.

Science.gov (United States)

Marjanovic, Nicholas; Piccinini, Giacomo; Kerr, Kevin; Esmailbeigi, Hananeh

2017-07-01

Speech is an important aspect of human communication; individuals with speech impairment are unable to communicate vocally in real time. Our team has developed the TongueToSpeech (TTS) device with the goal of augmenting speech communication for the vocally impaired. The proposed device is a wearable wireless assistive device that incorporates a capacitive touch keyboard interface embedded inside a discrete retainer. This device connects to a computer, tablet or a smartphone via Bluetooth connection. The developed TTS application converts text typed by the tongue into audible speech. Our studies have concluded that an 8-contact point configuration between the tongue and the TTS device would yield the best user precision and speed performance. On average using the TTS device inside the oral cavity takes 2.5 times longer than the pointer finger using a T9 (Text on 9 keys) keyboard configuration to type the same phrase. In conclusion, we have developed a discrete noninvasive wearable device that allows the vocally impaired individuals to communicate in real time.
Social eye gaze modulates processing of speech and co-speech gesture.

Science.gov (United States)

Holler, Judith; Schubotz, Louise; Kelly, Spencer; Hagoort, Peter; Schuetze, Manuela; Özyürek, Aslı

2014-12-01

In human face-to-face communication, language comprehension is a multi-modal, situated activity. However, little is known about how we combine information from different modalities during comprehension, and how perceived communicative intentions, often signaled through visual signals, influence this process. We explored this question by simulating a multi-party communication context in which a speaker alternated her gaze between two recipients. Participants viewed speech-only or speech+gesture object-related messages when being addressed (direct gaze) or unaddressed (gaze averted to other participant). They were then asked to choose which of two object images matched the speaker's preceding message. Unaddressed recipients responded significantly more slowly than addressees for speech-only utterances. However, perceiving the same speech accompanied by gestures sped unaddressed recipients up to a level identical to that of addressees. That is, when unaddressed recipients' speech processing suffers, gestures can enhance the comprehension of a speaker's message. We discuss our findings with respect to two hypotheses attempting to account for how social eye gaze may modulate multi-modal language comprehension. Copyright © 2014 Elsevier B.V. All rights reserved.
Making extreme computations possible with virtual machines

International Nuclear Information System (INIS)

Reuter, J.; Chokoufe Nejad, B.

2016-02-01

State-of-the-art algorithms generate scattering amplitudes for high-energy physics at leading order for high-multiplicity processes as compiled code (in Fortran, C or C++). For complicated processes the size of these libraries can become tremendous (many GiB). We show that amplitudes can be translated to byte-code instructions, which even reduce the size by one order of magnitude. The byte-code is interpreted by a Virtual Machine with runtimes comparable to compiled code and a better scaling with additional legs. We study the properties of this algorithm, as an extension of the Optimizing Matrix Element Generator (O'Mega). The bytecode matrix elements are available as alternative input for the event generator WHIZARD. The bytecode interpreter can be implemented very compactly, which will help with a future implementation on massively parallel GPUs.
High Order Tensor Formulation for Convolutional Sparse Coding

KAUST Repository

Bibi, Adel Aamer; Ghanem, Bernard

2017-01-01

Convolutional sparse coding (CSC) has gained attention for its successful role as a reconstruction and a classification tool in the computer vision and machine learning community. Current CSC methods can only reconstruct singlefeature 2D images
A Support Vector Machine-Based Gender Identification Using Speech Signal

Science.gov (United States)

Lee, Kye-Hwan; Kang, Sang-Ick; Kim, Deok-Hwan; Chang, Joon-Hyuk

We propose an effective voice-based gender identification method using a support vector machine (SVM). The SVM is a binary classification algorithm that classifies two groups by finding the voluntary nonlinear boundary in a feature space and is known to yield high classification performance. In the present work, we compare the identification performance of the SVM with that of a Gaussian mixture model (GMM)-based method using the mel frequency cepstral coefficients (MFCC). A novel approach of incorporating a features fusion scheme based on a combination of the MFCC and the fundamental frequency is proposed with the aim of improving the performance of gender identification. Experimental results demonstrate that the gender identification performance using the SVM is significantly better than that of the GMM-based scheme. Moreover, the performance is substantially improved when the proposed features fusion technique is applied.
Indonesian Stock Prediction using Support Vector Machine (SVM

Directory of Open Access Journals (Sweden)

Santoso Murtiyanto

2018-01-01

Full Text Available This project is part of developing software to provide predictive information technology-based services artificial intelligence (Machine Intelligence or Machine Learning that will be utilized in the money market community. The prediction method used in this early stages uses the combination of Gaussian Mixture Model and Support Vector Machine with Python programming. The system predicts the price of Astra International (stock code: ASII.JK stock data. The data used was taken during 17 yr period of January 2000 until September 2017. Some data was used for training/modeling (80 % of data and the remainder (20 % was used for testing. An integrated model comprising Gaussian Mixture Model and Support Vector Machine system has been tested to predict stock market of ASII.JK for l d in advance. This model has been compared with the Market Cummulative Return. From the results, it is depicts that the Gaussian Mixture Model-Support Vector Machine based stock predicted model, offers significant improvement over the compared models resulting sharpe ratio of 3.22.
Electrophysiological evidence for speech-specific audiovisual integration.

Science.gov (United States)

Baart, Martijn; Stekelenburg, Jeroen J; Vroomen, Jean

2014-01-01

Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were modulated by phonetic audiovisual congruency. In order to disentangle speech-specific (phonetic) integration from non-speech integration, we used Sine-Wave Speech (SWS) that was perceived as speech by half of the participants (they were in speech-mode), while the other half was in non-speech mode. Results showed that the N1 obtained with audiovisual stimuli peaked earlier than the N1 evoked by auditory-only stimuli. This lip-read induced speeding up of the N1 occurred for listeners in speech and non-speech mode. In contrast, if listeners were in speech-mode, lip-read speech also modulated the auditory P2, but not if listeners were in non-speech mode, thus revealing speech-specific audiovisual binding. Comparing ERPs for phonetically congruent audiovisual stimuli with ERPs for incongruent stimuli revealed an effect of phonetic stimulus congruency that started at ~200 ms after (in)congruence became apparent. Critically, akin to the P2 suppression, congruency effects were only observed if listeners were in speech mode, and not if they were in non-speech mode. Using identical stimuli, we thus confirm that audiovisual binding involves (partially) different neural mechanisms for sound processing in speech and non-speech mode. © 2013 Published by Elsevier Ltd.
GAME: GAlaxy Machine learning for Emission lines

Science.gov (United States)

Ucci, G.; Ferrara, A.; Pallottini, A.; Gallerani, S.

2018-06-01

We present an updated, optimized version of GAME (GAlaxy Machine learning for Emission lines), a code designed to infer key interstellar medium physical properties from emission line intensities of ultraviolet /optical/far-infrared galaxy spectra. The improvements concern (a) an enlarged spectral library including Pop III stars, (b) the inclusion of spectral noise in the training procedure, and (c) an accurate evaluation of uncertainties. We extensively validate the optimized code and compare its performance against empirical methods and other available emission line codes (PYQZ and HII-CHI-MISTRY) on a sample of 62 SDSS stacked galaxy spectra and 75 observed HII regions. Very good agreement is found for metallicity. However, ionization parameters derived by GAME tend to be higher. We show that this is due to the use of too limited libraries in the other codes. The main advantages of GAME are the simultaneous use of all the measured spectral lines and the extremely short computational times. We finally discuss the code potential and limitations.
GAME: GAlaxy Machine learning for Emission lines

Science.gov (United States)

Ucci, G.; Ferrara, A.; Pallottini, A.; Gallerani, S.

2018-03-01

We present an updated, optimized version of GAME (GAlaxy Machine learning for Emission lines), a code designed to infer key interstellar medium physical properties from emission line intensities of UV/optical/far infrared galaxy spectra. The improvements concern: (a) an enlarged spectral library including Pop III stars; (b) the inclusion of spectral noise in the training procedure, and (c) an accurate evaluation of uncertainties. We extensively validate the optimized code and compare its performance against empirical methods and other available emission line codes (pyqz and HII-CHI-mistry) on a sample of 62 SDSS stacked galaxy spectra and 75 observed HII regions. Very good agreement is found for metallicity. However, ionization parameters derived by GAME tend to be higher. We show that this is due to the use of too limited libraries in the other codes. The main advantages of GAME are the simultaneous use of all the measured spectral lines, and the extremely short computational times. We finally discuss the code potential and limitations.
Free Speech Yearbook 1978.

Science.gov (United States)

Phifer, Gregg, Ed.

The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…
Scaling gysela code beyond 32K-cores on bluegene/Q***

Directory of Open Access Journals (Sweden)

Bigot J.

2013-12-01

Full Text Available Gyrokinetic simulations lead to huge computational needs. Up to now, the semi- Lagrangian code Gysela performed large simulations using a few thousands cores (8k cores typically. Simulation with finer resolutions and with kinetic electrons are expected to increase those needs by a huge factor, providing a good example of applications requiring Exascale machines. This paper presents our work to improve Gysela in order to target an architecture that presents one possible way towards Exascale: the Blue Gene/Q. After analyzing the limitations of the code on this architecture, we have implemented three kinds of improvement: computational performance improvements, memory consumption improvements and disk i/o improvements. As a result, we show that the code now scales beyond 32k cores with much improved performances. This will make it possible to target the most powerful machines available and thus handle much larger physical cases.
Objective Prediction of Hearing Aid Benefit Across Listener Groups Using Machine Learning: Speech Recognition Performance With Binaural Noise-Reduction Algorithms

Science.gov (United States)

Schädler, Marc R.; Warzybok, Anna; Kollmeier, Birger

2018-01-01

The simulation framework for auditory discrimination experiments (FADE) was adopted and validated to predict the individual speech-in-noise recognition performance of listeners with normal and impaired hearing with and without a given hearing-aid algorithm. FADE uses a simple automatic speech recognizer (ASR) to estimate the lowest achievable speech reception thresholds (SRTs) from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Empirical data from the literature were used to evaluate the model in terms of predicted SRTs and benefits in SRT with the German matrix sentence recognition test when using eight single- and multichannel binaural noise-reduction algorithms. To allow individual predictions of SRTs in binaural conditions, the model was extended with a simple better ear approach and individualized by taking audiograms into account. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for a group of normal-hearing listeners and predicted the corresponding benefits with a root-mean-square prediction error of 0.6 dB. This highlights the potential of the approach for the objective assessment of benefits in SRT without prior knowledge about the empirical data. The predictions for the group of listeners with impaired hearing explained 75% of the empirical variance, while the individual predictions explained less than 25%. Possibly, additional individual factors should be considered for more accurate predictions with impaired hearing. A competing talker condition clearly showed one limitation of current ASR technology, as the empirical performance with SRTs lower than −20 dB could not be predicted. PMID:29692200
Joint Dictionary Learning-Based Non-Negative Matrix Factorization for Voice Conversion to Improve Speech Intelligibility After Oral Surgery.

Science.gov (United States)

Fu, Szu-Wei; Li, Pei-Chun; Lai, Ying-Hui; Yang, Cheng-Chien; Hsieh, Li-Chun; Tsao, Yu

2017-11-01

Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient's speech may be distorted and difficult to understand. To overcome this problem, VC methods can be applied to convert the distorted speech such that it is clear and more intelligible. To design an effective VC method, two key points must be considered: 1) the amount of training data may be limited (because speaking for a long time is usually difficult for postoperative patients); 2) rapid conversion is desirable (for better communication). Methods: We propose a novel joint dictionary learning based non-negative matrix factorization (JD-NMF) algorithm. Compared to conventional VC techniques, JD-NMF can perform VC efficiently and effectively with only a small amount of training data. Results: The experimental results demonstrate that the proposed JD-NMF method not only achieves notably higher short-time objective intelligibility (STOI) scores (a standardized objective intelligibility evaluation metric) than those obtained using the original unconverted speech but is also significantly more efficient and effective than a conventional exemplar-based NMF VC method. Conclusion: The proposed JD-NMF method may outperform the state-of-the-art exemplar-based NMF VC method in terms of STOI scores under the desired scenario. Significance: We confirmed the advantages of the proposed joint training criterion for the NMF-based VC. Moreover, we verified that the proposed JD-NMF can effectively improve the speech intelligibility scores of oral surgery patients. Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient
Visual context enhanced. The joint contribution of iconic gestures and visible speech to degraded speech comprehension.

NARCIS (Netherlands)

Drijvers, L.; Özyürek, A.

2017-01-01

Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech
Two-Level Semantics and Code Generation

DEFF Research Database (Denmark)

Nielson, Flemming; Nielson, Hanne Riis

1988-01-01

A two-level denotational metalanguage that is suitable for defining the semantics of Pascal-like languages is presented. The two levels allow for an explicit distinction between computations taking place at compile-time and computations taking place at run-time. While this distinction is perhaps...... not absolutely necessary for describing the input-output semantics of programming languages, it is necessary when issues such as data flow analysis and code generation are considered. For an example stack-machine, the authors show how to generate code for the run-time computations and still perform the compile...
Computational Approaches Reveal New Insights into Regulation and Function of Non; coding RNAs and their Targets

KAUST Repository

Alam, Tanvir

2016-01-01

Regulation and function of protein-coding genes are increasingly well-understood, but no comparable evidence exists for non-coding RNA (ncRNA) genes, which appear to be more numerous than protein-coding genes. We developed a novel machine
Beam Dynamics Studies in Recirculating Machines

CERN Document Server

Pellegrini, Dario; Latina, A

The LHeC and the CLIC Drive Beam share not only the high-current beams that make them prone to show instabilities, but also unconventional lattice topologies and operational schemes in which the time sequence of the bunches varies along the machine. In order to asses the feasibility of these projects, realistic simulations taking into account the most worrisome effects and their interplays, are crucial. These include linear and non-linear optics with time dependent elements, incoherent and coherent synchrotron radiation, short and long-range wakefields, beam-beam effect and ion cloud. In order to investigate multi-bunch effects in recirculating machines, a new version of the tracking code PLACET has been developed from scratch. PLACET2, already integrates most of the effects mentioned before and can easily receive additional physics. Its innovative design allows to describe complex lattices and track one or more bunches accordingly to the machine operation, reproducing the bunch train splitting and recombinat...
Creating a Multi-axis Machining Postprocessor

Directory of Open Access Journals (Sweden)

Petr Vavruška

2012-01-01

Full Text Available This paper focuses on the postprocessor creation process. When using standard commercially available postprocessors it is often very difficult to modify its internal source code, and it is a very complex process, in many cases even impossible, to implement the newly-developed functions. It is therefore very important to have a method for creating a postprocessor for any CAM system, which allows CL data (Cutter Location data to be generated to a separate text file. The goal of our work is to verify the proposed method for creating a postprocessor. Postprocessor functions for multi-axis machiningare dealt with in this work. A file with CL data must be translated by the postprocessor into an NC program that has been customized for a specific production machine and its control system. The postprocessor is therefore verified by applications for machining free-form surfaces of complex parts, and by executing the NC programs that are generated on real machine tools. This is also presented here.
Multisensory integration of speech sounds with letters vs. visual speech : only visual speech induces the mismatch negativity

NARCIS (Netherlands)

Stekelenburg, J.J.; Keetels, M.N.; Vroomen, J.H.M.

2018-01-01

Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect.
Speech Research

Science.gov (United States)

Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

Represented Speech in Qualitative Health Research

DEFF Research Database (Denmark)

Musaeus, Peter

2017-01-01

Represented speech refers to speech where we reference somebody. Represented speech is an important phenomenon in everyday conversation, health care communication, and qualitative research. This case will draw first from a case study on physicians’ workplace learning and second from a case study...... on nurses’ apprenticeship learning. The aim of the case is to guide the qualitative researcher to use own and others’ voices in the interview and to be sensitive to represented speech in everyday conversation. Moreover, reported speech matters to health professionals who aim to represent the voice...... of their patients. Qualitative researchers and students might learn to encourage interviewees to elaborate different voices or perspectives. Qualitative researchers working with natural speech might pay attention to how people talk and use represented speech. Finally, represented speech might be relevant...
Spectral integration in speech and non-speech sounds

Science.gov (United States)

Jacewicz, Ewa

2005-04-01

Spectral integration (or formant averaging) was proposed in vowel perception research to account for the observation that a reduction of the intensity of one of two closely spaced formants (as in /u/) produced a predictable shift in vowel quality [Delattre et al., Word 8, 195-210 (1952)]. A related observation was reported in psychoacoustics, indicating that when the components of a two-tone periodic complex differ in amplitude and frequency, its perceived pitch is shifted toward that of the more intense tone [Helmholtz, App. XIV (1875/1948)]. Subsequent research in both fields focused on the frequency interval that separates these two spectral components, in an attempt to determine the size of the bandwidth for spectral integration to occur. This talk will review the accumulated evidence for and against spectral integration within the hypothesized limit of 3.5 Bark for static and dynamic signals in speech perception and psychoacoustics. Based on similarities in the processing of speech and non-speech sounds, it is suggested that spectral integration may reflect a general property of the auditory system. A larger frequency bandwidth, possibly close to 3.5 Bark, may be utilized in integrating acoustic information, including speech, complex signals, or sound quality of a violin.
SSCTRK: A particle tracking code for the SSC

International Nuclear Information System (INIS)

Ritson, D.

1990-07-01

While many indirect methods are available to evaluate dynamic aperture there appears at this time to be no reliable substitute to tracking particles through realistic machine lattices for a number of turns determined by the storage times. Machine lattices are generated by ''Monte Carlo'' techniques from the expected rms fabrication and survey errors. Any given generated machine can potentially be a lucky or unlucky fluctuation from the average. Therefore simulation to serve as a predictor of future performance must be done for an ensemble of generated machines. Further, several amplitudes and momenta are necessary to predict machine performance. Thus to make Monte Carlo type simulations for the SSC requires very considerable computer resources. Hitherto, it has been assumed that this was not feasible, and alternative indirect methods have been proposed or tried to answer the problem. We reexamined the feasibility of using direct computation. Previous codes have represented lattices by a succession of thin elements separated by bend-drifts. With ''kick-drift'' configurations, tracking time is linear in the multipole order included, and the code is symplectic. Modern vector processors simultaneously handle a large number of cases in parallel. Combining the efficiencies of kick drift tracking with vector processing, in fact, makes realistic Monte Carlo simulation entirely feasible. SSCTRK uses the above features. It is structured to have a very friendly interface, a very wide latitude of choice for cases to be run in parallel, and, by using pure FORTRAN 77, to interchangeably run on a wide variety of computers. We describe in this paper the program structure operational checks and results achieved
Measurement of speech parameters in casual speech of dementia patients

NARCIS (Netherlands)

Ossewaarde, Roelant; Jonkers, Roel; Jalvingh, Fedor; Bastiaanse, Yvonne

Measurement of speech parameters in casual speech of dementia patients Roelant Adriaan Ossewaarde1,2, Roel Jonkers1, Fedor Jalvingh1,3, Roelien Bastiaanse1 1CLCG, University of Groningen (NL); 2HU University of Applied Sciences Utrecht (NL); 33St. Marienhospital - Vechta, Geriatric Clinic Vechta
Parallel processing Monte Carlo radiation transport codes

International Nuclear Information System (INIS)

McKinney, G.W.

1994-01-01

Issues related to distributed-memory multiprocessing as applied to Monte Carlo radiation transport are discussed. Measurements of communication overhead are presented for the radiation transport code MCNP which employs the communication software package PVM, and average efficiency curves are provided for a homogeneous virtual machine
Analysis of the steady-state operation of vacuum systems for fusion machines

International Nuclear Information System (INIS)

Roose, T.R.; Hoffman, M.A.; Carlson, G.A.

1975-01-01

A computer code named GASBAL was written to calculate the steady-state vacuum system performance of multi-chamber mirror machines as well as rather complex conventional multichamber vacuum systems. Application of the code, with some modifications, to the quasi-steady tokamak operating period should also be possible. Basically, GASBAL analyzes free molecular gas flow in a system consisting of a central chamber (the plasma chamber) connected by conductances to an arbitrary number of one- or two-chamber peripheral tanks. Each of the peripheral tanks may have vacuum pumping capability (pumping speed), sources of cold gas, and sources of energetic atoms. The central chamber may have actual vacuum pumping capability, as well as a plasma capable of ionizing injected atoms and impinging gas molecules and ''pumping'' them to a peripheral chamber. The GASBAL code was used in the preliminary design of a large mirror machine experiment--LLL's MX
Tangent: Automatic Differentiation Using Source Code Transformation in Python

OpenAIRE

van Merriënboer, Bart; Wiltschko, Alexander B.; Moldovan, Dan

2017-01-01

Automatic differentiation (AD) is an essential primitive for machine learning programming systems. Tangent is a new library that performs AD using source code transformation (SCT) in Python. It takes numeric functions written in a syntactic subset of Python and NumPy as input, and generates new Python functions which calculate a derivative. This approach to automatic differentiation is different from existing packages popular in machine learning, such as TensorFlow and Autograd. Advantages ar...
Numerical spin tracking in a synchrotron computer code Spink: Examples (RHIC)

International Nuclear Information System (INIS)

Luccio, A.

1995-01-01

In the course of acceleration of polarized protons in a synchrotron, many depolarizing resonances are encountered. They are classified in two categories: Intrinsic resonances that depend on the lattice structure of the ring and arise from the coupling of betatron oscillations with horizontal magnetic fields, and imperfection resonances caused by orbit distortions due to field errors. In general, the spectrum of resonances vs spin tune Gγ(G = 1.7928, the proton gyromagnetic anomaly, and y the proton relativistic energy ratio) for a given lattice tune ν, or vs ν for a given Gγ, contains a multitude of lines with various amplitudes or resonance strengths. The depolarization due to the resonance lines can be studied by numerically tracking protons with spin in a model accelerator. Tracking will allow one to check the strength of resonances, to study the effects of devices like Siberian Snakes, to find safe lattice tune regions where to operate, and finally to study in detail the operation of special devices such as Spin Flippers. A few computer codes exist that calculate resonance strengths E k and perform tracking, for proton and electron machines. Most relevant to our work for the AGS and RHIC machines are the programs Depol and Snake. Depol, calculates the E k 's by Fourier analysis. The input to Depol is the output of a machine model code, such as Synch or Mad, containing all details of the lattice. Snake, does the tracking, starting from a synthetic machine, that contains a certain number of periods, of FODO cells, of Siberian snakes, etc. We believed the complexities of machines like the AGS or RHIC could not be adequately represented by Snake. Then, we decided to write a new code, Spink, that combines some of the features of Depol and Snake. I.E., Spink reads a Mad output like Depol and tracks as Snake does. The structure of the code and examples for RHIC are described in the following
Study on intelligent processing system of man-machine interactive garment frame model

Science.gov (United States)

Chen, Shuwang; Yin, Xiaowei; Chang, Ruijiang; Pan, Peiyun; Wang, Xuedi; Shi, Shuze; Wei, Zhongqian

2018-05-01

A man-machine interactive garment frame model intelligent processing system is studied in this paper. The system consists of several sensor device, voice processing module, mechanical parts and data centralized acquisition devices. The sensor device is used to collect information on the environment changes brought by the body near the clothes frame model, the data collection device is used to collect the information of the environment change induced by the sensor device, voice processing module is used for speech recognition of nonspecific person to achieve human-machine interaction, mechanical moving parts are used to make corresponding mechanical responses to the information processed by data collection device.it is connected with data acquisition device by a means of one-way connection. There is a one-way connection between sensor device and data collection device, two-way connection between data acquisition device and voice processing module. The data collection device is one-way connection with mechanical movement parts. The intelligent processing system can judge whether it needs to interact with the customer, realize the man-machine interaction instead of the current rigid frame model.
Development of The Viking Speech Scale to classify the speech of children with cerebral palsy.

Science.gov (United States)

Pennington, Lindsay; Virella, Daniel; Mjøen, Tone; da Graça Andrada, Maria; Murray, Janice; Colver, Allan; Himmelmann, Kate; Rackauskaite, Gija; Greitane, Andra; Prasauskiene, Audrone; Andersen, Guro; de la Cruz, Javier

2013-10-01

Surveillance registers monitor the prevalence of cerebral palsy and the severity of resulting impairments across time and place. The motor disorders of cerebral palsy can affect children's speech production and limit their intelligibility. We describe the development of a scale to classify children's speech performance for use in cerebral palsy surveillance registers, and its reliability across raters and across time. Speech and language therapists, other healthcare professionals and parents classified the speech of 139 children with cerebral palsy (85 boys, 54 girls; mean age 6.03 years, SD 1.09) from observation and previous knowledge of the children. Another group of health professionals rated children's speech from information in their medical notes. With the exception of parents, raters reclassified children's speech at least four weeks after their initial classification. Raters were asked to rate how easy the scale was to use and how well the scale described the child's speech production using Likert scales. Inter-rater reliability was moderate to substantial (k>.58 for all comparisons). Test-retest reliability was substantial to almost perfect for all groups (k>.68). Over 74% of raters found the scale easy or very easy to use; 66% of parents and over 70% of health care professionals judged the scale to describe children's speech well or very well. We conclude that the Viking Speech Scale is a reliable tool to describe the speech performance of children with cerebral palsy, which can be applied through direct observation of children or through case note review. Copyright © 2013 Elsevier Ltd. All rights reserved.
Visual Context Enhanced: The Joint Contribution of Iconic Gestures and Visible Speech to Degraded Speech Comprehension

Science.gov (United States)

Drijvers, Linda; Ozyurek, Asli

2017-01-01

Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. Method:…
Wavelet Packet Entropy in Speaker-Independent Emotional State Detection from Speech Signal

Directory of Open Access Journals (Sweden)

Mina Kadkhodaei Elyaderani

2015-01-01

Full Text Available In this paper, wavelet packet entropy is proposed for speaker-independent emotion detection from speech. After pre-processing, wavelet packet decomposition using wavelet type db3 at level 4 is calculated and Shannon entropy in its nodes is calculated to be used as feature. In addition, prosodic features such as first four formants, jitter or pitch deviation amplitude, and shimmer or energy variation amplitude besides MFCC features are applied to complete the feature vector. Then, Support Vector Machine (SVM is used to classify the vectors in multi-class (all emotions or two-class (each emotion versus normal state format. 46 different utterances of a single sentence from Berlin Emotional Speech Dataset are selected. These are uttered by 10 speakers in sadness, happiness, fear, boredom, anger, and normal emotional state. Experimental results show that proposed features can improve emotional state detection accuracy in multi-class situation. Furthermore, adding to other features wavelet entropy coefficients increase the accuracy of two-class detection for anger, fear, and happiness.
Speech enhancement using emotion dependent codebooks

NARCIS (Netherlands)

Naidu, D.H.R.; Srinivasan, S.

2012-01-01

Several speech enhancement approaches utilize trained models of clean speech data, such as codebooks, Gaussian mixtures, and hidden Markov models. These models are typically trained on neutral clean speech data, without any emotion. However, in practical scenarios, emotional speech is a common
Advanced man-machine interaction. Fundamentals and implementation

Energy Technology Data Exchange (ETDEWEB)

Kraiss, K.F. (ed.) [Aachen Technische Hochschule (Germany). Lehrstuhl fuer Technische Informatik und Computerwissenschaften

2006-07-01

Man-machine interaction is the gateway providing access to functions and services, which, due to the ever increasing complexity of smart systems, threatens to become a bottleneck. This book therefore introduces not only advanced interfacing concepts, but also gives insight into the related theoretical background.This refers mainly to the realization of video-based multimodal interaction via gesture, mimics, and speech, but also to interacting with virtual object in virtual environments, cooperating with local or remote robots, and user assistance. While most publications in the field of human factors engineering focus on interface design, this book puts special emphasis on implementation aspects. To this end it is accompanied by software development environments for image processing, classification, and virtual environment implementation. In addition a test data base is included for gestures, head pose, facial expressions, full-body person recognition, and people tracking. These data are used for the examples throughout the book, but are also meant to encourage the reader to start experimentation on his own. Thus the book may serve as a self-contained introduction both for researchers and developers of man-machine interfaces. It may also be used for graduate-level university courses. (orig.)
Linguistic contributions to speech-on-speech masking for native and non-native listeners: Language familiarity and semantic content

Science.gov (United States)

Brouwer, Susanne; Van Engen, Kristin J.; Calandruccio, Lauren; Bradlow, Ann R.

2012-01-01

This study examined whether speech-on-speech masking is sensitive to variation in the degree of similarity between the target and the masker speech. Three experiments investigated whether speech-in-speech recognition varies across different background speech languages (English vs Dutch) for both English and Dutch targets, as well as across variation in the semantic content of the background speech (meaningful vs semantically anomalous sentences), and across variation in listener status vis-à-vis the target and masker languages (native, non-native, or unfamiliar). The results showed that the more similar the target speech is to the masker speech (e.g., same vs different language, same vs different levels of semantic content), the greater the interference on speech recognition accuracy. Moreover, the listener’s knowledge of the target and the background language modulate the size of the release from masking. These factors had an especially strong effect on masking effectiveness in highly unfavorable listening conditions. Overall this research provided evidence that that the degree of target-masker similarity plays a significant role in speech-in-speech recognition. The results also give insight into how listeners assign their resources differently depending on whether they are listening to their first or second language. PMID:22352516
Speech-specificity of two audiovisual integration effects

DEFF Research Database (Denmark)

Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

2010-01-01

Seeing the talker’s articulatory mouth movements can influence the auditory speech percept both in speech identification and detection tasks. Here we show that these audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers...... often fail to perceive as speech. While audiovisual integration in the identification task only occurred when observers were informed of the speech-like nature of SWS, integration occurred in the detection task both for informed and naïve observers. This shows that both speech-specific and general...... mechanisms underlie audiovisual integration of speech....
Recognizing speech in a novel accent: the motor theory of speech perception reframed.

Science.gov (United States)

Moulin-Frier, Clément; Arbib, Michael A

2013-08-01

The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory.
Advocate: A Distributed Architecture for Speech-to-Speech Translation

Science.gov (United States)

2009-01-01

tecture, are either wrapped natural-language processing ( NLP ) components or objects developed from scratch using the architecture’s API. GATE is...framework, we put together a demonstration Arabic -to- English speech translation system using both internally developed ( Arabic speech recognition and MT...conditions of our Arabic S2S demonstration system described earlier. Once again, the data size was varied and eighty identical requests were
Shot-by-shot spectrum model for rod-pinch, pulsed radiography machines

Directory of Open Access Journals (Sweden)

Wm M. Wood

2018-02-01

Full Text Available A simplified model of bremsstrahlung production is developed for determining the x-ray spectrum output of a rod-pinch radiography machine, on a shot-by-shot basis, using the measured voltage, V(t, and current, I(t. The motivation for this model is the need for an agile means of providing shot-by-shot spectrum prediction, from a laptop or desktop computer, for quantitative radiographic analysis. Simplifying assumptions are discussed, and the model is applied to the Cygnus rod-pinch machine. Output is compared to wedge transmission data for a series of radiographs from shots with identical target objects. Resulting model enables variation of parameters in real time, thus allowing for rapid optimization of the model across many shots. “Goodness of fit” is compared with output from LSP Particle-In-Cell code, as well as the Monte Carlo Neutron Propagation with Xrays (“MCNPX” model codes, and is shown to provide an excellent predictive representation of the spectral output of the Cygnus machine. Improvements to the model, specifically for application to other geometries, are discussed.
Shot-by-shot spectrum model for rod-pinch, pulsed radiography machines

Science.gov (United States)

Wood, Wm M.

2018-02-01

A simplified model of bremsstrahlung production is developed for determining the x-ray spectrum output of a rod-pinch radiography machine, on a shot-by-shot basis, using the measured voltage, V(t), and current, I(t). The motivation for this model is the need for an agile means of providing shot-by-shot spectrum prediction, from a laptop or desktop computer, for quantitative radiographic analysis. Simplifying assumptions are discussed, and the model is applied to the Cygnus rod-pinch machine. Output is compared to wedge transmission data for a series of radiographs from shots with identical target objects. Resulting model enables variation of parameters in real time, thus allowing for rapid optimization of the model across many shots. "Goodness of fit" is compared with output from LSP Particle-In-Cell code, as well as the Monte Carlo Neutron Propagation with Xrays ("MCNPX") model codes, and is shown to provide an excellent predictive representation of the spectral output of the Cygnus machine. Improvements to the model, specifically for application to other geometries, are discussed.

Extraction of state machines of legacy C code with Cpp2XMI

NARCIS (Netherlands)

Brand, van den M.G.J.; Serebrenik, A.; Zeeland, van D.; Serebrenik, A.

2008-01-01

Analysis of legacy code is often focussed on extracting either metrics or relations, e.g. call relations or structure relations. For object-oriented programs, e.g. Java or C++ code, such relations are commonly represented as UML diagrams: e.g., such tools as Columbus [1] and Cpp2XMI [2] are capable
Communicative performance of adolescents with severe speech impairment: influence of context.

Science.gov (United States)

Dalton, B M; Bedrosian, J L

1989-08-01

The communicative performance of 4 preoperational-level adolescents, using limited speech, gestures, and communication board techniques, was examined in a two-part investigation. In Part 1, each subject participated in an academic interaction with a teacher in a therapy room. Data were transcribed and coded for communication mode, function, and role. Two subjects were found to predominantly use the speech mode, while the remaining 2 predominantly used board and one other mode. The majority of productions consisted of responses to requests, and the initiator role was infrequently occupied. These findings were similar to those reported in previous investigations conducted in classroom settings. In Part 2, another examination of the communicative performance of these subjects was conducted in spontaneous interactions involving speaking and nonspeaking peers in a therapy room. Using the same data analysis procedures, gesture and speech modes predominated for 3 of the subjects in the nonspeaking peer interactions. The remaining subject exhibited minimal interaction. No consistent pattern of mode usage was exhibited across the speaking peer interactions. In the nonspeaking peer interactions, request predominated. In contrast, a variety of communication functions was exhibited in the speaking peer interactions. Both the initiator and the maintainer roles were occupied in the majority of interactions. Pertinent variables and clinical implications are discussed.
Machine learning in Python essential techniques for predictive analysis

CERN Document Server

Bowles, Michael

2015-01-01

Learn a simpler and more effective way to analyze data and predict outcomes with Python Machine Learning in Python shows you how to successfully analyze data using only two core machine learning algorithms, and how to apply them using Python. By focusing on two algorithm families that effectively predict outcomes, this book is able to provide full descriptions of the mechanisms at work, and the examples that illustrate the machinery with specific, hackable code. The algorithms are explained in simple terms with no complex math and applied using Python, with guidance on algorithm selection, d
Multilingual Practices in Contemporary and Historical Contexts: Interfaces between Code-Switching and Translation

Science.gov (United States)

Kolehmainen, Leena; Skaffari, Janne

2016-01-01

This article serves as an introduction to a collection of four articles on multilingual practices in speech and writing, exploring both contemporary and historical sources. It not only introduces the articles but also discusses the scope and definitions of code-switching, attitudes towards multilingual interaction and, most pertinently, the…
Where humans meet machines innovative solutions for knotty natural-language problems

CERN Document Server

Markowitz, Judith

2013-01-01

Where Humans Meet Machines: Innovative Solutions for Knotty Natural-Language Problems brings humans and machines closer together by showing how linguistic complexities that confound the speech systems of today can be handled effectively by sophisticated natural-language technology. Some of the most vexing natural-language problems that are addressed in this book entail recognizing and processing idiomatic expressions, understanding metaphors, matching an anaphor correctly with its antecedent, performing word-sense disambiguation, and handling out-of-vocabulary words and phrases. This fourteen-chapter anthology consists of contributions from industry scientists and from academicians working at major universities in North America and Europe. They include researchers who have played a central role in DARPA-funded programs and developers who craft real-world solutions for corporations. These contributing authors analyze the role of natural language technology in the global marketplace; they explore the need f...
Using the Speech Transmission Index for predicting non-native speech intelligibility

NARCIS (Netherlands)

Wijngaarden, S.J. van; Bronkhorst, A.W.; Houtgast, T.; Steeneken, H.J.M.

2004-01-01

While the Speech Transmission Index ~STI! is widely applied for prediction of speech intelligibility in room acoustics and telecommunication engineering, it is unclear how to interpret STI values when non-native talkers or listeners are involved. Based on subjectively measured psychometric functions
Speech Planning Happens before Speech Execution: Online Reaction Time Methods in the Study of Apraxia of Speech

Science.gov (United States)

Maas, Edwin; Mailend, Marja-Liisa

2012-01-01

Purpose: The purpose of this article is to present an argument for the use of online reaction time (RT) methods to the study of apraxia of speech (AOS) and to review the existing small literature in this area and the contributions it has made to our fundamental understanding of speech planning (deficits) in AOS. Method: Following a brief…
Predicting speech intelligibility in adverse conditions: evaluation of the speech-based envelope power spectrum model

DEFF Research Database (Denmark)

Jørgensen, Søren; Dau, Torsten

2011-01-01

conditions by comparing predictions to measured data from [Kjems et al. (2009). J. Acoust. Soc. Am. 126 (3), 1415-1426] where speech is mixed with four different interferers, including speech-shaped noise, bottle noise, car noise, and cafe noise. The model accounts well for the differences in intelligibility......The speech-based envelope power spectrum model (sEPSM) [Jørgensen and Dau (2011). J. Acoust. Soc. Am., 130 (3), 1475–1487] estimates the envelope signal-to-noise ratio (SNRenv) of distorted speech and accurately describes the speech recognition thresholds (SRT) for normal-hearing listeners...... observed for the different interferers. None of the standardized models successfully describe these data....
STATE-OF-THE-ART TASKS AND ACHIEVEMENTS OF PARALINGUISTIC SPEECH ANALYSIS SYSTEMS

Directory of Open Access Journals (Sweden)

A. A. Karpov

2016-07-01

Full Text Available We present analytical survey of state-of-the-art actual tasks in the area of computational paralinguistics, as well as the recent achievements of automatic systems for paralinguistic analysis of conversational speech. Paralinguistics studies non-verbal aspects of human communication and speech such as: natural emotions, accents, psycho-physiological states, pronunciation features, speaker’s voice parameters, etc. We describe architecture of a baseline computer system for acoustical paralinguistic analysis, its main components and useful speech processing methods. We present some information on an International contest called Computational Paralinguistics Challenge (ComParE, which is held each year since 2009 in the framework of the International conference INTERSPEECH organized by the International Speech Communication Association. We present sub-challenges (tasks that were proposed at the ComParE Challenges in 2009-2016, and analyze winning computer systems for each sub-challenge and obtained results. The last completed ComParE-2015 Challenge was organized in September 2015 in Germany and proposed 3 sub-challenges: 1 Degree of Nativeness (DN sub-challenge, determination of nativeness degree of speakers based on acoustics; 2 Parkinson's Condition (PC sub-challenge, recognition of a degree of Parkinson’s condition based on speech analysis; 3 Eating Condition (EC sub-challenge, determination of the eating condition state during speaking or a dialogue, and classification of consumed food type (one of seven classes of food by the speaker. In the last sub-challenge (EC, the winner was a joint Turkish-Russian team consisting of the authors of the given paper. We have developed the most efficient computer-based system for detection and classification of the corresponding (EC acoustical paralinguistic events. The paper deals with the architecture of this system, its main modules and methods, as well as the description of used training and evaluation
Cleft Audit Protocol for Speech (CAPS-A): A Comprehensive Training Package for Speech Analysis

Science.gov (United States)

Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.

2009-01-01

Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…
Perceived liveliness and speech comprehensibility in aphasia : the effects of direct speech in auditory narratives

NARCIS (Netherlands)

Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike

2014-01-01

Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in 'healthy' communication direct speech constructions contribute to the liveliness, and indirectly to
Machine Learning-Based Content Analysis: Automating the analysis of frames and agendas in political communication research

NARCIS (Netherlands)

Burscher, B.

2016-01-01

We used machine learning to study policy issues and frames in political messages. With regard to frames, we investigated the automation of two content-analytical tasks: frame coding and frame identification. We found that both tasks can be successfully automated by means of machine learning
Preschool speech intelligibility and vocabulary skills predict long-term speech and language outcomes following cochlear implantation in early childhood.

Science.gov (United States)

Castellanos, Irina; Kronenberger, William G; Beer, Jessica; Henning, Shirley C; Colson, Bethany G; Pisoni, David B

2014-07-01

Speech and language measures during grade school predict adolescent speech-language outcomes in children who receive cochlear implants (CIs), but no research has examined whether speech and language functioning at even younger ages is predictive of long-term outcomes in this population. The purpose of this study was to examine whether early preschool measures of speech and language performance predict speech-language functioning in long-term users of CIs. Early measures of speech intelligibility and receptive vocabulary (obtained during preschool ages of 3-6 years) in a sample of 35 prelingually deaf, early-implanted children predicted speech perception, language, and verbal working memory skills up to 18 years later. Age of onset of deafness and age at implantation added additional variance to preschool speech intelligibility in predicting some long-term outcome scores, but the relationship between preschool speech-language skills and later speech-language outcomes was not significantly attenuated by the addition of these hearing history variables. These findings suggest that speech and language development during the preschool years is predictive of long-term speech and language functioning in early-implanted, prelingually deaf children. As a result, measures of speech-language functioning at preschool ages can be used to identify and adjust interventions for very young CI users who may be at long-term risk for suboptimal speech and language outcomes.
A qualitative analysis of hate speech reported to the Romanian National Council for Combating Discrimination (2003‑2015

Directory of Open Access Journals (Sweden)

Adriana Iordache

2015-12-01

Full Text Available The article analyzes the specificities of Romanian hate speech over a period of twelve years through a qualitative analysis of 384 Decisions of the National Council for Combating Discrimination. The study employs a coding methodology which allows one to separate decisions according to the group that was the victim of hate speech. The article finds that stereotypes employed are similar to those encountered in the international literature. The main target of hate speech is the Roma, who are seen as „dirty“, „uncivilized“ and a threat to Romania’s image abroad. Other stereotypes encountered were that of the „disloyal“ Hungarian and of the sexually promiscuous woman. Moreover, women are seen as unfit for management positions. The article also discusses stereotypes about homosexuals, who are seen as „sick“ and about non-orthodox religions, portrayed as „sectarian“.
The Feature Extraction Based on Texture Image Information for Emotion Sensing in Speech

Directory of Open Access Journals (Sweden)

Kun-Ching Wang

2014-09-01

Full Text Available In this paper, we present a novel texture image feature for Emotion Sensing in Speech (ESS. This idea is based on the fact that the texture images carry emotion-related information. The feature extraction is derived from time-frequency representation of spectrogram images. First, we transform the spectrogram as a recognizable image. Next, we use a cubic curve to enhance the image contrast. Then, the texture image information (TII derived from the spectrogram image can be extracted by using Laws’ masks to characterize emotional state. In order to evaluate the effectiveness of the proposed emotion recognition in different languages, we use two open emotional databases including the Berlin Emotional Speech Database (EMO-DB and eNTERFACE corpus and one self-recorded database (KHUSC-EmoDB, to evaluate the performance cross-corpora. The results of the proposed ESS system are presented using support vector machine (SVM as a classifier. Experimental results show that the proposed TII-based feature extraction inspired by visual perception can provide significant classification for ESS systems. The two-dimensional (2-D TII feature can provide the discrimination between different emotions in visual expressions except for the conveyance pitch and formant tracks. In addition, the de-noising in 2-D images can be more easily completed than de-noising in 1-D speech.
Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

Science.gov (United States)

Kayasith, Prakasith; Theeramunkong, Thanaruk

It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.
Automated Speech Rate Measurement in Dysarthria

Science.gov (United States)

Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

2015-01-01

Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…
Simultaneous natural speech and AAC interventions for children with childhood apraxia of speech: lessons from a speech-language pathologist focus group.

Science.gov (United States)

Oommen, Elizabeth R; McCarthy, John W

2015-03-01

In childhood apraxia of speech (CAS), children exhibit varying levels of speech intelligibility depending on the nature of errors in articulation and prosody. Augmentative and alternative communication (AAC) strategies are beneficial, and commonly adopted with children with CAS. This study focused on the decision-making process and strategies adopted by speech-language pathologists (SLPs) when simultaneously implementing interventions that focused on natural speech and AAC. Eight SLPs, with significant clinical experience in CAS and AAC interventions, participated in an online focus group. Thematic analysis revealed eight themes: key decision-making factors; treatment history and rationale; benefits; challenges; therapy strategies and activities; collaboration with team members; recommendations; and other comments. Results are discussed along with clinical implications and directions for future research.
Improved part-of-speech prediction in suffix analysis.

Directory of Open Access Journals (Sweden)

Mario Fruzangohar

Full Text Available MOTIVATION: Predicting the part of speech (POS tag of an unknown word in a sentence is a significant challenge. This is particularly difficult in biomedicine, where POS tags serve as an input to training sophisticated literature summarization techniques, such as those based on Hidden Markov Models (HMM. Different approaches have been taken to deal with the POS tagger challenge, but with one exception--the TnT POS tagger--previous publications on POS tagging have omitted details of the suffix analysis used for handling unknown words. The suffix of an English word is a strong predictor of a POS tag for that word. As a pre-requisite for an accurate HMM POS tagger for biomedical publications, we present an efficient suffix prediction method for integration into a POS tagger. RESULTS: We have implemented a fully functional HMM POS tagger using experimentally optimised suffix based prediction. Our simple suffix analysis method, significantly outperformed the probability interpolation based TnT method. We have also shown how important suffix analysis can be for probability estimation of a known word (in the training corpus with an unseen POS tag; a common scenario with a small training corpus. We then integrated this simple method in our POS tagger and determined an optimised parameter set for both methods, which can help developers to optimise their current algorithm, based on our results. We also introduce the concept of counting methods in maximum likelihood estimation for the first time and show how counting methods can affect the prediction result. Finally, we describe how machine-learning techniques were applied to identify words, for which prediction of POS tags were always incorrect and propose a method to handle words of this type. AVAILABILITY AND IMPLEMENTATION: Java source code, binaries and setup instructions are freely available at http://genomes.sapac.edu.au/text_mining/pos_tagger.zip.
Speech Recognition on Mobile Devices

DEFF Research Database (Denmark)

Tan, Zheng-Hua; Lindberg, Børge

2010-01-01

in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within......The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR...

Classification of Strawberry Fruit Shape by Machine Learning

Science.gov (United States)

Ishikawa, T.; Hayashi, A.; Nagamatsu, S.; Kyutoku, Y.; Dan, I.; Wada, T.; Oku, K.; Saeki, Y.; Uto, T.; Tanabata, T.; Isobe, S.; Kochi, N.

2018-05-01

Shape is one of the most important traits of agricultural products due to its relationships with the quality, quantity, and value of the products. For strawberries, the nine types of fruit shape were defined and classified by humans based on the sampler patterns of the nine types. In this study, we tested the classification of strawberry shapes by machine learning in order to increase the accuracy of the classification, and we introduce the concept of computerization into this field. Four types of descriptors were extracted from the digital images of strawberries: (1) the Measured Values (MVs) including the length of the contour line, the area, the fruit length and width, and the fruit width/length ratio; (2) the Ellipse Similarity Index (ESI); (3) Elliptic Fourier Descriptors (EFDs), and (4) Chain Code Subtraction (CCS). We used these descriptors for the classification test along with the random forest approach, and eight of the nine shape types were classified with combinations of MVs + CCS + EFDs. CCS is a descriptor that adds human knowledge to the chain codes, and it showed higher robustness in classification than the other descriptors. Our results suggest machine learning's high ability to classify fruit shapes accurately. We will attempt to increase the classification accuracy and apply the machine learning methods to other plant species.
Song and speech: examining the link between singing talent and speech imitation ability.

Science.gov (United States)

Christiner, Markus; Reiterer, Susanne M

2013-01-01

In previous research on speech imitation, musicality, and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Forty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64% of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66% of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi) could be explained by working memory together with a singer's sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and auditory memory with singing fitting better into the category of "speech" on the productive level and "music" on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. (1) Motor flexibility and the ability to sing improve language and musical function. (2) Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. (3) The ability to sing improves the memory span of the auditory working memory.
Song and speech: examining the link between singing talent and speech imitation ability

Directory of Open Access Journals (Sweden)

Markus eChristiner

2013-11-01

Full Text Available In previous research on speech imitation, musicality and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Fourty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64 % of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66 % of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi could be explained by working memory together with a singer’s sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and sound memory with singing fitting better into the category of "speech" on the productive level and "music" on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. 1. Motor flexibility and the ability to sing improve language and musical function. 2. Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. 3. The ability to sing improves the memory span of the auditory short term memory.
Freedom of Speech Newsletter, September, 1975.

Science.gov (United States)

Allen, Winfred G., Jr., Ed.

The Freedom of Speech Newsletter is the communication medium for the Freedom of Speech Interest Group of the Western Speech Communication Association. The newsletter contains such features as a statement of concern by the National Ad Hoc Committee Against Censorship; Reticence and Free Speech, an article by James F. Vickrey discussing the subtle…
Automatic speech recognition used for evaluation of text-to-speech systems

Czech Academy of Sciences Publication Activity Database

Vích, Robert; Nouza, J.; Vondra, Martin

-, č. 5042 (2008), s. 136-148 ISSN 0302-9743 R&D Projects: GA AV ČR 1ET301710509; GA AV ČR 1QS108040569 Institutional research plan: CEZ:AV0Z20670512 Keywords : speech recognition * speech processing Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering
Interactive QR code beautification with full background image embedding

Science.gov (United States)

Lin, Lijian; Wu, Song; Liu, Sijiang; Jiang, Bo

2017-06-01

QR (Quick Response) code is a kind of two dimensional barcode that was first developed in automotive industry. Nowadays, QR code has been widely used in commercial applications like product promotion, mobile payment, product information management, etc. Traditional QR codes in accordance with the international standard are reliable and fast to decode, but are lack of aesthetic appearance to demonstrate visual information to customers. In this work, we present a novel interactive method to generate aesthetic QR code. By given information to be encoded and an image to be decorated as full QR code background, our method accepts interactive user's strokes as hints to remove undesired parts of QR code modules based on the support of QR code error correction mechanism and background color thresholds. Compared to previous approaches, our method follows the intention of the QR code designer, thus can achieve more user pleasant result, while keeping high machine readability.
Comparing and Optimising Parallel Haskell Implementations for Multicore Machines

DEFF Research Database (Denmark)

Berthold, Jost; Marlow, Simon; Hammond, Kevin

2009-01-01

In this paper, we investigate the differences and tradeoffs imposed by two parallel Haskell dialects running on multicore machines. GpH and Eden are both constructed using the highly-optimising sequential GHC compiler, and share thread scheduling, and other elements, from a common code base. The ...
SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support

Directory of Open Access Journals (Sweden)

Giampiero Salvi

2009-01-01

Full Text Available This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling.
The Effect of English Verbal Songs on Connected Speech Aspects of Adult English Learners’ Speech Production

Directory of Open Access Journals (Sweden)

Farshid Tayari Ashtiani

2015-02-01

Full Text Available The present study was an attempt to investigate the impact of English verbal songs on connected speech aspects of adult English learners’ speech production. 40 participants were selected based on the results of their performance in a piloted and validated version of NELSON test given to 60 intermediate English learners in a language institute in Tehran. Then they were equally distributed in two control and experimental groups and received a validated pretest of reading aloud and speaking in English. Afterward, the treatment was performed in 18 sessions by singing preselected songs culled based on some criteria such as popularity, familiarity, amount, and speed of speech delivery, etc. In the end, the posttests of reading aloud and speaking in English were administered. The results revealed that the treatment had statistically positive effects on the connected speech aspects of English learners’ speech production at statistical .05 level of significance. Meanwhile, the results represented that there was not any significant difference between the experimental group’s mean scores on the posttests of reading aloud and speaking. It was thus concluded that providing the EFL learners with English verbal songs could positively affect connected speech aspects of both modes of speech production, reading aloud and speaking. The Findings of this study have pedagogical implications for language teachers to be more aware and knowledgeable of the benefits of verbal songs to promote speech production of language learners in terms of naturalness and fluency. Keywords: English Verbal Songs, Connected Speech, Speech Production, Reading Aloud, Speaking
Machine translation with minimal reliance on parallel resources

CERN Document Server

Tambouratzis, George; Sofianopoulos, Sokratis

2017-01-01

This book provides a unified view on a new methodology for Machine Translation (MT). This methodology extracts information from widely available resources (extensive monolingual corpora) while only assuming the existence of a very limited parallel corpus, thus having a unique starting point to Statistical Machine Translation (SMT). In this book, a detailed presentation of the methodology principles and system architecture is followed by a series of experiments, where the proposed system is compared to other MT systems using a set of established metrics including BLEU, NIST, Meteor and TER. Additionally, a free-to-use code is available, that allows the creation of new MT systems. The volume is addressed to both language professionals and researchers. Prerequisites for the readers are very limited and include a basic understanding of the machine translation as well as of the basic tools of natural language processing.
An analysis of the masking of speech by competing speech using self-report data (L)

OpenAIRE

Agus, Trevor R.; Akeroyd, Michael A.; Noble, William; Bhullar, Navjot

2009-01-01

Many of the items in the “Speech, Spatial, and Qualities of Hearing” scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol.43, 85–99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively ...
Illustrated Speech Anatomy.

Science.gov (United States)

Shearer, William M.

Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…
The Design and Realization of Virtual Machine of Embedded Soft PLC Running System

Directory of Open Access Journals (Sweden)

Qingzhao Zeng

2014-11-01

Full Text Available Currently soft PLC has been the focus of study object for many countries. Soft PLC system consists of the developing system and running system. A Virtual Machine is an important part in running system even in the whole soft PLC system. It explains and performs intermediate code generated by the developing system and updates I/O status of PLC in order to complete its control function. This paper introduced the implementation scheme and execution process of the embedded soft PLC running system Virtual Machine, and mainly introduced its software implementation method, including the realization of the input sampling program, the realization of the instruction execution program and the realization of output refresh program. Besides, an operation code matching method was put forward in the instruction execution program design. Finally, the test takes PowerPC/P1010 (Freescale as the hardware platform and Vxworks as the operating system, the system test result shows that accuracy, the real-time performance and reliability of Virtual Machine.
Speech Entrainment Compensates for Broca's Area Damage

Science.gov (United States)

Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris

2015-01-01

Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to speech entrainment. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during speech entrainment versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of speech entrainment to improve speech production and may help select patients for speech entrainment treatment. PMID:25989443
Parametric Representation of the Speaker's Lips for Multimodal Sign Language and Speech Recognition

Science.gov (United States)

Ryumin, D.; Karpov, A. A.

2017-05-01

In this article, we propose a new method for parametric representation of human's lips region. The functional diagram of the method is described and implementation details with the explanation of its key stages and features are given. The results of automatic detection of the regions of interest are illustrated. A speed of the method work using several computers with different performances is reported. This universal method allows applying parametrical representation of the speaker's lipsfor the tasks of biometrics, computer vision, machine learning, and automatic recognition of face, elements of sign languages, and audio-visual speech, including lip-reading.
Patterns of poststroke brain damage that predict speech production errors in apraxia of speech and aphasia dissociate.

Science.gov (United States)

Basilakos, Alexandra; Rorden, Chris; Bonilha, Leonardo; Moser, Dana; Fridriksson, Julius

2015-06-01

Acquired apraxia of speech (AOS) is a motor speech disorder caused by brain damage. AOS often co-occurs with aphasia, a language disorder in which patients may also demonstrate speech production errors. The overlap of speech production deficits in both disorders has raised questions on whether AOS emerges from a unique pattern of brain damage or as a subelement of the aphasic syndrome. The purpose of this study was to determine whether speech production errors in AOS and aphasia are associated with distinctive patterns of brain injury. Forty-three patients with history of a single left-hemisphere stroke underwent comprehensive speech and language testing. The AOS Rating Scale was used to rate speech errors specific to AOS versus speech errors that can also be associated with both AOS and aphasia. Localized brain damage was identified using structural magnetic resonance imaging, and voxel-based lesion-impairment mapping was used to evaluate the relationship between speech errors specific to AOS, those that can occur in AOS or aphasia, and brain damage. The pattern of brain damage associated with AOS was most strongly associated with damage to cortical motor regions, with additional involvement of somatosensory areas. Speech production deficits that could be attributed to AOS or aphasia were associated with damage to the temporal lobe and the inferior precentral frontal regions. AOS likely occurs in conjunction with aphasia because of the proximity of the brain areas supporting speech and language, but the neurobiological substrate for each disorder differs. © 2015 American Heart Association, Inc.
Portable LQCD Monte Carlo code using OpenACC

Science.gov (United States)

Bonati, Claudio; Calore, Enrico; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Fabio Schifano, Sebastiano; Silvi, Giorgio; Tripiccione, Raffaele

2018-03-01

Varying from multi-core CPU processors to many-core GPUs, the present scenario of HPC architectures is extremely heterogeneous. In this context, code portability is increasingly important for easy maintainability of applications; this is relevant in scientific computing where code changes are numerous and frequent. In this talk we present the design and optimization of a state-of-the-art production level LQCD Monte Carlo application, using the OpenACC directives model. OpenACC aims to abstract parallel programming to a descriptive level, where programmers do not need to specify the mapping of the code on the target machine. We describe the OpenACC implementation and show that the same code is able to target different architectures, including state-of-the-art CPUs and GPUs.
A NOVEL APPROACH TO STUTTERED SPEECH CORRECTION

Directory of Open Access Journals (Sweden)

Alim Sabur Ajibola

2016-06-01

Full Text Available Stuttered speech is a dysfluency rich speech, more prevalent in males than females. It has been associated with insufficient air pressure or poor articulation, even though the root causes are more complex. The primary features include prolonged speech and repetitive speech, while some of its secondary features include, anxiety, fear, and shame. This study used LPC analysis and synthesis algorithms to reconstruct the stuttered speech. The results were evaluated using cepstral distance, Itakura-Saito distance, mean square error, and likelihood ratio. These measures implied perfect speech reconstruction quality. ASR was used for further testing, and the results showed that all the reconstructed speech samples were perfectly recognized while only three samples of the original speech were perfectly recognized.
Prisoner Fasting as Symbolic Speech: The Ultimate Speech-Action Test.

Science.gov (United States)

Sneed, Don; Stonecipher, Harry W.

The ultimate test of the speech-action dichotomy, as it relates to symbolic speech to be considered by the courts, may be the fasting of prison inmates who use hunger strikes to protest the conditions of their confinement or to make political statements. While hunger strikes have been utilized by prisoners for years as a means of protest, it was…
Childhood apraxia of speech and multiple phonological disorders in Cairo-Egyptian Arabic speaking children: language, speech, and oro-motor differences.

Science.gov (United States)

Aziz, Azza Adel; Shohdi, Sahar; Osman, Dalia Mostafa; Habib, Emad Iskander

2010-06-01

Childhood apraxia of speech is a neurological childhood speech-sound disorder in which the precision and consistency of movements underlying speech are impaired in the absence of neuromuscular deficits. Children with childhood apraxia of speech and those with multiple phonological disorder share some common phonological errors that can be misleading in diagnosis. This study posed a question about a possible significant difference in language, speech and non-speech oral performances between children with childhood apraxia of speech, multiple phonological disorder and normal children that can be used for a differential diagnostic purpose. 30 pre-school children between the ages of 4 and 6 years served as participants. Each of these children represented one of 3 possible subject-groups: Group 1: multiple phonological disorder; Group 2: suspected cases of childhood apraxia of speech; Group 3: control group with no communication disorder. Assessment procedures included: parent interviews; testing of non-speech oral motor skills and testing of speech skills. Data showed that children with suspected childhood apraxia of speech showed significantly lower language score only in their expressive abilities. Non-speech tasks did not identify significant differences between childhood apraxia of speech and multiple phonological disorder groups except for those which required two sequential motor performances. In speech tasks, both consonant and vowel accuracy were significantly lower and inconsistent in childhood apraxia of speech group than in the multiple phonological disorder group. Syllable number, shape and sequence accuracy differed significantly in the childhood apraxia of speech group than the other two groups. In addition, children with childhood apraxia of speech showed greater difficulty in processing prosodic features indicating a clear need to address these variables for differential diagnosis and treatment of children with childhood apraxia of speech. Copyright (c

Some links on this page may take you to non-federal websites. Their policies may differ from this site.