This thesis contains two volumes. The first is a written text that describes my compositional techniques in the context of an analysis of Codecs. The second volume is the score of this work. Volume one is divided into six sections: Introduction, harmony, rhythm and time, melodic materials, form, live electronics and future directions. Each section describes techniques and processes I developed throughout the compositional process. Codecs was inspired by the subversive proliferation musical materials though the use of audio codecs. I developed compositional tools based on encryption and compression in order to explore the audio codec metaphor. Volume two is the full score of Codecs, a work for large ensemble and live electronics. It is comprised of three sections and has a duration of approximately 14 minutes. The work is scored for flute (doubling on piccolo), oboe, clarinet in Bb (doubling on bass clarinet), bassoon, horn in F, trumpet, trombone, tuba, string quintet and percussion. Electronic drum pads and captured live sounds are used to control the live electronic elements.
Počta, P.; Beerends, J.G.
This paper deals with the intelligibility of speech coded by the STANAG 4591 standard codec, including packet loss, using synthesized speech input. Both subjective and objective assessments are used. It is shown that this codec significantly degrades intelligibility when compared to a standard
Full Text Available An adaptive multi-rate wideband (AMR-WB code is a speech codec developed on the basis of an algebraic code-excited linear-prediction (ACELP coding technique, and has a double advantage of low bit rates and high speech quality. This coding technique is widely used in modern mobile communication systems for a high speech quality in handheld devices. However, a major disadvantage is that a vector quantization (VQ of immittance spectral frequency (ISF coefficients occupies a significant computational load in the AMR-WB encoder. Hence, this paper presents a triangular inequality elimination (TIE algorithm combined with a dynamic mechanism and an intersection mechanism, abbreviated as the DI-TIE algorithm, to remarkably improve the complexity of ISF coefficient quantization in the AMR-WB speech codec. Both mechanisms are designed in a way that recursively enhances the performance of the TIE algorithm. At the end of this work, this proposal is experimentally validated as a superior search algorithm relative to a conventional TIE, a multiple TIE (MTIE, and an equal-average equal-variance equal-norm nearest neighbor search (EEENNS approach. With a full search algorithm as a benchmark for search load comparison, this work provides a search load reduction above 77%, a figure far beyond 36% in the TIE, 49% in the MTIE, and 68% in the EEENNS approach.
Full Text Available This paper deals with the impact of transcoding on the speech quality. We have focused mainly on the transcoding between codecs without the negative influence of the network parameters such as packet loss and delay. It has ensured objective and repeatable results from our measurement. The measurement was performed on the Transcoding Measuring System developed especially for this purpose. The system is based on the open source projects and is useful as a design tool for VoIP system administrators. The paper compares the most used codecs from the transcoding perspective. The multiple transcoding between G711, GSM and G729 codecs were performed and the speech quality of these calls was evaluated. The speech quality was measured by Perceptual Evaluation of Speech Quality method, which provides results in Mean Opinion Score used to describe the speech quality on a scale from 1 to 5. The obtained results indicate periodical speech quality degradation on every transcoding between two codecs.
Ossewaarde, Roelant; Jonkers, Roel; Jalvingh, Fedor; Bastiaanse, Yvonne
Measurement of speech parameters in casual speech of dementia patients Roelant Adriaan Ossewaarde1,2, Roel Jonkers1, Fedor Jalvingh1,3, Roelien Bastiaanse1 1CLCG, University of Groningen (NL); 2HU University of Applied Sciences Utrecht (NL); 33St. Marienhospital - Vechta, Geriatric Clinic Vechta
Randa Oktavada Zein
Full Text Available Voice Over Internet Protocol (VoIP merupakan layanan bersifat real-time, parameter yang dapat mempengaruhi kulitas layanan seperti delay, jitter dan packet loss. Dalam proses pengkodean sinyal analog menjadi sinyal digital menyebabkan terjadiya delay pada VoIP. Sistem pengcodean ini disebut codec, setiap codec memiliki bitrate yang berbeda pengkodeannya. Pada penelitian ini menggunakan codec G.711, G729 dan codec G.723.1 sebagai perbandingan untuk mengetahui qulity of service (QoS VoIP jika diterapkan pada jaringan MANET. Dengan penambahan aplikasi Hypertext Transfer Protovol (HTTP untuk mendapatkan parameter QoS VoIP dari codec G.711, G.729 dan G.723.1. Hasil yang didapat pada codec G.723.1 lebih baik jika dibandingkan dengan codec g.711 dan G.729 dilihat dari nilai dari parameter QoS seperti delay, jitter dan packet loss. Sesuai standar ITU-T G114 dimana perhitunan teoritis parameter dari codec G.723.1 didapat hasil terendah 7,68 kbps.
Full Text Available Voice over IP (VoIP adalah solusi komunikasi suara yang murah karena menggunakan jaringan IP dibanding penggunaan telephone analog yang banyak memakan biaya. Dalam penerapannya, VoIP mengalami permasalahan karena menggunakan teknologi packet switching yang mana penggunaannya bersamaan dengan paket data sehingga timbul delay, jitter, dan packet loss. Pada penelitian ini, algoritma Low Latency Queuing (LLQ diterapkan pada router cisco. Algoritma LLQ merupakan gabungan dari algoritma Priority Queuing (PQ dan Class Based Weight Fair Queuing (CBWFQ sehingga dapat memprioritaskan paket suara disamping paket data. Algoritma LLQ ini diujikan menggunakan codec GSM FR, G722, dan G711 A-law. Hasil pengujian didapatkan nilai parameter yang tidak jauh berbeda dan memenuhi standar ITU-T.G1010. Nilai delay rata - rata terendah yaitu ketika menggunakan codec G722 sebesar 20,019 ms tetapi G722 memiliki rata - rata jitter yang terbesar yaitu 0,986 ms. Codec dengan jitter rata – rata terkecil adalah G711 A-law sebesar 0,838 ms. Packet loss untuk semua codec yang diujikan adalah 0%. Throughput pada paket data terbesar saat menggunakan codec GSM FR yaitu 18,139 kbps. Codec yang direkomendasikan adalah G711 A-law karena lebih stabil dari segi jitter dan codec GSM FR cocok diimplementasikan pada jaringan yang memiliki bandwitdh kecil.
Hanratty, Jane; Deegan, Catherine; Walsh, Mary; Kirkpatrick, Barry
Diagnosis and monitoring of Parkinson's disease has a number of challenges as there is no definitive biomarker despite the broad range of symptoms. Research is ongoing to produce objective measures that can either diagnose Parkinson's or act as an objective decision support tool. Recent research on speech based measures have demonstrated promising results. This study aims to investigate the characteristics of the glottal source signal in Parkinsonian speech. An experiment is conducted in which a selection of glottal parameters are tested for their ability to discriminate between healthy and Parkinsonian speech. Results for each glottal parameter are presented for a database of 50 healthy speakers and a database of 16 speakers with Parkinsonian speech symptoms. Receiver operating characteristic (ROC) curves were employed to analyse the results and the area under the ROC curve (AUC) values were used to quantify the performance of each glottal parameter. The results indicate that glottal parameters can be used to discriminate between healthy and Parkinsonian speech, although results varied for each parameter tested. For the task of separating healthy and Parkinsonian speech, 2 out of the 7 glottal parameters tested produced AUC values of over 0.9.
Belyaev, Evgeny; Mantel, Claire; Forchhammer, Søren
Future infrared remote sensing systems, such as monitoring of the Earth's environment by satellites, infrastructure inspection by unmanned airborne vehicles etc., will require 16 bit depth infrared images to be compressed and stored or transmitted for further analysis. Such systems are equipped with low power embedded platforms where image or video data is compressed by a hardware block called the video processing unit (VPU). However, in many cases using two 8-bit VPUs can provide advantages compared with using higher bit depth image compression directly. We propose to compress 16 bit depth images via 8 bit depth codecs in the following way. First, an input 16 bit depth image is mapped into 8 bit depth images, e.g., the first image contains only the most significant bytes (MSB image) and the second one contains only the least significant bytes (LSB image). Then each image is compressed by an image or video codec with 8 bits per pixel input format. We analyze how the compression parameters for both MSB and LSB images should be chosen to provide the maximum objective quality for a given compression ratio. Finally, we apply the proposed infrared image compression method utilizing JPEG and H.264/AVC codecs, which are usually available in efficient implementations, and compare their rate-distortion performance with JPEG2000, JPEG-XT and H.265/HEVC codecs supporting direct compression of infrared images in 16 bit depth format. A preliminary result shows that two 8 bit H.264/AVC codecs can achieve similar result as 16 bit HEVC codec.
Ukhanova, Ann; Forchhammer, Søren
This paper discusses the question of video codec enhancement for wireless video transmission of high definition video data taking into account constraints on memory and complexity. Starting from parameter adjustment for JPEG2000 compression algorithm used for wireless transmission and achieving...
Slavata, Oldřich; Holub, Jan
This paper deals with an analysis of the relation between the codec that is used, the QoS method, and the final voice transmission quality. The Cisco 2811 router is used for adjusting QoS. VoIP client Linphone is used for adjusting the codec. The criterion for transmission quality is the MOS parameter investigated with the ITU-T P.862 PESQ and P.863 POLQA algorithms
Zourmand, Alireza; Mirhassani, Seyed Mostafa; Ting, Hua-Nong; Bux, Shaik Ismail; Ng, Kwan Hoong; Bilgen, Mehmet; Jalaludin, Mohd Amin
The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined.Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production.
Seo, Chan-Won; Han, Jong-Ki; Nguyen, Truong Q
Multimedia data delivered to mobile devices over wireless channels or the Internet are complicated by bandwidth fluctuation and the variety of mobile devices. Scalable video coding has been developed as an extension of H.264/AVC to solve this problem. Since scalable video codec provides various scalabilities to adapt the bitstream for the channel conditions and terminal types, scalable codec is one of the useful codecs for wired or wireless multimedia communication systems, such as IPTV and streaming services. In such scalable multimedia communication systems, video quality fluctuation degrades the visual perception significantly. It is important to efficiently use the target bits in order to maintain a consistent video quality or achieve a small distortion variation throughout the whole video sequence. The scheme proposed in this paper provides a useful function to control video quality in applications supporting scalability, whereas conventional schemes have been proposed to control video quality in the H.264 and MPEG-4 systems. The proposed algorithm decides the quantization parameter of the enhancement layer to maintain a consistent video quality throughout the entire sequence. The video quality of the enhancement layer is controlled based on a closed-form formula which utilizes the residual data and quantization error of the base layer. The simulation results show that the proposed algorithm controls the frame quality of the enhancement layer in a simple operation, where the parameter decision algorithm is applied to each frame.
Full Text Available The current trend of digital convergence leads to the need of the video encoder/decoder (codec that should support multiple video standards on a single platform as it is expensive to use dedicated video codec chip for each standard. The paper presents a high performance circuit shared architecture that can perform the quantization of five popular video codecs such as H.264/AVC, AVS, VC-1, MPEG-2/4, and JPEG. The proposed quantizer architecture is completely division-free as the division operation is replaced by shift and addition operations for all the standards. The design is implemented on FPGA and later synthesized in CMOS 0.18 μm technology. The results show that the proposed design satisfies the requirement of all five codecs with a maximum decoding capability of 60 fps at 187 MHz on Xilinx FPGA platform for 1080 p HD video.
Rached Tourki; M. Machhout; B. Bouallegue; M. Atri; M. Zeghid; D. Dia
In this paper, we proposed a secure video codec based on the discrete wavelet transformation (DWT) and the Advanced Encryption Standard (AES) processor. Either, use of video coding with DWT or encryption using AES is well known. However, linking these two designs to achieve secure video coding is leading. The contributions of our work are as follows. First, a new method for image and video compression is proposed. This codec is a synthesis of JPEG and JPEG2000,which is implemented using Huffm...
Young Han Lee
Full Text Available A bandwidth extension (BWE algorithm from wideband to superwideband (SWB is proposed for a scalable speech/audio codec that uses modified discrete cosine transform (MDCT coefficients as spectral parameters. The superwideband is first split into several subbands that are represented as gain parameters and normalized MDCT coefficients in the proposed BWE algorithm. We then estimate normalized MDCT coefficients of the wideband to be fetched for the superwideband and quantize the fetch indices. After that, we quantize gain parameters by using relative ratios between adjacent subbands. The proposed BWE algorithm is embedded into a standard superwideband codec, the SWB extension of G.729.1 Annex E, and its bitrate and quality are compared with those of the BWE algorithm already employed in the standard superwideband codec. It is shown from the comparison that the proposed BWE algorithm relatively reduces the bitrate by around 19% with better quality, compared to the BWE algorithm in the SWB extension of G.729.1 Annex E.
Uhler, Kristin M; Baca, Rosalinda; Dudas, Emily; Fredrickson, Tammy
Speech perception measures have long been considered an integral piece of the audiological assessment battery. Currently, a prelinguistic, standardized measure of speech perception is missing in the clinical assessment battery for infants and young toddlers. Such a measure would allow systematic assessment of speech perception abilities of infants as well as the potential to investigate the impact early identification of hearing loss and early fitting of amplification have on the auditory pathways. To investigate the impact of sensation level (SL) on the ability of infants with normal hearing (NH) to discriminate /a-i/ and /ba-da/ and to determine if performance on the two contrasts are significantly different in predicting the discrimination criterion. The design was based on a survival analysis model for event occurrence and a repeated measures logistic model for binary outcomes. The outcome for survival analysis was the minimum SL for criterion and the outcome for the logistic regression model was the presence/absence of achieving the criterion. Criterion achievement was designated when an infant's proportion correct score was >0.75 on the discrimination performance task. Twenty-two infants with NH sensitivity participated in this study. There were 9 males and 13 females, aged 6-14 mo. Testing took place over two to three sessions. The first session consisted of a hearing test, threshold assessment of the two speech sounds (/a/ and /i/), and if time and attention allowed, visual reinforcement infant speech discrimination (VRISD). The second session consisted of VRISD assessment for the two test contrasts (/a-i/ and /ba-da/). The presentation level started at 50 dBA. If the infant was unable to successfully achieve criterion (>0.75) at 50 dBA, the presentation level was increased to 70 dBA followed by 60 dBA. Data examination included an event analysis, which provided the probability of criterion distribution across SL. The second stage of the analysis was a
Přibil, J.; Přibilová, A.
The paper addresses reflection of microintonation and spectral properties in male and female acted emotional speech. Microintonation component of speech melody is analyzed regarding its spectral and statistical parameters. According to psychological research of emotional speech, different emotions are accompanied by different spectral noise. We control its amount by spectral flatness according to which the high frequency noise is mixed in voiced frames during cepstral speech synthesis. Our experiments are aimed at statistical analysis of cepstral coefficient values and ranges of spectral flatness in three emotions (joy, sadness, anger), and a neutral state for comparison. Calculated histograms of spectral flatness distribution are visually compared and modelled by Gamma probability distribution. Histograms of cepstral coefficient distribution are evaluated and compared using skewness and kurtosis. Achieved statistical results show good correlation comparing male and female voices for all emotional states portrayed by several Czech and Slovak professional actors.
Belyaev, Evgeny; Mantel, Claire; Forchhammer, Søren
.264/AVC codecs, which are usually available in efficient implementations, and compare their rate-distortion performance with JPEG2000, JPEG-XT and H.265/HEVC codecs supporting direct compression of infrared images in 16 bit depth format. A preliminary result shows that two 8 bit H.264/AVC codecs can...
R.N. Mekuria (Rufael); C.L. Blom (Kees); P.S. Cesar Garcia (Pablo Santiago)
htmlabstractwe present a generic and real-time time-varying point cloud codec for 3D immersive video. This codec is suitable for mixed reality applications where 3D point clouds are acquired at a fast rate. In this codec, intra frames are coded progressively in an octree subdivision. To further
Full Text Available A deep neural networks (DNN based close talk speech segregation algorithm is introduced. One nearby microphone is used to collect the target speech as close talk indicated, and another microphone is used to get the noise in environments. The time and energy difference between the two microphones signal is used as the segregation cue. A DNN estimator on each frequency channel is used to calculate the parameter masks. The parameter masks represent the target speech energy in each time frequency (T-F units. Experiment results show the good performance of the proposed system. The signal to noise ratio (SNR improvement is 8.1 dB on 0 dB noisy environment.
Zhang, Youhui; Zheng, Weimin
At work, at home, and in some public places, a desktop PC is usually available nowadays. Therefore, it is important for users to be able to play various videos on different PCs smoothly, but the diversity of codec types complicates the situation. Although some mainstream media players can try to download the needed codec automatically, this may fail for average users because installing the codec usually requires administrator privileges to complete, while the user may not be the owner of the PC. We believe an ideal solution should work without users' intervention, and need no special privileges. This paper proposes such a user-friendly, program-transparent solution for Windows-based media players. It runs the media player in a user-mode virtualization environment, and then downloads the needed codec on-the-fly. Because of API (Application Programming Interface) interception, some resource-accessing API calls from the player will be redirected to the downloaded codec resources. Then from the viewpoint of the player, the necessary codec exists locally and it can handle the video smoothly, although neither system registry nor system folders was modified during this process. Besides convenience, the principle of least privilege is maintained and the host system is left clean. This paper completely analyzes the technical issues and presents such a prototype which can work with DirectShow-compatible players. Performance tests show that the overhead is negligible. Moreover, our solution conforms to the Software-As-A-Service (SaaS) mode, which is very promising in the Internet era.
Rodriguez, Arturo A.; Morse, Ken
Software-only video codecs can provide good playback performance in desktop computers with a 486 or 68040 CPU running at 33 MHz without special hardware assistance. Typically, playback of compressed video can be categorized into three tasks: the actual decoding of the video stream, color conversion, and the transfer of decoded video data from system RAM to video RAM. By current standards, good playback performance is the decoding and display of video streams of 320 by 240 (or larger) compressed frames at 15 (or greater) frames-per- second. Software-only video codecs have evolved by modifying and tailoring existing compression methodologies to suit video playback in desktop computers. In this paper we examine the characteristics used to evaluate software-only video codec algorithms, namely: image fidelity (i.e., image quality), bandwidth (i.e., compression) ease-of-decoding (i.e., playback performance), memory consumption, compression to decompression asymmetry, scalability, and delay. We discuss the tradeoffs among these variables and the compromises that can be made to achieve low numerical complexity for software-only playback. Frame- differencing approaches are described since software-only video codecs typically employ them to enhance playback performance. To complement other papers that appear in this session of the Proceedings, we review methods derived from binary pattern image coding since these methods are amenable for software-only playback. In particular, we introduce a novel approach called pixel distribution image coding.
Leensen, Monique C. J.; Dreschler, Wouter A.
The online speech-in-noise test 'Earcheck' is sensitive for noise-induced hearing loss (NIHL). This study investigates effects of uncontrollable parameters in domestic self-screening, such as presentation level and transducer type, on speech reception thresholds (SRTs) obtained with Earcheck.
Full Text Available In this paper, we proposed a secure video codec based on the discrete wavelet transformation (DWT and the Advanced Encryption Standard (AES processor. Either, use of video coding with DWT or encryption using AES is well known. However, linking these two designs to achieve secure video coding is leading. The contributions of our work are as follows. First, a new method for image and video compression is proposed. This codec is a synthesis of JPEG and JPEG2000,which is implemented using Huffman coding to the JPEG and DWT to the JPEG2000. Furthermore, an improved motion estimation algorithm is proposed. Second, the encryptiondecryption effects are achieved by the AES processor. AES is aim to encrypt group of LL bands. The prominent feature of this method is an encryption of LL bands by AES-128 (128-bit keys, or AES-192 (192-bit keys, or AES-256 (256-bit keys.Third, we focus on a method that implements partial encryption of LL bands. Our approach provides considerable levels of security (key size, partial encryption, mode encryption, and has very limited adverse impact on the compression efficiency. The proposed codec can provide up to 9 cipher schemes within a reasonable software cost. Latency, correlation, PSNR and compression rate results are analyzed and shown.
Rao, K. R.
In view of the 56 KBPS digital switched network services and the ISDN, low bit rate codecs for providing real time full motion color video are under various stages of development. Some companies have already brought the codecs into the market. They are being used by industry and some Federal Agencies for video teleconferencing. In general, these codecs have various features such as multiplexing audio and data, high resolution graphics, encryption, error detection and correction, self diagnostics, freezeframe, split video, text overlay etc. To transmit the original color video on a 56 KBPS network requires bit rate reduction of the order of 1400:1. Such a large scale bandwidth compression can be realized only by implementing a number of sophisticated,digital signal processing techniques. This paper provides an overview of such techniques and outlines the newer concepts that are being investigated. Before resorting to the data compression techniques, various preprocessing operations such as noise filtering, composite-component transformation and horizontal and vertical blanking interval removal are to be implemented. Invariably spatio-temporal subsampling is achieved by appropriate filtering. Transform and/or prediction coupled with motion estimation and strengthened by adaptive features are some of the tools in the arsenal of the data reduction methods. Other essential blocks in the system are quantizer, bit allocation, buffer, multiplexer, channel coding etc.
Belyaev, Evgeny; Mantel, Claire; Forchhammer, Søren
images via 8 bit depth codecs in the following way. First, an input 16 bit depth image is mapped into 8 bit depth images, e.g., the first image contains only the most significant bytes (MSB image) and the second one contains only the least significant bytes (LSB image). Then each image is compressed.......264/AVC codecs, which are usually available in efficient implementations, and compare their rate-distortion performance with JPEG2000, JPEG-XT and H.265/HEVC codecs supporting direct compression of infrared images in 16 bit depth format. A preliminary result shows that two 8 bit H.264/AVC codecs can...
Full Text Available Multiview video which is one of the main types of three-dimensional (3D video signals, captured by a set of video cameras from various viewpoints, has attracted much interest recently. Data compression for multiview video has become a major issue. In this paper, a novel high efficiency fractal multiview video codec is proposed. Firstly, intraframe algorithm based on the H.264/AVC intraprediction modes and combining fractal and motion compensation (CFMC algorithm in which range blocks are predicted by domain blocks in the previously decoded frame using translational motion with gray value transformation is proposed for compressing the anchor viewpoint video. Then temporal-spatial prediction structure and fast disparity estimation algorithm exploiting parallax distribution constraints are designed to compress the multiview video data. The proposed fractal multiview video codec can exploit temporal and spatial correlations adequately. Experimental results show that it can obtain about 0.36 dB increase in the decoding quality and 36.21% decrease in encoding bitrate compared with JMVC8.5, and the encoding time is saved by 95.71%. The rate-distortion comparisons with other multiview video coding methods also demonstrate the superiority of the proposed scheme.
Hussmann, Katja; Grande, Marion; Meffert, Elisabeth; Christoph, Swetlana; Piefke, Martina; Willmes, Klaus; Huber, Walter
Although generally accepted as an important part of aphasia assessment, detailed analysis of spontaneous speech is rarely carried out in clinical practice mostly due to time limitations. The Aachener Sprachanalyse (ASPA; Aachen Speech Analysis) is a computer-assisted method for the quantitative analysis of German spontaneous speech that allows for…
Earlier studies by other authors indicate that vestibular stimulation may improve attention and dysarthria in head injured patients. In the present study of five severely head injured patients and five controls, the effect of vestibular stimulation on reaction times (reflecting attention) and some...... motor speech parameters (reflecting dysarthria) was investigated. After eight weeks with regular stimulation, it was concluded that reaction time changes were individual and consistent for a given subject. Only occasionally were they shortened after stimulation. However, reaction time was lengthened...... in three cases, prohibiting further stimulation in one case. Motion sickness was prohibitive in a second case. However, after-stimulation increase of phonation time and/or vital capacity was found in one patient and four controls. Oral diadochokinetic rates were slowed in several cases. Collectively, when...
Full Text Available Steganography is a means of covert communication without revealing the occurrence and the real purpose of communication. The adaptive multirate wideband (AMR-WB is a widely adapted format in mobile handsets and is also the recommended speech codec for VoLTE. In this paper, a novel AMR-WB speech steganography is proposed based on diameter-neighbor codebook partition algorithm. Different embedding capacity may be achieved by adjusting the iterative parameters during codebook division. The experimental results prove that the presented AMR-WB steganography may provide higher and flexible embedding capacity without inducing perceptible distortion compared with the state-of-the-art methods. With 48 iterations of cluster merging, twice the embedding capacity of complementary-neighbor-vertices-based embedding method may be obtained with a decrease of only around 2% in speech quality and much the same undetectability. Moreover, both the quality of stego speech and the security regarding statistical steganalysis are better than the recent speech steganography based on neighbor-index-division codebook partition.
Speech quality assessment is one of the key matters of voice services and every provider should ensure adequate connection quality to end users. Speech quality has to be measured by a trusted method and results have to correlate with intelligibility and clarity of the speech, as perceived by the listener. It can be achieved by subjective methods but in real life we must rely on objective measurements based on reliable models. One of them is E-model that we can consider as...
De Looze, Céline; Moreau, Noémie; Renié, Laurent; Kelly, Finnian; Ghio, Alain; Rico, Audrey; Audoin, Bertrand; Viallet, François; Pelletier, Jean; Petrone, Caterina
Cognitive impairment (CI) affects 40-65% of patients with multiple sclerosis (MS). CI can have a negative impact on a patient's everyday activities, such as engaging in conversations. Speech production planning ability is crucial for successful verbal interactions and thus for preserving social and occupational skills. This study investigates the effect of cognitive-linguistic demand and CI on speech production planning in MS, as reflected in speech prosody. A secondary aim is to explore the clinical potential of prosodic features for the prediction of an individual's cognitive status in MS. A total of 45 subjects, that is 22 healthy controls (HC) and 23 patients in early stages of relapsing-remitting MS, underwent neuropsychological tests probing specific cognitive processes involved in speech production planning. All subjects also performed a read speech task, in which they had to read isolated sentences manipulated as for phonological length. Results show that the speech of MS patients with CI is mainly affected at the temporal level (articulation and speech rate, pause duration). Regression analyses further indicate that rate measures are correlated with working memory scores. In addition, linear discriminant analysis shows the ROC AUC of identifying MS patients with CI is 0.70 (95% confidence interval: 0.68-0.73). Our findings indicate that prosodic planning is deficient in patients with MS-CI and that the scope of planning depends on patients' cognitive abilities. We discuss how speech-based approaches could be used as an ecological method for the assessment and monitoring of CI in MS. © 2017 The British Psychological Society.
Full Text Available Multiview video consists of multiple views of the same scene. They require enormous amount of data to achieve high image quality, which makes it indispensable to compress multiview video. Therefore, data compression is a major issue for multiviews. In this paper, we explore an efficient fractal video codec to compress multiviews. The proposed scheme first compresses a view-dependent geometry of the base view using fractal video encoder with homogeneous region condition. With the extended fractional pel motion estimation algorithm and fast disparity estimation algorithm, it then generates prediction images of other views. The prediction image uses the image-based rendering techniques based on the decoded video. And the residual signals are obtained by the prediction image and the original image. Finally, it encodes residual signals by the fractal video encoder. The idea is also to exploit the statistical dependencies from both temporal and interview reference pictures for motion compensated prediction. Experimental results show that the proposed algorithm is consistently better than JMVC8.5, with 62.25% bit rate decrease and 0.37 dB PSNR increase based on the Bjontegaard metric, and the total encoding time (TET of the proposed algorithm is reduced by 92%.
Přibil, Jiří; Přibilová, Anna
Roč. 9, č. 4 (2009), s. 95-104 ISSN 1335-8871 R&D Projects: GA MŠk OC08010; GA ČR GA102/09/0989 Institutional research plan: CEZ:AV0Z20670512 Keywords : signal processing * speech synthesis Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering
Topiwala, Pankaj; Dai, Wei; Krishnan, Madhu; Abbas, Adeel; Doshi, Sandeep; Newman, David
This paper compares the coding efficiency performance on 360 videos, of three software codecs: (a) AV1 video codec from the Alliance for Open Media (AOM); (b) the HEVC Reference Software HM; and (c) the JVET JEM Reference SW. Note that 360 video is especially challenging content, in that one codes full res globally, but typically looks locally (in a viewport), which magnifies errors. These are tested in two different projection formats ERP and RSP, to check consistency. Performance is tabulated for 1-pass encoding on two fronts: (1) objective performance based on end-to-end (E2E) metrics such as SPSNR-NN, and WS-PSNR, currently developed in the JVET committee; and (2) informal subjective assessment of static viewports. Constant quality encoding is performed with all the three codecs for an unbiased comparison of the core coding tools. Our general conclusion is that under constant quality coding, AV1 underperforms HEVC, which underperforms JVET. We also test with rate control, where AV1 currently underperforms the open source X265 HEVC codec. Objective and visual evidence is provided.
Ravishankar, C., Hughes Network Systems, Germantown, MD
Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the
Geske, Dagmar; Hess, Robert
The use of software codecs for video compression becomes commonplace in several videoconferencing applications. In order to reduce conflicts with other applications used at the same time, mechanisms for resource reservation on endsystems need to determine an upper bound for computing time used by the codec. This leads to the demand for predictable execution times of compression/decompression. Since compression schemes as H.261 inherently depend on the motion contained in the video, an adaptive admission control is required. This paper presents a data driven approach based on dynamical reduction of the number of processed macroblocks in peak situations. Besides the absolute speed is a point of interest. The question, whether and how software compression of high quality video is feasible on today's desktop computers, is examined.
Alesanco, A; Hernández, C; García, J; Portolés, A; Aured, C; García, M; Ramos, L; Serrano, P
This paper introduces a new clinical distortion index able to measure the decrease in diagnostic content in compressed echocardiograms. It is calculated using cardiologists' answers to a clinical testbed composed of two types of tests: one blind and the other semi-blind. This index may be used to compare clinical performance among video codecs from a clinical perspective. It can also be used to classify compression rates into useful and useless ranges, thus providing recommendations for echocardiogram compression. A study carried out in order to illustrate its use with Xvid video codec is also presented. The results obtained showed that, for 2D and M modes, the transmission rate should be at least 768 kbit s −1 and for color Doppler mode and pulsed/continuous Doppler, 256 kbit s −1
video teleconferencing codecs on the market as of November 1984 to facilitate the choice of an appropriate frame format and data compression algorithm...Engineer, computer company, male 5. Chapter Officer, national civic organization, female Group Y: 6. Marketing Representative, communication systems...both mon:tors to C4ve t e evi uators an idea what kind of cictures they will have to ; ucge . Special suggestions were given regardinc the sequences witn
Joshi, Urvang; Mukherjee, Debargha; Han, Jingning; Chen, Yue; Parker, Sarah; Su, Hui; Chiang, Angie; Xu, Yaowu; Liu, Zoe; Wang, Yunqing; Bankoski, Jim; Wang, Chen; Keyder, Emil
Google started the WebM Project in 2010 to develop open source, royalty- free video codecs designed specifically for media on the Web. The second generation codec released by the WebM project, VP9, is currently served by YouTube, and enjoys billions of views per day. Realizing the need for even greater compression efficiency to cope with the growing demand for video on the web, the WebM team embarked on an ambitious project to develop a next edition codec AV1, in a consortium of major tech companies called the Alliance for Open Media, that achieves at least a generational improvement in coding efficiency over VP9. In this paper, we focus primarily on new tools in AV1 that improve the prediction of pixel blocks before transforms, quantization and entropy coding are invoked. Specifically, we describe tools and coding modes that improve intra, inter and combined inter-intra prediction. Results are presented on standard test sets.
Knowles, Thea; Adams, Scott; Abeyesekera, Anita; Mancinelli, Cynthia; Gilmore, Greydon; Jog, Mandar
Purpose: The settings of 3 electrical stimulation parameters were adjusted in 12 speakers with Parkinson's disease (PD) with deep brain stimulation of the subthalamic nucleus (STN-DBS) to examine their effects on vowel acoustics and speech intelligibility. Method: Participants were tested under permutations of low, mid, and high STN-DBS frequency,…
Latanov, Alexander V.
Full Text Available This paper gives an overview of the published data on eye movement parameters while reading sentences in different languages with both local and global syntactic ambiguity. A locally ambiguous sentence contains a syntactically problematic phrase that leads to only one interpretation, while a globally ambiguous sentence has more than one distinct interpretation. In the first case the ambiguity persists only to the end of the sentence, when it is successfully resolved; in the second case the ambiguity is still present after reading the whole sentence. The obvious difficulty in analyzing the structure of locally and globally ambiguous sentences leads to increased reading time compared with unambiguous sentences. The syntactic ambiguity increases two major parameters: the fixation duration when reading words critical for interpreting the sentence, and the frequency of regressive saccades to reread those words. The reading time for critical words, disambiguating the local ambiguity, depends on the principle of early/late closure (i.e., high/low attachment: preferring a recurrent pattern to associate the critical word with a distant or closer word, respectively (as determined by its position in the sentence, and differs across languages. The first study of eye movement parameters in reading globally syntactic ambiguous sentences in the Russian language is reported in this paper. Our findings open up the prospects of quantitative studies of syntactic disambiguation in Slavonic and Romano-Germanic languages.
Kaganov, A Sh
The objective of the present study was to elucidate the relationship between the spectral speech characteristics and the biometric parameters of the speaker's vocal tract. The secondary objective was to consider the theoretical basis behind the medico-criminalistic personality identification from the biometric parameters of the speaker's vocal tract. The article is based on the results of real forensic medical investigations and the literature data.
WANG Jing; KUANG Jing-ming; ZHAO Sheng-hui
A variable-bit-rate characteristic waveform interpolation (VBR-CWI) speech codec with about 1.8kbit/s average bit rate which integrates phonetic classification into characteristic waveform (CW) decomposition is proposed.Each input frame is classified into one of 4 phonetic classes.Non-speech frames are represented with Bark-band noise model.The extracted CWs become rapidly evolving waveforms (REWs) or slowly evolving waveforms (SEWs) in the cases of unvoiced or stationary voiced frames respectively, while mixed voiced frames use the same CW decomposition as that in the conventional CWI.Experimental results show that the proposed codec can eliminate most buzzy and noisy artifacts existing in the fixed-bit-rate characteristic waveform interpolation (FBR-CWI) speech codec, the average bit rate can be much lower, and its reconstructed speech quality is much better than FS 1016 CELP at 4.8kbit/s and similar to G.723.1 ACELP at 5.3kbit/s.
Huang, Jen-Fa; Chen, Kai-Sheng; Lin, Ying-Chen; Li, Chung-Yu
A reconfiguration scheme based on composite signature codes over waveguide-gratings-based optical code-division multiple-access (OCDMA) network coder/decoders (codecs) is proposed in the paper. By using central control node to monitor network traffic condition and reconfigure the composite signature codes made up of maximal-length sequence (M-sequence) component codes and random changing the signature codes assigned for each user to improve the confidentiality performance in an OCDMA system. The proposed scheme is analyzed with some practical eavesdroppers' attacks.
Liu KJ Ray
Full Text Available A key enabling technology for the proliferation of multimedia PC's is the availability of fast video codecs, which are the basic building blocks of many new multimedia applications. Since most industrial video coding standards (e.g., MPEG1, MPEG2, H.261, H.263 only specify the decoder syntax, there are a lot of rooms for optimization in a practical implementation. When considering a specific hardware platform like the PC, the algorithmic optimization must be considered in tandem with the architecture of the PC. Specifically, an algorithm that is optimal in the sense of number of operations needed may not be the fastest implementation on the PC. This is because special instructions are available which can perform several operations at once under special circumstances. In this work, we describe a fast implementation of H.263 video encoder for the Pentium processor with MMX technology. The described codec is adopted for video mail and video phone softwares used in IBM ThinkPad.
Neji, Nihel; Jridi, Maher; Alfalou, Ayman; Masmoudi, Nouri
This paper presents an optimized H.264/AVC coding system for HDTV displays based on a typical flow with high coding efficiency and statics adaptivity features. For high quality streaming, the codec uses a Binary Arithmetic Encoding/Decoding algorithm with high complexity and a JVCE (Joint Video compression and encryption) scheme. In fact, particular attention is given to simultaneous compression and encryption applications to gain security without compromising the speed of transactions . The proposed design allows us to encrypt the information using a pseudo-random number generator (PRNG). Thus we achieved the two operations (compression and encryption) simultaneously and in a dependent manner which is a novelty in this kind of architecture. Moreover, we investigated the hardware implementation of CABAC (Context-based adaptive Binary Arithmetic Coding) codec. The proposed architecture is based on optimized binarizer/de-binarizer to handle significant pixel rates videos with low cost and high performance for most frequent SEs. This was checked using HD video frames. The obtained synthesis results using an FPGA (Xilinx's ISE) show that our design is relevant to code main profile video stream.
Full Text Available The higher resolutions and new functionality of video applications increase their throughput and processing requirements. In contrast, the energy and heat limitations of mobile devices demand low-power video cores. We propose a memory and communication centric design methodology to reach an energy-efficient dedicated implementation. First, memory optimizations are combined with algorithmic tuning. Then, a partitioning exploration introduces parallelism using a cyclo-static dataflow model that also expresses implementation-specific aspects of communication channels. Towards hardware, these channels are implemented as a restricted set of communication primitives. They enable an automated RTL development strategy for rigorous functional verification. The FPGA/ASIC design of an MPEG-4 Simple Profile video codec demonstrates the methodology. The video pipeline exploits the inherent functional parallelism of the codec and contains a tailored memory hierarchy with burst accesses to external memory. 4CIF encoding at 30 fps, consumes 71 mW in a 180 nm, 1.62 V UMC technology.
Full Text Available The higher resolutions and new functionality of video applications increase their throughput and processing requirements. In contrast, the energy and heat limitations of mobile devices demand low-power video cores. We propose a memory and communication centric design methodology to reach an energy-efficient dedicated implementation. First, memory optimizations are combined with algorithmic tuning. Then, a partitioning exploration introduces parallelism using a cyclo-static dataflow model that also expresses implementation-specific aspects of communication channels. Towards hardware, these channels are implemented as a restricted set of communication primitives. They enable an automated RTL development strategy for rigorous functional verification. The FPGA/ASIC design of an MPEG-4 Simple Profile video codec demonstrates the methodology. The video pipeline exploits the inherent functional parallelism of the codec and contains a tailored memory hierarchy with burst accesses to external memory. 4CIF encoding at 30 fps, consumes 71 mW in a 180 nm, 1.62 V UMC technology.
Full Text Available Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a...
Ukhanova, Ann; Belyaev, Evgeny; Forchhammer, Søren
(ME) and CABAC entropy coder consume much power so we eliminate ME from the codec and use CAVLC instead of CABAC. Some investigations show that low-complexity DVC outperforms other algorithms in terms of encoder side energy consumption . However, estimations of power consumption for H.264/AVC and DVC...
Nykänen, Arne; Lindegren, David; Wruck, Louisa; Ljung, Robert; Odelius, Johan; Möller, Sebastian
Previous studies have shown that free recall of spoken word lists is impaired if the speech is presented in background noise, even if the signal-to-noise ratio is kept at a level allowing full word identification. The objective of this study was to examine recall rates for word lists presented in noise and word lists coded by an AMR (Adaptive Multi Rate) telephone codec distorted by packet loss. Twenty subjects performed a word recall test. Word lists consisting of ten words were played to th...
Full Text Available Designing an effective and high performance network requires an accurate characterization and modeling of network traffic. The modeling of video frame sizes is normally applied in simulation studies and mathematical analysis and generating streams for testing and compliance purposes. Besides, video traffic assumed as a major source of multimedia traffic in future heterogeneous network. Therefore, the statistical distribution of video data can be used as the inputs for performance modeling of networks. The finding of this paper comprises the theoretical definition of distribution which seems to be relevant to the video trace in terms of its statistical properties and finds the best distribution using both the graphical method and the hypothesis test. The data set used in this article consists of layered video traces generating from Scalable Video Codec (SVC video compression technique of three different movies.
Huijbregts, M.A.H.; de Jong, Franciska M.G.
In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because
... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...
Jerry D. Gibson
Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.
Hines, Andrew; Gillen, Eoin; Harte, Naomi
There are many types of degradation which can occur in Voice over IP (VoIP) calls. Of interest in this work are degradations which occur independently of the codec, hardware or network in use. Specifically, their effect on the subjective and objec- tive quality of the speech is examined. Since no dataset suit- able for this purpose exists, a new dataset (TCD-VoIP) has been created and has been made publicly available. The dataset con- tains speech clips suffering from a range of common call q...
Hasse Jørgensen, Stina
About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....
Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...
... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...
This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering
Full Text Available Wireless-VoIP communications introduce perceptual degradations that are not present with traditional VoIP communications. This paper investigates the effects of such degradations on the performance of three state-of-the-art standard objective quality measurement algorithms—PESQ, P.563, and an "extended" E-model. The comparative study suggests that measurement performance is significantly affected by acoustic background noise type and level as well as speech codec and packet loss concealment strategy. On our data, PESQ attains superior overall performance and P.563 and E-model attain comparable performance figures.
Linear prediction (LP) analysis has been applied to speech system over the last few decades. LP technique is well-suited for speech analysis due to its ability to model speech production process approximately. Hence LP analysis has been widely used for speech enhancement, low-bit-rate speech coding in cellular telephony, speech recognition, characteristic parameter extraction (vocal tract resonances frequencies, fundamental frequency called pitch) and so on. However, the performance of the co...
Carrier nature of speech; modulation spectrum; spectral dynamics ... the relationships between phonetic values of sounds and their short-term spectral envelopes .... the number of free parameters that need to be estimated from training data.
This book shows how networking research and quality engineering can be combined to successfully manage the transmission quality when speech and video telephony is delivered in heterogeneous wireless networks. Nomadic use of services requires intelligent management of ongoing transmission, and to make the best of available resources many fundamental trade-offs must be considered. Network coverage versus throughput and reliability of a connection is one key aspect, efficiency versus robustness of signal compression is another. However, to successfully manage services, user-perceived Quality of Experience (QoE) in heterogeneous networks must be known, and the perception of quality changes must be understood. These issues are addressed in this book, in particular focusing on the perception of quality changes due to switching between diverse networks, speech and video codecs, and encoding bit rates during active calls.
Huo, Xueliang; Ghovanloo, Maysam
Tongue Drive System (TDS) is a wireless tongue operated assistive technology (AT), which can enable people with severe physical disabilities to access computers and drive powered wheelchairs using their volitional tongue movements. TDS offers six discrete commands, simultaneously available to the users, for pointing and typing as a substitute for mouse and keyboard in computer access, respectively. To enhance the TDS performance in typing, we have added a microphone, an audio codec, and a wireless audio link to its readily available 3-axial magnetic sensor array, and combined it with a commercially available speech recognition software, the Dragon Naturally Speaking, which is regarded as one of the most efficient ways for text entry. Our preliminary evaluations indicate that the combined TDS and speech recognition technologies can provide end users with significantly higher performance than using each technology alone, particularly in completing tasks that require both pointing and text entry, such as web surfing.
Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne
Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and
Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.
Anne Birgitta Nilsen
Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the
Benesty, Jacob; Chen, Jingdong
We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red
Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.
...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...
Светлана Викторовна Иванова
Full Text Available Discourse and media communication researchers pay attention to the fact that popular discursive and communicative practices have a tendency to hybridization and convergence. Discourse which is understood as language in use is flexible. Consequently, it turns out that one and the same text can represent several types of discourses. A vivid example of this tendency is revealed in American commencement speech / commencement address / graduation speech. A commencement speech is a speech university graduates are addressed with which in compliance with the modern trend is delivered by outstanding media personalities (politicians, athletes, actors, etc.. The objective of this study is to define the specificity of the realization of polydiscursive practices within commencement speech. The research involves discursive, contextual, stylistic and definitive analyses. Methodologically the study is based on the discourse analysis theory, in particular the notion of a discursive practice as a verbalized social practice makes up the conceptual basis of the research. This research draws upon a hundred commencement speeches delivered by prominent representatives of American society since 1980s till now. In brief, commencement speech belongs to institutional discourse public speech embodies. Commencement speech institutional parameters are well represented in speeches delivered by people in power like American and university presidents. Nevertheless, as the results of the research indicate commencement speech institutional character is not its only feature. Conceptual information analysis enables to refer commencement speech to didactic discourse as it is aimed at teaching university graduates how to deal with challenges life is rich in. Discursive practices of personal discourse are also actively integrated into the commencement speech discourse. More than that, existential discursive practices also find their way into the discourse under study. Commencement
... disorder; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder; Speech disorder - stuttering ... evaluation tools that can help identify and diagnose speech disorders: Denver Articulation Screening Examination Goldman-Fristoe Test of ...
The VDE system developed had the capability of recognizing up to 248 separate words in syntactic structures. 4 The two systems described are isolated...AND SPEAKER RECOGNITION by M.J.Hunt 5 ASSESSMENT OF SPEECH SYSTEMS ’ ..- * . by R.K.Moore 6 A SURVEY OF CURRENT EQUIPMENT AND RESEARCH’ by J.S.Bridle...TECHNOLOGY IN NAVY TRAINING SYSTEMS by R.Breaux, M.Blind and R.Lynchard 10 9 I-I GENERAL REVIEW OF MILITARY APPLICATIONS OF VOICE PROCESSING DR. BRUNO
Full Text Available This paper presents a method of speech recognition by pattern recognition techniques. Learning consists in determining the unique characteristics of a word (cepstral coefficients by eliminating those characteristics that are different from one word to another. For learning and recognition, the system will build a dictionary of words by determining the characteristics of each word to be used in the recognition. Determining the characteristics of an audio signal consists in the following steps: noise removal, sampling it, applying Hamming window, switching to frequency domain through Fourier transform, calculating the magnitude spectrum, filtering data, determining cepstral coefficients.
Full Text Available Nowadays, there is a quickly growing demand for the transmission of voice, video and data over an IP based network. Multimedia, whether we are talking about broadcast, audio and video transmission and others, from a global perspective is growing exponentially with time. With incoming requests from users, new technologies for data transfer are continually developing. Data must be delivered reliably and with the fewest losses at such high speed. Video quality as part of multimedia technology has a very important role nowadays. It is influenced by several factors, where each of them can have many forms and processing. Network performance is the major degradation effect that influences the quality of resulting image. Poor network performance (lack of link capacity, high network loadâ€¦ causes data packet losses or different delivery time for each packet. This work focuses exactly on these network phenomena. It examines the impact of different delays and packet losses on the quality parameters of triple play services, to evaluate the results using objective methods. The aim of this work is to bring a detailed view on the performance of video streaming over IP-based networks.
Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.
This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...
... OTC Relief for Diarrhea Home Diseases and Conditions Speech and Language Delay Condition Speech and Language Delay Share Print Table of Contents1. ... Treatment6. Everyday Life7. Questions8. Resources What is a speech and language delay? A speech and language delay ...
A speech modulated white noise device is reported that gives the rhythmic characteristics of a speech signal for intelligible reception by deaf persons. The signal is composed of random amplitudes and frequencies as modulated by the speech envelope characteristics of rhythm and stress. Time intensity parameters of speech are conveyed through the vibro-tactile sensation stimuli.
In this book, we introduce the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition. The specific models treated in depth include the widely used exponential-family distributions and the hidden Markov model. A detailed study is presented on unifying the common objective functions for discriminative learning in speech recognition, namely maximum mutual information (MMI), minimum classification error, and minimum phone/word error. The unification is presented, with rigorous mathematical analysis, in a common rational-functio
Nørholm, Sidsel Marie; Jensen, Jesper Rindom; Christensen, Mads Græsbøll
In this paper, the issue of single channel speech enhancement of non-stationary voiced speech is addressed. The non-stationarity of speech is well known, but state of the art speech enhancement methods assume stationarity within frames of 20–30 ms. We derive optimal distortionless filters that take...... the non-stationarity nature of voiced speech into account via linear constraints. This is facilitated by imposing a harmonic chirp model on the speech signal. As an implicit part of the filter design, the noise statistics are also estimated based on the observed signal and parameters of the harmonic chirp...... model. Simulations on real speech show that the chirp based filters perform better than their harmonic counterparts. Further, it is seen that the gain of using the chirp model increases when the estimated chirp parameter is big corresponding to periods in the signal where the instantaneous fundamental...
Kavalekalam, Mathew Shaji; Christensen, Mads Græsbøll; Boldt, Jesper B.
This paper deals with the enhancement of speech in presence of non-stationary babble noise. A binaural speech enhancement framework is proposed which takes into account both the voiced and unvoiced speech production model. The usage of this model in enhancement requires the Short term predictor...... (STP) parameters and the pitch information to be estimated. This paper uses a codebook based approach for estimating the STP parameters and a parametric binaural method is proposed for estimating the pitch parameters. Improvements in objective score are shown when using the voicedunvoiced speech model...
Basharirad, Babak; Moradhaseli, Mohammadreza
Recently, attention of the emotional speech signals research has been boosted in human machine interfaces due to availability of high computation capability. There are many systems proposed in the literature to identify the emotional state through speech. Selection of suitable feature sets, design of a proper classifications methods and prepare an appropriate dataset are the main key issues of speech emotion recognition systems. This paper critically analyzed the current available approaches of speech emotion recognition methods based on the three evaluating parameters (feature set, classification of features, accurately usage). In addition, this paper also evaluates the performance and limitations of available methods. Furthermore, it highlights the current promising direction for improvement of speech emotion recognition systems.
Roelant Ossewaarde; Roel Jonkers; Fedor Jalvingh; Roelien Bastiaanse
From the article: "Individuals with dementia often experience a decline in their ability to use language. Language problems have been reported in individuals with dementia caused by Alzheimer’s disease, Parkinson’s disease or degeneration of the fronto-temporal area. Acoustic properties are
Yen, Chih-Ta; Huang, Jen-Fa; Chih, Ping-En
We propose and experimentally demonstrated the two bands optical code-division multiple-access (OCDMA) network over bipolar Walsh-coded liquid-crystal modulators (LCMs) and driven by green light and red light lasers. Achieving system performance depends on the construction of a decoder that implements a true bipolar correlation using only unipolar signals and intensity detection for each band. We took advantage of the phase delay characteristics of LCMs to construct a prototype optical coder/decoder (codec). Matched and unmatched Walsh signature codes were evaluated to detect correlations among multiuser data in the access network. By using LCMs, a red and green laser light source was spectrally encoded and the summed light dots were complementary decoded. Favorable contrast on auto- and cross-correlations indicates that binary information symbols can be properly recovered using a balanced photodetector.
Full Text Available In multi-rate WLANs, users can suffer transmission rate changes due to the link adaptation mechanism. This results in a variable capacity channel, which is very hostile for VoIP and can cause serious quality of service (QoS degradation in all active calls. Various codec adaptation mechanisms have been proposed as a solution to this, as well as to solve congestion problems on WLAN environments. Here, these solutions are presented, categorized according to the adaptation policy and scenario they implement, and evaluated at call-level in terms of the resulting blocking and dropping probabilities, as well as the perceived voice quality. To define a common performance metric, a new index named VGoS-factor is presented, which, by combining these capacity and quality indicators, can provide an overall view of the capacity versus quality trade-off of the proposed mechanisms and consequently help in choosing the adequate policy for each scenario.
Trouvain, Jürgen; Truong, Khiet Phuong
Physical activity leads to a respiratory behaviour that is very different to a resting state and that influences speech production. How speech parameters are exactly affected by physical activity remains largely unknown. Hence, we investigated how several prosodic parameters change under influence
Full Text Available One of the major problems concerning the evolution of human language is to understand how sounds became associated to meaningful gestures. It has been proposed that the circuit controlling gestures and speech evolved from a circuit involved in the control of arm and mouth movements related to ingestion. This circuit contributed to the evolution of spoken language, moving from a system of communication based on arm gestures. The discovery of the mirror neurons has provided strong support for the gestural theory of speech origin because they offer a natural substrate for the embodiment of language and create a direct link between sender and receiver of a message. Behavioural studies indicate that manual gestures are linked to mouth movements used for syllable emission. Grasping with the hand selectively affected movement of inner or outer parts of the mouth according to syllable pronunciation and hand postures, in addition to hand actions, influenced the control of mouth grasp and vocalization. Gestures and words are also related to each other. It was found that when producing communicative gestures (emblems the intention to interact directly with a conspecific was transferred from gestures to words, inducing modification in voice parameters. Transfer effects of the meaning of representational gestures were found on both vocalizations and meaningful words. It has been concluded that the results of our studies suggest the existence of a system relating gesture to vocalization which was precursor of a more general system reciprocally relating gesture to word.
Kovacevic, Branko; Veinović, Mladen; Marković, Milan
This book focuses on speech signal phenomena, presenting a robustification of the usual speech generation models with regard to the presumed types of excitation signals, which is equivalent to the introduction of a class of nonlinear models and the corresponding criterion functions for parameter estimation. Compared to the general class of nonlinear models, such as various neural networks, these models possess good properties of controlled complexity, the option of working in “online” mode, as well as a low information volume for efficient speech encoding and transmission. Providing comprehensive insights, the book is based on the authors’ research, which has already been published, supplemented by additional texts discussing general considerations of speech modeling, linear predictive analysis and robust parameter estimation.
... to being completely unable to speak or understand speech. Causes include Hearing disorders and deafness Voice problems, ... or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism ...
Phifer, Gregg, Ed.
The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…
Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang
Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.
Schalling, Ellika; Hartelius, Lena
Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.
Gopi, E S
Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.
Full Text Available Background and Aims: Dysarthria affects linguistic domains such as respiration, phonation, articulation, resonance and prosody due to upper motor neuron, lower motor neuron, cerebellar or extrapyramidal tract lesions. Although Bengali is one of the major languages globally, dysarthric Bengali speech has not been subjected to neurolinguistic analysis. We attempted such an analysis with the goal of identifying the speech defects in native Bengali speakers in various types of dysarthria encountered in neurological disorders. Settings and Design: A cross-sectional observational study was conducted with 66 dysarthric subjects, predominantly middle-aged males, attending the Neuromedicine OPD of a tertiary care teaching hospital in Kolkata. Materials and Methods: After neurological examination, an instrument comprising commonly used Bengali words and a text block covering all Bengali vowels and consonants were used to carry out perceptual analysis of dysarthric speech. From recorded speech, 24 parameters pertaining to five linguistic domains were assessed. The Kruskal-Wallis analysis of variance, Chi-square test and Fisher′s exact test were used for analysis. Results: The dysarthria types were spastic (15 subjects, flaccid (10, mixed (12, hypokinetic (12, hyperkinetic (9 and ataxic (8. Of the 24 parameters assessed, 15 were found to occur in one or more types with a prevalence of at least 25%. Imprecise consonant was the most frequently occurring defect in most dysarthrias. The spectrum of defects in each type was identified. Some parameters were capable of distinguishing between types. Conclusions: This perceptual analysis has defined linguistic defects likely to be encountered in dysarthric Bengali speech in neurological disorders. The speech distortion can be described and distinguished by a limited number of parameters. This may be of importance to the speech therapist and neurologist in planning rehabilitation and further management.
Menegueti, Katia Ignacio; Mangilli, Laura Davison; Alonso, Nivaldo; Andrade, Claudia Regina Furquim de
To characterize the profile and speech characteristics of patients undergoing primary palatoplasty in a Brazilian university hospital, considering the time of intervention (early, before two years of age; late, after two years of age). Participants were 97 patients of both genders with cleft palate and/or cleft and lip palate, assigned to the Speech-language Pathology Department, who had been submitted to primary palatoplasty and presented no prior history of speech-language therapy. Patients were divided into two groups: early intervention group (EIG) - 43 patients undergoing primary palatoplasty before 2 years of age and late intervention group (LIG) - 54 patients undergoing primary palatoplasty after 2 years of age. All patients underwent speech-language pathology assessment. The following parameters were assessed: resonance classification, presence of nasal turbulence, presence of weak intraoral air pressure, presence of audible nasal air emission, speech understandability, and compensatory articulation disorder (CAD). At statistical significance level of 5% (p≤0.05), no significant difference was observed between the groups in the following parameters: resonance classification (p=0.067); level of hypernasality (p=0.113), presence of nasal turbulence (p=0.179); presence of weak intraoral air pressure (p=0.152); presence of nasal air emission (p=0.369), and speech understandability (p=0.113). The groups differed with respect to presence of compensatory articulation disorders (p=0.020), with the LIG presenting higher occurrence of altered phonemes. It was possible to assess the general profile and speech characteristics of the study participants. Patients submitted to early primary palatoplasty present better speech profile.
Full Text Available The adaptive multi-rate wideband (AMR-WB speech codec is widely used in modern mobile communication systems for high speech quality in handheld devices. Nonetheless, a major disadvantage is that vector quantization (VQ of immittance spectral frequency (ISF coefficients takes a considerable computational load in the AMR-WB coding. Accordingly, a binary search space-structured VQ (BSS-VQ algorithm is adopted to efficiently reduce the complexity of ISF quantization in AMR-WB. This search algorithm is done through a fast locating technique combined with lookup tables, such that an input vector is efficiently assigned to a subspace where relatively few codeword searches are required to be executed. In terms of overall search performance, this work is experimentally validated as a superior search algorithm relative to a multiple triangular inequality elimination (MTIE, a TIE with dynamic and intersection mechanisms (DI-TIE, and an equal-average equal-variance equal-norm nearest neighbor search (EEENNS approach. With a full search algorithm as a benchmark for overall search load comparison, this work provides an 87% search load reduction at a threshold of quantization accuracy of 0.96, a figure far beyond 55% in the MTIE, 76% in the EEENNS approach, and 83% in the DI-TIE approach.
Sandor, Aniko; Moses, Haifa
Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.
An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...
It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the
Full Text Available Language therapy has trafficked from a medical focus until a preventive focus. However, difficulties are evidenced in the development of this last task, because he is devoted bigger space to the correction of the disorders of the language. Because the speech disorders is the dysfunction with more frequently appearance, acquires special importance the preventive work that is developed to avoid its appearance. Speech education since early age of the childhood makes work easier for prevent the appearance of speech disorders in the children. The present work has as objective to offer different activities for the prevention of the speech disorders.
Full Text Available We propose a modular no-reference video quality prediction model for videos that are encoded with H.265/HEVC and VP9 codecs and viewed on mobile devices. The impairments which can affect video transmission are classified into two broad types depending upon which layer of the TCP/IP model they originated from. Impairments from the network layer are called the network QoS factors, while those from the application layer are called the application/payload QoS factors. Initially we treat the network and application QoS factors separately and find out the 1 : 1 relationship between the respective QoS factors and the corresponding perceived video quality or QoE. The mapping from the QoS to the QoE domain is based upon a decision variable that gives an optimal performance. Next, across each group we choose multiple QoS factors and find out the QoE for such multifactor impaired videos by using an additive, multiplicative, and regressive approach. We refer to these as the integrated network and application QoE, respectively. At the end, we use a multiple regression approach to combine the network and application QoE for building the final model. We also use an Artificial Neural Network approach for building the model and compare its performance with the regressive approach.
Cognitive performance may be significantly affected by fatigue. In the case of critical personnel, such as pilots, monitoring human fatigue is essential to ensure safety and success of a given operation. One of the modalities that may be used for this purpose is speech, which is sensitive to respiratory changes and increased muscle tension of vocal cords, induced by fatigue. Age, gender, vocal tract length, physical and emotional state may significantly alter speech intensity, duration, rhythm, and spectral characteristics. In addition to changes in speech rhythm, fatigue may also affect the quality of speech, such as articulation. In a noisy environment, detecting fatigue-related changes in speech signals, particularly subtle changes at the onset of fatigue, may be difficult. Therefore, in a performance-monitoring system, speech parameters which are significantly affected by fatigue need to be identified and extracted from input signals. For this purpose, a series of experiments was performed under slowly varying cognitive load conditions and at different times of the day. The results of the data analysis are presented here.
Meijers, A.W.M.; Tsohatzidis, S.L.
From its early development in the 1960s, speech act theory always had an individualistic orientation. It focused exclusively on speech acts performed by individual agents. Paradigmatic examples are ‘I promise that p’, ‘I order that p’, and ‘I declare that p’. There is a single speaker and a single
Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…
Kane, Peter E., Ed.
The 11 articles in this collection deal with theoretical and practical freedom of speech issues. The topics covered are (1) the United States Supreme Court and communication theory; (2) truth, knowledge, and a democratic respect for diversity; (3) denial of freedom of speech in Jock Yablonski's campaign for the presidency of the United Mine…
Shearer, William M.
Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…
Kane, Peter E., Ed.
This issue of "Free Speech" contains the following articles: "Daniel Schoor Relieved of Reporting Duties" by Laurence Stern, "The Sellout at CBS" by Michael Harrington, "Defending Dan Schorr" by Tome Wicker, "Speech to the Washington Press Club, February 25, 1976" by Daniel Schorr, "Funds…
Başkent, Deniz; Gaudrain, Etienne
Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level
Novelli-Olmstead, Tina; Ling, Daniel
Seven hearing impaired children (five to seven years old) assigned to the Speakers group made highly significant gains in speech production and auditory discrimination of speech, while Listeners made only slight speech production gains and no gains in auditory discrimination. Combined speech and auditory training was more effective than auditory…
Full Text Available One receives much more information through a visual sense than through a tactile one. However, most visual aids for hearing-impaired persons are not wearable because it is difficult to make them compact and it is not a best way to mask always their vision.Generally it is difficult to get the integrated patterns by a single mathematical transform of signals, such as a Foruier transform. In order to obtain the integrated pattern speech parameters should be carefully extracted by an analysis according as each parameter, and a visual pattern, which can intuitively be understood by anyone, must be synthesized from them. Successful integration of speech parameters will never disturb understanding of individual features, so that the system can be used for speech training and communication.
Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A
Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.
Harley, Trevor A.
Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…
Mario T Carreon
Full Text Available This paper discusses the Speech and Phoneme Recognition as an Educational Aid for the Deaf and Hearing Impaired (SPREAD application and the ongoing research on its deployment as a tool for motivating deaf and hearing impaired students to learn and appreciate speech. This application uses the Sphinx-4 voice recognition system to analyze the vocalization of the student and provide prompt feedback on their pronunciation. The packaging of the application as an interactive game aims to provide additional motivation for the deaf and hearing impaired student through visual motivation for them to learn and appreciate speech.
Binderup, Lars Grassme
, as opposed to a legal norm, that curbs exercises of the right to free speech that offend the feelings or beliefs of members from other cultural groups. The paper rejects the suggestion that acceptance of such a norm is in line with liberal egalitarian thinking. Following a review of the classical liberal...... egalitarian reasons for free speech - reasons from overall welfare, from autonomy and from respect for the equality of citizens - it is argued that these reasons outweigh the proposed reasons for curbing culturally offensive speech. Currently controversial cases such as that of the Danish Cartoon Controversy...
Jorge, Carlos A.F.; Aghina, Mauricio A.C.; Mol, Antonio C.A.; Pereira, Claudio M.N.A.
This work reports the development of a Man Machine Interface based on speech recognition. The system must recognize spoken commands, and execute the desired tasks, without manual interventions of operators. The range of applications goes from the execution of commands in an industrial plant's control room, to navigation and interaction in virtual environments. Results are reported for isolated word recognition, the isolated words corresponding to the spoken commands. For the pre-processing stage, relevant parameters are extracted from the speech signals, using the cepstral analysis technique, that are used for isolated word recognition, and corresponds to the inputs of an artificial neural network, that performs recognition tasks. (author)
Claudia Regina Furquim de Andrade
Full Text Available CONTEXT: The speech rate is one of the parameters considered when investigating speech fluency and is an important variable in the assessment of individuals with communication complaints. OBJECTIVE: To correlate the stuttering severity index with one of the indices used for assessing fluency/speech rate. DESIGN: Cross-sectional study. SETTING: Fluency and Fluency Disorders Investigation Laboratory, Faculdade de Medicina da Universidade de São Paulo. PARTICIPANTS: Seventy adults with stuttering diagnosis. MAIN MEASUREMENTS: A speech sample from each participant containing at least 200 fluent syllables was videotaped and analyzed according to a stuttering severity index test and speech rate parameters. RESULTS: The results obtained in this study indicate that the stuttering severity and the speech rate present significant variation, i.e., the more severe the stuttering is, the lower the speech rate in words and syllables per minute. DISCUSSION AND CONCLUSION: The results suggest that speech rate is an important indicator of fluency levels and should be incorporated in the assessment and treatment of stuttering. This study represents a first attempt to identify the possible subtypes of developmental stuttering. DEFINITION: Objective tests that quantify diseases are important in their diagnosis, treatment and prognosis.
Berisha, Visar; Liss, Julie
Bandwidth extension of speech is used in the International Telecommunication Union G.729.1 standard in which the narrowband bitstream is combined with quantized high-band parameters. Although this system produces high-quality wideband speech, the additional bits used to represent the high band can be further reduced. In addition to the algorithm used in the G.729.1 standard, bandwidth extension methods based on spectrum prediction have also been proposed. Although these algorithms do not require additional bits, they perform poorly when the correlation between the low and the high band is weak. In this book, two wideband speech coding algorithms that rely on bandwidth extension are developed. The algorithms operate as wrappers around existing narrowband compression schemes. More specifically, in these algorithms, the low band is encoded using an existing toll-quality narrowband system, whereas the high band is generated using the proposed extension techniques. The first method relies only on transmitted high-...
Niebuhr, Oliver; Brem, Alexander; Novák-Tót, Eszter
to business speeches. Consistent with the public opinion, our findings are indicative of Steve Jobs being a more charismatic speaker than Mark Zuckerberg. Beyond previous studies, our data suggest that rhythm and emphatic accentuation are also involved in conveying charisma. Furthermore, the differences...... between Steve Jobs and Mark Zuckerberg and the investor- and customer-related sections of their speeches support the modern understanding of charisma as a gradual, multiparametric, and context-sensitive concept....
Vích, Robert; Vondra, Martin
Vol. 4775, - (2007), s. 129-137 ISSN 0302-9743. [COST Action 2102 International Workshop. Vietri sul Mare, 29.03.2007-31.03.2007] R&D Projects: GA AV ČR(CZ) 1ET301710509 Institutional research plan: CEZ:AV0Z20670512 Keywords : speech * speech processing * cepstral analysis Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering Impact factor: 0.302, year: 2005
Locke, J L; Kutz, K J
Thirty kindergarteners, 15 who substituted /w/ for /r/ and 15 with correct articulation, received two perception tests and a memory test that included /w/ and /r/ in minimally contrastive syllables. Although both groups had nearly perfect perception of the experimenter's productions of /w/ and /r/, misarticulating subjects perceived their own tape-recorded w/r productions as /w/. In the memory task these same misarticulating subjects committed significantly more /w/-/r/ confusions in unspoken recall. The discussion considers why people subvocally rehearse; a developmental period in which children do not rehearse; ways subvocalization may aid recall, including motor and acoustic encoding; an echoic store that provides additional recall support if subjects rehearse vocally, and perception of self- and other- produced phonemes by misarticulating children-including its relevance to a motor theory of perception. Evidence is presented that speech for memory can be sufficiently impaired to cause memory disorder. Conceptions that restrict speech disorder to an impairment of communication are challenged.
Jørgensen, Søren; Dau, Torsten
The speech-based envelope power spectrum model (sEPSM; ) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...
Shi, Wenhua; Zhang, Xiongwei; Zou, Xia; Han, Wei
In this paper, a speech enhancement method using noise classification and Deep Neural Network (DNN) was proposed. Gaussian mixture model (GMM) was employed to determine the noise type in speech-absent frames. DNN was used to model the relationship between noisy observation and clean speech. Once the noise type was determined, the corresponding DNN model was applied to enhance the noisy speech. GMM was trained with mel-frequency cepstrum coefficients (MFCC) and the parameters were estimated with an iterative expectation-maximization (EM) algorithm. Noise type was updated by spectrum entropy-based voice activity detection (VAD). Experimental results demonstrate that the proposed method could achieve better objective speech quality and smaller distortion under stationary and non-stationary conditions.
L. M. Paulson
Full Text Available Introduction. Controversy exists over whether tonsillectomy will affect speech in patients with known velopharyngeal insufficiency (VPI, particularly in those with cleft palate. Methods. All patients seen at the OHSU Doernbecher Children's Hospital VPI clinic between 1997 and 2010 with VPI who underwent tonsillectomy were reviewed. Speech parameters were assessed before and after tonsillectomy. Wilcoxon rank-sum testing was used to evaluate for significance. Results. A total of 46 patients with VPI underwent tonsillectomy during this period. Twenty-three had pre- and postoperative speech evaluation sufficient for analysis. The majority (87% had a history of cleft palate. Indications for tonsillectomy included obstructive sleep apnea in 11 (48% and staged tonsillectomy prior to pharyngoplasty in 10 (43%. There was no significant difference between pre- and postoperative speech intelligibility or velopharyngeal competency in this population. Conclusion. In this study, tonsillectomy in patients with VPI did not significantly alter speech intelligibility or velopharyngeal competence.
Nie, Yingjiu; Galvin, John J; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie
This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.
WEIJianqiang; DULimin; YANZhaoli; ZENGHui
In this paper, a Kalman filter-based speech enhancement algorithm with some improvements of previous work is presented. A new technique based on spectral subtraction is used for separation speech and noise characteristics from noisy speech and for the computation of speech and noise Autoregressive (AR) parameters. In order to obtain a Kalman filter output with high audible quality, a perceptual post-filter is placed at the output of the Kalman filter to smooth the enhanced speech spectra.Extensive experiments indicate that this newly proposed method works well.
Lewis, James R
Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application
Booz, Jaime A.
with making adjustments to improve speech clarity have a larger impact on perceived femininity than others. Using a clear speech strategy alone may not be sufficient for a male speaker to be perceived as female, but could be used as one of many tools to help speakers achieve more "feminine" speech, in conjunction with more specific strategies targeting the acoustic parameters outlined in this study.
John F Magnotti
Full Text Available During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions abut the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.
Martínez-Sánchez, F; Meilán, J J G; Carro, J; Gómez Íñiguez, C; Millian-Morell, L; Pujante Valverde, I M; López-Alburquerque, T; López, D E
Speech disturbances will affect most patients with Parkinson's disease (PD) over the course of the disease. The origin and severity of these symptoms are of clinical and diagnostic interest. To evaluate the clinical pattern of speech impairment in PD patients and identify significant differences in speech rate and articulation compared to control subjects. Speech rate and articulation in a reading task were measured using an automatic analytical method. A total of 39 PD patients in the 'on' state and 45 age-and sex-matched asymptomatic controls participated in the study. None of the patients experienced dyskinesias or motor fluctuations during the test. The patients with PD displayed a significant reduction in speech and articulation rates; there were no significant correlations between the studied speech parameters and patient characteristics such as L-dopa dose, duration of the disorder, age, and UPDRS III scores and Hoehn & Yahr scales. Patients with PD show a characteristic pattern of declining speech rate. These results suggest that in PD, disfluencies are the result of the movement disorder affecting the physiology of speech production systems. Copyright © 2014 Sociedad Española de Neurología. Publicado por Elsevier España, S.L.U. All rights reserved.
Sandor, A.; Moses, H. R.
Currently on the International Space Station (ISS) and other space vehicles Caution & Warning (C&W) alerts are represented with various auditory tones that correspond to the type of event. This system relies on the crew's ability to remember what each tone represents in a high stress, high workload environment when responding to the alert. Furthermore, crew receive a year or more in advance of the mission that makes remembering the semantic meaning of the alerts more difficult. The current system works for missions conducted close to Earth where ground operators can assist as needed. On long duration missions, however, they will need to work off-nominal events autonomously. There is evidence that speech alarms may be easier and faster to recognize, especially during an off-nominal event. The Information Presentation Directed Research Project (FY07-FY09) funded by the Human Research Program included several studies investigating C&W alerts. The studies evaluated tone alerts currently in use with NASA flight deck displays along with candidate speech alerts. A follow-on study used four types of speech alerts to investigate how quickly various types of auditory alerts with and without a speech component - either at the beginning or at the end of the tone - can be identified. Even though crew were familiar with the tone alert from training or direct mission experience, alerts starting with a speech component were identified faster than alerts starting with a tone. The current study replicated the results from the previous study in a more rigorous experimental design to determine if the candidate speech alarms are ready for transition to operations or if more research is needed. Four types of alarms (caution, warning, fire, and depressurization) were presented to participants in both tone and speech formats in laboratory settings and later in the Human Exploration Research Analog (HERA). In the laboratory study, the alerts were presented by software and participants were
The purpose of this study is to examine speech intelligibility of children with primary speech and sound disorders aged 3 to 6 years in everyday life. The research problem is based on the degree to which parents or guardians, immediate family members (sister, brother, grandparents), extended family members (aunt, uncle, cousin), child's friends, other acquaintances, child's teachers and strangers understand the speech of children with speech sound disorders. We examined whether the level ...
Akhoun, Idrick; Moulin, Annie; Jeanvoine, Arnaud; Ménard, Mikael; Buret, François; Vollaire, Christian; Scorretti, Riccardo; Veuillet, Evelyne; Berger-Vachon, Christian; Collet, Lionel; Thai-Van, Hung
Speech elicited auditory brainstem responses (Speech ABR) have been shown to be an objective measurement of speech processing in the brainstem. Given the simultaneous stimulation and recording, and the similarities between the recording and the speech stimulus envelope, there is a great risk of artefactual recordings. This study sought to systematically investigate the source of artefactual contamination in Speech ABR response. In a first part, we measured the sound level thresholds over which artefactual responses were obtained, for different types of transducers and experimental setup parameters. A watermelon model was used to model the human head susceptibility to electromagnetic artefact. It was found that impedances between the electrodes had a great effect on electromagnetic susceptibility and that the most prominent artefact is due to the transducer's electromagnetic leakage. The only artefact-free condition was obtained with insert-earphones shielded in a Faraday cage linked to common ground. In a second part of the study, using the previously defined artefact-free condition, we recorded speech ABR in unilateral deaf subjects and bilateral normal hearing subjects. In an additional control condition, Speech ABR was recorded with the insert-earphones used to deliver the stimulation, unplugged from the ears, so that the subjects did not perceive the stimulus. No responses were obtained from the deaf ear of unilaterally hearing impaired subjects, nor in the insert-out-of-the-ear condition in all the subjects, showing that Speech ABR reflects the functioning of the auditory pathways.
Faizel Amri, Umar; Nur Wahidah Nik Hashim, Nik; Hazrin Hany Mohamad Hanif, Noor
In the department of engineering, students are required to fulfil at least 80 percent of class attendance. Conventional method requires student to sign his/her initial on the attendance sheet. However, this method is prone to cheating by having another student signing for their fellow classmate that is absent. We develop our hypothesis according to a verse in the Holy Qur’an (95:4), “We have created men in the best of mould”. Based on the verse, we believe each psychological characteristic of human being is unique and thus, their speech characteristic should be unique. In this paper we present the development of speech biometric-based attendance system. The system requires user’s voice to be installed in the system as trained data and it is saved in the system for registration of the user. The following voice of the user will be the test data in order to verify with the trained data stored in the system. The system uses PSD (Power Spectral Density) and Transition Parameter as the method for feature extraction of the voices. Euclidean and Mahalanobis distances are used in order to verified the user’s voice. For this research, ten subjects of five females and five males were chosen to be tested for the performance of the system. The system performance in term of recognition rate is found to be 60% correct identification of individuals.
Rajesh Kumar Dubey
Full Text Available The use of single time-instance features, where entire speech utterance is used for feature computation, is not accurate and adequate in capturing the time localized information of short-time transient distortions and their distinction from plosive sounds of speech, particularly degraded by impulsive noise. Hence, the importance of estimating features at multiple time-instances is sought. In this, only active speech segments of degraded speech are used for features computation at multiple time-instances on per frame basis. Here, active speech means both voiced and unvoiced frames except silence. The features of different combinations of multiple contiguous active speech segments are computed and called multiple time-instances features. The joint GMM training has been done using these features along with the subjective MOS of the corresponding speech utterance to obtain the parameters of GMM. These parameters of GMM and multiple time-instances features of test speech are used to compute the objective MOS values of different combinations of multiple contiguous active speech segments. The overall objective MOS of the test speech utterance is obtained by assigning equal weight to the objective MOS values of the different combinations of multiple contiguous active speech segments. This algorithm outperforms the Recommendation ITU-T P.563 and recently published algorithms.
Pop, Claudiu B.; Rindel, Jens Holger
In open plan offices the lack of speech privacy between the workstations is one of the major acoustic problems. Improving the speech privacy in an open plan design is therefore the main concern for a successful open plan environment. The project described in this paper aimed to find an objective...... parameter that correlates well with the perceived degree of speech privacy and to derive a clear method for evaluating the acoustic conditions in open plan offices. Acoustic measurements were carried out in an open plan office, followed by data analysis at the Acoustic Department, DTU. A computer model...
section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...
Proper speech functioning in human being, depends on the precise coordination and timing balances in a series of complex neuro nuscular movements and actions. Starting from the prime organ of energy source of expelled air from respirato y system; deliver such air to trigger vocal cords; swift changes of this phonatory episode to a comprehensible sound in RESONACE and final coordination of all head and neck structures to elicit final speech in ...
The paper contains a transcript of a speech by the chairman of the UKAEA, to mark the publication of the 1985/6 annual report. The topics discussed in the speech include: the Chernobyl accident and its effect on public attitudes to nuclear power, management and disposal of radioactive waste, the operation of UKAEA as a trading fund, and the UKAEA development programmes. The development programmes include work on the following: fast reactor technology, thermal reactors, reactor safety, health and safety aspects of water cooled reactors, the Joint European Torus, and under-lying research. (U.K.)
Rødbro, Christoffer A.; Christensen, Mads Græsbøll; Andersen, Søren Vang
We consider the problem of packet loss concealment for voice over IP (VoIP). The speech signal is compressed at the transmitter using a sinusoidal coding scheme working at 8 kbit/s. At the receiver, packet loss concealment is carried out working directly on the quantized sinusoidal parameters......, based on time-scaling of the packets surrounding the missing ones. Subjective listening tests show promising results indicating the potential of sinusoidal speech coding for VoIP....
Herbelin, Bruno; Jensen, Karl Kristoffer; Graugaard, Lars
Speech is both beautiful and informative. In this work, a conceptual study of the speech, through investigation of the tower of Babel, the archetypal phonemes, and a study of the reasons of uses of language is undertaken in order to create an artistic work investigating the nature of speech. The ....... The artwork is presented at the Re:New festival in May 2008....
Lam, Choi Ling Coriolanus
One of the major acoustical concerns in classrooms is the establishment of effective verbal communication between teachers and students. Non-optimal acoustical conditions, resulting in reduced verbal communication, can cause two main problems. First, they can lead to reduce learning efficiency. Second, they can also cause fatigue, stress, vocal strain and health problems, such as headaches and sore throats, among teachers who are forced to compensate for poor acoustical conditions by raising their voices. Besides, inadequate acoustical conditions can induce the usage of public address system. Improper usage of such amplifiers or loudspeakers can lead to impairment of students' hearing systems. The social costs of poor classroom acoustics will be large to impair the learning of children. This invisible problem has far reaching implications for learning, but is easily solved. Many researches have been carried out that they have accurately and concisely summarized the research findings on classrooms acoustics. Though, there is still a number of challenging questions remaining unanswered. Most objective indices for speech intelligibility are essentially based on studies of western languages. Even several studies of tonal languages as Mandarin have been conducted, there is much less on Cantonese. In this research, measurements have been done in unoccupied rooms to investigate the acoustical parameters and characteristics of the classrooms. The speech intelligibility tests, which based on English, Mandarin and Cantonese, and the survey were carried out on students aged from 5 years old to 22 years old. It aims to investigate the differences in intelligibility between English, Mandarin and Cantonese of the classrooms in Hong Kong. The significance on speech transmission index (STI) related to Phonetically Balanced (PB) word scores will further be developed. Together with developed empirical relationship between the speech intelligibility in classrooms with the variations
Full Text Available The article deals with a number of relevant methodological issues. First of all, the author analyses psychological peculiarities of dialogic speech and states that the dialogue is the product of at least two persons. Therefore, in this view, dialogic speech, unlike monologic speech, happens impromptu and is not prepared in advance. Dialogic speech is mainly of situational character. The linguistic nature of dialogic speech, in the author’s opinion, lies in the process of exchanging replications, which are coherent in structural and functional character. The author classifies dialogue groups by the number of replications and communicative parameters. The basic goal of dialogic speech teaching is developing the abilities and skills which enable to exchange replications. The author distinguishes two basic stages of dialogic speech teaching: 1. Training of abilities to exchange replications during communicative exercises. 2. Development of skills by training the capability to perform exercises of creative nature during a group dialogue, conversation or debate.
The welcoming speech underlines the fact that any validation process starting with calculation methods and ending with studies on the long-term behaviour of a repository system can only be effected through laboratory, field and natural-analogue studies. The use of natural analogues (NA) is to secure the biosphere and to verify whether this safety really exists. (HP) [de
Ekström, Seth-Reino; Borg, Erik
The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (Ptempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (Pmusic offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.
Full Text Available The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA noise and speech spectrum-filtered noise (SPN]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA. The results showed a significant effect of piano performance speed and octave (P<.01. Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01 and SPN (P<.05. Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01, but there were smaller differences between masking conditions (P<.01. It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.
Kane, Peter E., Ed.
The seven articles in this collection deal with theoretical and practical freedom of speech issues. Topics covered are: the United States Supreme Court, motion picture censorship, and the color line; judicial decision making; the established scientific community's suppression of the ideas of Immanuel Velikovsky; the problems of avant-garde jazz,…
ROH Yong-Wan; KIM Dong-Ju; LEE Woo-Seok; HONG Kwang-Seok
This paper focuses on acoustic features that effectively improve the recognition of emotion in human speech. The novel features in this paper are based on spectral-based entropy parameters such as fast Fourier transform (FFT) spectral entropy, delta FFT spectral entropy, Mel-frequency filter bank (MFB)spectral entropy, and Delta MFB spectral entropy. Spectral-based entropy features are simple. They reflect frequency characteristic and changing characteristic in frequency of speech. We implement an emotion rejection module using the probability distribution of recognized-scores and rejected-scores.This reduces the false recognition rate to improve overall performance. Recognized-scores and rejected-scores refer to probabilities of recognized and rejected emotion recognition results, respectively.These scores are first obtained from a pattern recognition procedure. The pattern recognition phase uses the Gaussian mixture model (GMM). We classify the four emotional states as anger, sadness,happiness and neutrality. The proposed method is evaluated using 45 sentences in each emotion for 30 subjects, 15 males and 15 females. Experimental results show that the proposed method is superior to the existing emotion recognition methods based on GMM using energy, Zero Crossing Rate (ZCR),linear prediction coefficient (LPC), and pitch parameters. We demonstrate the effectiveness of the proposed approach. One of the proposed features, combined MFB and delta MFB spectral entropy improves performance approximately 10% compared to the existing feature parameters for speech emotion recognition methods. We demonstrate a 4% performance improvement in the applied emotion rejection with low confidence score.
Full Text Available The Nobel Peace Prize has long been considered the premier peace prize in the world. According to Geir Lundestad, Secretary of the Nobel Committee, of the 300 some peace prizes awarded worldwide, “none is in any way as well known and as highly respected as the Nobel Peace Prize” (Lundestad, 2001. Nobel peace speech is a unique and significant international site of public discourse committed to articulating the universal grammar of peace. Spanning over 100 years of sociopolitical history on the world stage, Nobel Peace Laureates richly represent an important cross-section of domestic and international issues increasingly germane to many publics. Communication scholars’ interest in this rhetorical genre has increased in the past decade. Yet, the norm has been to analyze a single speech artifact from a prestigious or controversial winner rather than examine the collection of speeches for generic commonalities of import. In this essay, we analyze the discourse of Nobel peace speech inductively and argue that the organizing principle of the Nobel peace speech genre is the repetitive form of normative liberal principles and values that function as rhetorical topoi. These topoi include freedom and justice and appeal to the inviolable, inborn right of human beings to exercise certain political and civil liberties and the expectation of equality of protection from totalitarian and tyrannical abuses. The significance of this essay to contemporary communication theory is to expand our theoretical understanding of rhetoric’s role in the maintenance and development of an international and cross-cultural vocabulary for the grammar of peace.
This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.
Kavalekalam, Mathew Shaji; Christensen, Mads Græsbøll; Boldt, Jesper B.
term predictor (STP) parameters using a codebook based approach, when we have access to binaural noisy signals. The estimated STP parameters are subsequently used for enhancement in a dual channel scenario. Objective measures indicate, that the proposed method is able to improve the speech...
van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.
Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…
Full Text Available In recent years, various real life applications such as talking books, gadgets and humanoid robots have drawn the attention to pursue research in the area of expressive speech synthesis. Speech synthesis is widely used in various applications. However, there is a growing need for an expressive speech synthesis especially for communication and robotic. In this paper, global and local rule are developed to convert neutral to storytelling style speech for the Malay language. In order to generate rules, modification of prosodic parameters such as pitch, intensity, duration, tempo and pauses are considered. Modification of prosodic parameters is examined by performing prosodic analysis on a story collected from an experienced female and male storyteller. The global and local rule is applied in sentence level and synthesized using HNM. Subjective tests are conducted to evaluate the synthesized storytelling speech quality of both rules based on naturalness, intelligibility, and similarity to the original storytelling speech. The results showed that global rule give a better result than local rule
Full Text Available This paper presents a new method to detect speech/nonspeech components of a given noisy signal. Employing the combination of binary Walsh basis functions and an analysis-synthesis scheme, the original noisy speech signal is modified first. From the modified signals, the speech components are distinguished from the nonspeech components by using a simple decision scheme. Minimal number of Walsh basis functions to be applied is determined using singular value decomposition (SVD. The main advantages of the proposed method are low computational complexity, less parameters to be adjusted, and simple implementation. It is observed that the use of Walsh basis functions makes the proposed algorithm efficiently applicable in real-world situations where processing time is crucial. Simulation results indicate that the proposed algorithm achieves high-speech and nonspeech detection rates while maintaining a low error rate for different noisy conditions.
Gallardo, L.F.; Möller, S.; Beerends, J.
The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility
Juel Henrichsen, Peter
on the supply side. The present article reports on a new public action strategy which has taken shape in the course of 2013-14. While Denmark is a small language area, our public sector is well organised and has considerable purchasing power. Across this past year, Danish local authorities have organised around......Most of the Danish municipalities are ready to begin to adopt automatic speech recognition, but at the same time remain nervous following a long series of bad business cases in the recent past. Complaints are voiced over costly licences and low service levels, typical effects of a de facto monopoly...... the speech technology challenge, they have formulated a number of joint questions and new requirements to be met by suppliers and have deliberately worked towards formulating tendering material which will allow fair competition. Public researchers have contributed to this work, including the author...
Guddattu, Vasudeva; Krishna, Y.
The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…
... Staying Safe Videos for Educators Search English Español Speech-Language Therapy KidsHealth / For Parents / Speech-Language Therapy ... most kids with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech ...
Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A
The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg
Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole
Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.
Eficácia da fonoterapia em disfagia neurogênica usando a escala funcional de ingestão por via oral (FOIS como marcador Efficacy of speech therapy in neurogenic dysphagia using functional oral intake scale (FOIS as a parameter
Ana Maria Furkim
, using the functional oral intake scale as parameter of safe progression for oral feeding. METHODS: We carried out a retrospective study on forty-nine patients with neurogenic dysphagia under speech therapy sessions while in the wards and a comparison between the results of the functional oral intake scale before and after the therapy sessions(measuring the amount and type of orally-taken food in a safe manner by the patients. Possible factors affecting the therapy such as etiology of the neurologic disease, age, respiratory and clinical conditions, alertness, time and duration of the therapy sessions were studied. RESULTS: over the 49 studied patients, 36 showed improvement in the FOIS after speech therapy sessions. Regarding possible factors affecting the therapy, statistical analysis showed that deterioration of clinical condition, clinical intercurrences and decreased level of alertness were significant factors for the lack of progress to oral feeding during the speech therapy sessions. The other factors, etiology of disease, age, respiratory condition, time and number of sessions, did not demonstrate statistical significance. However, they did not interfere in improving or worsening the patients' clinical condition. CONCLUSION: there is an effective improvement in feeding by oral intake in patients with neurogenic dysphagia being treated by speech therapy sessions in the hospital wards. However, it cannot be achieved if the patient shows clinical intercurrences and decreased alertness level.
McClain, Matthew; Romanowski, Brian
Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
Informed consent mandates for abortion providers may infringe the First Amendment's freedom of speech. On the other hand, they may reinforce the physician's duty to obtain informed consent. Courts can promote both doctrines by ensuring that compelled physician speech pertains to medical facts about abortion rather than abortion ideology and that compelled speech is truthful and not misleading. © 2015 American Society of Law, Medicine & Ethics, Inc.
ble value of the parameters (ρ1,ρ2,θ) in their respective ranges. For each block ... Dirac was developed by BBC and aimed at a royalty-free, open technology (Dirac video codec [online]. ..... In: Acoustics, Speech, and Signal Processing, 1995.
Tan, Zheng-Hua; Lindberg, Børge
in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within......The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR...
Chiu, Yi-Fang; Forrest, Karen
Purpose: This study sought to investigate the interaction of speech movement execution with higher order lexical parameters. The authors examined how lexical characteristics affect speech output in individuals with Parkinson's disease (PD) and healthy control (HC) speakers. Method: Twenty speakers with PD and 12 healthy speakers read sentences…
; speech-to-speech translation; language identiﬁcation. ... interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers.
Juste, Fabiola Staróbole; Andrade, Claudia Regina Furquim de
To characterize the speech fluency profile of patients with Parkinson's disease. Study participants were 40 individuals of both genders aged 40 to 80 years divided into 2 groups: Research Group - RG (20 individuals with diagnosis of Parkinson's disease) and Control Group - CG (20 individuals with no communication or neurological disorders). For all of the participants, three speech samples involving different tasks were collected: monologue, individual reading, and automatic speech. The RG presented a significant larger number of speech disruptions, both stuttering-like and typical dysfluencies, and higher percentage of speech discontinuity in the monologue and individual reading tasks compared with the CG. Both groups presented reduced number of speech disruptions (stuttering-like and typical dysfluencies) in the automatic speech task; the groups presented similar performance in this task. Regarding speech rate, individuals in the RG presented lower number of words and syllables per minute compared with those in the CG in all speech tasks. Participants of the RG presented altered parameters of speech fluency compared with those of the CG; however, this change in fluency cannot be considered a stuttering disorder.
Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam
Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.
Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas
Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.
Rosenblum, Lawrence D.
Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal s...
de Bruijn, Marieke J.; ten Bosch, Louis; Kuik, Dirk J.; Witte, Birgit I.; Langendijk, Johannes A.; Leemans, C. Rene; Verdonck-de Leeuw, Irma M.
Speech impairment often occurs in patients after treatment for head and neck cancer. A specific speech characteristic that influences intelligibility and speech quality is voice-onset-time (VOT) in stop consonants. VOT is one of the functionally most relevant parameters that distinguishes voiced and
Partila, Pavol; Tovarek, Jaromir; Voznak, Miroslav
This paper presents a method for detecting speech under stress using Self-Organizing Maps. Most people who are exposed to stressful situations can not adequately respond to stimuli. Army, police, and fire department occupy the largest part of the environment that are typical of an increased number of stressful situations. The role of men in action is controlled by the control center. Control commands should be adapted to the psychological state of a man in action. It is known that the psychological changes of the human body are also reflected physiologically, which consequently means the stress effected speech. Therefore, it is clear that the speech stress recognizing system is required in the security forces. One of the possible classifiers, which are popular for its flexibility, is a self-organizing map. It is one type of the artificial neural networks. Flexibility means independence classifier on the character of the input data. This feature is suitable for speech processing. Human Stress can be seen as a kind of emotional state. Mel-frequency cepstral coefficients, LPC coefficients, and prosody features were selected for input data. These coefficients were selected for their sensitivity to emotional changes. The calculation of the parameters was performed on speech recordings, which can be divided into two classes, namely the stress state recordings and normal state recordings. The benefit of the experiment is a method using SOM classifier for stress speech detection. Results showed the advantage of this method, which is input data flexibility.
Novaes, Priscila Maronezi; Nicolielo-Carrilho, Ana Paola; Lopes-Herrera, Simone Aparecida
To identify and describe the speech rate and fluency of children with phonological disorder (PD) with and without speech-language therapy. Thirty children, aged 5-8 years old, both genders, were divided into three groups: experimental group 1 (G1) — 10 children with PD in intervention; experimental group 2 (G2) — 10 children with PD without intervention; and control group (CG) — 10 children with typical development. Speech samples were collected and analyzed according to parameters of specific protocol. The children in CG had higher number of words per minute compared to those in G1, which, in turn, performed better in this aspect compared to children in G2. Regarding the number of syllables per minute, the CG showed the best result. In this aspect, the children in G1 showed better results than those in G2. Comparing children's performance in the assessed groups regarding the tests, those with PD in intervention had higher time of speech sample and adequate speech rate, which may be indicative of greater auditory monitoring of their own speech as a result of the intervention.
... digital filtering for noise cancellation which interfaces to speech recognition software. It uses auditory features in speech recognition training, and provides applications to multilingual spoken language translation...
Teaching Speech Acts
Full Text Available In this paper I argue that pragmatic ability must become part of what we teach in the classroom if we are to realize the goals of communicative competence for our students. I review the research on pragmatics, especially those articles that point to the effectiveness of teaching pragmatics in an explicit manner, and those that posit methods for teaching. I also note two areas of scholarship that address classroom needs—the use of authentic data and appropriate assessment tools. The essay concludes with a summary of my own experience teaching speech acts in an advanced-level Portuguese class.
Full Text Available Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN trained on electromagnetic articulography (EMA data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.
Full Text Available Speech segmentation is an essential stage in designing automatic speech recognition systems and one can ﬁnd several algorithms proposed in the literature. It is a difﬁcult problem, as speech is immensely variable. The aim of the authors’ studies was to design an algorithm that could be employed at the stage of automatic speech recognition. This would make it possible to avoid some problems related to speech signal parametrization. Posing the problem in such a way requires the algorithm to be capable of working in real time. The only such algorithm was proposed by Tyagi et al., (2006, and it is a modiﬁed version of Brandt’s algorithm. The article presents a new algorithm for unsupervised automatic speech signal segmentation. It performs segmentation without access to information about the phonetic content of the utterances, relying exclusively on second-order statistics of a speech signal. The starting point for the proposed method is time-varying Schur coefﬁcients of an innovation adaptive ﬁlter. The Schur algorithm is known to be fast, precise, stable and capable of rapidly tracking changes in second order signal statistics. A transfer from one phoneme to another in the speech signal always indicates a change in signal statistics caused by vocal track changes. In order to allow for the properties of human hearing, detection of inter-phoneme boundaries is performed based on statistics deﬁned on the mel spectrum determined from the reﬂection coefﬁcients. The paper presents the structure of the algorithm, deﬁnes its properties, lists parameter values, describes detection efﬁciency results, and compares them with those for another algorithm. The obtained segmentation results, are satisfactory.
Parkinson, Michael G.; Dobkins, David H.
Using a computerized content analysis, the authors demonstrate changes in speech behaviors of prison inmates. They conclude that two to four hours of public speaking training can have only limited effect on students who live in a culture in which "prison speech" is the expected and rewarded form of behavior. (PD)
Pedersen, Michael Syskind; Wang, DeLiang; Larsen, Jan
a method for underdetermined blind source separation of convolutive mixtures. The proposed framework is applicable for separation of instantaneous as well as convolutive speech mixtures. It is possible to iteratively extract each speech signal from the mixture by combining blind source separation...
Some of the history of gradual infusion of the modulation spectrum concept into Automatic recognition of speech (ASR) comes next, pointing to the relationship of modulation spectrum processing to wellaccepted ASR techniques such as dynamic speech features or RelAtive SpecTrAl (RASTA) ﬁltering. Next, the frequency ...
Casper, Maureen A.; Raphael, Lawrence J.; Harris, Katherine S.; Geibel, Jennifer M.
Persons with cerebellar ataxia exhibit changes in physical coordination and speech and voice production. Previously, these alterations of speech and voice production were described primarily via perceptual coordinates. In this study, the spatial-temporal properties of syllable production were examined in 12 speakers, six of whom were healthy…
This PhD thesis in human-computer interfaces (informatics) studies the case of the anaesthesia record used during medical operations and the possibility to supplement it with speech recognition facilities. Problems and limitations have been identified with the traditional paper-based anaesthesia...... and inaccuracies in the anaesthesia record. Supplementing the electronic anaesthesia record interface with speech input facilities is proposed as one possible solution to a part of the problem. The testing of the various hypotheses has involved the development of a prototype of an electronic anaesthesia record...... interface with speech input facilities in Danish. The evaluation of the new interface was carried out in a full-scale anaesthesia simulator. This has been complemented by laboratory experiments on several aspects of speech recognition for this type of use, e.g. the effects of noise on speech recognition...
Full Text Available The current study explored the correlation between speakers' Eysenck personality traits and speech spectrum parameters. Forty-six subjects completed the Eysenck Personality Questionnaire. They were instructed to verbally answer the questions shown on a computer screen and their responses were recorded by the computer. Spectrum parameters of /sh/ and /i/ were analyzed by Praat voice software. Formant frequencies of the consonant /sh/ in lying responses were significantly lower than that in truthful responses, whereas no difference existed on the vowel /i/ speech spectrum. The second formant bandwidth of the consonant /sh/ speech spectrum was significantly correlated with the personality traits of Psychoticism, Extraversion, and Neuroticism, and the correlation differed between truthful and lying responses, whereas the first formant frequency of the vowel /i/ speech spectrum was negatively correlated with Neuroticism in both response types. The results suggest that personality characteristics may be conveyed through the human voice, although the extent to which these effects are due to physiological differences in the organs associated with speech or to a general Pygmalion effect is yet unknown.
ROH; Yong-Wan; KIM; Dong-Ju; LEE; Woo-Seok; HONG; Kwang-Seok
This paper focuses on acoustic features that effectively improve the recognition of emotion in human speech.The novel features in this paper are based on spectral-based entropy parameters such as fast Fourier transform(FFT) spectral entropy,delta FFT spectral entropy,Mel-frequency filter bank(MFB) spectral entropy,and Delta MFB spectral entropy.Spectral-based entropy features are simple.They reflect frequency characteristic and changing characteristic in frequency of speech.We implement an emotion rejection module using the probability distribution of recognized-scores and rejected-scores.This reduces the false recognition rate to improve overall performance.Recognized-scores and rejected-scores refer to probabilities of recognized and rejected emotion recognition results,respectively.These scores are first obtained from a pattern recognition procedure.The pattern recognition phase uses the Gaussian mixture model(GMM).We classify the four emotional states as anger,sadness,happiness and neutrality.The proposed method is evaluated using 45 sentences in each emotion for 30 subjects,15 males and 15 females.Experimental results show that the proposed method is superior to the existing emotion recognition methods based on GMM using energy,Zero Crossing Rate(ZCR),linear prediction coefficient(LPC),and pitch parameters.We demonstrate the effectiveness of the proposed approach.One of the proposed features,combined MFB and delta MFB spectral entropy improves performance approximately 10% compared to the existing feature parameters for speech emotion recognition methods.We demonstrate a 4% performance improvement in the applied emotion rejection with low confidence score.
Miller, Nick; Nath, Uma; Noble, Emma; Burn, David
To determine if perceptual speech measures distinguish people with Parkinson's disease (PD), multiple system atrophy with predominant parkinsonism (MSA-P) and progressive supranuclear palsy (PSP). Speech-language therapists blind to patient characteristics employed clinical rating scales to evaluate speech/voice in 24 people with clinically diagnosed PD, 17 with PSP and 9 with MSA-P, matched for disease duration (mean 4.9 years, standard deviation 2.2). No consistent intergroup differences appeared on specific speech/voice variables. People with PD were significantly less impaired on overall speech/voice severity. Analyses by severity suggested further investigation around laryngeal, resonance and fluency changes may characterize individual groups. MSA-P and PSP compared with PD were distinguished by severity of speech/voice deterioration, but individual speech/voice parameters failed to consistently differentiate groups.
Davidow, Jason H; Grossman, Heather L; Edge, Robin L
Voluntary stuttering techniques involve persons who stutter purposefully interjecting disfluencies into their speech. Little research has been conducted on the impact of these techniques on the speech pattern of persons who stutter. The present study examined whether changes in the frequency of voluntary stuttering accompanied changes in stuttering frequency, articulation rate, speech naturalness, and speech effort. In total, 12 persons who stutter aged 16-34 years participated. Participants read four 300-syllable passages during a control condition, and three voluntary stuttering conditions that involved attempting to produce purposeful, tension-free repetitions of initial sounds or syllables of a word for two or more repetitions (i.e., bouncing). The three voluntary stuttering conditions included bouncing on 5%, 10%, and 15% of syllables read. Friedman tests and follow-up Wilcoxon signed ranks tests were conducted for the statistical analyses. Stuttering frequency, articulation rate, and speech naturalness were significantly different between the voluntary stuttering conditions. Speech effort did not differ between the voluntary stuttering conditions. Stuttering frequency was significantly lower during the three voluntary stuttering conditions compared to the control condition, and speech effort was significantly lower during two of the three voluntary stuttering conditions compared to the control condition. Due to changes in articulation rate across the voluntary stuttering conditions, it is difficult to conclude, as has been suggested previously, that voluntary stuttering is the reason for stuttering reductions found when using voluntary stuttering techniques. Additionally, future investigations should examine different types of voluntary stuttering over an extended period of time to determine their impact on stuttering frequency, speech rate, speech naturalness, and speech effort.
Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Herve
To communicate, children must discriminate and identify speech sounds. Because visual speech plays an important role in this process, we explored how visual speech influences phoneme discrimination and identification by children. Critical items had intact visual speech (e.g. baez) coupled to non-intact (excised onsets) auditory speech (signified…
Naidu, D.H.R.; Srinivasan, S.
Several speech enhancement approaches utilize trained models of clean speech data, such as codebooks, Gaussian mixtures, and hidden Markov models. These models are typically trained on neutral clean speech data, without any emotion. However, in practical scenarios, emotional speech is a common
Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc
Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…
Shannon, Robert V
Music and speech share many acoustic cues but not all are equally important. For example, harmonic pitch is essential for music but not for speech. When birds communicate is their song more like speech or music? A new study contrasting pitch and spectral patterns shows that birds perceive their song more like humans perceive speech. Copyright © 2016 Elsevier Ltd. All rights reserved.
Allen, Winfred G., Jr., Ed.
The Freedom of Speech Newsletter is the communication medium for the Freedom of Speech Interest Group of the Western Speech Communication Association. The newsletter contains such features as a statement of concern by the National Ad Hoc Committee Against Censorship; Reticence and Free Speech, an article by James F. Vickrey discussing the subtle…
Frankle, Christen M.
There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.
Frankle, Christen M.
There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.
Johnson, Micah K.; Lyu, Siwei; Farid, Hany
Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.
Shedovskiy E. F.
Full Text Available The article reviews studies of speech disorders in schizophrenia. The authors paid attention to a historical course and characterization of studies of areas: the actual psychopathological (speech disorders as a psychopathological symptoms, their description and taxonomy, psychological (isolated neurons and pathopsychological perspective analysis separately analyzed some modern foreign works, covering a variety of approaches to the study of speech disorders in the endogenous mental disorders. Disorders and features of speech are among the most striking manifestations of schizophrenia along with impaired thinking (Savitskaya A. V., Mikirtumov B. E.. With all the variety of symptoms, speech disorders in schizophrenia could be classified and organized. The few clinical psychological studies of speech activity in schizophrenia presented work on the study of generation and standard speech utterance; features verbal associative process, speed parameters of speech utterances. Special attention is given to integrated research in the mainstream of biological psychiatry and genetic trends. It is shown that the topic for more than a half-century history of originality of speech pathology in schizophrenia has received some coverage in the psychiatric and psychological literature and continues to generate interest in the modern integrated multidisciplinary approach
Loizou, Philipos C
With the proliferation of mobile devices and hearing devices, including hearing aids and cochlear implants, there is a growing and pressing need to design algorithms that can improve speech intelligibility without sacrificing quality. Responding to this need, Speech Enhancement: Theory and Practice, Second Edition introduces readers to the basic problems of speech enhancement and the various algorithms proposed to solve these problems. Updated and expanded, this second edition of the bestselling textbook broadens its scope to include evaluation measures and enhancement algorithms aimed at impr
Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.
M. M. Bykov
Full Text Available In the article, the authors developed a method for detecting speech activity for an automated system for recognizing critical use of speeches with wavelet parameterization of speech signal and classification at intervals of “language”/“pause” using a curvilinear neural network. The method of wavelet-parametrization proposed by the authors allows choosing the optimal parameters of wavelet transformation in accordance with the user-specified error of presentation of speech signal. Also, the method allows estimating the loss of information depending on the selected parameters of continuous wavelet transformation (NPP, which allowed to reduce the number of scalable coefficients of the LVP of the speech signal in order of magnitude with the allowable degree of distortion of the local spectrum of the LVP. An algorithm for detecting speech activity with a curvilinear neural network classifier is also proposed, which shows the high quality of segmentation of speech signals at intervals "language" / "pause" and is resistant to the presence in the speech signal of narrowband noise and technogenic noise due to the inherent properties of the curvilinear neural network.
Błeszyński, Jacek Jarosław
Speech of people with autism is recognised as one of the basic diagnostic, therapeutic and theoretical problems. One of the most common symptoms of autism in children is echolalia, described here as being of different types and severity. This paper presents the results of studies into different levels of echolalia, both in normally developing children and in children diagnosed with autism, discusses the differences between simple echolalia and echolalic speech - which can be considered to b...
tecture, are either wrapped natural-language processing ( NLP ) components or objects developed from scratch using the architecture’s API. GATE is...framework, we put together a demonstration Arabic -to- English speech translation system using both internally developed ( Arabic speech recognition and MT...conditions of our Arabic S2S demonstration system described earlier. Once again, the data size was varied and eighty identical requests were
Ward, Roslyn; Leitão, Suze; Strauss, Geoff
This study evaluates perceptual changes in speech production accuracy in six children (3-11 years) with moderate-to-severe speech impairment associated with cerebral palsy before, during, and after participation in a motor-speech intervention program (Prompts for Restructuring Oral Muscular Phonetic Targets). An A1BCA2 single subject research design was implemented. Subsequent to the baseline phase (phase A1), phase B targeted each participant's first intervention priority on the PROMPT motor-speech hierarchy. Phase C then targeted one level higher. Weekly speech probes were administered, containing trained and untrained words at the two levels of intervention, plus an additional level that served as a control goal. The speech probes were analysed for motor-speech-movement-parameters and perceptual accuracy. Analysis of the speech probe data showed all participants recorded a statistically significant change. Between phases A1-B and B-C 6/6 and 4/6 participants, respectively, recorded a statistically significant increase in performance level on the motor speech movement patterns targeted during the training of that intervention. The preliminary data presented in this study make a contribution to providing evidence that supports the use of a treatment approach aligned with dynamic systems theory to improve the motor-speech movement patterns and speech production accuracy in children with cerebral palsy.
Hux, Karen; Knollman-Porter, Kelly; Brown, Jessica; Wallace, Sarah E
Using text-to-speech technology to provide simultaneous written and auditory content presentation may help compensate for chronic reading challenges if people with aphasia can understand synthetic speech output; however, inherent auditory comprehension challenges experienced by people with aphasia may make understanding synthetic speech difficult. This study's purpose was to compare the preferences and auditory comprehension accuracy of people with aphasia when listening to sentences generated with digitized natural speech, Alex synthetic speech (i.e., Macintosh platform), or David synthetic speech (i.e., Windows platform). The methodology required each of 20 participants with aphasia to select one of four images corresponding in meaning to each of 60 sentences comprising three stimulus sets. Results revealed significantly better accuracy given digitized natural speech than either synthetic speech option; however, individual participant performance analyses revealed three patterns: (a) comparable accuracy regardless of speech condition for 30% of participants, (b) comparable accuracy between digitized natural speech and one, but not both, synthetic speech option for 45% of participants, and (c) greater accuracy with digitized natural speech than with either synthetic speech option for remaining participants. Ranking and Likert-scale rating data revealed a preference for digitized natural speech and David synthetic speech over Alex synthetic speech. Results suggest many individuals with aphasia can comprehend synthetic speech options available on popular operating systems. Further examination of synthetic speech use to support reading comprehension through text-to-speech technology is thus warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Morgan, Lewis B.
This article focuses on speech mannerisms often employed by clients in a helping relationship. Eight mannerisms are presented and discussed, as well as possible interpretations. Suggestions are given to help counselors respond to them. (Author)
WANG Zhiping; ZHAO Li; ZOU Cairong
A modified Parzen-window method, which keep high resolution in low frequencies and keep smoothness in high frequencies, is proposed to obtain statistical model. Then, a gender classification method utilizing the statistical model is proposed, which have a 98% accuracy of gender classification while long sentence is dealt with. By separation the male voice and female voice, the mean and standard deviation of speech training samples with different emotion are used to create the corresponding emotion models. Then the Bhattacharyya distance between the test sample and statistical models of pitch, are utilized for emotion recognition in speech.The normalization of pitch for the male voice and female voice are also considered, in order to illustrate them into a uniform space. Finally, the speech emotion recognition experiment based on K Nearest Neighbor shows that, the correct rate of 81% is achieved, where it is only 73.85%if the traditional parameters are utilized.
Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James
Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration.
This study asks how speakers adjust their speech to their addressees, focusing on the potential roles of cognitive representations such as partner models, automatic processes such as interactive alignment, and social processes such as interactional negotiation. The nature of addressee orientation......, psycholinguistics and conversation analysis, and offers both overviews of child-directed, foreigner-directed and robot-directed speech and in-depth analyses of the processes involved in adjusting to a communication partner....
The article shows the differences between the speech etiquette of different peoples. The most important thing is to find a common language with this or that interlocutor. Knowledge of national etiquette, national character helps to learn the principles of speech of another nation. The article indicates in which cases certain forms of etiquette considered acceptable. At the same time, the rules of etiquette emphasized in the conduct of a dialogue in official meetings and for example, in the ex...
What happens to a person who speaks out about corruption in their organization, and finds themselves excluded from their profession? In this article, I argue that whistleblowers experience exclusions because they have engaged in ‘impossible speech’, that is, a speech act considered to be unacceptable or illegitimate. Drawing on Butler’s theories of recognition and censorship, I show how norms of acceptable speech working through recruitment practices, alongside the actions of colleagues, can ...
Rehana, Ridha; Silitonga, Sortha
One aim of this article is to show through a concrete example how speech function and speech role used in movie. The illustrative example is taken from the dialogue of Up movie. Central to the analysis proper form of dialogue on Up movie that contain of speech function and speech role; i.e. statement, offer, question, command, giving, and demanding. 269 dialogue were interpreted by actor, and it was found that the use of speech function and speech role.
Full Text Available Background: Deep brain stimulation of the subthalamic nucleus, although highly effective for the treatment of motor impairment in Parkinson´s disease, can induce speech deterioration in a subgroup of patients. The aim of the current study was to survey 1 if there are distinctive stimulation effects on the different parameters of voice and speech and 2 if there is a special pattern of preexisting speech abnormalities indicating a risk for further worsening under stimulation. Methods: N = 38 patients with Parkinson´s disease had to perform a speech test without medication with stimulation ON and OFF. Speech samples were analysed: 1 according to a four-dimensional perceptual speech score and 2 by acoustic analysis to obtain quantifiable measures of distinctive speech parameters.Results: Quality of voice was ameliorated with stimulation ON, and there were trends to increased loudness and better pitch variability. N = 8 patients featured a deterioration of speech with stimulation ON, caused by worsening of articulation or/and fluency. These patients had more severe overall speech impairment with characteristic features of articulatory slurring and articulatory acceleration already under StimOFF condition.Conclusion: The influence of subthalamic stimulation on Parkinsonian speech differs considerably between individual patients, however, there is a trend to amelioration of voice quality and prosody. Patients with stimulation-associated speech deterioration featured higher overall speech impairment and showed a distinctive pattern of articulatory abnormalities at baseline. Further investigations to confirm these preliminary findings are necessary to allow neurologists to pre-surgically estimate the individual risk of deterioration of speech under stimulation.
Rumbach, Anna F; Rose, Tanya A; Cheah, Mynn
To explore Australian speech-language pathologists' use of non-speech oral motor exercises, and rationales for using/not using non-speech oral motor exercises in clinical practice. A total of 124 speech-language pathologists practising in Australia, working with paediatric and/or adult clients with speech sound difficulties, completed an online survey. The majority of speech-language pathologists reported that they did not use non-speech oral motor exercises when working with paediatric or adult clients with speech sound difficulties. However, more than half of the speech-language pathologists working with adult clients who have dysarthria reported using non-speech oral motor exercises with this population. The most frequently reported rationale for using non-speech oral motor exercises in speech sound difficulty management was to improve awareness/placement of articulators. The majority of speech-language pathologists agreed there is no clear clinical or research evidence base to support non-speech oral motor exercise use with clients who have speech sound difficulties. This study provides an overview of Australian speech-language pathologists' reported use and perceptions of non-speech oral motor exercises' applicability and efficacy in treating paediatric and adult clients who have speech sound difficulties. The research findings provide speech-language pathologists with insight into how and why non-speech oral motor exercises are currently used, and adds to the knowledge base regarding Australian speech-language pathology practice of non-speech oral motor exercises in the treatment of speech sound difficulties. Implications for Rehabilitation Non-speech oral motor exercises refer to oral motor activities which do not involve speech, but involve the manipulation or stimulation of oral structures including the lips, tongue, jaw, and soft palate. Non-speech oral motor exercises are intended to improve the function (e.g., movement, strength) of oral structures. The
Full Text Available We describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. Speech lip movements are captured using a 3D stereo face capture system and split up into phonetic units. A dynamic parameterisation of this data is constructed which maintains the relationship between lip shapes and velocities; within this parameterisation a model of how lips move is built and is used in the animation of visual speech movements from speech audio input. The mapping from audio parameters to lip movements is disambiguated by selecting only the most similar stored phonetic units to the target utterance during synthesis. By combining properties of model-based synthesis (e.g., HMMs, neural nets with unit selection we improve the quality of our speech synthesis.
Elmahdy, Mohamed; Minker, Wolfgang
Novel Techniques for Dialectal Arabic Speech describes approaches to improve automatic speech recognition for dialectal Arabic. Since speech resources for dialectal Arabic speech recognition are very sparse, the authors describe how existing Modern Standard Arabic (MSA) speech data can be applied to dialectal Arabic speech recognition, while assuming that MSA is always a second language for all Arabic speakers. In this book, Egyptian Colloquial Arabic (ECA) has been chosen as a typical Arabic dialect. ECA is the first ranked Arabic dialect in terms of number of speakers, and a high quality ECA speech corpus with accurate phonetic transcription has been collected. MSA acoustic models were trained using news broadcast speech. In order to cross-lingually use MSA in dialectal Arabic speech recognition, the authors have normalized the phoneme sets for MSA and ECA. After this normalization, they have applied state-of-the-art acoustic model adaptation techniques like Maximum Likelihood Linear Regression (MLLR) and M...
Larm, Petra; Hongisto, Valtteri
During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.
Lynne E Bernstein
Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.
Wenmaekers, R.H.C.; Hout, van N.H.A.M.; Luxemburg, van L.C.J.; Hak, C.C.J.M.
The reverberation time and the background noise level are often used as the most important design parameters in European open plan offices to achieve a comfortable acoustic climate and to control speech intelligibility. Good speech intelligibility is desired for people working together, but bad
Full Text Available In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR. Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A, reference at 1 meter at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB. Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda. Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for
Meyer, Julien; Dentel, Laure; Meunier, Fanny
In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR). Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A), reference at 1 meter) at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB). Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda). Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for future studies.
Vick, Jennell C.; Moore, Christopher A.
This study examined the relationship between acoustic correlates of stress in trochaic (strong-weak), spondaic (strong-strong), and iambic (weak-strong) nonword bisyllables produced by children (30-50) with normal speech acquisition and children with speech delay. Ratios comparing the acoustic measures (vowel duration, rms, and f0) of the first syllable to the second syllable were calculated to evaluate the extent to which each phonetic parameter was used to mark stress. In addition, a calculation of the variability of jaw movement in each bisyllable was made. Finally, perceptual judgments of accuracy of stress production were made. Analysis of perceptual judgments indicated a robust difference between groups: While both groups of children produced errors in imitating the contrastive lexical stress models (~40%), the children with normal speech acquisition tended to produce trochaic forms in substitution for other stress types, whereas children with speech delay showed no preference for trochees. The relationship between segmental acoustic parameters, kinematic variability, and the ratings of stress by trained listeners will be presented.
Mehta, G.; Cutler, A.
Although spontaneous speech occurs more frequently in most listeners' experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ
Molnar, B; Gergely, J; Toth, G; Pronai, L; Zagoni, T; Papik, K; Tulassay, Z
Reporting and machine control based on speech technology can enhance work efficiency in the gastrointestinal endoscopy laboratory. The status and activation of endoscopy laboratory equipment were described as a multivariate parameter and function system. Speech recognition, text evaluation and action definition engines were installed. Special programs were developed for the grammatical analysis of command sentences, and a rule-based expert system for the definition of machine answers. A speech backup engine provides feedback to the user. Techniques were applied based on the "Hidden Markov" model of discrete word, user-independent speech recognition and on phoneme-based speech synthesis. Speech samples were collected from three male low-tone investigators. The dictation module and machine control modules were incorporated in a personal computer (PC) simulation program. Altogether 100 unidentified patient records were analyzed. The sentences were grouped according to keywords, which indicate the main topics of a gastrointestinal endoscopy report. They were: "endoscope", "esophagus", "cardia", "fundus", "corpus", "antrum", "pylorus", "bulbus", and "postbulbar section", in addition to the major pathological findings: "erosion", "ulceration", and "malignancy". "Biopsy" and "diagnosis" were also included. We implemented wireless speech communication control commands for equipment including an endoscopy unit, video, monitor, printer, and PC. The recognition rate was 95%. Speech technology may soon become an integrated part of our daily routine in the endoscopy laboratory. A central speech and laboratory computer could be the most efficient alternative to having separate speech recognition units in all items of equipment.
Bartos, Anthony L; Cipr, Tomas; Nelson, Douglas J; Schwarz, Petr; Banowetz, John; Jerabek, Ladislav
A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (-10 dB ≤ SNR < +10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR ≥ +10 dB).
Ramirez, J.; Gorriz, J. M.; Segura, J. C.
This chapter has shown an overview of the main challenges in robust speech detection and a review of the state of the art and applications. VADs are frequently used in a number of applications including speech coding, speech enhancement and speech recognition. A precise VAD extracts a set of discriminative speech features from the noisy speech and formulates the decision in terms of well defined rule. The chapter has summarized three robust VAD methods that yield high speech/non-speech discri...
Xia, Youshen; Wang, Jun
This paper proposes a new recurrent neural network-based Kalman filter for speech enhancement, based on a noise-constrained least squares estimate. The parameters of speech signal modeled as autoregressive process are first estimated by using the proposed recurrent neural network and the speech signal is then recovered from Kalman filtering. The proposed recurrent neural network is globally asymptomatically stable to the noise-constrained estimate. Because the noise-constrained estimate has a robust performance against non-Gaussian noise, the proposed recurrent neural network-based speech enhancement algorithm can minimize the estimation error of Kalman filter parameters in non-Gaussian noise. Furthermore, having a low-dimensional model feature, the proposed neural network-based speech enhancement algorithm has a much faster speed than two existing recurrent neural networks-based speech enhancement algorithms. Simulation results show that the proposed recurrent neural network-based speech enhancement algorithm can produce a good performance with fast computation and noise reduction. Copyright © 2015 Elsevier Ltd. All rights reserved.
Full Text Available Speech recognition applications are known to require a significant amount of resources. However, embedded speech recognition only authorizes few KB of memory, few MIPS, and small amount of training data. In order to fit the resource constraints of embedded applications, an approach based on a semicontinuous HMM system using state-independent acoustic modelling is proposed. A transformation is computed and applied to the global model in order to obtain each HMM state-dependent probability density functions, authorizing to store only the transformation parameters. This approach is evaluated on two tasks: digit and voice-command recognition. A fast adaptation technique of acoustic models is also proposed. In order to significantly reduce computational costs, the adaptation is performed only on the global model (using related speaker recognition adaptation techniques with no need for state-dependent data. The whole approach results in a relative gain of more than 20% compared to a basic HMM-based system fitting the constraints.
Iuzzini-Seigel, Jenya; Hogan, Tiffany P.; Green, Jordan R.
Purpose: The current research sought to determine (a) if speech inconsistency is a core feature of childhood apraxia of speech (CAS) or if it is driven by comorbid language impairment that affects a large subset of children with CAS and (b) if speech inconsistency is a sensitive and specific diagnostic marker that can differentiate between CAS and…
Full Text Available The paper deals with the methodology for speech quality measuring in GSM networks using Perceptual Evaluation of Speech Quality (PESQ. The paper brings results of practical measurement of own GSM network build on the Universal Software Radio Peripheral (USRP N210 hardware and OpenBTS software. This OpenBTS station was installed in open terrain, and the speech quality was measured from different distances from the transmitter. The limit parameters of OpenBTS station with USRP N210 were obtained.
Luciana Lemos de Azevedo
Full Text Available Objective Parkinsonian patients usually present speech impairment. The aim of this study was to verify the influence of levodopa and of the adapted Lee Silverman Vocal Treatment® method on prosodic parameters employed by parkinsonian patients. Method Ten patients with idiopathic Parkinson's disease using levodopa underwent recording of utterances produced in four stages: expressing attitudes of certainty and doubt and declarative and interrogative modalities. The sentences were recorded under the effect of levodopa (on, without the effect of levodopa (off; before and after speech therapy during the on and off periods. Results The speech therapy and its association with drug treatment promoted the improvement of prosodic parameters: increase of fundamental frequency measures, reduction of measures of duration and greater intensity. Conclusion The association of speech therapy to medication treatment is of great value in improving the communication of parkinsonian patients.
Jensen, Jesper Rindom; Benesty, Jacob; Christensen, Mads Græsbøll
In this work, we consider enhancement of multichannel speech recordings. Linear filtering and subspace approaches have been considered previously for solving the problem. The current linear filtering methods, although many variants exist, have limited control of noise reduction and speech...
Represented speech refers to speech where we reference somebody. Represented speech is an important phenomenon in everyday conversation, health care communication, and qualitative research. This case will draw first from a case study on physicians’ workplace learning and second from a case study...... on nurses’ apprenticeship learning. The aim of the case is to guide the qualitative researcher to use own and others’ voices in the interview and to be sensitive to represented speech in everyday conversation. Moreover, reported speech matters to health professionals who aim to represent the voice...... of their patients. Qualitative researchers and students might learn to encourage interviewees to elaborate different voices or perspectives. Qualitative researchers working with natural speech might pay attention to how people talk and use represented speech. Finally, represented speech might be relevant...
... here Home » Health Info » Statistics and Epidemiology Quick Statistics About Voice, Speech, Language Voice, Speech, Language, and ... no 205. Hyattsville, MD: National Center for Health Statistics. 2015. Hoffman HJ, Li C-M, Losonczy K, ...
Alim Sabur Ajibola
Full Text Available Stuttered speech is a dysfluency rich speech, more prevalent in males than females. It has been associated with insufficient air pressure or poor articulation, even though the root causes are more complex. The primary features include prolonged speech and repetitive speech, while some of its secondary features include, anxiety, fear, and shame. This study used LPC analysis and synthesis algorithms to reconstruct the stuttered speech. The results were evaluated using cepstral distance, Itakura-Saito distance, mean square error, and likelihood ratio. These measures implied perfect speech reconstruction quality. ASR was used for further testing, and the results showed that all the reconstructed speech samples were perfectly recognized while only three samples of the original speech were perfectly recognized.
Spiel, G; Brunner, E; Allmayer, B; Pletz, A
Speech disabilities (articulation deficits) and language disorders--expressive (vocabulary) receptive (language comprehension) are not uncommon in children. An overview of these along with a global description of the impairment of communication as well as clinical characteristics of language developmental disorders are presented in this article. The diagnostic tables, which are applied in the European and Anglo-American speech areas, ICD-10 and DSM-IV, have been explained and compared. Because of their strengths and weaknesses an alternative classification of language and speech developmental disorders is proposed, which allows a differentiation between expressive and receptive language capabilities with regard to the semantic and the morphological/syntax domains. Prevalence and comorbidity rates, psychosocial influences, biological factors and the biological social interaction have been discussed. The necessity of the use of standardized examinations is emphasised. General logopaedic treatment paradigms, specific therapy concepts and an overview of prognosis have been described.
Poole, Matthew L.; Brodtmann, Amy; Darby, David; Vogel, Adam P.
Purpose: Our purpose was to create a comprehensive review of speech impairment in frontotemporal dementia (FTD), primary progressive aphasia (PPA), and progressive apraxia of speech in order to identify the most effective measures for diagnosis and monitoring, and to elucidate associations between speech and neuroimaging. Method: Speech and…
Drijvers, L.; Özyürek, A.
Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech
Viswanathan, Navin; Kokkinakis, Kostas; Williams, Brittany T.
Purpose: The purpose of this study was to evaluate whether listeners with normal hearing perceiving noise-vocoded speech-in-speech demonstrate better intelligibility of target speech when the background speech was mismatched in language (linguistic release from masking [LRM]) and/or location (spatial release from masking [SRM]) relative to the…
Lee, Jimin; Hustad, Katherine C.; Weismer, Gary
Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…
Drijvers, Linda; Ozyurek, Asli
Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. Method:…
An experimental Dutch keyboard-to-speech system has been developed to explor the possibilities and limitations of Dutch speech synthesis in a communication aid for the speech impaired. The system uses diphones and a formant synthesizer chip for speech synthesis. Input to the system is in
Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike
Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in "healthy" communication direct speech constructions contribute to the liveliness, and indirectly to the comprehensibility, of speech.…
Zuk, Jennifer; Iuzzini-Seigel, Jenya; Cabbage, Kathryn; Green, Jordan R.; Hogan, Tiffany P.
Purpose: Childhood apraxia of speech (CAS) is hypothesized to arise from deficits in speech motor planning and programming, but the influence of abnormal speech perception in CAS on these processes is debated. This study examined speech perception abilities among children with CAS with and without language impairment compared to those with…
Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M J; Poletto, Christopher J; Ludlow, Christy L
The issue of whether speech is supported by the same neural substrates as non-speech vocal tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, was compared to the production of speech syllables without meaning. Brain activation related to overt production was captured with BOLD fMRI using a sparse sampling design for both conditions. Speech and non-speech were compared using voxel-wise whole brain analyses, and ROI analyses focused on frontal and temporoparietal structures previously reported to support speech production. Results showed substantial activation overlap between speech and non-speech function in regions. Although non-speech gesture production showed greater extent and amplitude of activation in the regions examined, both speech and non-speech showed comparable left laterality in activation for both target perception and production. These findings posit a more general role of the previously proposed "auditory dorsal stream" in the left hemisphere--to support the production of vocal tract gestures that are not limited to speech processing.
Hurkmans, Josephus Johannes Stephanus
Apraxia of Speech (AoS) is a neurogenic speech disorder. A wide variety of behavioural methods have been developed to treat AoS. Various therapy programmes use musical elements to improve speech production. A unique therapy programme combining elements of speech therapy and music therapy is called
Kenney, Mary Kay; Barac-Cikoja, Dragana; Finnegan, Kimberly; Jeffries, Neal; Ludlow, Christy L.
Children with developmental speech disorders may have additional deficits in speech perception and/or short-term memory. To determine whether these are only transient developmental delays that can accompany the disorder in childhood or persist as part of the speech disorder, adults with a persistent familial speech disorder were tested on speech…
T. E. Braudo
Full Text Available The purpose of this article is to acquaint the specialists, working with children having developmental disorders, with age-related norms for speech development. Many well-known linguists and psychologists studied speech ontogenesis (logogenesis. Speech is a higher mental function, which integrates many functional systems. Speech development in infants during the first months after birth is ensured by the innate hearing and emerging ability to fix the gaze on the face of an adult. Innate emotional reactions are also being developed during this period, turning into nonverbal forms of communication. At about 6 months a baby starts to pronounce some syllables; at 7–9 months – repeats various sounds combinations, pronounced by adults. At 10–11 months a baby begins to react on the words, referred to him/her. The first words usually appear at an age of 1 year; this is the start of the stage of active speech development. At this time it is acceptable, if a child confuses or rearranges sounds, distorts or misses them. By the age of 1.5 years a child begins to understand abstract explanations of adults. Significant vocabulary enlargement occurs between 2 and 3 years; grammatical structures of the language are being formed during this period (a child starts to use phrases and sentences. Preschool age (3–7 y. o. is characterized by incorrect, but steadily improving pronunciation of sounds and phonemic perception. The vocabulary increases; abstract speech and retelling are being formed. Children over 7 y. o. continue to improve grammar, writing and reading skills. The described stages may not have strict age boundaries, as soon as they are dependent not only on environment, but also on the child’s mental constitution, heredity and character.
Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M.J.; Poletto, Christopher J.; Ludlow, Christy L.
The issue of whether speech is supported by the same neural substrates as non-speech vocal-tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, were compared to the production of speech sylla...
Agüero, P D; Tulli, J C; Moscardi, G; Gonzalez, E L; Uriz, A J
Acoustical analysis of speech using computers has reached an important development in the latest years. The subjective evaluation of a clinician is complemented with an objective measure of relevant parameters of voice. Praat, MDVP (Multi Dimensional Voice Program) and SAV (Software for Voice Analysis) are some examples of software for speech analysis. This paper describes an approach to estimate the subjective characteristics of RASATI scale given objective acoustical parameters. Two approaches were used: linear regression with non-negativity constraints, and neural networks. The experiments show that such approach gives correct evaluations with ±1 error in 80% of the cases.
Full Text Available Dereverberation is required in various speech processing applications such as handsfree telephony and voice-controlled systems, especially when signals are applied that are recorded in a moderately or highly reverberant environment. In this paper, we compare a number of classical and more recently developed multimicrophone dereverberation algorithms, and validate the different algorithmic settings by means of two performance indices and a speech recognition system. It is found that some of the classical solutions obtain a moderate signal enhancement. More advanced subspace-based dereverberation techniques, on the other hand, fail to enhance the signals despite their high-computational load.
The E-model brings a modern approach to the computation of estimated quality, allowing for easy implementation. One of its advantages is that it can be applied in real time. The method is based on a mathematical computation model evaluating transmission path impairments influencing speech signal, especially delays and packet losses. These parameters, common in an IP network, can affect speech quality dramatically. The paper deals with a proposal for a simplified E-model and its pr...
Deep brain stimulation (DBS) has been reported to be successful in relieving the core motor symptoms of Parkinson's disease (PD) and motor fluctuations in the more advanced stages of the disease. However, data on the effects of DBS on speech performance are inconsistent. While there are some series of patients documenting that speech function was relatively unaffected by DBS of the nucleus subthalamicus (STN), other investigators reported on improvements of distinct parameters of oral control...
Holzrichter, J.F.; Ng, L.C.
The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs
Holzrichter, John F.; Ng, Lawrence C.
The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.
Fontan, Lionel; Tardieu, Julien; Gaillard, Pascal; Woisard, Virginie; Ruiz, Robert
Purpose: The authors investigated the relationship between the intelligibility and comprehension of speech presented in babble noise. Method: Forty participants listened to French imperative sentences (commands for moving objects) in a multitalker babble background for which intensity was experimentally controlled. Participants were instructed to…
Spectral integration (or formant averaging) was proposed in vowel perception research to account for the observation that a reduction of the intensity of one of two closely spaced formants (as in /u/) produced a predictable shift in vowel quality [Delattre et al., Word 8, 195-210 (1952)]. A related observation was reported in psychoacoustics, indicating that when the components of a two-tone periodic complex differ in amplitude and frequency, its perceived pitch is shifted toward that of the more intense tone [Helmholtz, App. XIV (1875/1948)]. Subsequent research in both fields focused on the frequency interval that separates these two spectral components, in an attempt to determine the size of the bandwidth for spectral integration to occur. This talk will review the accumulated evidence for and against spectral integration within the hypothesized limit of 3.5 Bark for static and dynamic signals in speech perception and psychoacoustics. Based on similarities in the processing of speech and non-speech sounds, it is suggested that spectral integration may reflect a general property of the auditory system. A larger frequency bandwidth, possibly close to 3.5 Bark, may be utilized in integrating acoustic information, including speech, complex signals, or sound quality of a violin.
Standard articulation tests are not always sensitive enough to discriminate between speech samples which are of high intelligibility. One can increase the sensitivity of such tests by presenting the test materials in noise. In this way, small differences in intelligibility can be magnified into
Larson, Martha; Ordelman, Roeland J.F.; de Jong, Franciska M.G.; Kohler, Joachim; Kraaij, Wessel
After two successful years at SIGIR in 2007 and 2008, the third workshop on Searching Spontaneous Conversational Speech (SSCS 2009) was held conjunction with the ACM Multimedia 2009. The goal of the SSCS series is to serve as a forum that brings together the disciplines that collaborate on spoken
from movements of certain organs with his (man‟s) throat and mouth…. By means ... In other words, government engages language; and how this affects the ... address the audience in a social gathering in order to have a new dawn. ..... Agbedo, C. U. Speech Act Analysis of Political discourse in the Nigerian Print Media in.
Nijland, Lian; Terband, Hayo; Maassen, Ben
Purpose: Childhood apraxia of speech (CAS) is diagnosed on the basis of specific speech characteristics, in the absence of problems in hearing, intelligence, and language comprehension. This does not preclude the possibility that children with this speech disorder might demonstrate additional problems. Method: Cognitive functions were investigated…
Keetels, M.N.; Schakel, L.; de Bonte, M.; Vroomen, J.
Listeners adjust their phonetic categories to cope with variations in the speech signal (phonetic recalibration). Previous studies have shown that lipread speech (and word knowledge) can adjust the perception of ambiguous speech and can induce phonetic adjustments (Bertelson, Vroomen, & de Gelder in
Hogan, J. Michael; Kurr, Jeffrey A.; Johnson, Jeremy D.; Bergmaier, Michael J.
In light of the U.S. Senate's designation of March 15, 2016 as "National Speech and Debate Education Day" (S. Res. 398, 2016), it only seems fitting that "Communication Education" devote a special section to the role of speech and debate in civic education. Speech and debate have been at the heart of the communication…
The experimental addition of speech output to computer-based Esperanto lessons using speech synthesized from text is described. Because of Esperanto's phonetic spelling and simple rhythm, it is particularly easy to describe the mechanisms of Esperanto synthesis. Attention is directed to how the text-to-speech conversion is performed and the ways…
on speech production characteristics, but also helps in accurate analysis of speech. .... include time delay estimation, speech enhancement from single and multi- ...... log. (. E[k]. ∑K−1 l=0. E[l]. ) ,. (7) where K is the number of samples in the ...
Minifie, Fred. D., Ed.; And Others
This book is written as a guide to the understanding of the processes involved in human speech communication. Ten authorities contributed material to provide an introduction to the physiological aspects of speech production and reception, the acoustical aspects of speech production and transmission, the psychophysics of sound reception, the nature…
Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta
Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…
The aim of this article is to describe and classify a number of different forms of English reported speech (or thought), and subsequently to analyze and represent them within the theory of FDG. First, the most prototypical forms of reported speech are discussed (direct and indirect speech);
Nijland, L.; Terband, H.; Maassen, B.
Purpose: Childhood Apraxia of Speech (CAS) is diagnosed on the basis of specific speech characteristics, in the absence of problems in hearing, intelligence, and language comprehension. This does not preclude the possibility that children with this speech disorder might demonstrate additional
Maussen, M.; Grillo, R.
This book focuses on the way in which public debate and legal practice intersect when it comes to the value of free speech and the need to regulate "offensive", "blasphemous" or "hate" speech, especially, though not exclusively where such speech is thought to be offensive to members of ethnic and
Carney, John J., Jr.
The exercise of freedom of speech within our nation has deteriorated. A practical value in teaching free speech is the possibility of restoring a commitment to its principles by educators. What must be taught is why freedom of speech is important, why it has been compromised, and the extent to which it has been compromised. Every technological…
Williams, A. Lynn, Ed.; McLeod, Sharynne, Ed.; McCauley, Rebecca J., Ed.
With detailed discussion and invaluable video footage of 23 treatment interventions for speech sound disorders (SSDs) in children, this textbook and DVD set should be part of every speech-language pathologist's professional preparation. Focusing on children with functional or motor-based speech disorders from early childhood through the early…
Farouk, Mohamed Hesham
This book provides a survey on wide-spread of employing wavelets analysis in different applications of speech processing. The author examines development and research in different application of speech processing. The book also summarizes the state of the art research on wavelet in speech processing.
KARLIN, ISAAC W.; AND OTHERS
THE GROWTH, DEVELOPMENT, AND ABNORMALITIES OF SPEECH IN CHILDHOOD ARE DESCRIBED IN THIS TEXT DESIGNED FOR PEDIATRICIANS, PSYCHOLOGISTS, EDUCATORS, MEDICAL STUDENTS, THERAPISTS, PATHOLOGISTS, AND PARENTS. THE NORMAL DEVELOPMENT OF SPEECH AND LANGUAGE IS DISCUSSED, INCLUDING THEORIES ON THE ORIGIN OF SPEECH IN MAN AND FACTORS INFLUENCING THE NORMAL…
The effectiveness of in-vehicle speech communication can be a good indicator of the perception of the overall vehicle quality and customer satisfaction. Currently available speech intelligibility metrics do not account in their procedures for essential parameters needed for a complete and accurate evaluation of in-vehicle speech intelligibility. These include the directivity and the distance of the talker with respect to the listener, binaural listening, hearing profile of the listener, vocal effort, and multisensory hearing. In the first part of this research the effectiveness of in-vehicle application of these metrics is investigated in a series of studies to reveal their shortcomings, including a wide range of scores resulting from each of the metrics for a given measurement configuration and vehicle operating condition. In addition, the nature of a possible correlation between the scores obtained from each metric is unknown. The metrics and the subjective perception of speech intelligibility using, for example, the same speech material have not been compared in literature. As a result, in the second part of this research, an alternative method for speech intelligibility evaluation is proposed for use in the automotive industry by utilizing a virtual reality driving environment for ultimately setting targets, including the associated statistical variability, for future in-vehicle speech intelligibility evaluation. The Speech Intelligibility Index (SII) was evaluated at the sentence Speech Receptions Threshold (sSRT) for various listening situations and hearing profiles using acoustic perception jury testing and a variety of talker and listener configurations and background noise. In addition, the effect of individual sources and transfer paths of sound in an operating vehicle to the vehicle interior sound, specifically their effect on speech intelligibility was quantified, in the framework of the newly developed speech intelligibility evaluation method. Lastly
Melo, Roberta Michelon; Mota, Helena Bolli; Berti, Larissa Cristina
This study used acoustic and articulatory analyses to characterize the contrast between alveolar and velar stops with typical speech data, comparing the parameters (acoustic and articulatory) of adults and children with typical speech development. The sample consisted of 20 adults and 15 children with typical speech development. The analyzed corpus was organized through five repetitions of each target-word (/'kap ə/, /'tapə/, /'galo/ e /'daɾə/). These words were inserted into a carrier phrase and the participant was asked to name them spontaneously. Simultaneous audio and video data were recorded (tongue ultrasound images). The data was submitted to acoustic analyses (voice onset time; spectral peak and burst spectral moments; vowel/consonant transition and relative duration measures) and articulatory analyses (proportion of significant axes of the anterior and posterior tongue regions and description of tongue curves). Acoustic and articulatory parameters were effective to indicate the contrast between alveolar and velar stops, mainly in the adult group. Both speech analyses showed statistically significant differences between the two groups. The acoustic and articulatory parameters provided signals to characterize the phonic contrast of speech. One of the main findings in the comparison between adult and child speech was evidence of articulatory refinement/maturation even after the period of segment acquisition.
Frank H Guenther
Full Text Available Brain-machine interfaces (BMIs involving electrodes implanted into the human cerebral cortex have recently been developed in an attempt to restore function to profoundly paralyzed individuals. Current BMIs for restoring communication can provide important capabilities via a typing process, but unfortunately they are only capable of slow communication rates. In the current study we use a novel approach to speech restoration in which we decode continuous auditory parameters for a real-time speech synthesizer from neuronal activity in motor cortex during attempted speech.Neural signals recorded by a Neurotrophic Electrode implanted in a speech-related region of the left precentral gyrus of a human volunteer suffering from locked-in syndrome, characterized by near-total paralysis with spared cognition, were transmitted wirelessly across the scalp and used to drive a speech synthesizer. A Kalman filter-based decoder translated the neural signals generated during attempted speech into continuous parameters for controlling a synthesizer that provided immediate (within 50 ms auditory feedback of the decoded sound. Accuracy of the volunteer's vowel productions with the synthesizer improved quickly with practice, with a 25% improvement in average hit rate (from 45% to 70% and 46% decrease in average endpoint error from the first to the last block of a three-vowel task.Our results support the feasibility of neural prostheses that may have the potential to provide near-conversational synthetic speech output for individuals with severely impaired speech motor control. They also provide an initial glimpse into the functional properties of neurons in speech motor cortical areas.
Holland, George E.; Struve, Walter S.; Homer, John F.
A speech analyzer for interpretation of sound includes a sound input which converts the sound into a signal representing the sound. The signal is passed through a plurality of frequency pass filters to derive a plurality of frequency formants. These formants are converted to voltage signals by frequency-to-voltage converters and then are prepared for visual display in continuous real time. Parameters from the inputted sound are also derived and displayed. The display may then be interpreted by the user. The preferred embodiment includes a microprocessor which is interfaced with a television set for displaying of the sound formants. The microprocessor software enables the sound analyzer to present a variety of display modes for interpretive and therapeutic used by the user.
Pontoppidan, Niels Henrik; Dyrholm, Mads
a Factorial Hidden Markov Model, with non-stationary assumptions on the source autocorrelations modelled through the Factorial Hidden Markov Model, leads to separation in the monaural case. By extending Hansens work we find that Roweis' assumptions are necessary for monaural speech separation. Furthermore we...
... for stuttering to change over time or for emotions and attitudes about your speech to change as you have new experiences. It is important for you to have a clear idea about your motivation for going to therapy because your reasons for ...
Dunin-Kȩplicz, Barbara; Strachocka, Alina; Szałas, Andrzej; Verbrugge, Rineke
This paper discusses an implementation of four speech acts: assert, concede, request and challenge in a paraconsistent framework. A natural four-valued model of interaction yields multiple new cognitive situations. They are analyzed in the context of communicative relations, which partially replace
Bradley, Bert E.
Argues for the continuation of liberal education over career-oriented programs. Defines liberal education as one that develops abilities that transcend occupational concerns, and that enables individuals to cope with shifts in values, vocations, careers, and the environment. Argues that speech communication makes a significant contribution to…
Mar 4, 2014 ... In reflecting on possible responses to this ... Through the actions of a prophet, as Philip Wogamen (1998:4) reasons, people are supposed to have a ... The main argument in this article is that the person called to prophetic speech needs to become ..... were like dumb bricks and blocks to be forcefully moved.
White, Keith S.
Continuous speech recognition (SR) is an emerging technology that allows direct digital transcription of dictated radiology reports. The SR systems are being widely deployed in the radiology community. This is a review of technical and practical issues that should be considered when implementing an SR system. (orig.)
Bryant, Gregory A.
Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…
Verbal communication does not rely only on the simple perception of auditory signals. It is rather a parallel and integrative processing of linguistic and non-linguistic information, involving temporal and frontal areas in particular. This review describes the inherent complexity of auditory speech comprehension from a functional-neuroanatomical perspective. The review is divided into two parts. In the first part, structural and functional asymmetry of language relevant structures will be discus. The second part of the review will discuss recent neuroimaging studies, which coherently demonstrate that speech comprehension processes rely on a hierarchical network involving the temporal, parietal, and frontal lobes. Further, the results support the dual-stream model for speech comprehension, with a dorsal stream for auditory-motor integration, and a ventral stream for extracting meaning but also the processing of sentences and narratives. Specific patterns of functional asymmetry between the left and right hemisphere can also be demonstrated. The review article concludes with a discussion on interactions between the dorsal and ventral streams, particularly the involvement of motor related areas in speech perception processes, and outlines some remaining unresolved issues. This article is part of a Special Issue entitled Human Auditory Neuroimaging. Copyright © 2013 Elsevier B.V. All rights reserved.
Mar 4, 2014 ... It is expected that people will be drawn into the reality of God by authentic prophetic speech, .... strands of the DNA molecule show themselves to be arranged ... explains, chemical patterns act like the letters of a code, .... viewing the self-reflection regarding the ministry of renewal from the .... Irresistible force.
Full Text Available Speech is a means of communication which is intrinsically bimodal: the audio signal originates from the dynamics of the articulators. This paper reviews recent works in the field of audiovisual speech, and more specifically techniques developed to measure the level of correspondence between audio and visual speech. It overviews the most common audio and visual speech front-end processing, transformations performed on audio, visual, or joint audiovisual feature spaces, and the actual measure of correspondence between audio and visual speech. Finally, the use of synchrony measure for biometric identity verification based on talking faces is experimented on the BANCA database.
Massaro, Dominic W; Chen, Trevor H
Galantucci, Fowler, and Turvey (2006) have claimed that perceiving speech is perceiving gestures and that the motor system is recruited for perceiving speech. We make the counter argument that perceiving speech is not perceiving gestures, that the motor system is not recruitedfor perceiving speech, and that speech perception can be adequately described by a prototypical pattern recognition model, the fuzzy logical model of perception (FLMP). Empirical evidence taken as support for gesture and motor theory is reconsidered in more detail and in the framework of the FLMR Additional theoretical and logical arguments are made to challenge gesture and motor theory.
Full Text Available In this paper a method for speech quality estimation is evaluated by simulating the transfer of speech over packet switched and mobile networks. The proposed system uses Dynamic Time Warping algorithm for test and received speech comparison. Several tests have been made on a test speech sample of a single speaker with simulated packet (frame loss effects on the perceived speech. The achieved results have been compared with measured PESQ values on the used transmission channel and their correlation has been observed.
Full Text Available The performance of objective speech and audio quality measures for the prediction of the perceived quality of frequency-compressed speech in hearing aids is investigated in this paper. A number of existing quality measures have been applied to speech signals processed by a hearing aid, which compresses speech spectra along frequency in order to make information contained in higher frequencies audible for listeners with severe high-frequency hearing loss. Quality measures were compared with subjective ratings obtained from normal hearing and hearing impaired children and adults in an earlier study. High correlations were achieved with quality measures computed by quality models that are based on the auditory model of Dau et al., namely, the measure PSM, computed by the quality model PEMO-Q; the measure qc, computed by the quality model proposed by Hansen and Kollmeier; and the linear subcomponent of the HASQI. For the prediction of quality ratings by hearing impaired listeners, extensions of some models incorporating hearing loss were implemented and shown to achieve improved prediction accuracy. Results indicate that these objective quality measures can potentially serve as tools for assisting in initial setting of frequency compression parameters.
Chapman, Kathy L; Baylis, Adriane; Trost-Cardamone, Judith; Cordero, Kelly Nett; Dixon, Angela; Dobbelsteyn, Cindy; Thurmes, Anna; Wilson, Kristina; Harding-Bell, Anne; Sweeney, Triona; Stoddard, Gregory; Sell, Debbie
To describe the results of two reliability studies and to assess the effect of training on interrater reliability scores. The first study (1) examined interrater and intrarater reliability scores (weighted and unweighted kappas) and (2) compared interrater reliability scores before and after training on the use of the Cleft Audit Protocol for Speech-Augmented (CAPS-A) with British English-speaking children. The second study examined interrater and intrarater reliability on a modified version of the CAPS-A (CAPS-A Americleft Modification) with American and Canadian English-speaking children. Finally, comparisons were made between the interrater and intrarater reliability scores obtained for Study 1 and Study 2. The participants were speech-language pathologists from the Americleft Speech Project. In Study 1, interrater reliability scores improved for 6 of the 13 parameters following training on the CAPS-A protocol. Comparison of the reliability results for the two studies indicated lower scores for Study 2 compared with Study 1. However, this appeared to be an artifact of the kappa statistic that occurred due to insufficient variability in the reliability samples for Study 2. When percent agreement scores were also calculated, the ratings appeared similar across Study 1 and Study 2. The findings of this study suggested that improvements in interrater reliability could be obtained following a program of systematic training. However, improvements were not uniform across all parameters. Acceptable levels of reliability were achieved for those parameters most important for evaluation of velopharyngeal function.
Full Text Available This paper applied speech recognition and RFID technologies to develop an omni-directional mobile robot into a robot with voice control and guide introduction functions. For speech recognition, the speech signals were captured by short-time processing. The speaker first recorded the isolated words for the robot to create speech database of specific speakers. After the speech pre-processing of this speech database, the feature parameters of cepstrum and delta-cepstrum were obtained using linear predictive coefficient (LPC. Then, the Hidden Markov Model (HMM was used for model training of the speech database, and the Viterbi algorithm was used to find an optimal state sequence as the reference sample for speech recognition. The trained reference model was put into the industrial computer on the robot platform, and the user entered the isolated words to be tested. After processing by the same reference model and comparing with previous reference model, the path of the maximum total probability in various models found using the Viterbi algorithm in the recognition was the recognition result. Finally, the speech recognition and RFID systems were achieved in an actual environment to prove its feasibility and stability, and implemented into the omni-directional mobile robot.
Gao, Pei-pei; Liu, Feng
With the development of information technology and artificial intelligence, speech synthesis plays a significant role in the fields of Human-Computer Interaction Techniques. However, the main problem of current speech synthesis techniques is lacking of naturalness and expressiveness so that it is not yet close to the standard of natural language. Another problem is that the human-computer interaction based on the speech synthesis is too monotonous to realize mechanism of user subjective drive. This thesis introduces the historical development of speech synthesis and summarizes the general process of this technique. It is pointed out that prosody generation module is an important part in the process of speech synthesis. On the basis of further research, using eye activity rules when reading to control and drive prosody generation was introduced as a new human-computer interaction method to enrich the synthetic form. In this article, the present situation of speech synthesis technology is reviewed in detail. Based on the premise of eye gaze data extraction, using eye movement signal in real-time driving, a speech synthesis method which can express the real speech rhythm of the speaker is proposed. That is, when reader is watching corpora with its eyes in silent reading, capture the reading information such as the eye gaze duration per prosodic unit, and establish a hierarchical prosodic pattern of duration model to determine the duration parameters of synthesized speech. At last, after the analysis, the feasibility of the above method is verified.
Greene, Beth G; Logan, John S; Pisoni, David B
We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.
GREENE, BETH G.; LOGAN, JOHN S.; PISONI, DAVID B.
We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered. PMID:23225916
Huh, Young Eun; Park, Jongkyu; Suh, Mee Kyung; Lee, Sang Eun; Kim, Jumin; Jeong, Yuri; Kim, Hee-Tae; Cho, Jin Whan
In Parkinson variant of multiple system atrophy (MSA-P), patterns of early speech impairment and their distinguishing features from Parkinson's disease (PD) require further exploration. Here, we compared speech data among patients with early-stage MSA-P, PD, and healthy subjects using quantitative acoustic and perceptual analyses. Variables were analyzed for men and women in view of gender-specific features of speech. Acoustic analysis revealed that male patients with MSA-P exhibited more profound speech abnormalities than those with PD, regarding increased voice pitch, prolonged pause time, and reduced speech rate. This might be due to widespread pathology of MSA-P in nigrostriatal or extra-striatal structures related to speech production. Although several perceptual measures were mildly impaired in MSA-P and PD patients, none of these parameters showed a significant difference between patient groups. Detailed speech analysis using acoustic measures may help distinguish between MSA-P and PD early in the disease process. Copyright © 2015 Elsevier Inc. All rights reserved.
Rullo, R; Di Maggio, D; Addabbo, F; Rullo, F; Festa, V M; Perillo, L
In this study, resonance and articulation disorders were examined in a group of patients surgically treated for cleft lip and palate, considering family social background, and children's ability of self monitoring their speech output while speaking. Fifty children (32 males and 18 females) mean age 6.5 ± 1.6 years, affected by non-syndromic complete unilateral cleft of the lip and palate underwent the same surgical protocol. The speech level was evaluated using the Accordi's speech assessment protocol that focuses on intelligibility, nasality, nasal air escape, pharyngeal friction, and glottal stop. Pearson product-moment correlation analysis was used to detect significant associations between analysed parameters. A total of 16% (8 children) of the sample had severe to moderate degree of nasality and nasal air escape, presence of pharyngeal friction and glottal stop, which obviously compromise speech intelligibility. Ten children (10%) showed a barely acceptable phonological outcome: nasality and nasal air escape were mild to moderate, but the intelligibility remained poor. Thirty-two children (64%) had normal speech. Statistical analysis revealed a significant correlation between the severity of nasal resonance and nasal air escape (p ≤ 0.05). No statistical significant correlation was found between the final intelligibility and the patient social background, neither between the final intelligibility nor the age of the patients. The differences in speech outcome could be explained with a specific, subjective, and inborn ability, different for each child, in self-monitoring their speech output.
Petar S. Aleksic
Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0Ã¢Â€Â“30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.
Wu, Liang; Wan, Congying; Wang, Supin; Wan, Mingxi
Electrolarynx (EL) is a medical speech-recovery device designed for patients who have lost their original voice box due to laryngeal cancer. As a substitute for human larynx, the current commercial EL voice source cannot reconstruct natural EL speech under laryngectomy conditions. To eliminate the abnormal acoustic properties of EL speech, a supraglottal voice source with compensation of vocal tract characteristics was proposed and provided through an experimental EL(SGVS-EL) system. The acoustic analyses of simulated EL speech and reconstructed EL speech produced with different voice sources were performed in the normal subject and laryngectomee. The results indicated that the supraglottal voice source was successful in improving the acoustic properties of EL speech by enhancing low- frequency energy, correcting the shifted formants to normal range, and eliminating the visible spectral zeros. Both normal subject and laryngectomee also produced more natural vowels using SGVS-EL than commercial EL, even if the vocal tract parameter was substituted and the supraglottal voice source was biased to a certain degree. Therefore, supraglottal voice source is a feasible and effective approach to improving the acoustic quality of EL speech.
Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris
Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to speech entrainment. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during speech entrainment versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of speech entrainment to improve speech production and may help select patients for speech entrainment treatment. PMID:25989443
Nørholm, Sidsel Marie
This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based...... on this model. The basic model used in this thesis is the harmonic model which is a commonly used model for describing the voiced part of the speech signal. We show that it can be beneficial to extend the model to take inharmonicities or the non-stationarity of speech into account. Extending the model...
Agus, Trevor R; Akeroyd, Michael A; Noble, William; Bhullar, Navjot
Many of the items in the "Speech, Spatial, and Qualities of Hearing" scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol. 43, 85-99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively higher scores for competing speech (six items), energetic masking (one item), and no masking (three items). The results suggest significant masking by competing speech in everyday listening situations.
Full Text Available This present study investigated the link between speech-in-speech perception capacities and four executive function components: response suppression, inhibitory control, switching and working memory. We constructed a cross-modal semantic priming paradigm using a written target word and a spoken prime word, implemented in one of two concurrent auditory sentences (cocktail party situation. The prime and target were semantically related or unrelated. Participants had to perform a lexical decision task on visual target words and simultaneously listen to only one of two pronounced sentences. The attention of the participant was manipulated: The prime was in the pronounced sentence listened to by the participant or in the ignored one. In addition, we evaluate the executive function abilities of participants (switching cost, inhibitory-control cost and response-suppression cost and their working memory span. Correlation analyses were performed between the executive and priming measurements. Our results showed a significant interaction effect between attention and semantic priming. We observed a significant priming effect in the attended but not in the ignored condition. Only priming effects obtained in the ignored condition were significantly correlated with some of the executive measurements. However, no correlation between priming effects and working memory capacity was found. Overall, these results confirm, first, the role of attention for semantic priming effect and, second, the implication of executive functions in speech-in-noise understanding capacities.
Rose, Richard C.; Barnwell, Thomas P., III
The performance of a broad class of analysis-by-synthesis linear predictive speech coders is quantified experimentally. The class of coders includes a number of well-known techniques as well as a very large number of speech coders which have not been named or studied. A general formulation for deriving the parametric representation used in all of the coders in the class is presented. A new coder, named the self-excited vocoder, is discussed because of its good performance with low complexity, and because of the insight this coder gives to analysis-by-synthesis coders in general. The results of a study comparing the performances of different members of this class are presented. The study takes the form of a series of formal subjective and objective speech quality tests performed on selected coders. The results of this study lead to some interesting and important observations concerning the controlling parameters for analysis-by-synthesis speech coders.
Partila, Pavol; Voznak, Miroslav; Tovarek, Jaromir
The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.
Full Text Available The paper makes a quantitative analysis and comparison on the continuous speech emotion of Lhasa Tibetan in the four basic emotional patterns (happy, surprise, sad, neutral pitch, energy and time length by experimental phonetics and the linear statistical research methods, found that there is a positive correlation between the Lhasa Tibetan emotional speech and pitch, energy and duration, etc. And the pitch, energy and duration of negative emotion acoustic parameters are bigger than positive emotion, on this basis, drawing the Lhasa Tibetan speech emotion acoustic feature patterns. Compared with the Chinese language and the Tibetan, even though both have the tone prosodic features, they also have significant differences in the acoustic characteristics of the speech emotion.
Mowlaee, Pejman; Saeidi, Rahim; Christensen, Mads Græsbøll
) accuracy, here, we report the objective and subjective results as well. The results show that the proposed system performs as well as the best of the state-of-the-art in terms of perceived quality while its performance in terms of speaker identification and automatic speech recognition results......In this paper, we present a novel system for joint speaker identification and speech separation. For speaker identification a single-channel speaker identification algorithm is proposed which provides an estimate of signal-to-signal ratio (SSR) as a by-product. For speech separation, we propose...... a sinusoidal model-based algorithm. The speech separation algorithm consists of a double-talk/single-talk detector followed by a minimum mean square error estimator of sinusoidal parameters for finding optimal codevectors from pre-trained speaker codebooks. In evaluating the proposed system, we start from...
Full Text Available The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.
Carbonell, Kathy M.
One of the lasting concerns in audiology is the unexplained individual differences in speech perception performance even for individuals with similar audiograms. One proposal is that there are cognitive/perceptual individual differences underlying this vulnerability and that these differences are present in normal hearing (NH) individuals but do not reveal themselves in studies that use clear speech produced in quiet (because of a ceiling effect). However, previous studies have failed to uncover cognitive/perceptual variables that explain much of the variance in NH performance on more challenging degraded speech tasks. This lack of strong correlations may be due to either examining the wrong measures (e.g., working memory capacity) or to there being no reliable differences in degraded speech performance in NH listeners (i.e., variability in performance is due to measurement noise). The proposed project has 3 aims; the first, is to establish whether there are reliable individual differences in degraded speech performance for NH listeners that are sustained both across degradation types (speech in noise, compressed speech, noise-vocoded speech) and across multiple testing sessions. The second aim is to establish whether there are reliable differences in NH listeners' ability to adapt their phonetic categories based on short-term statistics both across tasks and across sessions; and finally, to determine whether performance on degraded speech perception tasks are correlated with performance on phonetic adaptability tasks, thus establishing a possible explanatory variable for individual differences in speech perception for NH and hearing impaired listeners.
Bruderer, Alison G; Danielson, D Kyle; Kandhadai, Padmapriya; Werker, Janet F
The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception-production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants' speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants' tongues. With a looking-time procedure, we found that temporarily restraining infants' articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral-motor movements influence speech sound discrimination. Moreover, an experimentally induced "impairment" in articulator movement can compromise speech perception performance, raising the question of whether long-term oral-motor impairments may impact perceptual development.
Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z
The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.
Full Text Available The speech is a tool for accurate communication of ideas. When we talk about speech prevention as a practical realization of the language, we are referring to the fact that it should be comprised of the elements of the criteria as viewed from the perspective of the standards. This criteria, in the broad sense of the word, presupposes an exact realization of the thought expressed between the speaker and the recipient.The absence of this criterion catches the eye through the practical realization of the language and brings forth consequences, often hidden very deeply in the human psyche. Their outer manifestation already represents a delayed reaction of the social environment. The foundation for overcoming and standardization of this phenomenon must be the anatomy-physiological patterns of the body, accomplished through methods in concordance with the nature of the body.
Asadi, Sima; Wexler, Anthony S.; Cappa, Christopher D.; Bouvier, Nicole M.; Barreda-Castanon, Santiago; Ristenpart, William D.
We show that the rate of aerosol particle emission during healthy human speech is strongly correlated with the loudness (amplitude) of vocalization. Emission rates range from approximately 1 to 50 particles per second for quiet to loud amplitudes, regardless of language spoken (English, Spanish, Mandarin, or Arabic). Intriguingly, a small fraction of individuals behave as ``super emitters,'' consistently emitting an order of magnitude more aerosol particles than their peers. We interpret the results in terms of the eggressive flowrate during vocalization, which is known to vary significantly for different types of vocalization and for different individuals. The results suggest that individual speech patterns could affect the probability of airborne disease transmission. The results also provide a possible explanation for the existence of ``super spreaders'' who transmit pathogens much more readily than average and who play a key role in the spread of epidemics.
The Speech Cycling Task is a novel experimental paradigm developed together with Robert Port and Keiichi Tajima at Indiana University. In a task of this sort, subjects repeat a phrase containing multiple prominent, or stressed, syllables in time with an auditory metronome, which can be simple or complex. A phase-based collective variable is defined in the acoustic speech signal. This paper reports on two experiments using speech cycling which together reveal many of the hallmarks of hierarchically coupled oscillatory processes. The first experiment requires subjects to place the final stressed syllable of a small phrase at specified phases within the overall Phrase Repetition Cycle (PRC). It is clearly demonstrated that only three patterns, characterized by phases around 1/3, 1/2 or 2/3 are reliably produced, and these points are attractors for other target phases. The system is thus multistable, and the attractors correspond to stable couplings between the metrical foot and the PRC. A second experiment examines the behavior of these attractors at increased rates. Faster rates lead to mode jumps between attractors. Previous experiments have also illustrated hysteresis as the system moves from one mode to the next. The dynamical organization is particularly interesting from a modeling point of view, as there is no single part of the speech production system which cycles at the level of either the metrical foot or the phrase repetition cycle. That is, there is no continuous kinematic observable in the system. Nonetheless, there is strong evidence that the oscopic behavior of the entire production system is correctly described as hierarchically coupled oscillators. There are many parallels between this organization and the forms of inter-limb coupling observed in locomotion and rhythmic manual tasks.
Full Text Available It has been suggested that intra- and inter-speaker variability in speech are correlated. Interlocutors have been shown to converge on various phonetic dimensions. In addition, speakers imitate the phonetic properties of voices they are exposed to in shadowing, repetition, and even passive listening tasks. We review three theoretical accounts of speech imitation and convergence phenomena: (i the Episodic Theory (ET of speech perception and production (Goldinger, 1998; (ii the Motor Theory (MT of speech perception (Liberman and Whalen, 2000;Galantucci et al., 2006 ; (iii Communication Accommodation Theory (CAT; Giles et al., 1991;Giles and Coupland, 1991. We argue that no account is able to explain all the available evidence. In particular, there is a need to integrate low-level, mechanistic accounts (like ET and MT and higher-level accounts (like CAT. We propose that this is possible within the framework of an integrated theory of production and comprehension (Pickering & Garrod, in press. Similarly to both ET and MT, this theory assumes parity between production and perception. Uniquely, however, it posits that listeners simulate speakers’ utterances by computing forward-model predictions at many different levels, which are then compared to the incoming phonetic input. In our account phonetic imitation can be achieved via the same mechanism that is responsible for sensorimotor adaptation; i.e. the correction of prediction errors. In addition, the model assumes that the degree to which sensory prediction errors lead to motor adjustments is context-dependent. The notion of context subsumes both the preceding linguistic input and non-linguistic attributes of the situation (e.g., the speaker’s and listener’s social identities, their conversational roles, the listener’s intention to imitate.
enough from the truth. Subjects were then interviewed individually in a sound booth to obtain “norming” speech data, pre- interview. We also...e.g. pitch, intensity, speaking rate, voice quality), gender, ethnicity and personality information, our machine learning experiments can classify...Have you ever been in trouble with the police?” vs. open-ended (e.g. “What is the last movie you saw that you really hated ?”) DISTRIBUTION A
Tovarek, Jaromir; Partila, Pavol; Voznak, Miroslav; Mikulec, Martin; Mehic, Miralem
Impact of changes in blood pressure and pulse from human speech is disclosed in this article. The symptoms of increased physical activity are pulse, systolic and diastolic pressure. There are many methods of measuring and indicating these parameters. The measurements must be carried out using devices which are not used in everyday life. In most cases, the measurement of blood pressure and pulse following health problems or other adverse feelings. Nowadays, research teams are trying to design and implement modern methods in ordinary human activities. The main objective of the proposal is to reduce the delay between detecting the adverse pressure and to the mentioned warning signs and feelings. Common and frequent activity of man is speaking, while it is known that the function of the vocal tract can be affected by the change in heart activity. Therefore, it can be a useful parameter for detecting physiological changes. A method for detecting human physiological changes by speech processing and artificial neural network classification is described in this article. The pulse and blood pressure changes was induced by physical exercises in this experiment. The set of measured subjects was formed by ten healthy volunteers of both sexes. None of the subjects was a professional athlete. The process of the experiment was divided into phases before, during and after physical training. Pulse, systolic, diastolic pressure was measured and voice activity was recorded after each of them. The results of this experiment describe a method for detecting increased cardiac activity from human speech using artificial neural network.
Van Bree, K.C.
For many speech telecommunication technologies a robust speech activity detector is important. An audio-only speech detector will givefalse positives when the interfering signal is speech or has speech characteristics. The modality video is suitable to solve this problem. In this report the approach
Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.
This report describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). Part I describes a classification extension to the SDCS to differentiate motor speech disorders from speech delay and to differentiate among three sub-types of motor speech disorders.…
Badalamenti, A F
This paper presents evidence that six of the seven parts of speech occur in written text as Poisson processes, simple or recurring. The six major parts are nouns, verbs, adjectives, adverbs, prepositions, and conjunctions, with the interjection occurring too infrequently to support a model. The data consist of more than the first 5000 words of works by four major authors coded to label the parts of speech, as well as periods (sentence terminators). Sentence length is measured via the period and found to be normally distributed with no stochastic model identified for its occurrence. The models for all six speech parts but the noun significantly distinguish some pairs of authors and likewise for the joint use of all words types. Any one author is significantly distinguished from any other by at least one word type and sentence length very significantly distinguishes each from all others. The variety of word type use, measured by Shannon entropy, builds to about 90% of its maximum possible value. The rate constants for nouns are close to the fractions of maximum entropy achieved. This finding together with the stochastic models and the relations among them suggest that the noun may be a primitive organizer of written text.
Malik, H.; Darma, S.; Soekirno, S.
This research reported a comparison from a success rate of speech recognition systems that used two types of databases they were existing databases and new databases, that were implemented into quadcopter as motion control. Speech recognition system was using Mel frequency cepstral coefficient method (MFCC) as feature extraction that was trained using recursive neural network method (RNN). MFCC method was one of the feature extraction methods that most used for speech recognition. This method has a success rate of 80% - 95%. Existing database was used to measure the success rate of RNN method. The new database was created using Indonesian language and then the success rate was compared with results from an existing database. Sound input from the microphone was processed on a DSP module with MFCC method to get the characteristic values. Then, the characteristic values were trained using the RNN which result was a command. The command became a control input to the single board computer (SBC) which result was the movement of the quadcopter. On SBC, we used robot operating system (ROS) as the kernel (Operating System).
de Cheveigné, Alain; Kawahara, Hideki
An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds. It is based on the well-known autocorrelation method with a number of modifications that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. There is no upper limit on the frequency search range, so the algorithm is suited for high-pitched voices and music. The algorithm is relatively simple and may be implemented efficiently and with low latency, and it involves few parameters that must be tuned. It is based on a signal model (periodic signal) that may be extended in several ways to handle various forms of aperiodicity that occur in particular applications. Finally, interesting parallels may be drawn with models of auditory processing.
Lehn-Schiøler, Tue; Hansen, Lars Kai; Larsen, Jan
In this paper a system that transforms speech waveforms to animated faces are proposed. The system relies on continuous state space models to perform the mapping, this makes it possible to ensure video with no sudden jumps and allows continuous control of the parameters in 'face space...... a subjective point of view the model is able to construct an image sequence from an unknown noisy speech sequence even though the number of training examples are limited.......'. The performance of the system is critically dependent on the number of hidden variables, with too few variables the model cannot represent data, and with too many overfitting is noticed. Simulations are performed on recordings of 3-5 sec.\\$\\backslash\\$ video sequences with sentences from the Timit database. From...
Shriberg, Lawrence D.; Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.
Purpose: The goal of this article is to introduce the pause marker (PM), a single-sign diagnostic marker proposed to discriminate early or persistent childhood apraxia of speech (CAS) from speech delay.
Full Text Available The authors analyse the effect of speech rate variation on Afrikaans phone stability from an acoustic perspective. Specifically they introduce two techniques for the acoustic analysis of speech rate variation, apply these techniques to an Afrikaans...
Jørgensen, Søren; Cubick, Jens; Dau, Torsten
In the development process of modern telecommunication systems, such as mobile phones, it is common practice to use computer models to objectively evaluate the transmission quality of the system, instead of time-consuming perceptual listening tests. Such models have typically focused on the quality...... of the transmitted speech, while little or no attention has been provided to speech intelligibility. The present study investigated to what extent three state-of-the art speech intelligibility models could predict the intelligibility of noisy speech transmitted through mobile phones. Sentences from the Danish...... Dantale II speech material were mixed with three different kinds of background noise, transmitted through three different mobile phones, and recorded at the receiver via a local network simulator. The speech intelligibility of the transmitted sentences was assessed by six normal-hearing listeners...
Jung, Youngsin; Duffy, Joseph R; Josephs, Keith A
Primary progressive aphasia is a neurodegenerative syndrome characterized by progressive language dysfunction. The majority of primary progressive aphasia cases can be classified into three subtypes: nonfluent/agrammatic, semantic, and logopenic variants. Each variant presents with unique clinical features, and is associated with distinctive underlying pathology and neuroimaging findings. Unlike primary progressive aphasia, apraxia of speech is a disorder that involves inaccurate production of sounds secondary to impaired planning or programming of speech movements. Primary progressive apraxia of speech is a neurodegenerative form of apraxia of speech, and it should be distinguished from primary progressive aphasia given its discrete clinicopathological presentation. Recently, there have been substantial advances in our understanding of these speech and language disorders. The clinical, neuroimaging, and histopathological features of primary progressive aphasia and apraxia of speech are reviewed in this article. The distinctions among these disorders for accurate diagnosis are increasingly important from a prognostic and therapeutic standpoint. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
Faundez-Zanuy, Marcos; Esposito, Antonietta; Cordasco, Gennaro; Drugman, Thomas; Solé-Casals, Jordi; Morabito, Francesco
This book presents recent advances in nonlinear speech processing beyond nonlinear techniques. It shows that it exploits heuristic and psychological models of human interaction in order to succeed in the implementations of socially believable VUIs and applications for human health and psychological support. The book takes into account the multifunctional role of speech and what is “outside of the box” (see Björn Schuller’s foreword). To this aim, the book is organized in 6 sections, each collecting a small number of short chapters reporting advances “inside” and “outside” themes related to nonlinear speech research. The themes emphasize theoretical and practical issues for modelling socially believable speech interfaces, ranging from efforts to capture the nature of sound changes in linguistic contexts and the timing nature of speech; labors to identify and detect speech features that help in the diagnosis of psychological and neuronal disease, attempts to improve the effectiveness and performa...
Lü, Tao; Guo, Jin; Zhang, He-yong; Yan, Chun-hui; Wang, Can-jin
To address the challenges of non-cooperative and remote acoustic detection, an all-fiber laser Doppler vibrometer (LDV) is established. The all-fiber LDV system can offer the advantages of smaller size, lightweight design and robust structure, hence it is a better fit for remote speech detection. In order to improve the performance and the efficiency of LDV for long-range hearing, the speech enhancement technology based on optimally modified log-spectral amplitude (OM-LSA) algorithm is used. The experimental results show that the comprehensible speech signals within the range of 150 m can be obtained by the proposed LDV. The signal-to-noise ratio ( SNR) and mean opinion score ( MOS) of the LDV speech signal can be increased by 100% and 27%, respectively, by using the speech enhancement technology. This all-fiber LDV, which combines the speech enhancement technology, can meet the practical demand in engineering.
Mobile Speech and Advanced Natural Language Solutions provides a comprehensive and forward-looking treatment of natural speech in the mobile environment. This fourteen-chapter anthology brings together lead scientists from Apple, Google, IBM, AT&T, Yahoo! Research and other companies, along with academicians, technology developers and market analysts. They analyze the growing markets for mobile speech, new methodological approaches to the study of natural language, empirical research findings on natural language and mobility, and future trends in mobile speech. Mobile Speech opens with a challenge to the industry to broaden the discussion about speech in mobile environments beyond the smartphone, to consider natural language applications across different domains. Among the new natural language methods introduced in this book are Sequence Package Analysis, which locates and extracts valuable opinion-related data buried in online postings; microintonation as a way to make TTS truly human-like; and se...
Korsun, O. N.; Poliyev, A. V.
The algorithm for building an optimal pattern for the purpose of automatic speech recognition, which increases the probability of correct recognition, is developed and presented in this work. The optimal pattern forming is based on the decomposition of an initial pattern to principal components, which enables to reduce the dimension of multi-parameter optimization problem. At the next step the training samples are introduced and the optimal estimates for principal components decomposition coefficients are obtained by a numeric parameter optimization algorithm. Finally, we consider the experiment results that show the improvement in speech recognition introduced by the proposed optimization algorithm.
The role of speech production mechanisms in difficult speech comprehension is the subject of on-going debate in speech science. Two Activation Likelihood Estimation (ALE) analyses were conducted on neuroimaging studies investigating difficult speech comprehension or speech production. Meta-analysis 1 included 10 studies contrasting comprehension…
...] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...; speech-to-speech (STS); pay-per-call (900) calls; types of calls; and equal access to interexchange... of a report, due April 16, 2011, addressing whether it is necessary for the waivers to remain in...
...] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities... proposed compensation rates for Interstate TRS, Speech-to-Speech Services (STS), Captioned Telephone... costs reported in the data submitted to NECA by VRS providers. In this regard, document DA 10-761 also...
H. B. Kekre; Tanuja K. Sarode
Mostly transforms are used for speech data compressions which are lossy algorithms. Such algorithms are tolerable for speech data compression since the loss in quality is not perceived by the human ear. However the vector quantization (VQ) has a potential to give more data compression maintaining the same quality. In this paper we propose speech data compression algorithm using vector quantization technique. We have used VQ algorithms LBG, KPE and FCG. The results table s...
Full Text Available This paper presents new Czech language two-channel (stereo speech database recorded in car environment. The created database was designed for experiments with speech enhancement for communication purposes and for the study and the design of a robust speech recognition systems. Tools for automated phoneme labelling based on Baum-Welch re-estimation were realised. The noise analysis of the car background environment was done.
Pollak, P.; Vopicka, J.; Hanzl, V.; Sovka, Pavel
This paper presents new Czech language two-channel (stereo) speech database recorded in car environment. The created database was designed for experiments with speech enhancement for communication purposes and for the study and the design of a robust speech recognition systems. Tools for automated phoneme labelling based on Baum-Welch re-estimation were realised. The noise analysis of the car background environment was done.
Agus, Trevor R.; Akeroyd, Michael A.; Noble, William; Bhullar, Navjot
Many of the items in the “Speech, Spatial, and Qualities of Hearing” scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol.43, 85–99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively ...
Gade, Anders C.
This chapter deals specifically with concepts, tools, and architectural variables of importance when designing auditoria for speech and music. The focus will be on cultivating the useful components of the sound in the room rather than on avoiding noise from outside or from installations, which is dealt with in Chap. 11. The chapter starts by presenting the subjective aspects of the room acoustic experience according to consensus at the time of writing. Then follows a description of their objective counterparts, the objective room acoustic parameters, among which the classical reverberation time measure is only one of many, but still of fundamental value. After explanations on how these parameters can be measured and predicted during the design phase, the remainder of the chapter deals with how the acoustic properties can be controlled by the architectural design of auditoria. This is done by presenting the influence of individual design elements as well as brief descriptions of halls designed for specific purposes, such as drama, opera, and symphonic concerts. Finally, some important aspects of loudspeaker installations in auditoria are briefly touched upon.
Macdonald, Ewen N; Raufer, Stefan
The Lombard effect refers to the phenomenon where talkers automatically increase their level of speech in a noisy environment. While many studies have characterized how the Lombard effect influences different measures of speech production (e.g., F0, spectral tilt, etc.), few have investigated...... the consequences of temporally fluctuating noise. In the present study, 20 talkers produced speech in a variety of noise conditions, including both steady-state and amplitude-modulated white noise. While listening to noise over headphones, talkers produced randomly generated five word sentences. Similar...... of noisy environments and will alter their speech accordingly....
In this paper I argue that one way of explaining what is wrong with hate speech is by critically assessing what kind of freedom free speech involves and, relatedly, what kind of freedom hate speech undermines. More specifically, I argue that the main arguments for freedom of speech (e.g. from truth, from autonomy, and from democracy) rely on a “positive” conception of freedom intended as autonomy and self-mastery (Berlin, 2006), and can only partially help us to understand what is wrong with ...
Polyanskaya, Leona; Ordin, Mikhail
Analysis of English rhythm in speech produced by children and adults revealed that speech rhythm becomes increasingly more stress-timed as language acquisition progresses. Children reach the adult-like target by 11 to 12 years. The employed speech elicitation paradigm ensured that the sentences produced by adults and children at different ages were comparable in terms of lexical content, segmental composition, and phonotactic complexity. Detected differences between child and adult rhythm and between rhythm in child speech at various ages cannot be attributed to acquisition of phonotactic language features or vocabulary, and indicate the development of language-specific phonetic timing in the course of acquisition.
Developmental apraxia of speech (DAS) in children is a speech disorder, supposed to have a neurological origin, which is commonly considered to result from particular deficits in speech processing (i.e., phonological planning, motor programming). However, the label DAS has often been used as
Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.
Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…
Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike
Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in 'healthy' communication direct speech constructions contribute to the liveliness, and indirectly to
Imad Hayif Sameer
Full Text Available The theory of speech acts, which clarifies what people do when they speak, is not about individual words or sentences that form the basic elements of human communication, but rather about particular speech acts that are performed when uttering words. A speech act is the attempt at doing something purely by speaking. Many things can be done by speaking. Speech acts are studied under what is called speech act theory, and belong to the domain of pragmatics. In this paper, two Egyptian inaugural speeches from El-Sadat and El-Sisi, belonging to different periods were analyzed to find out whether there were differences within this genre in the same culture or not. The study showed that there was a very small difference between these two speeches which were analyzed according to Searle’s theory of speech acts. In El Sadat’s speech, commissives came to occupy the first place. Meanwhile, in El–Sisi’s speech, assertives occupied the first place. Within the speeches of one culture, we can find that the differences depended on the circumstances that surrounded the elections of the Presidents at the time. Speech acts were tools they used to convey what they wanted and to obtain support from their audiences.
Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.
De Keyser, Kim; Santens, Patrick; Bockstael, Annelies; Botteldooren, Dick; Talsma, Durk; De Vos, Stefanie; Van Cauwenberghe, Mieke; Verheugen, Femke; Corthals, Paul; De Letter, Miet
Purpose: This study investigated the possible relationship between hypokinetic speech production and speech intensity perception in patients with Parkinson's disease (PD). Method: Participants included 14 patients with idiopathic PD and 14 matched healthy controls (HCs) with normal hearing and cognition. First, speech production was objectified…
Jamal, Norezmi; Shanta, Shahnoor; Mahmud, Farhanahani; Sha'abani, MNAH
This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system's library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.
Vivien Arief Wardhany
Full Text Available The purpose of multimedia devices development is controlling through voice. Nowdays voice that can be recognized only in English. To overcome the issue, then recognition using Indonesian language model and accousticc model and dictionary. Automatic Speech Recognizier is build using engine CMU Sphinx with modified english language to Indonesian Language database and XBMC used as the multimedia player. The experiment is using 10 volunteers testing items based on 7 commands. The volunteers is classifiedd by the genders, 5 Male & 5 female. 10 samples is taken in each command, continue with each volunteer perform 10 testing command. Each volunteer also have to try all 7 command that already provided. Based on percentage clarification table, the word “Kanan” had the most recognize with percentage 83% while “pilih” is the lowest one. The word which had the most wrong clarification is “kembali” with percentagee 67%, while the word “kanan” is the lowest one. From the result of Recognition Rate by male there are several command such as “Kembali”, “Utama”, “Atas “ and “Bawah” has the low Recognition Rate. Especially for “kembali” cannot be recognized as the command in the female voices but in male voice that command has 4% of RR this is because the command doesn’t have similar word in english near to “kembali” so the system unrecognize the command. Also for the command “Pilih” using the female voice has 80% of RR but for the male voice has only 4% of RR. This problem is mostly because of the different voice characteristic between adult male and female which male has lower voice frequencies (from 85 to 180 Hz than woman (165 to 255 Hz.The result of the experiment showed that each man had different number of recognition rate caused by the difference tone, pronunciation, and speed of speech. For further work needs to be done in order to improving the accouracy of the Indonesian Automatic Speech Recognition system
Hodges-Simeon, Carolyn R.; Gaulin, Steven J. C.; Puts, David A.
Low mean fundamental frequency (F 0) in men’s voices has been found to positively influence perceptions of dominance by men and attractiveness by women using standardized speech. Using natural speech obtained during an ecologically valid social interaction, we examined relationships between multiple vocal parameters and dominance and attractiveness judgments. Male voices from an unscripted dating game were judged by men for physical and social dominance and by women in fert...
abridging the freedom of speech .” Speech is construed broadly and includes both oral and written speech, as well as expressive conduct and displays when...intended to convey a message that is likely to be understood.7 Religious speech is certainly included. As a bedrock constitutional right, freedom of speech has...to good order and discipline or of a nature to bring discredit upon the armed forces)—the First Amendment’s freedom of speech will not provide them
Maas, Edwin; Mailend, Marja-Liisa
Purpose: The purpose of this article is to present an argument for the use of online reaction time (RT) methods to the study of apraxia of speech (AOS) and to review the existing small literature in this area and the contributions it has made to our fundamental understanding of speech planning (deficits) in AOS. Method: Following a brief…
Stekelenburg, J.J.; Keetels, M.N.; Vroomen, J.H.M.
Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect.
Full Text Available One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processingd with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or
Peelle, Jonathan E; Sommers, Mitchell S
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration
Peelle, Jonathan E.; Sommers, Mitchell S.
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported
Full Text Available Deep brain stimulation (DBS has been reported to be successful in relieving the core motor symptoms of Parkinson's disease (PD and motor fluctuations in the more advanced stages of the disease. However, data on the effects of DBS on speech performance are inconsistent. While there are some series of patients documenting that speech function was relatively unaffected by DBS of the nucleus subthalamicus (STN, other investigators reported on improvements of distinct parameters of oral control and voice. Though, these ameliorations of single speech modalities were not always accompanied by an improvement of overall speech intelligibility. On the other hand, there are also indications for an induction of dysarthria as an adverse effect of STN-DBS occurring at least in some patients with PD. Since a deterioration of speech function has more often been observed under high stimulation amplitudes, this phenomenon has been ascribed to a spread of current-to-adjacent pathways which might also be the reason for the sporadic observation of an onset of dysarthria under DBS of other basal ganglia targets (e.g., globus pallidus internus/GPi or thalamus/Vim. The aim of this paper is to review and evaluate reports in the literature on the effects of DBS on speech function in PD.
Waqas, A.; Muhammad, T.; Jamal, H.
Speech is the most essential method of correspondence of humankind. Cell telephony, portable hearing assistants and, hands free are specific provisions in this respect. The performance of these communication devices could be affected because of distortions which might augment them. There are two essential sorts of distortions that might be recognized, specifically: convolutive and additive noises. These mutilations contaminate the clean speech and make it unsatisfactory to human audiences i.e. perceptual value and intelligibility of speech signal diminishes. The objective of speech upgrade systems is to enhance the quality and understandability of speech to make it more satisfactory to audiences. This paper recommends a modified hybrid approach for single channel devices to process the noisy signals considering only the effect of background noises. It is a mixture of pre-processing relative spectral amplitude (RASTA) filter, which is approximated by a straight forward 4th order band-pass filter, and conventional minimum mean square error short time spectral amplitude (MMSE STSA85) estimator. To analyze the performance of the algorithm an objective parameter called Perceptual estimation of speech quality (PESQ) is measured. The results show that the modified algorithm performs well to remove the background noises. SIMULINK implementation is also performed and its profile report has been generated to observe the execution time. (author)
of reduction levels and perceived speaker attributes in which moderate reduction can make a better impression on listeners than no reduction. In addition to its relevance in reduction models and theories, this interplay is instructive for various fields of speech application from social robotics to charisma...... whether variation in the degree of reduction also has a systematic effect on the attributes we ascribe to the speaker who produces the speech signal. A perception experiment was carried out for German in which 46 listeners judged whether or not speakers showing 3 different combinations of segmental...... and prosodic reduction levels (unreduced, moderately reduced, strongly reduced) are appropriately described by 13 physical, social, and cognitive attributes. The experiment shows that clear speech is not mere speech, and less clear speech is not just reduced either. Rather, results revealed a complex interplay...
Sales-Cruz, Mauricio; Heitzig, Martina; Cameron, Ian
of optimisation techniques coupled with dynamic solution of the underlying model. Linear and nonlinear approaches to parameter estimation are investigated. There is also the application of maximum likelihood principles in the estimation of parameters, as well as the use of orthogonal collocation to generate a set......In this chapter the importance of parameter estimation in model development is illustrated through various applications related to reaction systems. In particular, rate constants in a reaction system are obtained through parameter estimation methods. These approaches often require the application...... of algebraic equations as the basis for parameter estimation.These approaches are illustrated using estimations of kinetic constants from reaction system models....
Oard, Douglas W
Libraries and archives collect recorded speech and multimedia objects that contain recorded speech, and such material may comprise a substantial portion of the collection in future digital libraries...
.... To aid in the assessment of various commercially available speech recognition systems, several aircraft speech databases have been developed at the Air Force Research Laboratory's Human Effectiveness Directorate...
Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias
Seeing the talker’s articulatory mouth movements can influence the auditory speech percept both in speech identification and detection tasks. Here we show that these audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers...... often fail to perceive as speech. While audiovisual integration in the identification task only occurred when observers were informed of the speech-like nature of SWS, integration occurred in the detection task both for informed and naïve observers. This shows that both speech-specific and general...... mechanisms underlie audiovisual integration of speech....
... speech intelligibility. Speech intelligibility for signals generated by an acoustic microphone, a throat microphone, and the two microphones together was assessed using the Modified Rhyme Test (MRT...
Weinstein, C. J.
This report documents work performed during FY 1981 on the DCA-sponsored Network Speech Systems Technology Program. The two areas of work reported are: (1) communication system studies in support of the evolving Defense Switched Network (DSN) and (2) design and implementation of satellite/terrestrial interfaces for the Experimental Integrated Switched Network (EISN). The system studies focus on the development and evaluation of economical and endurable network routing procedures. Satellite/terrestrial interface development includes circuit-switched and packet-switched connections to the experimental wideband satellite network. Efforts in planning and coordination of EISN experiments are reported in detail in a separate EISN Experiment Plan.
Actual spoken language of man developed only approximately 200,000 to 100,000 years ago. As a result of natural selection, man has developed hearing, which is most sensitive in the frequency regions of 200 to 4000 Hz, corresponding to those of spoken sounds. Functional hearing has been one of the prerequisites for the development of speech, although according to current opinion the language itself may have evolved by mimicking gestures with the so-called mirror neurons. Due to hearing, gesticulation was no longer necessary, and the hands became available for other purposes.
Schönmeyr, Björn; Wendby, Lisa; Sharma, Mitali; Jacobson, Lia; Restrepo, Carolina; Campbell, Alex
Many patients with cleft palate deformities worldwide receive treatment at a later age than is recommended for normal speech to develop. The outcomes after late palate repairs in terms of speech and quality of life (QOL) still remain largely unstudied. In the current study, questionnaires were used to assess the patients' perception of speech and QOL before and after primary palate repair. All of the patients were operated at a cleft center in northeast India and had a cleft palate with a normal lip or with a cleft lip that had been previously repaired. A total of 134 patients (7-35 years) were interviewed preoperatively and 46 patients (7-32 years) were assessed in the postoperative survey. The survey showed that scores based on the speech handicap index, concerning speech and speech-related QOL, did not improve postoperatively. In fact, the questionnaires indicated that the speech became more unpredictable (P reported that their self-confidence had improved after the operation. Thus, the majority of interviewed patients who underwent late primary palate repair were satisfied with the surgery. At the same time, speech and speech-related QOL did not improve according to the speech handicap index-based survey. Speech predictability may even become worse and nasal regurgitation may increase after late palate repair, according to these results.
Nielsen, Jens Bo; Dau, Torsten; Neher, Tobias
Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs......) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded...... with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed...
Balasubramanian, Venu; Max, Ludo
The present study reports on the first case of crossed apraxia of speech (CAS) in a 69-year-old right-handed female (SE). The possibility of occurrence of apraxia of speech (AOS) following right hemisphere lesion is discussed in the context of known occurrences of ideomotor apraxias and acquired neurogenic stuttering in several cases with right…
Yegnanarayana, B.; Satyanarayana Murthy, P.; Eggen, J.H.
In this paper we propose a speech-analysis method to bring out characteristics of the vocal tract system in short segments which are much less than a pitch period. The method performs windowing in the source and system components of the speech signal and recombines them to obtain a signal reflecting
RUTHERFORD, DAVID; WESTLAKE, HAROLD
DESIGNED TO PROVIDE AN ESSENTIAL CORE OF INFORMATION, THIS BOOK TREATS NORMAL AND ABNORMAL DEVELOPMENT, STRUCTURE, AND FUNCTION OF THE LIPS AND PALATE AND THEIR RELATIONSHIPS TO CLEFT LIP AND CLEFT PALATE SPEECH. PROBLEMS OF PERSONAL AND SOCIAL ADJUSTMENT, HEARING, AND SPEECH IN CLEFT LIP OR CLEFT PALATE INDIVIDUALS ARE DISCUSSED. NASAL RESONANCE…
Gocen, Gokcen; Okur, Alpaslan
Generally, the speaking aspect is not properly debated when discussing the positive and negative effects of television (TV), especially on children. So, to highlight this point, this study was first initialized by asking the question: "What are the effects of TV on speech?" and secondly, to transform the effects that TV has on speech in…
Barker, Larry L.; And Others
The purposes of this paper are (1) to review the background and nature of hypnosis, (2) to synthesize research on hypnosis related to speech communication, and (3) to delineate and compare two potential techniques for reducing speech anxiety--hypnosis and systematic desensitization. Hypnosis has been defined as a mental state characterised by…
not only affect the listener of speech communication in a noisy environment, HPDs can also affect the speaker . Tufts and Frank (2003) found that...of hearing protection on speech intelligibility in noise. Sound and Vibration . 20(10): 12-14. Berger, E. H. 1980. EARLog #4 – The
Namasivayam, Aravind K.; Pukonen, Margit; Goshulak, Debra; Hard, Jennifer; Rudzicz, Frank; Rietveld, Toni; Maassen, Ben; Kroll, Robert; van Lieshout, Pascal
Background: Intensive treatment has been repeatedly recommended for the treatment of speech deficits in childhood apraxia of speech (CAS). However, differences in treatment outcomes as a function of treatment intensity have not been systematically studied in this population. Aim: To investigate the effects of treatment intensity on outcome…
Full Text Available Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants’ attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4-13 months of age were exposed to happy-sounding infant-directed speech versus hummed lullabies by the same woman. They listened significantly longer to the speech, which had considerably greater acoustic variability and expressiveness, than to the lullabies. In Experiment 2, infants of comparable age who heard the lyrics of a Turkish children’s song spoken versus sung in a joyful/happy manner did not exhibit differential listening. Infants in Experiment 3 heard the happily sung lyrics of the Turkish children’s song versus a version that was spoken in an adult-directed or affectively neutral manner. They listened significantly longer to the sung version. Overall, happy voice quality rather than vocal mode (speech or singing was the principal contributor to infant attention, regardless of age.
A keynote speech outlining the importance of collaboration and diversity in the workplace. The 20-minute speech describes NASA's challenges and accomplishments over the years and what lies ahead. Topics include: diversity and inclusion principles, international cooperation, Kennedy Space Center planning and development, opportunities for cooperation, and NASA's vision for exploration.
Engels, H.J.M.; Heynick, F.; Staak, C.P.F. van der
Freud's contemporary fellow psychiatrist Emil Kraepelin collected over the course of several decades some 700 specimens of speech in dreams, mostly his own, along with various concomitant data. These generally exhibit far more obvious primary-process influence than do the dream speech specimens
Riis, Søren Kamaric
We evaluate the hidden neural network HMM/NN hybrid on two speech recognition benchmark tasks; (1) task independent isolated word recognition on the Phonebook database, and (2) recognition of broad phoneme classes in continuous speech from the TIMIT database. It is shown how hidden neural networks...
Holzrichter, John F.
A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.
Lalonde, Kaylah; Holt, Rachael Frush
Purpose: This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method: Twelve adults and 27 typically developing 3-…
Harris, Katherine Safford
Speaking is universally acknowledged as an important human talent, yet as a topic of educated common knowledge, it is peculiarly neglected. Partly, this is a consequence of the relatively recent growth of research on speech perception, production, and development, but also a function of the way that information is sliced up by undergraduate colleges. Although the basic acoustic mechanism of vowel production was known to Helmholtz, the ability to view speech production as a physiological event is evolving even now with such techniques as fMRI. Intensive research on speech perception emerged only in the early 1930s as Fletcher and the engineers at Bell Telephone Laboratories developed the transmission of speech over telephone lines. The study of speech development was revolutionized by the papers of Eimas and his colleagues on speech perception in infants in the 1970s. Dissemination of knowledge in these fields is the responsibility of no single academic discipline. It forms a center for two departments, Linguistics, and Speech and Hearing, but in the former, there is a heavy emphasis on other aspects of language than speech and, in the latter, a focus on clinical practice. For psychologists, it is a rather minor component of a very diverse assembly of topics. I will focus on these three fields in proposing possible remedies.
Cooper, James W.; Viswanathan, Mahesh; Byron, Donna; Chan, Margaret
The study has applied speech recognition and text-mining technologies to a set of recorded outbound marketing calls and analyzed the results. Since speaker-independent speech recognition technology results in a significantly lower recognition rate than that found when the recognizer is trained for a particular speaker, a number of post-processing…
Ries, Stephanie; Janssen, Niels; Dufau, Stephane; Alario, F.-Xavier; Burle, Boris
The concept of "monitoring" refers to our ability to control our actions on-line. Monitoring involved in speech production is often described in psycholinguistic models as an inherent part of the language system. We probed the specificity of speech monitoring in two psycholinguistic experiments where electroencephalographic activities were…
Shriberg, Lawrence D.; Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.
Purpose: Previous articles in this supplement described rationale for and development of the pause marker (PM), a diagnostic marker of childhood apraxia of speech (CAS), and studies supporting its validity and reliability. The present article assesses the theoretical coherence of the PM with speech processing deficits in CAS. Method: PM and other…
Morgan, Lewis B.
Introduces counselors to 10 commonly used mannerisms of speech and the part that each mannerism plays in the communication process, especially in the counseling context. Offers suggestions on how to respond to these speech mannerisms in a straightforward and effective manner. (Author)
This paper presents a study of pulmonic ingressive speech, a severely understudied phenomenon within varieties of English. While ingressive speech has been reported for several parts of the British Isles, New England, and eastern Canada, thus far Newfoundland appears to be the only locality where researchers have managed to provide substantial…
Groenewold, Rimke; Bastiaanse, Roelien; Huiskes, Mike
Background: Previous studies have shown that individuals with aphasia are usually able to produce direct reported speech constructions. So far these studies have mainly been conducted in English. The results show that direct speech is beneficial for aphasic speakers for various reasons. In Dutch the
Prelock, Patricia A.; Deppe, Janet
The purpose of this article is to explain the role of speech-language pathology in early intervention. The expected credentials of professionals in the field are described, and the current numbers of practitioners serving young children are identified. Several resource documents available from the American Speech-Language Hearing Association are…
Full Text Available Hidden Markov Model (HMM)-based synthesis in combination with speaker adaptation has proven to be an approach that is well-suited for child speech synthesis. This paper describes the development and evaluation of different HMM-based child speech...
Bender, Brenda K.; Cannito, Michael P.; Murry, Thomas; Woodson, Gayle E.
This study compared speech intelligibility in nondisabled speakers and speakers with adductor spasmodic dysphonia (ADSD) before and after botulinum toxin (Botox) injection. Standard speech samples were obtained from 10 speakers diagnosed with severe ADSD prior to and 1 month following Botox injection, as well as from 10 age- and gender-matched…
Marcus, Laurence R.
This book explores issues typified by a series of hateful speech events at Kean College (New Jersey) and on other U.S. campuses in the early 1990s, by examining the dichotomies that exist between the First and the Fourteenth Amendments and between civil liberties and civil rights, and by contrasting the values of free speech and academic freedom…
New York State Education Dept., Albany. Office of the Professions.
The handbook contains State Education Department rules and regulations that govern speech-language pathology and audiology in New York State. The handbook also describes licensure and first registration as a licensed speech-language pathologist or audiologist. The introduction discusses professional regulation in New York State while the second…
Namasivayam, Aravind K.; Pukonen, Margit; Goshulak, Debra; Hard, Jennifer; Rudzicz, Frank; Rietveld, Toni; Maassen, Ben; Kroll, Robert; van Lieshout, Pascal
BackgroundIntensive treatment has been repeatedly recommended for the treatment of speech deficits in childhood apraxia of speech (CAS). However, differences in treatment outcomes as a function of treatment intensity have not been systematically studied in this population. AimTo investigate the
The primary objective of this position paper is to assess the theoretical and empirical support that exists for the Mayo Clinic view of motor speech disorders in general, and for oromotor, nonverbal tasks as a window to speech production processes in particular. Literature both in support of and against the Mayo clinic view and the associated use…
Theune, Mariet; Walker, M.; Rambow, O.
In concept-to-speech systems, spoken output is generated on the basis of a text that has been produced by the system itself. In such systems, linguistic information from the text generation component may be exploited to achieve a higher prosodic quality of the speech output than can be obtained in a
Jones, Robert E.; Segallis, Greg P.; Boyd, Robert
Coder/decoder constructed on single integrated-circuit chip. Handles data in variety of formats at rates up to 300 Mbps, correcting up to 3 errors per data block of 256 to 512 bits. Helps reduce cost of transmitting data. Useful in many high-data-rate, bandwidth-limited communication systems such as; personal communication networks, cellular telephone networks, satellite communication systems, high-speed computing networks, broadcasting, and high-reliability data-communication links.
Lakhdar Moulay Abdelmounaim
Full Text Available Vector Quantisation (VQ is an efficient coding algorithm that has been widely used in the field of video and image coding, due to its fast decoding efficiency. However, the indexes of VQ are sometimes lost because of signal interference during the transmission. In this paper, we propose an efficient estimation method to conceal and recover the lost indexes on the decoder side, to avoid re-transmitting the whole image again. If the image or video has the limitation of a period of validity, re-transmitting the data wastes the resources of time and network bandwidth. Therefore, using the originally received correct data to estimate and recover the lost data is efficient in time-constrained situations, such as network conferencing or mobile transmissions. In nature images, the pixels are correlated with their neighbours and VQ partitions the image into sub-blocks and quantises them to the indexes that are transmitted; the correlation between adjacent indexes is very strong. There are two parts of the proposed method. The first is pre-processing and the second is an estimation process. In pre-processing, we modify the order of codevectors in the VQ codebook to increase the correlation among the neighbouring vectors. We then use a special filtering method in the estimation process. Using conventional VQ to compress the Lena image and transmit it without any loss of index can achieve a PSNR of 30.429 dB on the decoder. The simulation results demonstrate that our method can estimate the indexes to achieve PSNR values of 29.084 and 28.327 dB when the loss rate is 0.5% and 1%, respectively.
of the effectiveness of their application in audio processing. The second part of the thesis deals with introducing sparsity directly in the linear prediction analysis-by-synthesis (LPAS) speech coding paradigm. We first propose a novel near-optimal method to look for a sparse approximate excitation using a compressed...... one with direct applications to coding but also consistent with the speech production model of voiced speech, where the excitation of the all-pole filter can be modeled as an impulse train, i.e., a sparse sequence. Introducing sparsity in the LP framework will also bring to de- velop the concept...... sensing formulation. Furthermore, we define a novel re-estimation procedure to adapt the predictor coefficients to the given sparse excitation, balancing the two representations in the context of speech coding. Finally, the advantages of the compact parametric representation of a segment of speech, given...
Jørgensen, Søren; Dau, Torsten
Jørgensen and Dau (J Acoust Soc Am 130:1475-1487, 2011) proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII) in conditions with nonlinearly processed speech...... subjected to phase jitter, a condition in which the spectral structure of the intelligibility of speech signal is strongly affected, while the broadband temporal envelope is kept largely intact. In contrast, the effects of this distortion can be predicted -successfully by the spectro-temporal modulation...... suggest that the SNRenv might reflect a powerful decision metric, while some explicit across-frequency analysis seems crucial in some conditions. How such across-frequency analysis is "realized" in the auditory system remains unresolved....
Atmaja, Bagus Tris; Farid, Mifta Nur; Arifianto, Dhany
Speech enhancement is challenging task in audio signal processing to enhance the quality of targeted speech signal while suppress other noises. In the beginning, the speech enhancement algorithm growth rapidly from spectral subtraction, Wiener filtering, spectral amplitude MMSE estimator to Non-negative Matrix Factorization (NMF). Smartphone as revolutionary device now is being used in all aspect of life including journalism; personally and professionally. Although many smartphones have two microphones (main and rear) the only main microphone is widely used for voice recording. This is why the NMF algorithm widely used for this purpose of speech enhancement. This paper evaluate speech enhancement on smartphone voice recording by using some algorithms mentioned previously. We also extend the NMF algorithm to Kulback-Leibler NMF with supervised separation. The last algorithm shows improved result compared to others by spectrogram and PESQ score evaluation. (paper)
Full Text Available Huntington’s disease (HD has been described as a genetic condition caused by a mutation in the CAG (cytosine-adenine-guanine nucleotide sequence. Depending on the stage of the disease, people may have difficulties in speech, language and swallowing. The purpose of this paper is to describe these difficulties in detail, as well as to provide an account on speech and language therapy approach to this condition. Regarding speech, it is worth noticing that characteristics typical of hyperkinetic dysarthria can be found due to underlying choreic movements. The speech of people with HD tends to show shorter sentences, with much simpler syntactic structures, and difficulties in tasks that require complex cognitive processing. Moreover, swallowing may present dysphagia that progresses as the disease develops. A timely, comprehensive and effective speech-language intervention is essential to improve the quality of life of people and contribute to their communicative welfare.
Christopher W Bishop
Full Text Available Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral 'what' and dorsal 'where' pathways.
) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting...... understanding speech when more than one person is talking, even when reduced audibility has been fully compensated for by a hearing aid. The reasons for these difficulties are not well understood. This presentation highlights recent concepts of the monaural and binaural signal processing strategies employed...... by the normal as well as impaired auditory system. Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII...
Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris
A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production
It is becoming crucial to accurately estimate and monitor speech quality in various ambient environments to guarantee high quality speech communication. This practical hands-on book shows speech intelligibility measurement methods so that the readers can start measuring or estimating speech intelligibility of their own system. The book also introduces subjective and objective speech quality measures, and describes in detail speech intelligibility measurement methods. It introduces a diagnostic rhyme test which uses rhyming word-pairs, and includes: An investigation into the effect of word familiarity on speech intelligibility. Speech intelligibility measurement of localized speech in virtual 3-D acoustic space using the rhyme test. Estimation of speech intelligibility using objective measures, including the ITU standard PESQ measures, and automatic speech recognizers.
Doire, C.S.J.; Brookes, M. D.; Naylor, P. A.
The reverberation of an acoustic channel can be characterised by two frequency-dependent parameters: the reverberation time and the direct-to-reverberant energy ratio. This paper presents an algorithm for blindly determining these parameters from a single-channel speech signal. The algorithm uses...
There is a lack of agreement on the features used to differentiate Childhood Apraxia of Speech (CAS) from Phonological Disorders (PD). One criterion which has gained consensus is lexical inconsistency of speech (ASHA, 2007); however, no accepted measure of this feature has been defined. Although lexical assessment provides information about consistency of an item across repeated trials, it may not capture the magnitude of inconsistency within an item. In contrast, segmental analysis provides more extensive information about consistency of phoneme usage across multiple contexts and word-positions. The current research compared segmental and lexical inconsistency metrics in preschool-aged children with PD, CAS, and typical development (TD) to determine how inconsistency varies with age in typical and disordered speakers, and whether CAS and PD were differentiated equally well by both assessment levels. Whereas lexical and segmental analyses may be influenced by listener characteristics or speaker intelligibility, the acoustic signal is less vulnerable to these factors. In addition, the acoustic signal may reveal information which is not evident in the perceptual signal. A second focus of the current research was motivated by Blumstein et al.'s (1980) classic study on voice onset time (VOT) in adults with acquired apraxia of speech (AOS) which demonstrated a motor impairment underlying AOS. In the current study, VOT analyses were conducted to determine the relationship between age and group with the voicing distribution for bilabial and alveolar plosives. Findings revealed that 3-year-olds evidenced significantly higher inconsistency than 5-year-olds; segmental inconsistency approached 0% in 5-year-olds with TD, whereas it persisted in children with PD and CAS suggesting that for child in this age-range, inconsistency is a feature of speech disorder rather than typical development (Holm et al., 2007). Likewise, whereas segmental and lexical inconsistency were
Full Text Available In the past few years, several studies have been directed to understanding the complexity of functional interactions between different brain regions during various human behaviors. Among these, neuroimaging research installed the notion that speech and language require an orchestration of brain regions for comprehension, planning, and integration of a heard sound with a spoken word. However, these studies have been largely limited to mapping the neural correlates of separate speech elements and examining distinct cortical or subcortical circuits involved in different aspects of speech control. As a result, the complexity of the brain network machinery controlling speech and language remained largely unknown. Using graph theoretical analysis of functional MRI (fMRI data in healthy subjects, we quantified the large-scale speech network topology by constructing functional brain networks of increasing hierarchy from the resting state to motor output of meaningless syllables to complex production of real-life speech as well as compared to non-speech-related sequential finger tapping and pure tone discrimination networks. We identified a segregated network of highly connected local neural communities (hubs in the primary sensorimotor and parietal regions, which formed a commonly shared core hub network across the examined conditions, with the left area 4p playing an important role in speech network organization. These sensorimotor core hubs exhibited features of flexible hubs based on their participation in several functional domains across different networks and ability to adaptively switch long-range functional connectivity depending on task content, resulting in a distinct community structure of each examined network. Specifically, compared to other tasks, speech production was characterized by the formation of six distinct neural communities with specialized recruitment of the prefrontal cortex, insula, putamen, and thalamus, which collectively
Full Text Available Automatic detection of voice pathologies enables non-invasive, low cost and objective assessments of the presence of disorders, as well as accelerating and improving the process of diagnosis and clinical treatment given to patients. In this work, a vector made up of 28 acoustic parameters is evaluated using principal component analysis (PCA, kernel principal component analysis (kPCA and an auto-associative neural network (NLPCA in four kinds of pathology detection (hyperfunctional dysphonia, functional dysphonia, laryngitis, vocal cord paralysis using the a, i and u vowels, spoken at a high, low and normal pitch. The results indicate that the kPCA and NLPCA methods can be considered a step towards pathology detection of the vocal folds. The results show that such an approach provides acceptable results for this purpose, with the best efficiency levels of around 100%. The study brings the most commonly used approaches to speech signal processing together and leads to a comparison of the machine learning methods determining the health status of the patient
Full Text Available In order to develop a novel voice sensor to detect human voices, the use of features which are more robust to noise is an important issue. Voice sensor is also called voice activity detection (VAD. Due to that the inherent nature of the formant structure only occurred on the speech spectrogram (well-known as voiceprint, Wu et al. were the first to use band-spectral entropy (BSE to describe the characteristics of voiceprints. However, the performance of VAD based on BSE feature was degraded in colored noise (or voiceprint-like noise environments. In order to solve this problem, we propose the two-dimensional part-band energy entropy (TD-PBEE parameter based on two variables: part-band partition number upon frequency index and long-term window size upon time index to further improve the BSE-based VAD algorithm. The two variables can efficiently represent the characteristics of voiceprints on each critical frequency band and use long-term information for noisy speech spectrograms, respectively. The TD-PBEE parameter can be regarded as a PBEE parameter over time. First, the strength of voiceprints can be partly enhanced by using four entropies applied to four part-bands. We can use the four part-band energy entropies for describing the voiceprints in detail. Due to the characteristics of non-stationary for speech and various noises, we will then use long-term information processing to refine the PBEE, so the voice-like noise can be distinguished from noisy speech through the concept of PBEE with long-term information. Our experiments show that the proposed feature extraction with the TD-PBEE parameter is quite insensitive to background noise. The proposed TD-PBEE-based VAD algorithm is evaluated for four types of noises and five signal-to-noise ratio (SNR levels. We find that the accuracy of the proposed TD-PBEE-based VAD algorithm averaged over all noises and all SNR levels is better than that of other considered VAD algorithms.
Davidow, Jason H.; Bothe, Anne K.; Richardson, Jessica D.; Andreatta, Richard D.
Purpose: This study introduces a series of systematic investigations intended to clarify the parameters of the fluency-inducing conditions (FICs) in stuttering. Method: Participants included 11 adults, aged 20-63 years, with typical speech-production skills. A repeated measures design was used to examine the relationships between several speech…
Christensen, Mads Græsbøll
Speech enhancement and separation algorithms sometimes employ a two-stage processing scheme, wherein the signal is first mapped to an intermediate low-dimensional parametric description after which the parameters are mapped to vectors in codebooks trained on, for exam- ple, individual noise...
Van Lancker Sidtis, Diana; Rogers, Tiffany; Godier, Violette; Tagliati, Michele; Sidtis, John J.
Purpose: Speaking, which naturally occurs in different modes or "tasks" such as conversation and repetition, relies on intact basal ganglia nuclei. Recent studies suggest that voice and fluency parameters are differentially affected by speech task. In this study, the authors examine the effects of subcortical functionality on voice and fluency,…
Olsson, Rasmus Kongsgaard; Hansen, Lars Kai
We discuss an identification framework for noisy speech mixtures. A block-based generative model is formulated that explicitly incorporates the time-varying harmonic plus noise (H+N) model for a number of latent sources observed through noisy convolutive mixtures. All parameters including...
Visscher, Chris; Houwen, Suzanne; Moolenaar, Ben; Lyons, Jim; Scherder, Erik J. A.; Hartman, Esther
Aim: This study compared the gross motor skills of school-age children (mean age 7y 8mo, range 6-9y) with developmental speech and language disorders (DSLDs; n = 105; 76 males, 29 females) and typically developing children (n = 105; 76 males, 29 females). The relationship between the performance parameters and the children's age was investigated…
Farshid Tayari Ashtiani
Full Text Available The present study was an attempt to investigate the impact of English verbal songs on connected speech aspects of adult English learners’ speech production. 40 participants were selected based on the results of their performance in a piloted and validated version of NELSON test given to 60 intermediate English learners in a language institute in Tehran. Then they were equally distributed in two control and experimental groups and received a validated pretest of reading aloud and speaking in English. Afterward, the treatment was performed in 18 sessions by singing preselected songs culled based on some criteria such as popularity, familiarity, amount, and speed of speech delivery, etc. In the end, the posttests of reading aloud and speaking in English were administered. The results revealed that the treatment had statistically positive effects on the connected speech aspects of English learners’ speech production at statistical .05 level of significance. Meanwhile, the results represented that there was not any significant difference between the experimental group’s mean scores on the posttests of reading aloud and speaking. It was thus concluded that providing the EFL learners with English verbal songs could positively affect connected speech aspects of both modes of speech production, reading aloud and speaking. The Findings of this study have pedagogical implications for language teachers to be more aware and knowledgeable of the benefits of verbal songs to promote speech production of language learners in terms of naturalness and fluency. Keywords: English Verbal Songs, Connected Speech, Speech Production, Reading Aloud, Speaking
Zheng, Yingjun; Wu, Chao; Li, Juanhua; Li, Ruikeng; Peng, Hongjun; She, Shenglin; Ning, Yuping; Li, Liang
Speech recognition under noisy "cocktail-party" environments involves multiple perceptual/cognitive processes, including target detection, selective attention, irrelevant signal inhibition, sensory/working memory, and speech production. Compared to health listeners, people with schizophrenia are more vulnerable to masking stimuli and perform worse in speech recognition under speech-on-speech masking conditions. Although the schizophrenia-related speech-recognition impairment under "cocktail-party" conditions is associated with deficits of various perceptual/cognitive processes, it is crucial to know whether the brain substrates critically underlying speech detection against informational speech masking are impaired in people with schizophrenia. Using functional magnetic resonance imaging (fMRI), this study investigated differences between people with schizophrenia (n = 19, mean age = 33 ± 10 years) and their matched healthy controls (n = 15, mean age = 30 ± 9 years) in intra-network functional connectivity (FC) specifically associated with target-speech detection under speech-on-speech-masking conditions. The target-speech detection performance under the speech-on-speech-masking condition in participants with schizophrenia was significantly worse than that in matched healthy participants (healthy controls). Moreover, in healthy controls, but not participants with schizophrenia, the strength of intra-network FC within the bilateral caudate was positively correlated with the speech-detection performance under the speech-masking conditions. Compared to controls, patients showed altered spatial activity pattern and decreased intra-network FC in the caudate. In people with schizophrenia, the declined speech-detection performance under speech-on-speech masking conditions is associated with reduced intra-caudate functional connectivity, which normally contributes to detecting target speech against speech masking via its functions of suppressing masking-speech signals.
This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.
Al-Talabani, Abdulbasit; Sellahewa, Harin; Jassim, Sabah A.
Human emotion recognition from speech is studied frequently for its importance in many applications, e.g. human-computer interaction. There is a wide diversity and non-agreement about the basic emotion or emotion-related states on one hand and about where the emotion related information lies in the speech signal on the other side. These diversities motivate our investigations into extracting Meta-features using the PCA approach, or using a non-adaptive random projection RP, which significantly reduce the large dimensional speech feature vectors that may contain a wide range of emotion related information. Subsets of Meta-features are fused to increase the performance of the recognition model that adopts the score-based LDC classifier. We shall demonstrate that our scheme outperform the state of the art results when tested on non-prompted databases or acted databases (i.e. when subjects act specific emotions while uttering a sentence). However, the huge gap between accuracy rates achieved on the different types of datasets of speech raises questions about the way emotions modulate the speech. In particular we shall argue that emotion recognition from speech should not be dealt with as a classification problem. We shall demonstrate the presence of a spectrum of different emotions in the same speech portion especially in the non-prompted data sets, which tends to be more "natural" than the acted datasets where the subjects attempt to suppress all but one emotion.
Kühn, Simone; Brass, Marcel; Gallinat, Jürgen
The so-called embodiment of communication has attracted considerable interest. Recently a growing number of studies have proposed a link between Broca's area's involvement in action processing and its involvement in speech. The present quantitative meta-analysis set out to test whether neuroimaging studies on imitation and overt speech show overlap within inferior frontal gyrus. By means of activation likelihood estimation (ALE), we investigated concurrence of brain regions activated by object-free hand imitation studies as well as overt speech studies including simple syllable and more complex word production. We found direct overlap between imitation and speech in bilateral pars opercularis (BA 44) within Broca's area. Subtraction analyses revealed no unique localization neither for speech nor for imitation. To verify the potential of ALE subtraction analysis to detect unique involvement within Broca's area, we contrasted the results of a meta-analysis on motor inhibition and imitation and found separable regions involved for imitation. This is the first meta-analysis to compare the neural correlates of imitation and overt speech. The results are in line with the proposed evolutionary roots of speech in imitation.
Ding, Nai; Patel, Aniruddh D; Chen, Lin; Butler, Henry; Luo, Cheng; Poeppel, David
Speech and music have structured rhythms. Here we discuss a major acoustic correlate of spoken and musical rhythms, the slow (0.25-32Hz) temporal modulations in sound intensity and compare the modulation properties of speech and music. We analyze these modulations using over 25h of speech and over 39h of recordings of Western music. We show that the speech modulation spectrum is highly consistent across 9 languages (including languages with typologically different rhythmic characteristics). A different, but similarly consistent modulation spectrum is observed for music, including classical music played by single instruments of different types, symphonic, jazz, and rock. The temporal modulations of speech and music show broad but well-separated peaks around 5 and 2Hz, respectively. These acoustically dominant time scales may be intrinsic features of speech and music, a possibility which should be investigated using more culturally diverse samples in each domain. Distinct modulation timescales for speech and music could facilitate their perceptual analysis and its neural processing. Copyright © 2017 Elsevier Ltd. All rights reserved.
Hardison, Debra M.
The majority of studies in second-language (L2) speech processing have involved unimodal (i.e., auditory) input; however, in many instances, speech communication involves both visual and auditory sources of information. Some researchers have argued that multimodal speech is the primary mode of speech perception (e.g., Rosenblum 2005). Research on…
Pilling, Michael; Thomas, Sharon
Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…
Morrill, Ryan J.; Paukner, Annika; Ferrari, Pier F.; Ghazanfar, Asif A.
Across all languages studied to date, audiovisual speech exhibits a consistent rhythmic structure. This rhythm is critical to speech perception. Some have suggested that the speech rhythm evolved "de novo" in humans. An alternative account--the one we explored here--is that the rhythm of speech evolved through the modification of rhythmic facial…
In order to obtain articulatory analysis of speech production the model is improved. the standard model, as used in LPC analysis, to a large extent only models the acoustic properties of speech signal as opposed to articulatory modelling of the speech production. In spite of this the LPC model...... is by far the most widely used model in speech technology....
Newbold, Elisabeth Joy; Stackhouse, Joy; Wells, Bill
Standardised tests of whole-word accuracy are popular in the speech pathology and developmental psychology literature as measures of children's speech performance. However, they may not be sensitive enough to measure changes in speech output in children with severe and persisting speech difficulties (SPSD). To identify the best ways of doing this,…
This paper carries on a tentative interpersonal metafunction analysis of Barack Obama's victory speech from the interpersonal metafunction, which aims to help readers understand and evaluate the speech regarding its suitability, thus to provide some guidance for readers to make better speeches. This study has promising implications for speeches as…
Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.
Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…
Huijbregts, M.A.H.; Wooters, Chuck; Ordelman, Roeland J.F.
In this paper we discuss the speech activity detection system that we used for detecting speech regions in the Dutch TRECVID video collection. The system is designed to filter non-speech like music or sound effects out of the signal without the use of predefined non-speech models. Because the system
Peach, Richard K.; Tonkovich, John D.
Reports describing subcortical apraxia of speech (AOS) have received little consideration in the development of recent speech processing models because the speech characteristics of patients with this diagnosis have not been described precisely. We describe a case of AOS with aphasia secondary to basal ganglia hemorrhage. Speech-language symptoms…
Sheffert, Sonya M; Olson, Elizabeth
In this research, we investigated the effects of voice and face information on the perceptual learning of talkers and on long-term memory for spoken words. In the first phase, listeners were trained over several days to identify voices from words presented auditorily or audiovisually. The training data showed that visual information about speakers enhanced voice learning, revealing cross-modal connections in talker processing akin to those observed in speech processing. In the second phase, the listeners completed an auditory or audiovisual word recognition memory test in which equal numbers of words were spoken by familiar and unfamiliar talkers. The data showed that words presented by familiar talkers were more likely to be retrieved from episodic memory, regardless of modality. Together, these findings provide new information about the representational code underlying familiar talker recognition and the role of stimulus familiarity in episodic word recognition.
Radel, Rémi; Sarrazin, Philippe; Jehu, Marie; Pelletier, Luc
This study examines whether motivation can be primed through unattended speech. Study 1 used a dichotic-listening paradigm and repeated strength measures. In comparison to the baseline condition, in which the unattended channel was only composed by neutral words, the presence of words related to high (low) intensity of motivation led participants to exert more (less) strength when squeezing a hand dynamometer. In a second study, a barely audible conversation was played while participants' attention was mobilized on a demanding task. Participants who were exposed to a conversation depicting intrinsic motivation performed better and persevered longer in a subsequent word-fragment completion task than those exposed to the same conversation made unintelligible. These findings suggest that motivation can be primed without attention. © 2013 The British Psychological Society.
The present study has surveyed post-editor trainees’ views and attitudes before and after the introduction of speech technology as a front end to a computer-aided translation workbench. The aim of the survey was (i) to identify attitudes and perceptions among post-editor trainees before performing...... a post-editing task using automatic speech recognition (ASR); and (ii) to assess the degree to which post-editors’ attitudes and expectations to the use of speech technology changed after actually using it. The survey was based on two questionnaires: the first one administered before the participants...
Children with HIV have critical speech and language issues because the virus manifests itself primarily in the developing central nervous system, sometimes causing speech, motor control, and language disabilities. Language impediments that develop during the second year of life seem to be especially severe. HIV-infected children are also susceptible to recurrent ear infections, which can damage hearing. Developmental issues must be addressed for these children to reach their full potential. A decline in language skills may coincide with or precede other losses in cognitive ability. A speech pathologist can play an important role on a pediatric HIV team. References are included.
This work combines interdisciplinary knowledge and experience from research fields of psychology, linguistics, audio-processing, machine learning, and computer science. The work systematically explores a novel research topic devoted to automated modeling of personality expression from speech. For this aim, it introduces a novel personality assessment questionnaire and presents the results of extensive labeling sessions to annotate the speech data with personality assessments. It provides estimates of the Big 5 personality traits, i.e. openness, conscientiousness, extroversion, agreeableness, and neuroticism. Based on a database built on the questionnaire, the book presents models to tell apart different personality types or classes from speech automatically.
Murphy, Timothy; Olsen, Kristina; Andersen, Christopher; Reichhardt, Line
The first objective of the project is to show how freedom of speech and democracy are dependent on one another in Denmark. The project’s next focal point is to look at how freedom of speech was framed in relation to the Mohammed publications in 2005. To do this, it identifies how freedom of speech was used by many Danish and European newspapers to justify the publications. Arguments against the publications by both the Danish media and the Muslim community (within Denmark and abroad) are also...
Martins Santo, Ana Luisa
This document compares out-of-box performance of three commercially available speech recognition software: Vocapia VoxSigma TM , Google Cloud Speech, and Lime- craft Transcriber. It is defined a set of evaluation criteria and test methods for speech recognition softwares. The evaluation of these softwares in noisy environments are also included for the testing purposes. Recognition accuracy was compared using noisy environments and languages. Testing in ”ideal” non-noisy environment of a quiet room has been also performed for comparison.
Daniella de Almeida Santos
Full Text Available Our paper intends to discuss how the teacher's speech can interfere in the construction of arguments on the part of the students, when they are involved with the task of solving an experimental problem in sciences classes. Thus, we wanted to understand how teacher and students relate to each other in a discursive movement for the senses structuring of the obtained experimental data. With that concern, our focus is in the processes of the speeches authorship, both students' and teachers', in the episodes in which the actors of the teaching and learning process organize their speeches, mediated by the experimental activity.
Zakaria, Mohd Normani; Jalaei, Bahram
Auditory brainstem responses evoked by complex stimuli such as speech syllables have been studied in normal subjects and subjects with compromised auditory functions. The stability of speech-evoked auditory brainstem response (speech-ABR) when tested over time has been reported but the literature is limited. The present study was carried out to determine the test-retest reliability of speech-ABR in healthy children at a low sensation level. Seventeen healthy children (6 boys, 11 girls) aged from 5 to 9 years (mean = 6.8 ± 3.3 years) were tested in two sessions separated by a 3-month period. The stimulus used was a 40-ms syllable /da/ presented at 30 dB sensation level. As revealed by pair t-test and intra-class correlation (ICC) analyses, peak latencies, peak amplitudes and composite onset measures of speech-ABR were found to be highly replicable. Compared to other parameters, higher ICC values were noted for peak latencies of speech-ABR. The present study was the first to report the test-retest reliability of speech-ABR recorded at low stimulation levels in healthy children. Due to its good stability, it can be used as an objective indicator for assessing the effectiveness of auditory rehabilitation in hearing-impaired children in future studies. Copyright © 2017 Elsevier B.V. All rights reserved.
Eskelund, Kasper; Andersen, Tobias
Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate...... of audiovisual integration specific to speech perception. However, the results of Tuomainen et al. might have been influenced by another effect. When observers were naïve, they had little motivation to look at the face. When informed, they knew that the face was relevant for the task and this could increase...... visual detection task. In our first experiment, observers presented with congruent and incongruent audiovisual sine-wave speech stimuli did only show a McGurk effect when informed of the speech nature of the stimulus. Performance on the secondary visual task was very good, thus supporting the finding...
May, Tobias; Bentsen, Thomas; Dau, Torsten
speech and noise activity on the basis of individual time-frequency (T-F) units. One important parameter of the segregation system is the window duration of the analysis-synthesis stage, which determines the lower limit of modulation frequencies that can be represented but also the temporal acuity...... with which the segregation system can manipulate individual T-F units. To clarify the consequences of this trade-off on modulation-based speech segregation performance, the influence of the window duration was systematically investigated...
This book provides a detailed overview of various parameters/factors involved in inventory analysis. It especially focuses on the assessment and modeling of basic inventory parameters, namely demand, procurement cost, cycle time, ordering cost, inventory carrying cost, inventory stock, stock out level, and stock out cost. In the context of economic lot size, it provides equations related to the optimum values. It also discusses why the optimum lot size and optimum total relevant cost are considered to be key decision variables, and uses numerous examples to explain each of these inventory parameters separately. Lastly, it provides detailed information on parameter estimation for different sectors/products. Written in a simple and lucid style, it offers a valuable resource for a broad readership, especially Master of Business Administration (MBA) students.
In order to obtain articulatory analysis of speech production the model is improved. the standard model, as used in LPC analysis, to a large extent only models the acoustic properties of speech signal as opposed to articulatory modelling of the speech production. In spite of this the LPC model...... is by far the most widely used model in speech technology....
Hurkmans, Joost; Jonkers, Roel; de Bruijn, Madeleen; Boonstra, Anne M.; Hartman, Paul P.; Arendzen, Hans; Reinders - Messelink, Heelen
Background: Several studies using musical elements in the treatment of neurological language and speech disorders have reported improvement of speech production. One such programme, Speech-Music Therapy for Aphasia (SMTA), integrates speech therapy and music therapy (MT) to treat the individual with
Cowan, Gloria; Khatchadourian, Desiree
Women are more intolerant of hate speech than men. This study examined relationality measures as mediators of gender differences in the perception of the harm of hate speech and the importance of freedom of speech. Participants were 107 male and 123 female college students. Questionnaires assessed the perceived harm of hate speech, the importance…
Anastasia E. Vishneva
The paper presents the results of the self-assessment study in patients with aphasia and dysarthria after stroke or traumatic brain injury. All the patients were neurorehabilitation course. Self-esteem is considered as important parameter in the study of the psychological status of patients. This article describes the differences in the quantitative and qualitative parameters of self-esteem in patients with various speech defect (motor aphasia, temporal aphasia, dysarthria). The actual self-e...
This dissertation attempts to find the common traits of great speeches. It does so by closely examining the language of some of the most well-known speeches in world. These speeches are presented in the book Speeches that Changed the World (2006) by Simon Sebag Montefiore. The dissertation specifically looks at four variables: The beginnings and endings of the speeches, the use of passive voice, the use of personal pronouns and the difficulty of the language. These four variables are based on...
Mehta, G.; Cutler, A.
Although spontaneous speech occurs more frequently in most listeners’ experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ considerably, however, which suggests that laboratory results may not generalize to the recognition of spontaneous and read speech materials, and their response time to detect word-initial target phonem...
Malmenholt, Ann; Lohmander, Anette; McAllister, Anita
The purpose of this study was to investigate current knowledge of the diagnosis childhood apraxia of speech (CAS) in Sweden and compare speech characteristics and symptoms to those of earlier survey findings in mainly English-speakers. In a web-based questionnaire 178 Swedish speech-language pathologists (SLPs) anonymously answered questions about their perception of typical speech characteristics for CAS. They graded own assessment skills and estimated clinical occurrence. The seven top speech characteristics reported as typical for children with CAS were: inconsistent speech production (85%), sequencing difficulties (71%), oro-motor deficits (63%), vowel errors (62%), voicing errors (61%), consonant cluster deletions (54%), and prosodic disturbance (53%). Motor-programming deficits described as lack of automatization of speech movements were perceived by 82%. All listed characteristics were consistent with the American Speech-Language-Hearing Association (ASHA) consensus-based features, Strand's 10-point checklist, and the diagnostic model proposed by Ozanne. The mode for clinical occurrence was 5%. Number of suspected cases of CAS in the clinical caseload was approximately one new patient/year and SLP. The results support and add to findings from studies of CAS in English-speaking children with similar speech characteristics regarded as typical. Possibly, these findings could contribute to cross-linguistic consensus on CAS characteristics.
Sayyahi, Fateme; Soleymani, Zahra; Akbari, Mohammad; Bijankhan, Mahmood; Dolatshahi, Behrooz
The present study examined the relationship between gap detection threshold and speech error consistency in children with speech sound disorder. The participants were children five to six years of age who were categorized into three groups of typical speech, consistent speech disorder (CSD) and inconsistent speech disorder (ISD).The phonetic gap detection threshold test was used for this study, which is a valid test comprised six syllables with inter-stimulus intervals between 20-300ms. The participants were asked to listen to the recorded stimuli three times and indicate whether they heard one or two sounds. There was no significant difference between the typical and CSD groups (p=0.55), but there were significant differences in performance between the ISD and CSD groups and the ISD and typical groups (p=0.00). The ISD group discriminated between speech sounds at a higher threshold. Children with inconsistent speech errors could not distinguish speech sounds during time-limited phonetic discrimination. It is suggested that inconsistency in speech is a representation of inconsistency in auditory perception, which causes by high gap detection threshold. Copyright © 2016 Elsevier Ltd. All rights reserved.
Hearnshaw, Stephanie; Baker, Elise; Munro, Natalie
To investigate whether Australian-English speaking children with and without speech sound disorder (SSD) differ in their overall speech perception accuracy. Additionally, to investigate differences in the perception of specific phonemes and the association between speech perception and speech production skills. Twenty-five Australian-English speaking children aged 48-60 months participated in this study. The SSD group included 12 children and the typically developing (TD) group included 13 children. Children completed routine speech and language assessments in addition to an experimental Australian-English lexical and phonetic judgement task based on Rvachew's Speech Assessment and Interactive Learning System (SAILS) program (Rvachew, 2009). This task included eight words across four word-initial phonemes-/k, ɹ, ʃ, s/. Children with SSD showed significantly poorer perceptual accuracy on the lexical and phonetic judgement task compared with TD peers. The phonemes /ɹ/ and /s/ were most frequently perceived in error across both groups. Additionally, the phoneme /ɹ/ was most commonly produced in error. There was also a positive correlation between overall speech perception and speech production scores. Children with SSD perceived speech less accurately than their typically developing peers. The findings suggest that an Australian-English variation of a lexical and phonetic judgement task similar to the SAILS program is promising and worthy of a larger scale study. Copyright © 2017 Elsevier Inc. All rights reserved.
Kumar, G Vinodh; Halder, Tamesh; Jaiswal, Amit K; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan
Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300-600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus, our
Parsa, Vijay; Scollie, Susan; Glista, Danielle; Seelisch, Andreas
Frequency lowering technologies offer an alternative amplification solution for severe to profound high frequency hearing losses. While frequency lowering technologies may improve audibility of high frequency sounds, the very nature of this processing can affect the perceived sound quality. This article reports the results from two studies that investigated the impact of a nonlinear frequency compression (NFC) algorithm on perceived sound quality. In the first study, the cutoff frequency and compression ratio parameters of the NFC algorithm were varied, and their effect on the speech quality was measured subjectively with 12 normal hearing adults, 12 normal hearing children, 13 hearing impaired adults, and 9 hearing impaired children. In the second study, 12 normal hearing and 8 hearing impaired adult listeners rated the quality of speech in quiet, speech in noise, and music after processing with a different set of NFC parameters. Results showed that the cutoff frequency parameter had more impact on sound quality ratings than the compression ratio, and that the hearing impaired adults were more tolerant to increased frequency compression than normal hearing adults. No statistically significant differences were found in the sound quality ratings of speech-in-noise and music stimuli processed through various NFC settings by hearing impaired listeners. These findings suggest that there may be an acceptable range of NFC settings for hearing impaired individuals where sound quality is not adversely affected. These results may assist an Audiologist in clinical NFC hearing aid fittings for achieving a balance between high frequency audibility and sound quality.
"Ultra Low Bit-Rate Speech Coding" focuses on the specialized topic of speech coding at very low bit-rates of 1 Kbits/sec and less, particularly at the lower ends of this range, down to 100 bps. The authors set forth the fundamental results and trends that form the basis for such ultra low bit-rates to be viable and provide a comprehensive overview of various techniques and systems in literature to date, with particular attention to their work in the paradigm of unit-selection based segment quantization. The book is for research students, academic faculty and researchers, and industry practitioners in the areas of speech processing and speech coding.
Liu, Fu-Hua; Stern, Richard M; Huang, Xuedong; Acero, Alejandro
In this paper we describe and compare the performance of a series of cepstrum-based procedures that enable the CMU SPHINX-II speech recognition system to maintain a high level of recognition accuracy...
domain or in the frequency domain. However their .... computer to speech analysis led to important elaborations ... tool for the estimation of formant trajectory (10), ... prediction Linear prediction In effect determines the filter .... Radio Res. Lab.
...". The underlying thesis is that the auditory periphery contributes to the robust performance of humans in speech reception in noise through a concerted contribution of the efferent feedback system...
Hall, M.; Smeele, P.M.T.; Kuhl, P.K.
The integration of auditory and visual speech is observed when modes specify different places of articulation. Influences of auditory variation on integration were examined using consonant identifi-cation, plus quality and similarity ratings. Auditory identification predicted auditory-visual
The supported research provides a careful examination of the many different interrelated factors, processes, and constructs important to the perception by humans of complex acoustic signals, including speech and music...
Full Text Available This study describes the methodology used for designing a database of speech under real stress. Based on limits of existing stress databases, we used a communication task via a computer game to collect speech data. To validate the presence of stress, known psychophysiological indicators such as heart rate and electrodermal activity, as well as subjective self-assessment were used. This paper presents the data from first 5 speakers (3 men, 2 women who participated in initial tests of the proposed design. In 4 out of 5 speakers increases in fundamental frequency and intensity of speech were registered. Similarly, in 4 out of 5 speakers heart rate was significantly increased during the task, when compared with reference measurement from before the task. These first results show that proposed design might be appropriate for building a speech under stress database. However, there are still considerations that need to be addressed.
Children vary in their development of speech and language skills. Health care professionals have lists of milestones ... normal. These milestones help figure out whether a child is on track or if he or she ...
, in the light of the rich and complex Danish sound system. The first two studies report on native adults’ perception of Danish speech sounds in quiet and noise. The third study examined the development of language-specific perception in native Danish infants at 6, 9 and 12 months of age. The book points......Little is known about the perception of speech sounds by native Danish listeners. However, the Danish sound system differs in several interesting ways from the sound systems of other languages. For instance, Danish is characterized, among other features, by a rich vowel inventory and by different...... reductions of speech sounds evident in the pronunciation of the language. This book (originally a PhD thesis) consists of three studies based on the results of two experiments. The experiments were designed to provide knowledge of the perception of Danish speech sounds by Danish adults and infants...
National Aeronautics and Space Administration — The Pitch Synchronous Segmentation (PSS) that accelerates speech without changing its fundamental frequency method could be applied and evaluated for use at NASA....
Nussbaum, Debra B.; Waddy-Smith, Bettie
The article suggests three strategies for the audiologist and speech/communication specialist to use in assisting the preschool teacher to implement student's individualized education program: (1) demonstration teaming, (2) dual teaming; and (3) rotation teaming. (CL)
Silvia F. Hitos
Conclusion: Mouth breathing can affect speech development, socialization, and school performance. Early detection of mouth breathing is essential to prevent and minimize its negative effects on the overall development of individuals.
(i) Feature extractor that extracts the relevant information from the speech ..... extraction we are able to approximately account for this variability across listeners. ..... scores (similar to Zissman (1996)) before making a decision about the word.
Mowlaee, Pejman; Christensen, Mads Græsbøll; Jensen, Søren Holdt
We present new results on single-channel speech separation and suggest a new separation approach to improve the speech quality of separated signals from an observed mix- ture. The key idea is to derive a mixture estimator based on sinusoidal parameters. The proposed estimator is aimed at ﬁnding...... mixture estimator used in binary masks and the Wiener ﬁltering approach, it is observed that the proposed method achieves an acceptable perceptual speech quality with less cross- talk at different signal-to-signal ratios. Moreover, the method is independent of pitch estimates and reduces the computational...... complexity of the separation by replacing the short-time Fourier transform (STFT) feature vectors of high dimensionality with sinusoidal feature vectors. We report separation results for the proposed method and compare them with respect to other benchmark methods. The improvements made by applying...
toward the monolingual English 25 msec value. Miyawaki et a]. (1975) investigated the /ra/ - /la/ continuum with English and Japanese speakers...Standard Dictionary In order to evaluate some of the claims of the learning theory of speech recognition, a computer model was developed. The NEXus...discrimination of synthetic vowels. Language and Speech, 1962, 5, 171-189. Funk and Wagnalls New Standard Dictionary of the English Language. New York: Funk and
Bailly, G.; Theune, Mariet; Meijs, Koen; Campbell, N.; Hamza, W.; Heylen, Dirk K.J.; Ordelman, Roeland J.F.; Hoge, H.; Jianhua, T.
Work on expressive speech synthesis has long focused on the expression of basic emotions. In recent years, however, interest in other expressive styles has been increasing. The research presented in this paper aims at the generation of a storytelling speaking style, which is suitable for storytelling applications and more in general, for applications aimed at children. Based on an analysis of human storytellers' speech, we designed and implemented a set of prosodic rules for converting "neutr...
Ramzi A. Haraty; Omar El Ariss
The research proposed here was for an Arabic speech recognition application, concentrating on the Lebanese dialect. The system starts by sampling the speech, which was the process of transforming the sound from analog to digital and then extracts the features by using the Mel-Frequency Cepstral Coefficients (MFCC). The extracted features are then compared with the system's stored model; in this case the stored model chosen was a phoneme-based model. This reference model differs from the direc...
Despite many years of research, Speech Recognition remains an active area of research in Artificial Intelligence. Currently, the most common commercial application of this technology on mobile devices uses a wireless client – server approach to meet the computational and memory demands of the speech recognition process. Unfortunately, such an approach is unlikely to remain viable when fully applied over the approximately 7.22 Billion mobile phones currently in circulation. In this thesis we p...
Full Text Available The article deals with the basic speech tactics used in mass media discourse. It has been stated that such tactics as contact establishment and speech interaction termination, yielding up initiative or its preserving are compulsory for the communicative situation of a talk show. Language personalities of television talk shows anchors and linguistic ways of the interview organisation are stressed. The material is amply illustrated with relevant examples.
Freedom of speech is a fundamental liberty that imposes a stringent duty of tolerance. Tolerance is limited by direct incitements to violence. False notions and bad laws on speech have obscured our view of this freedom. Hence, perhaps, the self-righteous intolerance, incitements and threats in response to Giubilini and Minerva. Those who disagree have the right to argue back but their attempts to shut us up are morally wrong.
Cera, Maysa Luchesi; Ortiz, Karin Zazo; Bertolucci, Paulo Henrique Ferreira; Minett, Thaís Soares Cianciarullo
Alzheimer's disease (AD) affects not only memory but also other cognitive functions, such as orientation, language, praxis, attention, visual perception, or executive function. Most studies on oral communication in AD focus on aphasia; however, speech and orofacial apraxias are also present in these patients. The aim of this study was to investigate the presence of speech and orofacial apraxias in patients with AD with the hypothesis that apraxia severity is strongly correlated with disease severity. Ninety participants in different stages of AD (mild, moderate, and severe) underwent the following assessments: Clinical Dementia Rating, Mini-Mental State Examination, Lawton Instrumental Activities of Daily Living, a specific speech and orofacial praxis assessment, and the oral agility subtest of the Boston diagnostic aphasia examination. The mean age was 80.2 ± 7.2 years and 73% were women. Patients with AD had significantly lower scores than normal controls for speech praxis (mean difference=-2.9, 95% confidence interval (CI)=-3.3 to -2.4) and orofacial praxis (mean difference=-4.9, 95% CI=-5.4 to -4.3). Dementia severity was significantly associated with orofacial apraxia severity (moderate AD: β =-19.63, p= 0.011; and severe AD: β =-51.68, p speech apraxia severity (moderate AD: β = 7.07, p = 0.001; and severe AD: β =8.16, p Speech and orofacial apraxias were evident in patients with AD and became more pronounced with disease progression.
Falk, Simone; Lanzilotti, Cosima; Schön, Daniele
Musical rhythm positively impacts on subsequent speech processing. However, the neural mechanisms underlying this phenomenon are so far unclear. We investigated whether carryover effects from a preceding musical cue to a speech stimulus result from a continuation of neural phase entrainment to periodicities that are present in both music and speech. Participants listened and memorized French metrical sentences that contained (quasi-)periodic recurrences of accents and syllables. Speech stimuli were preceded by a rhythmically regular or irregular musical cue. Our results show that the presence of a regular cue modulates neural response as estimated by EEG power spectral density, intertrial coherence, and source analyses at critical frequencies during speech processing compared with the irregular condition. Importantly, intertrial coherences for regular cues were indicative of the participants' success in memorizing the subsequent speech stimuli. These findings underscore the highly adaptive nature of neural phase entrainment across fundamentally different auditory stimuli. They also support current models of neural phase entrainment as a tool of predictive timing and attentional selection across cognitive domains.
Bodnaruk Elena Vladimirovna
Full Text Available The article analyzes the characteristics and use of grammatical semantics and lexical and grammatical means used to create future prospects in double indirect discourse. The material for the study were epic works by contemporary German writers. In the analysis of the empirical material it has been pointed out that indirect discourse has preterial basis and is the kind of most frequent inner speech of characters. The most widely used form with future semantics in preterial indirect speech is conditional I, formally having a conjunctive basis, but is mostly used with the indicative semantics. Competitive to conditional I in indirect speech is preterial indicative. A characteristic feature of the indirect speech is the use of modal verbs, which, thanks to its semantics is usually referred as an action at a later term, creating the prospect of future statements. The most frequent were modal verbs wollen and sollen in the form of the preterite, more rare verbs were m ssen and k nnen. German indirect speech distinguishes the ability to use forms on the basis of conjunctive: preterite and plusquamperfect of conjunctive. Both forms express values similar to those of the indicative. However, conjunctive forms the basis of the data shown in a slightly more pronounced seme of uncertainty that accompanies future uses of these forms in indirect speech. In addition, plusquamperfect conjunctive differs from others by the presence of the seme of completeness.
Jadir Mauro Galvao
Full Text Available The theme of sustainability has not yet achieved the feat of make up as an integral part the theoretical medley that brings out our most everyday actions, often visits some of our thoughts and permeates many of our speeches. The big event of 2012, the meeting gathered Rio +20 glances from all corners of the planet around that theme as burning, but we still see forward timidly. Although we have no very clear what the term sustainability closes it does not sound quite strange. Associate with things like ecology, planet, wastes emitted by smokestacks of factories, deforestation, recycling and global warming must be related, but our goal in this article is the least of clarifying the term conceptually and more try to observe as it appears in speeches of such conference. When the competent authorities talk about sustainability relate to what? We intend to investigate the lines and between the lines of these speeches, any assumptions associated with the term. Therefore we will analyze the speech of the People´s Summit, the opening speech of President Dilma and emblematic speech of the President of Uruguay, José Pepe Mujica.
Baart, Martijn; Stekelenburg, Jeroen J; Vroomen, Jean
Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were modulated by phonetic audiovisual congruency. In order to disentangle speech-specific (phonetic) integration from non-speech integration, we used Sine-Wave Speech (SWS) that was perceived as speech by half of the participants (they were in speech-mode), while the other half was in non-speech mode. Results showed that the N1 obtained with audiovisual stimuli peaked earlier than the N1 evoked by auditory-only stimuli. This lip-read induced speeding up of the N1 occurred for listeners in speech and non-speech mode. In contrast, if listeners were in speech-mode, lip-read speech also modulated the auditory P2, but not if listeners were in non-speech mode, thus revealing speech-specific audiovisual binding. Comparing ERPs for phonetically congruent audiovisual stimuli with ERPs for incongruent stimuli revealed an effect of phonetic stimulus congruency that started at ~200 ms after (in)congruence became apparent. Critically, akin to the P2 suppression, congruency effects were only observed if listeners were in speech mode, and not if they were in non-speech mode. Using identical stimuli, we thus confirm that audiovisual binding involves (partially) different neural mechanisms for sound processing in speech and non-speech mode. © 2013 Published by Elsevier Ltd.
Objective The Internet provides the general public with information about speech pathology services, including client groups and service delivery models, as well as the professionals providing the services. Although this information assists the general public and other professionals to both access and understand speech pathology services, it also potentially provides information about speech pathology as a prospective career, including the types of people who are speech pathologists (i.e. demographics). The aim of the present study was to collect baseline data on how the speech pathology profession was presented via images on the Internet. Methods A pilot prospective observational study using content analysis methodology was conducted to analyse publicly available Internet images related to the speech pathology profession. The terms 'Speech Pathology' and 'speech pathologist' to represent both the profession and the professional were used, resulting in the identification of 200 images. These images were considered across a range of areas, including who was in the image (e.g. professional, client, significant other), the technology used and the types of intervention. Results The majority of images showed both a client and a professional (i.e. speech pathologist). While the professional was predominantly presented as female, the gender of the client was more evenly distributed. The clients were more likely to be preschool or school aged, however male speech pathologists were presented as providing therapy to selected age groups (i.e. school aged and younger adults). Images were predominantly of individual therapy and the few group images that were presented were all paediatric. Conclusion Current images of speech pathology continue to portray narrow professional demographics and client groups (e.g. paediatrics). Promoting images of wider scope to fully represent the depth and breadth of speech pathology professional practice may assist in attracting a more diverse
Somanath, Keerthan; Mau, Ted
(1) To develop an automated algorithm to analyze electroglottographic (EGG) signal in continuous dysphonic speech, and (2) to identify EGG waveform parameters that correlate with the auditory-perceptual quality of strain in the speech of patients with adductor spasmodic dysphonia (ADSD). Software development with application in a prospective controlled study. EGG was recorded from 12 normal speakers and 12 subjects with ADSD reading excerpts from the Rainbow Passage. Data were processed by a new algorithm developed with the specific goal of analyzing continuous dysphonic speech. The contact quotient, pulse width, a new parameter peak skew, and various contact closing slope quotient and contact opening slope quotient measures were extracted. EGG parameters were compared between normal and ADSD speech. Within the ADSD group, intra-subject comparison was also made between perceptually strained syllables and unstrained syllables. The opening slope quotient SO7525 distinguished strained syllables from unstrained syllables in continuous speech within individual subjects with ADSD. The standard deviations, but not the means, of contact quotient, EGGW50, peak skew, and SO7525 were different between normal and ADSD speakers. The strain-stress pattern in continuous speech can be visualized as color gradients based on the variation of EGG parameter values. EGG parameters may provide a within-subject measure of vocal strain and serve as a marker for treatment response. The addition of EGG to multidimensional assessment may lead to improved characterization of the voice disturbance in ADSD. Copyright Â© 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias S
Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech signal. Here, we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers informed of its speech-like nature recognize as speech. While the McGurk illusion only occurred for informed observers, the audiovisual detection advantage occurred for naïve observers as well. This finding supports a multistage account of audiovisual integration of speech in which the many attributes of the audiovisual speech signal are integrated by separate integration processes.
Full Text Available According to a traditional view, speech perception and production are processed largely separately in sensory and motor brain areas. Recent psycholinguistic and neuroimaging studies provide novel evidence that the sensory and motor systems dynamically interact in speech processing, by demonstrating that speech perception and imitation share regional brain activations. However, the exact nature and mechanisms of these sensorimotor interactions are not completely understood yet.Transcranial magnetic stimulation (TMS has often been used in the cognitive neurosciences, including speech research, as a complementary technique to behavioral and neuroimaging studies. Here we provide an up-to-date review focusing on TMS studies that explored speech perception and imitation.Single-pulse TMS of the primary motor cortex (M1 demonstrated a speech specific and somatotopically specific increase of excitability of the M1 lip area during speech perception (listening to speech or lip reading. A paired-coil TMS approach showed increases in effective connectivity from brain regions that are involved in speech processing to the M1 lip area when listening to speech. TMS in virtual lesion mode applied to speech processing areas modulated performance of phonological recognition and imitation of perceived speech.In summary, TMS is an innovative tool to investigate processing of speech perception and imitation. TMS studies have provided strong evidence that the sensory system is critically involved in mapping sensory input onto motor output and that the motor system plays an important role in speech perception.
Heinen, Esther; Birkholz, Peter; Willmes, Klaus; Neuschaefer-Rube, Christiane
To explore possible effects of tongue piercing on perceived speech quality. Using a quasi-experimental design, we analyzed the effect of tongue piercing on speech in a perception experiment. Samples of spontaneous speech and read speech were recorded from 20 long-term pierced and 20 non-pierced individuals (10 males, 10 females each). The individuals having a tongue piercing were recorded with attached and removed piercing. The audio samples were blindly rated by 26 female and 20 male laypersons and by 5 female speech-language pathologists with regard to perceived speech quality along 5 dimensions: speech clarity, speech rate, prosody, rhythm and fluency. We found no statistically significant differences for any of the speech quality dimensions between the pierced and non-pierced individuals, neither for the read nor for the spontaneous speech. In addition, neither length nor position of piercing had a significant effect on speech quality. The removal of tongue piercings had no effects on speech performance either. Rating differences between laypersons and speech-language pathologists were not dependent on the presence of a tongue piercing. People are able to perfectly adapt their articulation to long-term tongue piercings such that their speech quality is not perceptually affected.
Mehta, G; Cutler, A
Although spontaneous speech occurs more frequently in most listeners' experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ considerably, however, which suggests that laboratory results may not generalise to the recognition of spontaneous speech. In the present study listeners were presented with both spontaneous and read speech materials, and their response time to detect word-initial target phonemes was measured. Responses were, overall, equally fast in each speech mode. However, analysis of effects previously reported in phoneme detection studies revealed significant differences between speech modes. In read speech but not in spontaneous speech, later targets were detected more rapidly than targets preceded by short words. In contrast, in spontaneous speech but not in read speech, targets were detected more rapidly in accented than in unaccented words and in strong than in weak syllables. An explanation for this pattern is offered in terms of characteristic prosodic differences between spontaneous and read speech. The results support claims from previous work that listeners pay great attention to prosodic information in the process of recognising speech.
Pinker, Steven; Nowak, Martin A.; Lee, James J.
When people speak, they often insinuate their intent indirectly rather than stating it as a bald proposition. Examples include sexual come-ons, veiled threats, polite requests, and concealed bribes. We propose a three-part theory of indirect speech, based on the idea that human communication involves a mixture of cooperation and conflict. First, indirect requests allow for plausible deniability, in which a cooperative listener can accept the request, but an uncooperative one cannot react adversarially to it. This intuition is supported by a game-theoretic model that predicts the costs and benefits to a speaker of direct and indirect requests. Second, language has two functions: to convey information and to negotiate the type of relationship holding between speaker and hearer (in particular, dominance, communality, or reciprocity). The emotional costs of a mismatch in the assumed relationship type can create a need for plausible deniability and, thereby, select for indirectness even when there are no tangible costs. Third, people perceive language as a digital medium, which allows a sentence to generate common knowledge, to propagate a message with high fidelity, and to serve as a reference point in coordination games. This feature makes an indirect request qualitatively different from a direct one even when the speaker and listener can infer each other's intentions with high confidence. PMID:18199841
Davidow, Jason H.
Background: Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control…
Terband, H.; Maassen, B.
Childhood apraxia of speech (CAS) is a highly controversial clinical entity, with respect to both clinical signs and underlying neuromotor deficit. In the current paper, we advocate a modeling approach in which a computational neural model of speech acquisition and production is utilized in order to
Terband, H.R.; Maassen, B.A.M.
Childhood apraxia of speech (CAS) is a highly controversial clinical entity, with respect to both clinical signs and underlying neuromotor deficit. In the current paper, we advocate a modeling approach in which a computational neural model of speech acquisition and production is utilized in order to
Mailend, Marja-Liisa; Maas, Edwin
Purpose: Apraxia of speech (AOS) is considered a speech motor programming impairment, but the specific nature of the impairment remains a matter of debate. This study investigated 2 hypotheses about the underlying impairment in AOS framed within the Directions Into Velocities of Articulators (DIVA; Guenther, Ghosh, & Tourville, 2006) model: The…
Sneed, Don; Stonecipher, Harry W.
The ultimate test of the speech-action dichotomy, as it relates to symbolic speech to be considered by the courts, may be the fasting of prison inmates who use hunger strikes to protest the conditions of their confinement or to make political statements. While hunger strikes have been utilized by prisoners for years as a means of protest, it was…
Lametti, Daniel R.; Rochet-Capellan, Amélie; Neufeld, Emily; Shiller, Douglas M.
Recent studies of human speech motor learning suggest that learning is accompanied by changes in auditory perception. But what drives the perceptual change? Is it a consequence of changes in the motor system? Or is it a result of sensory inflow during learning? Here, subjects participated in a speech motor-learning task involving adaptation to altered auditory feedback and they were subsequently tested for perceptual change. In two separate experiments, involving two different auditory perceptual continua, we show that changes in the speech motor system that accompany learning drive changes in auditory speech perception. Specifically, we obtained changes in speech perception when adaptation to altered auditory feedback led to speech production that fell into the phonetic range of the speech perceptual tests. However, a similar change in perception was not observed when the auditory feedback that subjects' received during learning fell into the phonetic range of the perceptual tests. This indicates that the central motor outflow associated with vocal sensorimotor adaptation drives changes to the perceptual classification of speech sounds. PMID:25080594
Full Text Available One of the goals of text-to-speech (TTS) systems is to produce natural-sounding synthesised speech. Towards this end various natural language processing (NLP) tasks are performed to model the prosodic aspects of the TTS voice. One of the fundamental...
Ygual-Fernandez, A; Cervera-Merida, J F
In the treatment of speech disorders by means of speech therapy two antagonistic methodological approaches are applied: non-verbal ones, based on oral motor exercises (OME), and verbal ones, which are based on speech processing tasks with syllables, phonemes and words. In Spain, OME programmes are called 'programas de praxias', and are widely used and valued by speech therapists. To review the studies conducted on the effectiveness of OME-based treatments applied to children with speech disorders and the theoretical arguments that could justify, or not, their usefulness. Over the last few decades evidence has been gathered about the lack of efficacy of this approach to treat developmental speech disorders and pronunciation problems in populations without any neurological alteration of motor functioning. The American Speech-Language-Hearing Association has advised against its use taking into account the principles of evidence-based practice. The knowledge gathered to date on motor control shows that the pattern of mobility and its corresponding organisation in the brain are different in speech and other non-verbal functions linked to nutrition and breathing. Neither the studies on their effectiveness nor the arguments based on motor control studies recommend the use of OME-based programmes for the treatment of pronunciation problems in children with developmental language disorders.
Wijngaarden, S.J. van; Bronkhorst, A.W.; Houtgast, T.; Steeneken, H.J.M.
While the Speech Transmission Index ~STI! is widely applied for prediction of speech intelligibility in room acoustics and telecommunication engineering, it is unclear how to interpret STI values when non-native talkers or listeners are involved. Based on subjectively measured psychometric functions
Terband, H.; Maassen, B.; van Lieshout, P.; Nijland, L.
The aim of this study was to investigate the consistency and composition of functional synergies for speech movements in children with developmental speech disorders. Kinematic data were collected on the reiterated productions of syllables spa (/spa:/) and paas (/pa:s/) by 10 6- to 9-year-olds with
Vích, Robert; Nouza, J.; Vondra, Martin
-, č. 5042 (2008), s. 136-148 ISSN 0302-9743 R&D Projects: GA AV ČR 1ET301710509; GA AV ČR 1QS108040569 Institutional research plan: CEZ:AV0Z20670512 Keywords : speech recognition * speech processing Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering
Asakawa, Kaori; Tanaka, Akihiro; Imai, Hisato
We investigated whether audiovisual synchrony perception for speech could change after observation of the audiovisual temporal mismatch. Previous studies have revealed that audiovisual synchrony perception is re-calibrated after exposure to a constant timing difference between auditory and visual signals in non-speech. In the present study, we examined whether this audiovisual temporal recalibration occurs at the perceptual level even for speech (monosyllables). In Experiment 1, participants performed an audiovisual simultaneity judgment task (i.e., a direct measurement of the audiovisual synchrony perception) in terms of the speech signal after observation of the speech stimuli which had a constant audiovisual lag. The results showed that the “simultaneous” responses (i.e., proportion of responses for which participants judged the auditory and visual stimuli to be synchronous) at least partly depended on exposure lag. In Experiment 2, we adopted the McGurk identification task (i.e., an indirect measurement of the audiovisual synchrony perception) to exclude the possibility that this modulation of synchrony perception was solely attributable to the response strategy using stimuli identical to those of Experiment 1. The characteristics of the McGurk effect reported by participants depended on exposure lag. Thus, it was shown that audiovisual synchrony perception for speech could be modulated following exposure to constant lag both in direct and indirect measurement. Our results suggest that temporal recalibration occurs not only in non-speech signals but also in monosyllabic speech at the perceptual level.